Publications
AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery
Johann Wenckstern*, Eeshaan Jain*, Kiril Vasilev, Matteo Pariset, Andreas Wicki, Gabriele Gut, Charlotte Bunne
ArXiv Preprint
Spatial proteomics technologies have transformed our understanding of complex tissue architectures by enabling simultaneous analysis of multiple molecular markers and their spatial organization. The high dimensionality of these data, varying marker combinations across experiments and heterogeneous study designs pose unique challenges for computational analysis. Here, we present Virtual Tissues (VirTues), a foundation model framework for biological tissues that operates across the molecular, cellular and tissue scale. VirTues introduces innovations in transformer architecture design, including a novel tokenization scheme that captures both spatial and marker dimensions, and attention mechanisms that scale to high-dimensional multiplex data while maintaining interpretability. Trained on diverse cancer and non-cancer tissue datasets, VirTues demonstrates strong generalization capabilities without task-specific fine-tuning, enabling cross-study analysis and novel marker integration. As a generalist model, VirTues outperforms existing approaches across clinical diagnostics, biological discovery and patient case retrieval tasks, while providing insights into tissue function and disease mechanisms.
Show BibTeX
Clique Number Estimation via Differentiable Functions of Adjacency Matrix Permutations
Indradyumna Roy*, Eeshaan Jain*, Soumen Chakrabarti, Abir De
ICLR 2025
MxNet is a fully differentiable clique number estimator that learns from distant supervision without explicit clique demonstrations. We reformulate MCP as detecting dense submatrices via learned permutations within a nested subgraph matching task.
Graph Edit Distance Evaluation Datasets: Pitfalls and Mitigation
Eeshaan Jain*, Indradyumna Roy*, Saswat Meher, Soumen Chakrabarti, Abir De
LoG 2024 (Extended Abstract)
Graph Edit Distance (GED) is a powerful framework for modeling both symmetric and asymmetric relationships between graph pairs under various cost settings. Due to the combinatorial intractability of exact GED computation, recent advancements have focused on neural GED estimators that approximate GED by leveraging data distribution characteristics. However, the datasets commonly used to benchmark such neural models exhibit two critical flaws: (1) significant isomorphism bias and (2) reliance on uniform edit costs for GED ground truths. Our datasets eliminate isomorphism leakage and incorporate a range of edit costs, facilitating more accurate assessment of GED methods
Show BibTeX
Graph Edit Distance with General Costs Using Neural Set Divergence
Eeshaan Jain*, Indradyumna Roy*, Saswat Meher, Soumen Chakrabarti, Abir De
NeurIPS 2024 and LoG 2024 (Extended Abstract)
GraphEdx is the first-of-its-kind neural GED framework that incorporates variable edit costs, capable of modeling both symmetric and asymmetric graph (dis)similarities, allowing for more flexible and accurate GED estimation compared to earlier methods.
Show BibTeX
Graph Edit Distance with General Costs Using Neural Set Divergence
Eeshaan Jain, Tushar Nandy, Gaurav Aggarwal, Ashish V. Tendulkar, Rishabh K Iyer, Abir De
NeurIPS 2023
Existing subset selection methods for efficient learning predominantly employ discrete combinatorial and model-specific approaches, which lack generalizability--- for each new model, the algorithm has to be executed from the beginning. We propose `SubSelNet`, a non-adaptive subset selection framework, which tackles these problems.
Show BibTeX
Johann Wenckstern*, Eeshaan Jain*, Kiril Vasilev, Matteo Pariset, Andreas Wicki, Gabriele Gut, Charlotte Bunne
ArXiv Preprint
Spatial proteomics technologies have transformed our understanding of complex tissue architectures by enabling simultaneous analysis of multiple molecular markers and their spatial organization. The high dimensionality of these data, varying marker combinations across experiments and heterogeneous study designs pose unique challenges for computational analysis. Here, we present Virtual Tissues (VirTues), a foundation model framework for biological tissues that operates across the molecular, cellular and tissue scale. VirTues introduces innovations in transformer architecture design, including a novel tokenization scheme that captures both spatial and marker dimensions, and attention mechanisms that scale to high-dimensional multiplex data while maintaining interpretability. Trained on diverse cancer and non-cancer tissue datasets, VirTues demonstrates strong generalization capabilities without task-specific fine-tuning, enabling cross-study analysis and novel marker integration. As a generalist model, VirTues outperforms existing approaches across clinical diagnostics, biological discovery and patient case retrieval tasks, while providing insights into tissue function and disease mechanisms.
Show BibTeX Indradyumna Roy*, Eeshaan Jain*, Soumen Chakrabarti, Abir De
ICLR 2025
MxNet is a fully differentiable clique number estimator that learns from distant supervision without explicit clique demonstrations. We reformulate MCP as detecting dense submatrices via learned permutations within a nested subgraph matching task.
Eeshaan Jain*, Indradyumna Roy*, Saswat Meher, Soumen Chakrabarti, Abir De
LoG 2024 (Extended Abstract)
Graph Edit Distance (GED) is a powerful framework for modeling both symmetric and asymmetric relationships between graph pairs under various cost settings. Due to the combinatorial intractability of exact GED computation, recent advancements have focused on neural GED estimators that approximate GED by leveraging data distribution characteristics. However, the datasets commonly used to benchmark such neural models exhibit two critical flaws: (1) significant isomorphism bias and (2) reliance on uniform edit costs for GED ground truths. Our datasets eliminate isomorphism leakage and incorporate a range of edit costs, facilitating more accurate assessment of GED methods
Show BibTeX Eeshaan Jain*, Indradyumna Roy*, Saswat Meher, Soumen Chakrabarti, Abir De
NeurIPS 2024 and LoG 2024 (Extended Abstract)
GraphEdx is the first-of-its-kind neural GED framework that incorporates variable edit costs, capable of modeling both symmetric and asymmetric graph (dis)similarities, allowing for more flexible and accurate GED estimation compared to earlier methods.
Show BibTeX Eeshaan Jain, Tushar Nandy, Gaurav Aggarwal, Ashish V. Tendulkar, Rishabh K Iyer, Abir De
NeurIPS 2023
Existing subset selection methods for efficient learning predominantly employ discrete combinatorial and model-specific approaches, which lack generalizability--- for each new model, the algorithm has to be executed from the beginning. We propose `SubSelNet`, a non-adaptive subset selection framework, which tackles these problems.
Show BibTeX