Biologically Guided Variational Inference for Interpretable Multimodal Single-Cell Integration and Mechanistic Discovery
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-omics technologies allow for a detailed characterization of cell types and states across multiple omics layers, helping to identify features that differentiate biological conditions, such as chemical or CRISPR-based perturbations. However, current tools employing variational inference on single-cell datasets, including methods for paired and mosaic integration, transfer learning, and modality imputation, typically act as black boxes. This lack of interpretability makes it challenging to evaluate whether biological variation is preserved, which can compromise downstream analyses. Here, we introduce NetworkVI, a sparse deep generative model designed for the integration and interpretation of multimodal single-cell data. NetworkVI utilizes biological prior knowledge as an inductive bias, specifically it relies on gene-gene interactions inferred from topologically associated domains and structured ontologies like the Gene Ontology to aggregate gene embeddings to cell embeddings, enhancing the interpretability at the gene and subcellular level. While achieving state-of-the-art data integration, modality imputation, and cell label transfer via query-to-reference mapping benchmarks across bimodal and trimodal datasets, NetworkVI additionally excels in providing biologically meaningful modality- and cell type-specific interpretations. NetworkVI aids researchers in identifying associations between genes and biological processes and uncovers immune evasion mechanisms in a Perturb CITE-seq dataset of melanoma cells. NetworkVI will support researchers in interpreting cellular disease mechanisms, guiding biomarker discovery, and ultimately aiding the development of targeted therapies in large-scale single-cell multimodal atlases.
NetworkVI is available at http://github.com/LArnoldt/networkVI .