Fast Optimization of Robust Transcriptomics Embeddings using Probabilistic Inference Autoencoder Networks for multi-Omics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Advances in single-cell genomics technologies enable the routine acquisition of atlases with millions of cells. These datasets often include multiple sources of variation, such as donors, sequencing platforms, developmental timepoints, and species. Although these covariates provide new opportunities for discovery, they also present challenges for downstream analyses. To mitigate unwanted sources of variation, dataset integration is the starting point for most analyses. However, existing methods struggle with integrating large and complex datasets. To address these limitations, we developed PIANO, a variational autoencoder framework that uses a negative binomial generalized linear model for stronger batch correction, and code compilation for up to ten times faster training than existing tools. We first demonstrate performant integration compared to commonly used integration methods on single-species datasets. We then show PIANO enables superior analyses of multiple atlases, solving challenging integration tasks across sequencing platforms, developmental timepoints, and species, while simultaneously preserving desired biological signals. Our contributions include a novel, high-performance integration method and recommendations for integration applications.