Robust Integration of Sparse Single-Cell Alternative Splicing and Gene Expression Data with SpliceVI
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Alternative splicing (AS) and gene expression (GE) are tightly related regulatory processes, critical for defining cell types and states, yet are rarely modeled together in single-cell analyses. This hinders a comprehensive understanding of cellular identity. We address this by introducing SpliceVI, adapted from MultiVI (Multi-modal Variational Inference) to specifically handle AS. Applied to a large multisample mouse Smart-seq2 dataset ( n = 142, 315 cells/nuclei), SpliceVI jointly learns from both AS and GE using a partial variational autoencoder that effectively handles the sparsity and missingness of splicing data. We show that SpliceVI’s joint embeddings are more expressive and informative of biological correlates like age than a GE-only approach (scVI). SpliceVI also uncovers splicingbased differences between neuronal subclusters. This approach reveals the distinct yet synergistic relationship between AS and GE in shaping cellular diversity in mouse.