CellSpliceNet for Interpretable Multimodal Modeling of Alternative Splicing Across Neurons in C. elegans
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Alternative splicing profoundly diversifies the transcriptome and proteome, but decoding its regulatory mechanisms remains a challenge. We introduce CellSpliceNet, an interpretable transformer-based multimodal deep learning framework designed to predict splicing outcomes across the neurons of C. elegans. By integrating four complementary data modalities, namely long-range genomic sequence, local regions of interest (ROIs) in the RNA sequence, secondary structure, and gene expression, CellSpliceNet captures the complex interplay of factors that influence splicing decisions within the cellular context. CellSpliceNet employs modality-specific embeddings including different scales of RNA sequence (exon-flanking and whole-gene), generated RNA structure, and cell-type-specific gene expression. Gene expression and structure encoders employ expressive graph signal processing-based encoders that utilize the graph scattering transform. Further, CellSpliceNet introduces a novel multimodal multi-head attention mechanism that preserves the integrity of each modality while facilitating selective cross-modal interactions, notably allowing cellular gene expression to inform sequence and structural predictions. Attention-based pooling within each modality highlights biologically critical elements, such as canonical intron–exon splice boundaries and accessible single-stranded RNA loop structures within exons. We apply CellSpliceNet to a unique multimodal dataset measuring DNA sequencing, RNA sequencing, and alternative splicing frequencies in purified neuronal subtypes of the C. elegans. Our results show superior performance of CellSpliceNet compared to several other current models, and ablations demonstrate that each module of CellSpliceNet is essential for optimal performance.