An Enhanced Variant-Aware Deep Learning Model for Individual Gene Expression Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of gene expression from individual whole-genome sequences is critical for understanding disease mechanisms and advancing precision medicine. Current methods, however, struggle with individual-specific genetic variations and integrating detailed sequence context. To address this, we introduce GenomicVariExpress (GVE), a novel deep learning model that leverages a pre-trained sequence encoder and incorporates an Enhanced Variant Integration Module (EVIM). EVIM explicitly encodes and fuses multi-dimensional variant features, such as type, allele frequency, and predicted functional impact, enabling GVE to precisely capture how individual variations modulate gene expression. We evaluate GVE using paired whole-genome and RNA sequencing data from the GTEx Whole Blood cohort. Our experiments demonstrate GVE consistently achieves superior performance compared to state-of-the-art baselines. An ablation study confirms EVIM’s critical contribution to this improved performance. Furthermore, analyses highlight GVE’s enhanced biological interpretability and its superior performance across multiple tissues and for genes influenced by rare variants. GVE represents a significant step towards accurate, individual-level gene expression prediction, offering a powerful tool for genomic function research and personalized healthcare applications.