Multifaceted Representation of Genes via Deep Learning of Gene Expression Networks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate predictive modeling of human gene relationships would fundamentally transform our ability to uncover the molecular mechanisms that underpin key biological processes and disease development. Recent studies have employed advanced AI techniques to model the complexities of gene networks using large gene expression datasets 1–11 . However, the extent and nature of the biological information these models can learn is not fully understood. On the other hand, the potential for improving model performance by using alternative data types, model architectures, and methodologies remains underexplored. Here, we developed GeneRAIN models by training on a large dataset of 410K human bulk RNA-seq samples, rather than single-cell RNA-seq datasets used by most previous studies. We showed that although the models were trained only on gene expression data, they learned a wide range of biological information well beyond expression. We introduced GeneRAIN-vec, a state-of-the-art, multifaceted vectorized representation of genes. Further, we showcased capabilities and broad applicability of our approach by making 62.5M predictions, equating to 4,797 biological attribute predictions for each of the 13,030 long non-coding RNAs. These achievements stem from various methodological innovations, including experimenting with multiple model architectures and a new ‘Binning-By-Gene’ normalization method. Comprehensive evaluation of our models clearly demonstrated that they significantly outperformed current state-of-the-art models 3,12 . This study improves our understanding of the capabilities of Transformer and self-supervised deep learning when applied to extensive expression data. Our methodological advancements offer crucial insights into refining these techniques, set to significantly advance our understanding and exploration of biology.

Article activity feed