TEA-GCN: Constructing Accurate, Species-comparable, and Explainable Gene Regulatory and Functional Networks via Condition-Specific Ensemble Co-expression
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gene co-expression networks (GCNs) are widely used for gene function prediction, yet their performance often suffers from low dataset quality. To address this problem, we have developed TEA-GCN (Two-Tier Ensemble Aggregation-GCN), a novel method that constructs highly-performing ensemble GCNs by leveraging unsupervised transcriptomic data partitioning and multi-metric co-expression scoring. Benchmarking over 450,000 public RNA-seq samples across Saccharomyces cerevisiae , Arabidopsis thaliana , Homo sapiens , and nine additional angiosperms, we show that TEA-GCN consistently outperforms the state-of-the-art in predicting gene functions, recovering transcription factor-target relationships, and inferring gene regulatory networks. TEA-GCN uses natural language processing to provide an unprecedented level of explainability by identifying co-expression relationships unique to specific tissues or conditions. Additionally, TEA-GCNs exhibit enhanced cross-species conservation, improving comparative analyses across diverse plants. We provide both an open-source pipeline ( https://github.com/pengkenlim/TEA-GCN ) and a user-friendly Plant-GCN web resource for ten Angiosperm species ( https://plantgcn.connectome.tools/ ), supporting widespread application of this framework in gene function prediction and comparative transcriptomics.