TEA-GCN: Constructing Accurate, Species-comparable, and Explainable Gene Regulatory and Functional Networks via Condition-Specific Ensemble Co-expression

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gene co-expression networks (GCNs) are widely used for gene function prediction, yet their performance often suffers from low dataset quality. To address this problem, we have developed TEA-GCN (Two-Tier Ensemble Aggregation-GCN), a novel method that constructs highly-performing ensemble GCNs by leveraging unsupervised transcriptomic data partitioning and multi-metric co-expression scoring. Benchmarking over 450,000 public RNA-seq samples across Saccharomyces cerevisiae , Arabidopsis thaliana , Homo sapiens , and nine additional angiosperms, we show that TEA-GCN consistently outperforms the state-of-the-art in predicting gene functions, recovering transcription factor-target relationships, and inferring gene regulatory networks. TEA-GCN uses natural language processing to provide an unprecedented level of explainability by identifying co-expression relationships unique to specific tissues or conditions. Additionally, TEA-GCNs exhibit enhanced cross-species conservation, improving comparative analyses across diverse plants. We provide both an open-source pipeline ( https://github.com/pengkenlim/TEA-GCN ) and a user-friendly Plant-GCN web resource for ten Angiosperm species ( https://plantgcn.connectome.tools/ ), supporting widespread application of this framework in gene function prediction and comparative transcriptomics.

Article activity feed