GRNPred: A Multimodal Graph Transformer with Masked Gene Expression Pretraining for Gene Regulatory Network Inference

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gene regulatory network (GRN) inference is a fundamental problem in systems biology, aiming to identify transcription factor (TF)–target gene interactions from high-dimensional gene expression data. Accurate GRN reconstruction remains challenging due to limited labeled regulatory data, severe class imbalance, and the complex, nonlinear nature of transcriptional regulation. Here, we introduce GRNPred, a multimodal graph transformer framework for robust GRN inference that integrates gene expression, functional annotations, semantic gene descriptions, regulatory binding motif priors, and gene co-expression network topology. GRNPred follows a two-stage training strategy. In the first self-supervised pretraining phase, a graph transformer encoder is trained on TF-centered gene co-expression subgraphs using masked gene-expression reconstruction, enabling the model to learn transcriptional context from unlabeled data. In the second supervised fine-tuning stage, the pretrained encoder is finetuned for supervised TF–target edge prediction using available regulatory annotations. Transformer-based attention allows GRNPred to capture long-range and context-dependent regulatory interactions that are difficult to model with conventional graph neural networks. Extensive evaluation across 7 benchmark datasets and 3 regulatory network constructions demonstrates that GRNPred consistently outperforms state-of-the-art GRN inference methods, achieving AUROC scores of up to 0.94 and AUPRC scores of up to 0.93, while maintaining strong robustness across diverse biological contexts.

Article activity feed