GRNITE: Gene Regulatory Network Inference with Text Embeddings
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gene regulatory networks (GRNs) capture complex regulatory relationships that govern gene expression in cells. Inference of GRNs from single-cell RNA-seq (scRNA-seq) data has been an active topic of research in the past several years. However, despite the improvements in the data quality, the GRN inference problem remains a challenging task with many approaches showing variable performance dependent on the organism and cell type. To improve the quality of GRN inference and enable more comprehensive exploratory analyses of GRNs across various phenotypes of interest, we developed a two-stage meta-method called GRNITE. In the first step, GRNITE leverages LLM-based embeddings of plain text gene descriptions to create a prior gene interaction graph, which is then optimized with a graph neural network (GNN) to achieve a “universal” biological prior for GRN inference. In the second step, GRNITE uses a GNN to incorporate information from a GRN inferred from scRNA-seq data with any baseline inference method into our prior. The result of this two-step approach is a near-universal improvement in AUROC and recall of all evaluated methods, with minor trade-offs in precision. Furthermore, GRNITE is a lightweight meta-method, which adds a minimal amount of extra compute time on top of the original GRN inference performed. GRNITE and our pre-trained universal prior GRN are available on GitHub: https://github.com/aliaaz99/GRNITE .