Efficient Inference of Direct Gene--Gene Associations via High-Dimensional Precision Matrix with Rigorous FDR Control
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gene coexpression networks (GCNs) provide a useful representation of coordinated gene activity, but their reconstruction from single-cell RNA sequencing (scRNA-seq) data is challenged by high dimensionality, sparsity and heterogeneous noise. Most existing approaches rely on marginal correlation or machine-learning regressors, which tend to capture indirect associations and typically lack rigorous statistical error control.We propose dGGAPM (Direct Gene--Gene Associations via Precision Matrix), a precision-matrix-based framework for inferring direct gene--gene associations from scRNA-seq data. dGGAPM employs node-wise sparse regression to estimate partial correlations between genes, and then applies an adaptive thresholding rule derived from high-dimensional asymptotic theory to obtain a sparse network under a pre-specified false discovery rate (FDR). This yields a signed gene coexpression network in which edges correspond to statistically supported conditional dependencies, providing a principled alternative to ad hoc correlation cutoffs.We evaluate dGGAPM on three time-resolved scRNA-seq datasets covering human and mouse embryonic stem cells and mouse hematopoietic stem cells, using ChIP-seq–derived regulatory networks as proxies for ground truth. Across these benchmarks, dGGAPM achieves competitive or improved area under the ROC curve compared with several widely used methods (scLink, GENIE3, SCODE and GRNBoost2), and recovers biologically coherent modules enriched for processes such as lineage commitment, DNA replication and hematopoiesis. Case studies on key regulators including NANOG, SOX2 and GATA2 further illustrate that dGGAPM can reveal interpretable modules and hub genes. These results indicate that precision-matrix-based inference with explicit FDR control is a useful and robust strategy for reconstructing gene coexpression networks from single-cell transcriptomic data.