Three Classes of Confound in Gene-Regulatory-Network Inference: A Systematic Audit and Open-Source Diagnostic Toolkit
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background : Inferred gene regulatory networks (GRNs) from single-cell RNA-seq are used to prioritize transcription-factor–target hypotheses, yet edge rankings can be inflated by confounds that are rarely audited systematically. Individual confound classes—technical batch effects, genomic proximity co-expression, and degree-distribution artifacts—have been studied in isolation, but no prior work has conducted a unified audit across all three on the same datasets and inference methods. Results : We present grn_confound_audit, an open-source Python package that implements a unified three-class confound audit covering technical bias (batch, donor, and assay-method leakage), genomic-structural bias (chromosomal proximity inflation), and topological bias (degree-distribution artifacts). Across three Tabula Sapiens tissues and 12 inference methods, the tool reveals that: (i) donor and batch identity are recoverable from edge features at AUC 0.85–0.97; (ii) prior-heavy methods show 2–3× genomic-proximity enrichment that attenuates to 1.15–1.28× under degree-preserving rewiring; (iii) no individual edge reaches FDR ≤ 0.10 under topological null calibration despite strong global separation (z-scores 12–60). The three confound classes are largely orthogonal, and joint filtering retains only ∼28% of candidate edges. Perturbation validation using CRISPR data shows that technically blacklisted edges have 2.7-fold lower perturbation-significant rates. Conclusions : The grn_confound_audit toolkit enables routine multi-class confound diagnostics for any scored GRN edge list, producing per-edge quality indices, standardised reports, and actionable recommendations. We propose that confound auditing should become a standard component of GRN publications alongside accuracy benchmarks.