SignifiKANTE: Efficient P -value computation for gene regulatory networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gene regulatory networks (GRNs) are graph-based representations of regulatory relationships between transcription factors and target genes. Various tools exist to infer GRNs from gene expression data, but since this task is computationally intensive, statistical significance estimates are often omitted. While permutation-based empirical P -value computation methods are relatively straightforward to implement, they are prohibitively expensive when applied to popular regression-based GRN inference methods and realistically sized datasets. To address this bottleneck, we developed SignifiKANTE. SignifiKANTE is based on the key insight that the background count distributions of groups of target genes may be highly similar, even if their expression vectors show distinct behavior. Relying on this insight, SignifiKANTE employs gene clustering based on the 1-Wasserstein distance to create a small, constant number of background distributions which enables the simultaneous computation of approximate empirical P -values for multiple target genes. This reduces runtime by orders of magnitudes (for some datasets, from several weeks to few hours), without compromising faithfulness of the obtained P-values. SignifiKANTE extends the popular GRN inference package Arboreto and is available as a Python package on GitHub ( https://github.com/bionetslab/SignifiKANTE ) and PyPI ( https://pypi.org/project/signifikante/ ).