Causal single-cell RNA-seq simulation, in silico perturbation, and GRN inference benchmarking using GRouNdGAN-Toolkit
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Rapid advances in high-throughput single-cell sequencing technologies, coupled with the development of computational methods capable of leveraging large datasets, have led to the emergence of numerous approaches for deciphering regulatory interactions in the form of Gene Regulatory Networks (GRNs). However, in the absence of context-specific gold-standard ground truths, particularly those containing causal interactions, systematically benchmarking GRN inference methods remains a challenge. Thus, we previously developed GRouNdGAN, a causal implicit generative model capable of simulating realistic observational and interventional scRNA-seq data following a user-defined GRN on any biological system of interest. Importantly, we demonstrated that GRouNdGAN generates datasets that bridge the gap between experimentation and simulation for GRN inference benchmarking.
Method
Building upon the GRouNdGAN framework, we developed an extended toolkit that offers additional features, including interactive model visualization, training monitoring, more customizable GRN creation options, synthetic data similarity and GRN inference benchmarking metrics, and an intuitive TF knockout prediction module. Here, we provide a step-by-step procedure for implementing the protocol from start to finish and introduce alternative variations to adapt GRouNdGAN to studies with different experimental setups. GRouNdGAN-Toolkit is publicly available as a python code repository and containerized application and is accompanied by a tutorial website featuring a collection of simulated datasets. Model training largely depends on graphic hardware and the size and density of the input GRN, and typically takes around 75h to complete on a single GPU. Excluding model training, this protocol typically takes less than 25min to complete.
Discussion
GRouNdGAN-Toolkit is a versatile simulator with user friendly interface that does not assume advanced computational genomics expertise, enhancing its usability and accessibility across a wide range of users.