DeepBioSim: Efficient and Versatile Methods for Microbiome Data Simulation with Minimal Statistical Assumptions
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
The human microbiome profoundly influences health and disease. Robust computational and statistical tools for identifying causal microbe–disease links are therefore critical to uncovering the mechanistic basis of these associations. Yet benchmarking such tools remains difficult: microbiome datasets are sparse, high-dimensional, and contain complex dependencies, and no gold-standard reference set exists. Realistic simulated data with embedded ground truth are essential for fair evaluation of analytical tools. Current simulators often impose strong assumptions, require hard-to-obtain auxiliary information, or fail to scale to large, high-dimensional datasets.
Results
We introduce DeepBioSim, a DEEP-learning framework for BIOlogical SIMulation of microbiome data. DeepBioSim uses variational autoencoders (VAEs) to generate realistic microbiome datasets by sampling directly from the latent distribution of metagenomic or metatranscriptomic count data.
Conclusions
The approach is fast, accurate, and scalable, generating highly realistic synthetic microbiome datasets without extensive hyper-parameter tuning or phylogenetic input. Tests on human RNA-seq data confirm versatility of DeepBioSim, showing it can also reliably simulate single-organism omics profiles.