DeepBioSim: Efficient and Versatile Methods for Microbiome Data Simulation with Minimal Statistical Assumptions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

The human microbiome profoundly influences health and disease. Robust computational and statistical tools for identifying causal microbe–disease links are therefore critical to uncovering the mechanistic basis of these associations. Yet benchmarking such tools remains difficult: microbiome datasets are sparse, high-dimensional, and contain complex dependencies, and no gold-standard reference set exists. Realistic simulated data with embedded ground truth are essential for fair evaluation of analytical tools. Current simulators often impose strong assumptions, require hard-to-obtain auxiliary information, or fail to scale to large, high-dimensional datasets.

Results

We introduce DeepBioSim, a DEEP-learning framework for BIOlogical SIMulation of microbiome data. DeepBioSim uses variational autoencoders (VAEs) to generate realistic microbiome datasets by sampling directly from the latent distribution of metagenomic or metatranscriptomic count data.

Conclusions

The approach is fast, accurate, and scalable, generating highly realistic synthetic microbiome datasets without extensive hyper-parameter tuning or phylogenetic input. Tests on human RNA-seq data confirm versatility of DeepBioSim, showing it can also reliably simulate single-organism omics profiles.

Article activity feed