MB-GAN: Microbiome Simulation via Generative Adversarial Network
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaScience)
Abstract
Background
Trillions of microbes inhabit the human body and have a profound effect on human health. The recent development of metagenome-wide association studies and other quantitative analysis methods accelerate the discovery of the associations between human microbiome and diseases. To assess the strengths and limitations of these analytical tools, simulating realistic microbiome datasets is critically important. However, simulating the real microbiome data is challenging because it is difficult to model their correlation structure using explicit statistical models.
Results
To address the challenge of simulating realistic microbiome data, we designed a novel simulation framework termed MB-GAN, by using a generative adversarial network (GAN) and utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from given microbial abundances and compute simulated abundances that are indistinguishable from them. In practice, MB-GAN showed the following advantages. First, MB-GAN avoids explicit statistical modeling assumptions, and it only requires real datasets as inputs. Second, unlike the traditional GANs, MB-GAN is easily applicable and can converge efficiently.
Conclusions
By applying MB-GAN to a case-control gut microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-taxa correlations. These advantages are suitable for further microbiome methodology development where high-fidelity microbiome data are needed.
Article activity feed
-
Now published in GigaScience doi: 10.1093/gigascience/giab005
Ruichen Rong 1Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteShuang Jiang 2Department of Statistical Sciences, Southern Methodist University, Dallas, TX 75275, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteLin Xu 1Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteGuanghua Xiao 1Quantitative Biomedical Research Center, …
Now published in GigaScience doi: 10.1093/gigascience/giab005
Ruichen Rong 1Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteShuang Jiang 2Department of Statistical Sciences, Southern Methodist University, Dallas, TX 75275, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteLin Xu 1Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteGuanghua Xiao 1Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USA3Harold C. Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USA4Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteYang Xie 1Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USA3Harold C. Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USA4Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteDajiang J. Liu 5Institute for Personalized Medicine, College of Medicine, Pennsylvania State University, Hershey, Pennsylvania, 17033, USA6Division of Biostatistics, Department of Public Health Sciences, College of Medicine, Pennsylvania State University, Hershey, Pennsylvania, 17033, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteQiwei Li 7Department of Mathematical Sciences, The University of Texas at Dallas, Dallas, Texas, 75080, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteXiaowei Zhan 1Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USA8Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteFor correspondence: Xiaowei.Zhan@UTSouthwestern.edu
A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giab005 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.
These peer reviews were as follows:
Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102648 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102649
-
-