ABaCo: Addressing Heterogeneity Challenges in Metagenomic Data Integration with Adversarial Generative Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid advancement of high-throughput metagenomics has produced extensive and heterogeneous datasets with significant implications for environmental and human health. Integrating these datasets is crucial for understanding the functional roles of microbiomes and the interactions within microbial communities. However, this integration remains challenging due to technical heterogeneity and the inherent complexity of these biological systems. To address these challenges, we introduce ABaCo, a generative model that combines a Variational Autoencoder (VAE) with an adversarial discriminator specifically designed to handle the unique characteristics of metagenomic data. Our results demonstrate that ABaCo effectively integrates metagenomic data from multiple studies, corrects technical heterogeneity, outperforms existing methods, and preserves taxonomic-level biological signals. We have developed ABaCo as an open-source, fully documented Python library to facilitate, support and enhance metagenomics research in the scientific community.