Ribosomal protein phylogeography offers quantitative insights into the efficacy of genome-resolved surveys of microbial communities
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing availability of microbial genomes is essential to gain insights into microbial ecology and evolution that can propel biotechnological and biomedical advances. Recent advances in genome recovery have significantly expanded the catalogue of microbial genomes from diverse habitats. However, the ability to explain how well a set of genomes account for the diversity in a given environment remains challenging for individual studies or biome-specific databases. Here we present EcoPhylo, a computational workflow to characterize the phylogeography of any gene family through integrated analyses of genomes and metagenomes, and our application of this approach to ribosomal proteins to quantify phylogeny-aware genome recovery rates across three biomes. Our findings show that genome recovery rates vary widely across taxa and biomes, and that single amplified genomes, metagenome-assembled genomes, and isolate genomes have non-uniform yet quantifiable representation of environmental microbes. EcoPhylo reveals highly resolved, reference-free, multi-domain phylogenies in conjunction with distribution patterns of individual clades across environments, providing a means to assess genome recovery in individual studies and benchmark biome-level genome collections.