EcoFoldDB : Protein Structure‐Guided Functional Profiling of Ecologically Relevant Microbial Traits at the Metagenome Scale

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Microbial communities are fundamental to planetary health and ecosystem processes. High‐throughput metagenomic sequencing has provided unprecedented insights into the structure and function of these communities. However, functionally profiling metagenomes remains constrained due to the limited sensitivity of existing sequence homology‐based methods to annotate evolutionarily divergent genes. Protein structure, more conserved than sequence and intrinsically tied to molecular function, offers a solution. Capitalising on recent breakthroughs in structural bioinformatics, we present EcoFoldDB, a database of protein structures curated for ecologically relevant microbial traits, and its companion pipeline, EcoFoldDB‐annotate , which leverages Foldseek with the ProstT5 protein language model for rapid structural homology searching directly from sequence data. EcoFoldDB‐annotate outperforms state‐of‐the‐art sequence‐based methods in annotating metagenomic proteins, in terms of sensitivity and precision. To demonstrate its utility and scalability, we performed structure‐guided functional profiling of 32 million proteins encoded by 8000 high‐quality metagenome‐assembled genomes from the global soil microbiome. EcoFoldDB‐annotate could resolve the phylogenetic partitioning of important nitrogen cycling pathways, from taxonomically restricted nitrifiers to more widespread denitrifiers, as well as identifying novel, uncultivated bacterial taxa enriched in plant growth‐promoting traits. We anticipate that EcoFoldDB will enable researchers to extract ecological insights from environmental genomes and metagenomes and accelerate discoveries in microbial ecology.

Article activity feed