Global Metagenomics Reveals Hidden Protist Diversity

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Protists, defined as eukaryotes distinct from animals, plants, and fungi, are a polyphyletic group that dominates the eukaryotic tree of life, exhibiting significant phylogenetic diversity and fulfilling critical ecological roles. Historically, research has prioritized protists associated with animals and plants, particularly those of medical significance, thereby overlooking the majority of protist diversity. Conventional molecular methods, such as 18S rRNA gene amplicon sequencing, frequently encounter limitations, including primer binding bias and PCR bias caused by gene length variations, resulting in a biased representation of protistan diversity. Further, most protist lineages are notoriously difficult to cultivate. Here, we analyzed over 30,000 metagenome assemblies and protist single-cell genomes, together with 21 long-amplicon 18S datasets, spanning marine, freshwater, and soil ecosystems. We recovered 157,956 18S rRNA gene sequences (≥800 bp), which clustered into 103,338 operational taxonomic units (OTUs) at 97% sequence identity and 24,438 OTUs at 85% identity. Notably, 69% of 13,238 non-singleton clusters at 85% identity consisted exclusively of environmental sequences, uncovering a wealth of novel, uncultivated, and unclassified protist diversity. A comprehensive taxonomic framework of eukaryotes based on concatenated 18S and 28S rRNA genes that incorporated most novel lineages revealed substantial underrepresentation of Amoebozoa, Discoba, and Rhizaria in reference databases, with many lacking isolate or genome sequence representation. Further, we identified 13 lineages that likely represent deeply branching diversity, including candidates at approximately class- to phylum-level depth, that lack representation in public databases. The corresponding 85% OTUs were primarily affiliated with Excavata, with some branching deeply in the eukaryotic tree. Comprehensive analysis of the global distribution of eukaryotes revealed uneven microbial eukaryotic diversity across supergroups and ecosystems, with notable novelty particularly in soil and marine environments. We then examined co-occurrence between protists and prokaryotes, predicting putative symbiotic or predator-prey relationships, particularly among understudied protist groups with bacteria such as Verrucomicrobia and Rickettsiales. Our results extend current reference coverage and provide a global, contig-based framework for protistan diversity and distribution, highlighting major gaps in curated databases and metabarcoding coverage and guiding targeted studies of these organisms’ ecological roles.

Article activity feed