ASVNet: Inferring microbes from 16S rRNA amplicon sequencing data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Amplicon sequencing of the variable regions of the 16S rRNA gene is one of the most common techniques in microbiome research. A crucial step in analysing this data is to cluster raw sequences into biologically interpretable groups that approximate the actual organisms in a community, which reduces data complexity, facilitates diversity analyses, and enables clearer ecological and functional interpretations. Most approaches rely on fixed sequence similarity thresholds, typically operational taxonomic units (OTUs) at 97% or amplicon sequence variants (ASVs) at 99-100%. However, finding the correct correspondence between amplicons and the original organisms in a complex community remains challenging, primarily due to the presence of varying copy numbers within genomes and the high similarity of amplicon sequences among species. While the stringent threshold in ASVs prevents merging sequences from different species, it often splits sequences from the same genome into multiple ASVs, inflating species counts. Conversely, the 97% similarity threshold in OTUs tends to frequently collapse multiple distinct organisms into a single unit regardless of their functional or ecological (dis)similarity, and thereby underestimating true diversity. To address these limitations, we introduce ASVNet, an algorithmic framework that leverages amplicon sequence similarity and co-abundance networks to better reflect the underlying organisms in a community. ASVNet builds on the principle that amplicon counts from the same organism co-vary across samples and it thus attempts to cluster these co-occurring, similar sequences as empirical OTUs (eOTUs). Comparative analyses in synthetic and natural root-associated microbial communities with genome-sequenced isolates show that ASVNet-derived eOTUs more accurately represent the original microbes and possess greater biological relevance than OTUs defined at any fixed threshold. By providing a biologically grounded framework to amplicon sequence data interpretation, ASVNet extends our ability to extract meaningful insights from complex microbiome datasets.