A Bioinformatic Pipeline for Consensus Taxonomic Classification of Long-Read Amplicons

Ashley A. Paulsen
Breah LaSarre
Drew Delp
Gwyn A. Beattie
Larry J. Halverson

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Characterizing community composition is fundamental to understanding microbial community function. Recent advances in Oxford Nanopore Technology (ONT) long-read sequencing now allow community profiling using full-length gene amplicons, affording better taxonomic resolution than standard short-amplicon Illumina sequencing. However, robust ONT-compatible profiling workflows are lacking. To address this, we have created the Amplicon Consensus Taxonomy (ACT) pipeline for classifying long-read amplicons. ACT combines output from three existing pipelines – Emu, Sintax, and LACA – to leverage the strengths of each while offsetting their individual limitations. We also developed the ACT database (ACT-DB), a sequence-similarity-aware reference database that clusters highly similar sequences into multi-taxa groups to reduce overclassification. We benchmarked ACT performance against Emu and Sintax using a defined simple mock community, simulated datasets, and a complex rhizosphere community supplemented with novel species. While ACT exhibited generally comparable or superior performance across datasets, ACT demonstrated a marked advantage over Emu and Sintax in identifying novel and low-abundance taxa in both simple and complex communities, resulting in significantly higher species-richness estimates that better reflected those observed in prior Illumina amplicon studies. Furthermore, by clustering ambiguous reference sequences, ACT-DB allowed ACT to resolve reads to meaningful multi-species groups, improving resolution without coercing artificial precision. Together, ACT and ACT-DB form a robust long-read amplicon profiling workflow that confidently identifies known species while reducing overclassification and preserving low-abundance and unknown taxa.

IMPORTANCE

Microbial communities are frequently characterized by amplicon sequencing of marker genes, such as the bacterial 16S rRNA gene and fungal ITS region. Historically, the standard profiling method has been Illumina sequencing of 200-300 bp amplicons, but improved accuracy of ONT long-read sequencing means it is now possible to sequence amplicons spanning full genes of any size, prompting the need for tools optimized for long amplicons. Here, we describe the ACT bioinformatic pipeline for assigning taxonomy to amplicons of any length. We evaluated ACT performance using full-length 16S amplicon data relative to that of two commonly used pipelines. Additionally, we developed a sequence ambiguity-aware ACT database (ACT-DB) of 16S rRNA sequences to further improve classification accuracy and resolution.

Version published to 10.64898/2026.04.29.721641 on bioRxiv
Apr 30, 2026

16S rRNA sequence captures microbial functional potential

This article has 3 authors:
1. Jia Liu
2. M. Clara De Paolis Kaluza
3. Yana Bromberg
This article has no evaluationsLatest version Apr 18, 2026
Systematic comparisons between long-read and short-read based amplicon sequencing to characterize mixed microalgal communities

This article has 9 authors:
1. Zihan Dai
2. Md Mahbubul Alam
3. Benjamin Gincley
4. Farhan Khan
5. Ga-Yeong Kim
6. Hannah Molitor
7. Jeremy Guest
8. Ian M. Bradley
9. Ameet J. Pinto
This article has no evaluationsLatest version Apr 18, 2026
Benchmarking Short-Read ITS2 and Full-Length ITS Sequencing Reveals Pipeline-Dependent Biases in Indoor Fungal Community Profiling

This article has 14 authors:
1. Mengyi Dong
2. Denene Blackwood
3. Megan E. J. Lott
4. Sherlynette Perez Castro
5. Xavier Larkin
6. Thomas Clerkin
7. Heather Hemric
8. Jake Nash
9. Yeon Ji Kim
10. Jason Arnold
11. Lawrence A. David
12. Rytas Vilgalys
13. Anthony A. Fodor
14. Rachel T Noble
This article has no evaluationsLatest version May 15, 2026

Discuss this preprint

Listed in

Abstract

IMPORTANCE

Article activity feed

Related articles

16S rRNA sequence captures microbial functional potential

Systematic comparisons between long-read and short-read based amplicon sequencing to characterize mixed microalgal communities

Benchmarking Short-Read ITS2 and Full-Length ITS Sequencing Reveals Pipeline-Dependent Biases in Indoor Fungal Community Profiling