Discovering the unseen: a performance comparison of taxonomic classification methods for unknown DNA barcodes

Johanna Orsholm
Alessandro Zito
Panu Somervuo
Jesse P Harrison
Markus Koskela
Otso Ovaskainen
Mariana P Braga
Nicolas Chazot
Tomas Roslin
Brendan Furneaux

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

DNA barcoding and metabarcoding have emerged as cost-efficient, standardized methods for characterizing local biodiversity. Based on the sequencing of a small targeted gene fragment, it is theoretically possible to identify a wide diversity of taxa by comparing them with reference sequence databases. However, a key challenge for accurate taxonomic classification is the incompleteness of such databases, leading to most query sequences lacking species-level matches.

Where species-level matches are missing, it may be possible to classify query sequences to a higher taxonomic level, such as genus or family, based on the similarity of related reference taxa. The challenge then lies in confidently recognizing whether the sequence belongs to an unobserved (here, “novel”) taxon on a given taxonomic level.

In this study, we evaluated the performance and utility of several methods for taxonomic classification. Methods were assessed based on the classification accuracy of both observed and novel taxa, training time, space requirements, and run time. We did this for two cases: the COI barcode for arthropods, and the ITS barcode for fungi, with the latter representing an instance with substantially greater sequence similarity variation within classes. To test classification of novel taxa, we used well-curated datasets with partially distinct taxonomic distribution between the training and test set. Novel taxa occurred at all evaluated taxonomic levels, such as novel species in observed genera and novel genera in observed families. We further assessed the effect on performance when shifting from full-length barcodes to shorter sequences as generated through metabarcoding in the testing dataset.

This study sheds light on the strengths and limitations of different classification algorithms across varied ecological contexts and provides guidance for researchers in selecting suitable algorithms for DNA barcoding and metabarcoding applications. In particular, it demonstrates the supreme performance of phylogenetic placement methods such as EPA-ng for classification of arthropod COI barcodes, and composition-based classifiers such as SINTAX, RDP-NBC, and IDTAXA for fungal ITS.

Version published to 10.1101/2025.10.13.681976 on bioRxiv
Oct 14, 2025

DNA barcodes analyses provide insights into species delineation and possible cryptic species in Amentotaxus (Taxaceae)

This article has 8 authors:
1. DAN LI
2. Zhi-Qiong Mo
3. Jie Wang
4. Chao-Nan Fu
5. Michael Möller
6. Philip Thomas
7. Jian-Bin Yan
8. Lian-Ming Gao
This article has no evaluationsLatest version Feb 2, 2026
META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Systematic barcoding of 12 major medicinal herbs in the Apiaceae family

This article has 8 authors:
1. Ju-Young Ahn
2. Jee Young Park
3. Hyun-Seung Park
4. Ji-Hyeok Lee
5. Seung Kyu Kim
6. Yeon Jeong Kim
7. Yun Sun Lee
8. Tae-Jin Yang
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DNA barcodes analyses provide insights into species delineation and possible cryptic species in Amentotaxus (Taxaceae)

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Systematic barcoding of 12 major medicinal herbs in the Apiaceae family