Benchmarking the impact of reference genome selection on taxonomic profiling accuracy

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Over the past decades, genome databases have expanded exponentially, often incorporating highly similar genomes at the same taxonomic level. This redundancy can hinder taxonomic classification, leading to difficulties distinguishing between closely related sequences and increasing computational demands. While some novel taxonomic classification tools address this redundancy by filtering the input genome database, there is limited work exploring the impact of different sequence dereplication methods across taxonomic classification tools.

Results

We assess the effect of genome selection approaches on taxonomic classification using both bacterial and viral datasets. Our results demonstrate that careful selection of reference genomes can improve classification accuracy. We also show that using prior knowledge about a metagenomic sample, such as sampling location, can significantly improve classification accuracy. Finally, we find that using a redundancy-filtered genome database generally reduces the computational resources required, with minimal loss in classification accuracy.

Article activity feed