Naïve Bayes Classifier as an Out-of-Distribution Detector of Novel Taxa

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Detecting sequences from novel taxa remains a key challenge in metagenomic classification, as reference databases rarely capture the full extent of microbial diversity. We investigate the Naïve Bayes Classifier (NBC++) as an out-of-distribution (OOD) detector by analyzing its log-likelihood scores across simulated and real metagenomic datasets. By partitioning reference databases and introducing taxonomic novelty, we derive thresholds that distinguish known from unknown reads at multiple taxonomic levels. These thresholds remain consistent across database sizes, indicating that once a lineage is represented, novelty detection performance stabilizes. Applied to a human gut metagenome, the thresholds reflect differences in database density and classification confidence. This work characterizes how NBC++ responds to novelty and illustrates its use in evaluating unclassified metagenomic reads.

Article activity feed