Fine-Grained Taxonomy with Vision Models: A Benchmark on Long-Tailed and Domain-Adaptive Classification

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ground beetles are a highly sensitive and speciose biological indicator, critical for biodiversity monitoring, yet their taxonomic classification remains underutilized due to the manual effort required for species differentiation based on subtle morphological variations. In this paper, we present a benchmark for fine-grained taxonomic classification, evaluating 12 vision models, across four diverse, long-tailed datasets spanning over 230 genera and 1769 species. These datasets include both controlled laboratory images and challenging field-collected (in-situ) photographs. We investigate two key real-world challenges: sample efficiency and domain adaptation. Our results show that 1) a Vision and Language Transformer with an MLP head achieves best performance, with 97% genus-level and 94% species-level accuracy; 2) efficient subsampling allows train data to be cut in half with minimal performance degradation; 3) model performance significantly drops in domain shift from lab to in-situ settings, highlighting a critical domain gap. Overall, our study lays a foundation for scalable, fine-grained taxonomic classification of beetles and supports broader applications in sample-efficient and domain-adaptive learning for ecological computer vision.

Article activity feed