Investigating the application of LLMs to invertebrate palaeontology through the development of automated taxonomy assistants for brachiopod identification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Taxonomic identification is a central practice in palaeontology, underpinning biostratigraphic correlations, palaeobiogeographic reconstructions, and analyses of macroevolutionary patterns. Despite its importance, taxonomy depends on a limited number of specialists and on the synthesis of extensive descriptive literature that is often difficult to access. Recent developments in artificial intelligence provide potential tools to support taxonomic work and improve accessibility and efficiency in fossil identification. Most automated approaches have so far relied on deep learning models trained on photographic datasets of fossil specimens. While effective for some microfossil groups, these systems face substantial limitations when applied to macrofossils, which are often incompletely preserved, morphologically complex, and poorly suited to standardized imaging workflows. Because palaeontological taxonomy is fundamentally text-based—relying on diagnoses, descriptions, and comparative remarks published in the literature—Large Language Models (LLMs) offer an alternative framework for automated assistance. Here we explore the application of LLM-augmented taxonomy systems (LATS) to invertebrate fossil identification through the development of a prototype system for brachiopods. The system is trained on genus-level diagnoses extracted from the Treatise on Invertebrate Paleontology, Part H: Brachiopoda (Revised), one of the most comprehensive and authoritative compilations of fossil invertebrate taxonomy. Strategies were implemented to address the brevity of diagnoses, including integration with descriptions of higher-rank taxa and adjustable retrieval knowledge basis. Preliminary testing indicates that the system reliably provides plausible candidate matches and handles complex morphological terminology effectively. LATS thus represent a promising approach for developing automated assistants in macrofossil taxonomy, with potential future integration of expanded textual databases and image-based analyses.

Article activity feed