Investigating the application of LLMs to invertebrate palaeontology through the development of automated taxonomy assistants for brachiopod identification

Alessandro Carniti
Michael Stephenson
Jiaxi Yang
Shuzhong Shen
Junxuan Fan
Jieping Ye

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Taxonomic identification is a central practice in palaeontology, underpinning biostratigraphic correlations, palaeobiogeographic reconstructions, and analyses of macroevolutionary patterns. Despite its importance, taxonomy depends on a limited number of specialists and on the synthesis of extensive descriptive literature that is often difficult to access. Recent developments in artificial intelligence provide potential tools to support taxonomic work and improve accessibility and efficiency in fossil identification. Most automated approaches have so far relied on deep learning models trained on photographic datasets of fossil specimens. While effective for some microfossil groups, these systems face substantial limitations when applied to macrofossils, which are often incompletely preserved, morphologically complex, and poorly suited to standardized imaging workflows. Because palaeontological taxonomy is fundamentally text-based—relying on diagnoses, descriptions, and comparative remarks published in the literature—Large Language Models (LLMs) offer an alternative framework for automated assistance. Here we explore the application of LLM-augmented taxonomy systems (LATS) to invertebrate fossil identification through the development of a prototype system for brachiopods. The system is trained on genus-level diagnoses extracted from the Treatise on Invertebrate Paleontology, Part H: Brachiopoda (Revised), one of the most comprehensive and authoritative compilations of fossil invertebrate taxonomy. Strategies were implemented to address the brevity of diagnoses, including integration with descriptions of higher-rank taxa and adjustable retrieval knowledge basis. Preliminary testing indicates that the system reliably provides plausible candidate matches and handles complex morphological terminology effectively. LATS thus represent a promising approach for developing automated assistants in macrofossil taxonomy, with potential future integration of expanded textual databases and image-based analyses.

Version published to 10.31223/x5d48j
Mar 19, 2026

LLM-augmented taxonomy for >4500 palaeopalynology genera

This article has 7 authors:
1. Michael Stephenson
2. Jiaxi Yang
3. Alessandro Carniti
4. Shuzhong Shen
5. Junxuan Fan
6. Jan Hennissen
7. Jieping Ye
This article has no evaluationsLatest version Feb 7, 2026
Accessing the challenges of descriptive morphology in Barychelidae and Theraphosidae through a morphometric approach (Araneae, Mygalomorphae)

This article has 4 authors:
1. Hector M. O. GONZALEZ-FILHO
2. Maria T. COLPANI-SARTORI
3. Arthur GALLETI-LIMA
4. José Paulo L. GUADANUCCI
This article has no evaluationsLatest version Jan 30, 2026
A middle Cambrian macroscopic tardigrade ancestor

This article has 2 authors:
1. Marc Mapalo
2. Javier Ortega-Hernández
This article has no evaluationsLatest version Feb 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

LLM-augmented taxonomy for >4500 palaeopalynology genera

Accessing the challenges of descriptive morphology in Barychelidae and Theraphosidae through a morphometric approach (Araneae, Mygalomorphae)

A middle Cambrian macroscopic tardigrade ancestor