Comparative Assessment of Large Language Models for Microbial Phenotype Annotation

Philipp C. Münch
Nasim Safaei
René Mreches
Martin Binder
Yichen Han
Gary Robertson
Eric A. Franzosa
Curtis Huttenhower
Alice C. McHardy

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly used to extract knowledge from text, yet their coverage and reliability in biology remain unclear. Microbial phenotypes are especially important to assess, as comprehensive data remain sparse except for well-studied organisms and they underpin our understanding of microbial characteristics, functional roles, and applications. Here, we systematically assessed the biological knowledge encoded in publicly available LLMs for structured phenotype annotation of microbial species. We evaluated the performance of over 50 LLMs, including state-of-the-art models such as Claude Sonnet 4 and the GPT-5 family of models. Across phenotypes, LLMs reached accurate assignments for many species, but performance varied widely by model and trait, and no single model dominated. Model self-reported confidence is informative, with higher confidence aligning with higher accuracy, and can be used to prioritize phenotype assignment, effectively distinguishing between high-and low-confidence inferences. Overall, our study outlines the utility and limitations of text-based LLMs for phenotype characterization in microbiology.

Version published to 10.1101/2025.11.24.690272 on bioRxiv
Nov 27, 2025

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026
LLMAgent4Bio: LLM Agents for Biological Intelligence Across Genomics, Proteomics, Spatial Biology, and Biomedicine

This article has 9 authors:
1. Sajib Acharjee Dip
2. Dipanwita Mallick
3. Uddip Acharjee Shuvo
4. Shovito Barua Soummo
5. Fazle Rafsani
6. Bikash Kumar Paul
7. Nazifa Ahmed Moumi
8. Shafayat Ahmed
9. Liqing Zhang
This article has no evaluationsLatest version Dec 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

LLMAgent4Bio: LLM Agents for Biological Intelligence Across Genomics, Proteomics, Spatial Biology, and Biomedicine