Benchmarking Large Language Models for Pathogen–Disease Classification in Post-Acute Infection Syndromes

Syed Mohammed Khalid
Tom Wölker
Leidy-Alejandra G. Molano
Simon Graf
Andreas Keller

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Post-Acute Infection Syndromes (PAIS) are medical conditions that persist following acute infections from pathogens such as SARS-CoV-2, Epstein-Barr virus, and Influenza virus. Despite growing global awareness of PAIS and the exponential increase in biomedical literature, only a small fraction of this literature pertains specifically to PAIS, making the identification of pathogen–disease associations within such a vast, heterogeneous, and unstructured corpus a significant challenge for researchers. This study evaluated the effectiveness of large language models (LLMs) in extracting these associations through a binary classification task using a curated dataset of 1,000 manually labeled PubMed abstracts. We benchmarked a wide range of open-source LLMs of varying sizes (4B–70B parameters), including generalist, reasoning, and biomedical-specific models. We also investigated the extent to which prompting strategies such as zero-shot, few-shot, and Chain of Thought (CoT) methods can improve classification performance. Our results indicate that model performance varied by size, architecture, and prompting strategy. Zero-shot prompting produced the most reliable results, with Mistral-Small-Instruct-2409 and Nemotron-70B achieving strong balanced accuracy scores of 0.81 and macro F1 scores of up to 0.80, while maintaining minimal invalid outputs. While few-shot and CoT prompting often degraded performance in generalist models, reasoning models such as DeepSeek-R1-Distill-Llama-70B and QwQ-32B demonstrated improved accuracy and consistency when provided with additional context.

Version published to 10.1101/2025.06.30.662395v1 on bioRxiv
Jul 2, 2025

Integrating Multimodal Data for a Comprehensive Knowledge Graph to Advance Infectious Disease Research

This article has 8 authors:
1. Hengyu Fan
2. Liwei Guo
3. Fang Li
4. Zhen Yuan
5. Yuxi Deng
6. Yingjing Xiao
7. Honglin Li
8. Shiliang Li
This article has no evaluationsLatest version Jun 5, 2025
DeepSeek as the paradigm shift in rare disease diagnosis – the power of a fully automated genetic variant classification system

This article has 9 authors:
1. Wei Ma
2. Grace Fong
3. Joe Lai
4. Heidi Wu
5. Shirley Pik Ying Hue
6. Jonson Ying
7. The Hong Kong Genome Project
8. Annie Tsz Wai Chu
9. Brian Hon Yin Chung
This article has no evaluationsLatest version Jun 4, 2025
Evaluating Large Language Models for Gene-to-Phenotype Mapping: The Critical Role of Full-Text Database Access

This article has 8 authors:
1. Nicolas Matthew Suhardi
2. Anastasia Oktarina
3. Julia Retzky
4. Damanpreet Dhillon
5. Dona Ninan
6. Mathias P.G. Bostrom
7. Xu Yang
8. Vincentius Jeremy Suhardi
This article has no evaluationsLatest version Jun 12, 2025

Listed in

Abstract

Article activity feed

Related articles

Integrating Multimodal Data for a Comprehensive Knowledge Graph to Advance Infectious Disease Research

DeepSeek as the paradigm shift in rare disease diagnosis – the power of a fully automated genetic variant classification system

Evaluating Large Language Models for Gene-to-Phenotype Mapping: The Critical Role of Full-Text Database Access