Can large language models reliably extract human disease genes from full-text scientific literature?

Danqing Yin
Matthew Ka Siu Leung
Darren Wan Ho Pun
Fiona Haixin Chen
Julie Yujin Kwon
Xinyi Lin
Joshua W. K. Ho

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Manual extraction of high-fidelity gene-disease-phenotype information from human genetics literature is a labor-intensive task that requires trained human genetics researchers to read through many primary research papers. This presents a major challenge for maintaining up-to-date human disease genetic databases. Recent exploration into large language models (LLMs) opens new directions in automating this manual process. However, most approaches depend on pre-training, finetuning, or specialized generative artificial intelligence (GenAI) tools, but there is a lack of empirical evidence to show whether commercially-available LLMs can be directly used to reliably extract gene-disease-phenotype for human genetic diseases. Herein, we perform a benchmark of the use of three zero-shot prompted LLMs, namely GPT-4, DeepSeek and Claude, without task-specific fine-tuning, to extract human genetic information directly from full text of scientific papers. Using known congenital heart diseases (CHD) genes found in the open access CHDgene database ( https://chdgene.victorchang.edu.au/ ) as the benchmark data set, GPT-4o achieved overall 88.8% extraction accuracy across 23 gene entries containing over 57 references, with 100% accuracy in gene name, 78.3% and 76.7% in disease and phenotype fields respectively. This work introduces a lightweight, easy-to-deploy, and yet robust LLM-based agent named GeneAgent, analyze sources of disagreement, and highlight the feasibility of integrating powerful LLM into genetic evidence synthesis workflows.

Highlight

First systematic benchmark of LLMs for extracting human gene–disease–phenotype relationships from full-text biomedical articles

GeneAgent: a lightweight, highly accurate prompt-only LLM agent

New domain task-specific evaluation framework

Version published to 10.1101/2025.07.27.667022 on bioRxiv
Jul 31, 2025

aiDIVA – Diagnostics of Rare Genetic Diseases Using Large Language Models

This article has 7 authors:
1. Dominic Boceck
2. Lucia Laugwitz
3. Marc Sturm
4. Daniela Bezdan
5. Axel Gschwind
6. Tobias B. Haack
7. Stephan Ossowski
This article has no evaluationsLatest version Sep 7, 2025
BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

This article has 6 authors:
1. Baqer M. Merzah
2. Tania Taami
3. Salman Asoudeh
4. Amir reza Hossein pour
5. Saeed Mirzaee
6. Amir Ali Bengari
This article has no evaluationsLatest version Jul 21, 2025
What Large Language Models Know About Plant Molecular Biology

This article has 8 authors:
1. Manuel Fernandez Burda
2. Lucia Ferrero
3. Nicolás Gaggion
4. Camille Fonouni-Farde
5. The MoBiPlant Consortium
6. Martín Crespi
7. Federico Ariel
8. Enzo Ferrante
This article has no evaluationsLatest version Sep 4, 2025

Listed in

Abstract

Highlight

Article activity feed

Related articles

aiDIVA – Diagnostics of Rare Genetic Diseases Using Large Language Models

BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

What Large Language Models Know About Plant Molecular Biology