Automating Candidate Gene Prioritization with Large Language Models: From Naive Scoring to Literature-Grounded Validation

Taushif Khan
Mohammed Toufiq
Marina Yurieva
Nitaya Indrawattana
Akanitt Jittmittraphap
Nathamon Kosoltanapiwat
Pornpan Pumirat
Passanesh Sukphopetch
Muthita Vanaporn
Karolina Palucka
Basirudeen Kabeer
Darawan Rinchai
Damien Chaussabel

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Identifying promising therapeutic targets from thousands of genes in transcriptomic studies remains a major bottleneck in biomedical research. While large language models (LLMs) show potential for gene prioritization, they suffer from hallucination and lack systematic validation against expert knowledge.

Methods

We developed a two-stage computational framework that combines LLM-based screening with literature validation for systematic gene prioritization. Starting with 10,824 genes from the BloodGen3 repertoire, we applied multi-criteria evaluation for sepsis relevance, followed by retrieval-augmented generation (RAG) using 6,346 curated sepsis publications. A novel faithfulness evaluation system verified that LLM predictions aligned with retrieved literature evidence.

Results

The framework identified 609 sepsis-relevant genes with >94% filtering efficiency, demonstrating strong enrichment for inflammatory pathways including TNF-α signaling, complement activation, and interferon responses. Literature validation yielded 30 ultra-high confidence therapeutic candidates, including both established sepsis genes (IL10, TREM1, S100A9, NLRP3) and novel targets warranting investigation. Benchmark validation against expert-curated databases achieved 71.2% recall, with systematic correlation between computational confidence and evidence quality. The final candidate set balanced discovery (11 novel genes) with validation (19 known genes), maintaining biological coherence throughout the filtering process.

Conclusions

This framework demonstrates that rigorous methodology can transform unreliable LLM outputs into systematically validated biological insights. By combining computational efficiency with literature grounding, the approach provides a practical tool for prioritizing experimental validation efforts. The modular design enables adaptation to other diseases through knowledge base substitution, offering a systematic approach to literature-guided biomarker discovery.

Availability

Source code and implementation details are available at https://github.com/taushifkhan/llm-geneprioritization-framework , vector database at https://doi.org/10.5281/zenodo.15802241 and Interactive demonstration at https://llm-geneprioritization.streamlit.app/

Version published to 10.1101/2025.09.17.676837 on bioRxiv
Sep 20, 2025

Benchmarking large language models for cell-free RNA diagnostic biomarker discovery

This article has 6 authors:
1. Hunter A. Gaudio
2. Andrew Bliss
3. Conor J. Loy
4. Daniel Eweis-LaBolle
5. Anne E. Gardella
6. Iwijn De Vlaminck
This article has no evaluationsLatest version Aug 24, 2025
What Large Language Models Know About Plant Molecular Biology

This article has 8 authors:
1. Manuel Fernandez Burda
2. Lucia Ferrero
3. Nicolás Gaggion
4. Camille Fonouni-Farde
5. The MoBiPlant Consortium
6. Martín Crespi
7. Federico Ariel
8. Enzo Ferrante
This article has no evaluationsLatest version Sep 4, 2025
Benchmarking generative AI tools for literature retrieval and summarization in genomic variant interpretation

This article has 9 authors:
1. Andrea Gazzo
2. Silvia Berardelli
3. Matteo Biancospino
4. Lorenzo Cuollo
5. Flavia Dei Zotti
6. Emanuela Ferraro
7. Antonio Marra
8. Enrico Tartarotti
9. Paolo Magni
This article has no evaluationsLatest version Oct 1, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Availability

Article activity feed

Related articles

Benchmarking large language models for cell-free RNA diagnostic biomarker discovery

What Large Language Models Know About Plant Molecular Biology

Benchmarking generative AI tools for literature retrieval and summarization in genomic variant interpretation