Benchmarking Large Language Models for Replication of Guideline-Based PGx Recommendations

Mike Zack
Ioan Skobodchikov
Danil Stupichev
Alex Moore
David Sokolov
Igor Trifonov
Allan Gobbs

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We evaluated the ability of large language models (LLMs) to generate clinically accurate pharmacogenomic (PGx) recommendations aligned with CPIC guidelines. Using a benchmark of 599 curated gene–drug–phenotype scenarios, we compared five leading models, including GPT-4o and fine-tuned LLaMA variants, through both standard lexical metrics and a novel semantic evaluation framework (LLM Score) validated by expert review. General-purpose models frequently produced incomplete or unsafe outputs, while our domain-adapted model achieved superior performance, with an LLM Score of 0.92 and significantly faster inference speed. Results highlight the importance of fine-tuning and structured prompting over model scale alone. This work establishes a robust framework for evaluating PGx-specific LLMs and demonstrates the feasibility of safer, AI-driven personalized medicine.

Version published to 10.21203/rs.3.rs-6630450/v1 on Research Square
May 15, 2025

MRQC-LLM: A Novel Large Language Model Framework for Enhancing Medical Record Quality Control

This article has 8 authors:
1. Zhenqi Zhang
2. Xuchen Yang
3. Xun Yao
4. Hao Yang
5. Shutong Zhang
6. Sikai Liu
7. Jing Wang
8. Rui Shi
This article has no evaluationsLatest version Jun 25, 2025
CLEVER: Clinical Large Language Model Evaluationby Expert Review

This article has 4 authors:
1. Veysel Kocaman
2. Mustafa Kaya
3. Andrei Ferrer
4. David Talby
This article has no evaluationsLatest version Jul 23, 2025
Implementation of Large Language Models in Electronic Health Records

This article has 3 authors:
1. Maxime Griot
2. Jean Vanderdonckt
3. Demet Yuksel
This article has no evaluationsLatest version Jul 4, 2025

Listed in

Abstract

Article activity feed

Related articles

MRQC-LLM: A Novel Large Language Model Framework for Enhancing Medical Record Quality Control

CLEVER: Clinical Large Language Model Evaluationby Expert Review

Implementation of Large Language Models in Electronic Health Records