Leveraging Large Language Models for Accurate Retrieval of Patient Information From Medical Reports: Systematic Evaluation Study

Angel Manuel Garcia-Carmona
Maria-Lorena Prieto
Enrique Puertas
Juan-Jose Beunza

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The digital transformation of health care has introduced both opportunities and challenges, particularly in managing and analyzing the vast amounts of unstructured medical data generated daily. There is a need to explore the feasibility of generative solutions in extracting data from medical reports, categorized by specific criteria.

Objective

This study aimed to investigate the application of large language models (LLMs) for the automated extraction of structured information from unstructured medical reports, using the LangChain framework in Python.

Methods

Through a systematic evaluation of leading LLMs—GPT-4o, Llama 3, Llama 3.1, Gemma 2, Qwen 2, and Qwen 2.5—using zero-shot prompting techniques and embedding results into a vector database, this study assessed the performance of LLMs in extracting patient demographics, diagnostic details, and pharmacological data.

Results

Evaluation metrics, including accuracy, precision, recall, and F1-score, revealed high efficacy across most categories, with GPT-4o achieving the highest overall performance (91.4% accuracy).

Conclusions

The findings highlight notable differences in precision and recall between models, particularly in extracting names and age-related information. There were challenges in processing unstructured medical text, including variability in model performance across data types. Our findings demonstrate the feasibility of integrating LLMs into health care workflows; LLMs offer substantial improvements in data accessibility and support clinical decision-making processes. In addition, the paper describes the role of retrieval-augmented generation techniques in enhancing information retrieval accuracy, addressing issues such as hallucinations and outdated data in LLM outputs. Future work should explore the need for optimization through larger and more diverse training datasets, advanced prompting strategies, and the integration of domain-specific knowledge to improve model generalizability and precision.

Version published to 10.2196/68776
Jul 3, 2025
Version published to 10.2196/preprints.68776
Nov 14, 2024
Version published to 10.20944/preprints202407.0986.v1
Jul 12, 2024

MRQC-LLM: A Novel Large Language Model Framework for Enhancing Medical Record Quality Control

This article has 8 authors:
1. Zhenqi Zhang
2. Xuchen Yang
3. Xun Yao
4. Hao Yang
5. Shutong Zhang
6. Sikai Liu
7. Jing Wang
8. Rui Shi
This article has no evaluationsLatest version Jun 25, 2025
Large Language Models in Portuguese for Healthcare: A Systematic Review

This article has 7 authors:
1. Andre Massahiro Shimaoka
2. Antonio Carlos da Silva Junior
3. José Marcio Duarte
4. Thiago Bulhões da Silva Costa
5. Ivan Torres Pisa
6. Luciano Rodrigo Lopes
7. Paulo Bandiera-Paiva
This article has no evaluationsLatest version May 22, 2025
Verifiable Summarization of Electronic Health Records Using Large Language Models to Support Chart Review

This article has 14 authors:
1. Ritchie Verma
2. Emily Alsentzer
3. Zachary Strasser
4. Leslie Chang
5. Kirollos Roman
6. Esteban Gershanik
7. Camellia Hernandez
8. Miguel Linares
9. Jorge Rodriguez
10. Durga Thakral
11. Ozan Unlu
12. Jacqueline You
13. Li Zhou
14. David Bates
This article has no evaluationsLatest version Jun 3, 2025

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

MRQC-LLM: A Novel Large Language Model Framework for Enhancing Medical Record Quality Control

Large Language Models in Portuguese for Healthcare: A Systematic Review

Verifiable Summarization of Electronic Health Records Using Large Language Models to Support Chart Review