Comparison of Large Language Model and Manual Review for Clinical Data Curation in Breast Cancer

Young-Joon Kang
Hocheol Lee
Jae Pak Yi
Hyobin Kim
Chang Ik Yoon
Jong Min Baek
Yong-seok Kim
Ye Won Jeon
Jiyoung Rhu
Su Hyun Lim
Hoon Choi
Se Jeong Oh

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose : Large language models (LLMs) offer the potential to automate the extraction of clinical data; however, few studies have been published on their feasibility in the field of oncology. To evaluate the feasibility and accuracy of LLM-based processing compared with manual physician review for the extraction of clinical data from breast cancer records. Results: The groups yielded comparable results for most clinical parameters. The LLM group yielded better documentation of lymph node assessment (91.2% vs. 78.5%) but had a larger proportion of missing data for cancer staging (12.2% vs. 3.1%). Breast-conserving surgery rates were similar (63.5% vs. 63.9%). The LLM achieved 90.8% accuracy in validation analysis while requiring significantly less processing time (12 days vs. 7 months) and fewer physicians (two vs. five). The LLM group’s stage distribution aligned better with the national registry data than the manual-review group(Cramér’sV = 0.03 vs. 0.07), and it captured more survival events (41 vs. 11; P = 0.002). Discussion : LLM-based processing demonstrated comparable effectiveness to manual review by physicians, while significantly reducing processing time and resource utilization. Despite limitations in integrated assessments, this approach shows potential for efficient clinical data curation in oncology research. Methods : This retrospective study analyzed breast cancer records from five academic hospitals (2019). Two independent cohorts were compared: manual physician review (n=1,366) and LLM-based processing using Claude 3.5 Sonnet (n=1,734) groups. Primary outcomes included missing value rates, accuracy of data extraction, and inter-cohort concordance. Secondary outcomes included comparison with national registry data, processing time, and resource utilization.

Version published to 10.21203/rs.3.rs-7089616/v1 on Research Square
Sep 1, 2025

Performance assessment of large language models in cancer staging: Comparative analysis of Mistral models

This article has 10 authors:
1. Roman Rouzier
2. Valentin Harter
3. Ethan Rouzier
4. Victor Ferment
5. Simon Gruau
6. Benoit Andre
7. Cécile Saumon-Sud
8. Lawrence Nadin
9. Aurélien Corroyer-Dulmont
10. Nicolas Vigneron
This article has no evaluationsLatest version Sep 12, 2025
An explainable language model predicts survival from medical reports in oncology

This article has 13 authors:
1. Clément Piat
2. Quentin Blampey
3. Alexandre Joutard
4. Mohamed Aymen Qabel
5. Théo Di Piazza
6. Ugo Benassayag
7. Raphael Vienne
8. Raphael Reme
9. Daphné Morel
10. Maxime Choffe
11. Eric Deutsch
12. Jean-Yves Blay
13. Loic Verlingue
This article has no evaluationsLatest version Aug 27, 2025
Comparison of Multimodal Large Language Models and Physicians for Medical Diagnosis Using NEJM Image Challenge Cases: Cross-sectional Study

This article has 6 authors:
1. Chiyu Sheng
2. Shumin Shen
3. Lin Wang
4. Wei Chen
5. Shanghu Wang
6. Nianfei Wang
This article has no evaluationsLatest version Sep 1, 2025

Listed in

Abstract

Article activity feed

Related articles

Performance assessment of large language models in cancer staging: Comparative analysis of Mistral models

An explainable language model predicts survival from medical reports in oncology

Comparison of Multimodal Large Language Models and Physicians for Medical Diagnosis Using NEJM Image Challenge Cases: Cross-sectional Study