Comparison of Large Language Model and Manual Review for Clinical Data Curation in Breast Cancer

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose : Large language models (LLMs) offer the potential to automate the extraction of clinical data; however, few studies have been published on their feasibility in the field of oncology. To evaluate the feasibility and accuracy of LLM-based processing compared with manual physician review for the extraction of clinical data from breast cancer records. Results: The groups yielded comparable results for most clinical parameters. The LLM group yielded better documentation of lymph node assessment (91.2% vs. 78.5%) but had a larger proportion of missing data for cancer staging (12.2% vs. 3.1%). Breast-conserving surgery rates were similar (63.5% vs. 63.9%). The LLM achieved 90.8% accuracy in validation analysis while requiring significantly less processing time (12 days vs. 7 months) and fewer physicians (two vs. five). The LLM group’s stage distribution aligned better with the national registry data than the manual-review group(Cramér’sV = 0.03 vs. 0.07), and it captured more survival events (41 vs. 11; P = 0.002). Discussion : LLM-based processing demonstrated comparable effectiveness to manual review by physicians, while significantly reducing processing time and resource utilization. Despite limitations in integrated assessments, this approach shows potential for efficient clinical data curation in oncology research. Methods : This retrospective study analyzed breast cancer records from five academic hospitals (2019). Two independent cohorts were compared: manual physician review (n=1,366) and LLM-based processing using Claude 3.5 Sonnet (n=1,734) groups. Primary outcomes included missing value rates, accuracy of data extraction, and inter-cohort concordance. Secondary outcomes included comparison with national registry data, processing time, and resource utilization.

Article activity feed