Comparison of Large Language Model and Manual Review for Clinical Data Curation in Breast Cancer
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose : Large language models (LLMs) offer the potential to automate the extraction of clinical data; however, few studies have been published on their feasibility in the field of oncology. To evaluate the feasibility and accuracy of LLM-based processing compared with manual physician review for the extraction of clinical data from breast cancer records. Results: The groups yielded comparable results for most clinical parameters. The LLM group yielded better documentation of lymph node assessment (91.2% vs. 78.5%) but had a larger proportion of missing data for cancer staging (12.2% vs. 3.1%). Breast-conserving surgery rates were similar (63.5% vs. 63.9%). The LLM achieved 90.8% accuracy in validation analysis while requiring significantly less processing time (12 days vs. 7 months) and fewer physicians (two vs. five). The LLM group’s stage distribution aligned better with the national registry data than the manual-review group(Cramér’sV = 0.03 vs. 0.07), and it captured more survival events (41 vs. 11; P = 0.002). Discussion : LLM-based processing demonstrated comparable effectiveness to manual review by physicians, while significantly reducing processing time and resource utilization. Despite limitations in integrated assessments, this approach shows potential for efficient clinical data curation in oncology research. Methods : This retrospective study analyzed breast cancer records from five academic hospitals (2019). Two independent cohorts were compared: manual physician review (n=1,366) and LLM-based processing using Claude 3.5 Sonnet (n=1,734) groups. Primary outcomes included missing value rates, accuracy of data extraction, and inter-cohort concordance. Secondary outcomes included comparison with national registry data, processing time, and resource utilization.