Automated large language model screening of unstructured radiology reports for timely osteoporosis intervention.
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Our objective was to evaluate the feasibility of a large language model (LLM)–based approach for classifying vertebral osteoporotic compression fractures (VOCFs) from unstructured lumbar spine radiology reports for integration into osteoporosis clinical workflows. Two open-source, on-premise LLM tools were assessed using a few-shot learning strategy to classify reports into five fracture subcategories and to identify “recent fractures for notification.” Performance was evaluated on two real-world datasets authored by 100 unique radiologists and benchmarked against a radiologist-defined reference standard. Both LLMs demonstrated satisfactory performance, achieving sensitivities above 0.70 and specificities above 0.80 across datasets, with one model showing higher sensitivity consistent with a screening role. Inter-run agreement was high (κ > 0.81), and performance was consistent across datasets from different timepoints (Model 1: χ² = 2.46, p = 0.12; Model 2: χ² = 1.50, p = 0.22). Common errors were related to temporal context and complex syntax. These findings demonstrate the feasibility of using open-source LLMs, without fine-tuning, to identify and flag recent osteoporotic fractures from unstructured radiology reports, supporting early referral for expedited osteoporosis screening and management.