Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Case Study
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Large Language Models (LLMs) have been successfully used to extract structured data from free-text radiology reports. Most of current studies were conducted with private models accessed via Application Programming Interface (API). We aimed to evaluate the feasibility of using open-source LLMs, deployed on limited local hardware resources for extraction of structured information from free-text mammography reports, according to a Common Data Elements (CDE)-based framework.
Methods
Seventy-nine CDEs were defined by an interdisciplinary expert panel, reflecting real-world reporting practice. Sixty-one reports were classified by two independent researchers with 1533 classifications assigned to establish ground truth. Five different open-source LLMs deployable on a single GPU were used for data extraction using the general-classifier Python package. Extractions were performed for two different prompt approaches with classification metrics calculated overall and on subgroups. Additional analyses were conducted using thresholds for the relative probability of classifications.
Results
High inter-rater agreement was observed between manual classifiers (Cohen’s Kappa 0.83). Using default prompts, the LLMs achieved accuracies of 59.23–72.86%. Adapting prompts to better explain classification tasks improved performance for all models, with accuracies of 64.71–85.32%. Setting certainty thresholds further improved accuracies to >90% but reduced the coverage rate to <50%.
Conclusion
Locally deployed open-source LLMs can effectively extract information from mammography reports with good accuracy, addressing data privacy concerns while maintaining compatibility with limited computational resources. Prompt engineering substantially increases performance, highlighting the importance of optimization in clinical applications. Using a CDE-based framework provides clear semantics and structure, facilitating interoperability and consistent data extraction.