Leveraging Language Models for Automated Label Generation in Traumatic Brain Injury Radiology Reports
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Timely interpretation of head CT scans is critical for managing traumatic brain injury (TBI), yet delays in radiology reporting can slow urgent clinical decisions. To address this challenge, we developed natural language processing (NLP) frameworks that automatically convert free-text radiology reports into structured, machine-readable findings. Using 4,038 de-identified head CT reports, including 444 expert-annotated samples, we compared several strategies for improving clinical finding and location model accuracy. A lexicon-weighted domain-adaptive pretraining approach, designed to emphasize key diseased-related terms, achieved the best overall performance, reaching a weighted F1-score of 0.92 across five-fold cross-validation. A location-aware cascade model further improved recognition of anatomical sites, enhancing transparency and clinical relevance. Semi-supervised learning using unlabeled reports produced moderate gains over standard supervised models. These methods demonstrate that domain-specific adaptation and structured modeling can reliably extract critical findings from radiology text, enabling faster, more consistent interpretation of radiology reports. By enabling automated report generation from imaging and multimodal data, this framework may help shorten reporting turnaround times and facilitate more data-driven neurotrauma care.