Automated ACR TI-RADS Classification of Thyroid Nodules from Narrative Ultrasound Reports Using a Fine-Tuned Open-Source Language Model: A Reproducible and Low-Resource Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Manual ACR TI-RADS classification from narrative ultrasound reports is a key component of thyroid nodule risk stratification but is laborious and subject to inter-observer variability. While Large Language Models (LLMs) offer potential solutions, existing approaches often rely on proprietary models or require extensive computational resources, limiting widespread adoption. This study aimed to develop and validate a reproducible, low-resource framework using a fine-tuned open-source LLM to automate this task. Methods: This retrospective study utilized a dataset of 1,850 de-identified thyroid ultrasound reports from a primary single center. The reports were annotated by radiologists to establish a ground truth. An open-source 7-billion parameter model (Qwen1.5-7B) was fine-tuned on a training set (n=1,480) using Low-Rank Adaptation (LoRA) on a single consumer-grade GPU. The model's performance was evaluated on a hold-out internal test set (n=370) and a separate external validation set (n=210) from another institution. Results: On the internal test set, the fine-tuned model achieved an overall accuracy of 93.0% and a macro-averaged F1-score of 0.950. On the external validation set, it maintained robust performance with an accuracy of 88.6% and a macro F1-score of 0.891, demonstrating strong generalizability. It significantly outperformed both a zero-shot LLM baseline and a traditional machine learning model (TF-IDF with SVM) on both datasets. Conclusions: Fine-tuning an accessible, open-source language model on local, consumer-grade hardware is an effective and resource-efficient strategy for automating ACR TI-RADS classification from narrative reports. This approach offers a practical and generalizable blueprint for healthcare institutions to develop bespoke AI tools, potentially enhancing workflow efficiency and diagnostic consistency while preserving data privacy.

Article activity feed