Can Large Language Models Reliably Interpret Radiology Reports? A Systematic Evaluation for Tumor Progression Classification

Valentin POHYER
Constance de Margerie-Mellon
Laetitia PERRONNE
Loïc DURON
Constance THIBAULT
Stéphane Oudard
Laure FOURNIER
Bastien Rance

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Radiology reports, typically recorded as unstructured free text or with varying levels of structuration, contain critical information on tumor evolution but remain difficult to mine for care optimization or research without advanced language processing. We evaluated 15 open-source Large Language Models (LLMs) for classifying tumor evolution from French imaging reports, using a gold-standard corpus of 310 cases. We tested models with varied architecture, hyperparameter configuration and prompting strategy, and compared them with rule-based and BERT-based baselines. We systematically assessed development time and carbon emissions. Properly selected and configured, LLMs outperformed state-of-the-art baselines without requiring large manually annotated datasets, but used substantial computational resources. In contrast, fine-tuned BERT models, trained on high-quality annotations, achieved only slightly lower performance at reduced hardware and computational costs. Our results highlight a trade-off between human annotation effort and computational infrastructure, offering insight for transforming unstructured clinical reports into structured, actionable data.

Version published to 10.21203/rs.3.rs-7630430/v1 on Research Square
Sep 23, 2025

Automated ACR TI-RADS Classification of Thyroid Nodules from Narrative Ultrasound Reports Using a Fine-Tuned Open-Source Language Model: A Reproducible and Low-Resource Framework

This article has 8 authors:
1. Miao Yu
2. Sijia Huang
3. Muyang Li
4. Likuan Zhang
5. Heng Zhang
6. Qiao Xu
7. Zikang Wang
8. Jian Gao
This article has no evaluationsLatest version Nov 4, 2025
Large language models in radiologic numerical tasks: A thorough evaluation and error analysis

This article has 6 authors:
1. Ali Nowroozi
2. Masha Bondarenko
3. Adrian Serapio
4. Tician Schnitzler
5. Sukhmanjit S Brar
6. Jae Ho Sohn
This article has no evaluationsLatest version Oct 21, 2025
Automated Prediction of Radiological Protocols Using Retrieval Augmented Generation

This article has 10 authors:
1. Conrad T. Testagrose
2. Panagiotis Korfiatis
3. Timothy L. Kline
4. Justin D. Benfield
5. Cole J. Cook
6. Peggy S. Merkel
7. Mutlu Demirer
8. Richard D. White
9. Candice W. Bolan
10. Barbaros S. Erdal
This article has no evaluationsLatest version Sep 17, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Automated ACR TI-RADS Classification of Thyroid Nodules from Narrative Ultrasound Reports Using a Fine-Tuned Open-Source Language Model: A Reproducible and Low-Resource Framework

Large language models in radiologic numerical tasks: A thorough evaluation and error analysis

Automated Prediction of Radiological Protocols Using Retrieval Augmented Generation