Development of a RAG-based Expert LLM for Clinical Support in Radiation Oncology

Tingjun Liu
Xucheng Wang
Matthew Inkman
Julian C. Hong
Michael R. Waters
Jin Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The ability of pre-trained large language models (LLMs) to rapidly master novel natural language processing tasks holds transformative potential. However, pre-trained LLMs often struggle to achieve high performance in specialized domains such as oncology and have the tendency to deliver incorrect information confidently (“hallucinate”), limiting their utility in such contexts. Retrieval-augmented generation (RAG) addresses this limitation by dynamically incorporating authoritative, domain-specific knowledge directly into the LLM’s inference process. This approach significantly enhances LLM performance without the typical requirement for extensive fine-tuning or retraining.

In this study, we demonstrate the exceptional performance of a minimalist RAG pipeline (without additional model fine-tuning) on radiation oncology board-style examinations. Leveraging a meticulously curated knowledge base sourced from Gunderson & Tepper’s Clinical Radiation Oncology, Fifth Edition and NCCN guidelines, our model substantially surpassed the performance of contemporary OpenAI models, achieving an outstanding accuracy of 91.5% on the 2021 American College of Radiology (ACR) TXIT examination. This result markedly exceeds the performance benchmarks set by previous LLM-based approaches in this field, which attained a maximum accuracy of 74%.

Crucially, our model exhibited robust self-awareness regarding its knowledge boundaries, overcoming a glaring weakness of pre-trained LLMs; questions answered incorrectly were reliably flagged with low confidence scores (mean 4.12/10 vs. 7.36/10 for correct answers), highlighting areas inadequately represented within the RAG knowledge base. This precise uncertainty estimation underscores RAG’s unique strength in enhancing not just accuracy, but also the reliability and interpretability of model outputs.

We demonstrate that integrating domain-specific knowledge via RAG significantly enhances large language model performance in radiation oncology, enabling reliable confidence scoring previously unattainable with pretrained LLMs. This scalable approach may be well-suited for clinical decision support and medical education. Future efforts will incorporate clinical guidelines and select primary literature to broaden applicability.

Version published to 10.1101/2025.09.16.25335813 on medRxiv
Sep 18, 2025

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

This article has 16 authors:
1. Hao Chen
2. Zhe Xu
3. Ziyi LIU
4. Junlin HOU
5. Ma Jiabo
6. Cheng Jin
7. Yihui Wang
8. Zhixuan CHEN
9. Zhengyu ZHANG
10. Fuxiang HUANG
11. Zhengrui GUO
12. Fengtao ZHOU
13. Yingxue XU
14. Xi WANG
15. Ronald Chan
16. Li Liang
This article has no evaluationsLatest version Oct 1, 2025
Can Large Language Models Reliably Interpret Radiology Reports? A Systematic Evaluation for Tumor Progression Classification

This article has 8 authors:
1. Valentin POHYER
2. Constance de Margerie-Mellon
3. Laetitia PERRONNE
4. Loïc DURON
5. Constance THIBAULT
6. Stéphane Oudard
7. Laure FOURNIER
8. Bastien Rance
This article has no evaluationsLatest version Sep 23, 2025
MedRAGent: An Automatic Literature Retrieval and Screening System Utilizing Large Language Models with Retrieval-Augmented Generation

This article has 7 authors:
1. Zhuoyi Chen
2. Tianyi Liu
3. Yangrui Mo
4. Qishen Fu
5. Sibin Lei
6. Tiejun Tong
7. Xiaoyu Tang
This article has no evaluationsLatest version Sep 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

Can Large Language Models Reliably Interpret Radiology Reports? A Systematic Evaluation for Tumor Progression Classification

MedRAGent: An Automatic Literature Retrieval and Screening System Utilizing Large Language Models with Retrieval-Augmented Generation