Systematic Evaluation of Multilingual Retrieval-Augmented Generation for Gastrointestinal Tumor Board Decision Support

Derna Stifini
Andrea Della Penna
André L. Mihaljevic
Pavlos Missios
Michael Bitzer
Carsten Eickhoff
Ivan Capobianco

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) have been proposed as decision support tools for multidisciplinary tumor boards, yet systematic preclinical validation of retrieval-augmented generation (RAG) pipelines remains lacking. In this retrospective framework validation study using real-world clinical data, we applied a modular evaluation framework to 100 gastrointestinal tumor board cases spanning five cancer types, systematically testing 16 configurations varying model variant, multilingual retrieval strategy, query formulation, and corpus scope. Baseline concordance with multidisciplinary team recommendations ranged from 79–85%. Combining query rewriting with curated guideline retrieval improved concordance to 93–95% (p < 0.01), with prompt design and corpus curation exerting greater influence than model selection. Among residual discordant cases in optimal configurations, approximately 60% represented clinically inappropriate recommendations rather than acceptable therapeutic alternatives. These findings demonstrate that systematic RAG optimization substantially improves clinical decision support concordance, while the high rate of inappropriate residual errors underscores the necessity of mandatory expert oversight before any clinical deployment.

Version published to 10.21203/rs.3.rs-8849187/v1 on Research Square
Mar 8, 2026

OncoCITE: Multimodal Multi-Agent Reconstruction of Clinical Oncology Knowledge Bases from Scientific Literature

This article has 6 authors:
1. Mujahid Quidwai
2. Santiago Thibaud
3. Dennis Shasha
4. Sundar Jagannath
5. Samir Parekh
6. Alessandro Laganà
This article has no evaluationsLatest version Mar 31, 2026
Early economic evaluation of retrieval-layer correction in clinical RAG: a decision-uncertainty framework

This article has 1 author:
1. Yngve Mikkelsen
This article has no evaluationsLatest version Mar 30, 2026
Concordance Between the DeepSeek-V3 Language Model and Multidisciplinary Team Recommendations in Lung Cancer: A Retrospective Study

This article has 7 authors:
1. Yihan ZHao
2. Fangqi Yuan
3. Lingli Wang
4. Meifang Wang
5. Long Zhang
6. Tao Ren
7. Hansheng Wang
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

OncoCITE: Multimodal Multi-Agent Reconstruction of Clinical Oncology Knowledge Bases from Scientific Literature

Early economic evaluation of retrieval-layer correction in clinical RAG: a decision-uncertainty framework

Concordance Between the DeepSeek-V3 Language Model and Multidisciplinary Team Recommendations in Lung Cancer: A Retrospective Study