Clinical Large Language Models with Multi-Stage Instruction Tuning and Advanced Retrieval-Augmented Generation

Donald Martin
Blake Bowman

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The demand for efficient and accurate Clinical Decision Support Systems (CDSS) is growing rapidly, driven by the escalating volume of medical data. While Large Language Models (LLMs) offer significant potential, their direct application in healthcare is limited by issues like hallucinations and lack of domain-specific knowledge. Retrieval-Augmented Generation (RAG) addresses these challenges by grounding LLMs with external knowledge, and recent lightweight RAG-based CDSS have shown promise. Building on this, we propose Enhanced Clinical RAG-LLM (ECRAG-LLM), a novel system designed to elevate performance in complex clinical scenarios. ECRAG-LLM utilizes a robust yet lightweight Mistral-based LLM, integrated with a multi-stage instruction tuning strategy that first adapts to general medical knowledge and then reinforces context-aware and causal reasoning using a custom dataset of structured clinical cases. We employ BioSimCSE for domain-specific embeddings and introduce an enhanced RAG architecture featuring hybrid retrieval, cross-encoder-based contextual re-ranking, and context summarization to optimize retrieved information. Extensive experiments on medical benchmarks demonstrate that ECRAG-LLM consistently outperforms baseline lightweight fine-tuned LLMs, achieving significant improvements in diagnostic accuracy, treatment appropriateness, and explanatory quality, particularly in tasks requiring deep clinical reasoning. An ablation study confirms the synergistic contributions of our innovations, and an error analysis highlights a substantial reduction in critical errors, positioning ECRAG-LLM as a more reliable and intelligent solution for resource-constrained clinical environments.

Version published to 10.20944/preprints202602.0996.v1
Feb 12, 2026

An Enhanced Lightweight Clinical Decision Support System via Refined Fine-Tuning and Intelligent Retrieval-Augmented Generation

This article has 2 authors:
1. Shulin Yuan
2. Bowen He
This article has no evaluationsLatest version Dec 30, 2025
A Retrieval Augmented System for Cardiological Electronic Health Records.

This article has 9 authors:
1. Annamaria Defilippo
2. Giovanni Canino
3. Nicola Procopio
4. Albino Trapuzzano
5. Sabato Sorrentino
6. ciro Indolfi
7. Patrizia Vizza
8. Pierangelo Veltri
9. Pietro Hiram Guzzi
This article has no evaluationsLatest version Jan 9, 2026
Prompt-Orchestrated Large Language Models for Clinical Information Extraction

This article has 13 authors:
1. Livia Lilli
2. Andrea Rosati
3. Giovanni Paolo Tobia
4. Massimo Criscione
5. Federica Tomassini
6. Chiara Dachena
7. Alice Luraschi
8. Chiara Cantarini
9. Carolina De Maria
10. Luigi Congedo
11. Massimo Bernaschi
12. Stefano Patarnello
13. Anna Fagotti
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An Enhanced Lightweight Clinical Decision Support System via Refined Fine-Tuning and Intelligent Retrieval-Augmented Generation

A Retrieval Augmented System for Cardiological Electronic Health Records.

Prompt-Orchestrated Large Language Models for Clinical Information Extraction