Domain-Specific Fine-Tuning in a Retrieval-Augmented Generation Framework for Precision Geriatric Medical QA
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) have demonstrated remarkable general-purpose capabilities, yet they often struggle to accurately answer complex questions in specialized domains such as geriatric medicine. Although existing Retrieval-Augmented Generation (RAG) methods can mitigate hallucinations by leveraging external knowledge sources, these methods lack robust domain adaptation, resulting in persistent issues of inaccurate or irrelevant answers. To address this limitation, we present a novel domain-specific fine-tuning approach within the RAG framework, complemented by a specially constructed geriatric medical dataset designed for RAG tasks. Through full-parameter fine-tuning of a large language model, our method achieves a 6–10 percentage-point increase in GPT-4-based answer accuracy over general-purpose RAG baselines on a geriatric medicine test set. Human evaluations further confirm the enhanced professionalism and clinical relevance of the model’s outputs. Notably, the model maintains strong performance on Longbench’s general capability benchmarks, underscoring both the specificity and generalizability of our strategy. This study provides a new pathway toward building more reliable and domain-focused medical QA systems, offering insights for future RAG applications in specialized fields.