RAGCare-QA: A Benchmark Dataset for Evaluating Retrieval-Augmented Generation Pipelines in Theoretical Medical Knowledge

Jovana Dobreva
Ivana Karasmanakis
Filip Ivanisevic
Tadej Horvat
Dimitar Kitanovski
Matjaz Gams
Kostadin Mishev
Monika Simjanoska Misheva

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The paper introduces RAGCare-QA, an extensive dataset of 420 theoretical medical knowledge questions for assessing Retrieval-Augmented Generation (RAG) pipelines in medical education and evaluation settings. The dataset includes one-choice-only questions from six medical specialties (Cardiology, Endocrinology, Gastroenterology, Family Medicine, Oncology, and Neurology) with three levels of complexity (Basic, Intermediate, and Advanced). Each question is accompanied by the best fit of RAG implementation complexity level, such as Basic RAG (315 questions, 75.0%), Multi-vector RAG (82 questions, 19.5%), and Graph-enhanced RAG (23 questions, 5.5%). The questions emphasize theoretical medical knowledge on fundamental concepts, pathophysiology, diagnostic criteria, and treatment principles important in medical education. The dataset is a useful tool for the assessment of RAG-based medical education systems, allowing researchers to fine-tune retrieval methods for various categories of theoretical medical knowledge questions.

VALUE OF THE DATA

–

RAGCare-QA dataset is designed to benchmark state-of-the-art RAG architectures recommendations for theoretical medical knowledge through 420 human annotated single-choice questions, well-distributed in 6 different medical specialties.

–

Researchers can leverage this resource to build more effective educational tools that adapt their retrieval strategies based on question complexity and medical specialty.

–

The dataset fills a gap in medical AI by providing a standardized benchmark that supports the development of AI-based adaptive educational tools.

–

The dataset classifies each question by the most suitable RAG architecture, Basic, Multi-vector, or Graph-enhanced, needed for context retrieval, enabling precise performance comparisons across retrieval strategies.

–

The dataset can serve as a foundation for development of specialized retrieval strategies to enhance learning outcomes in medical education.

Version published to 10.1101/2025.08.15.25333718 on medRxiv
Aug 16, 2025

Retrieval-Augmented Generation (RAG) in Healthcare: A Comprehensive Review

This article has 3 authors:
1. Fnu Neha
2. Deepshikha Bhati
3. Deepak Kumar Shukla
This article has no evaluationsLatest version Sep 11, 2025
Cardiology Knowledge Assessment of Retrieval-Augmented Open versus Proprietary Large Language Models

This article has 12 authors:
1. Constantine Tarabanis
2. Shaan Khurshid
3. Areti Karamanou
4. Rodo Piperaki
5. Lucas A. Mavromatis
6. Aris Hatzimemos
7. Dimitrios Tachmatzidis
8. Constantinos Bakogiannis
9. Vassilios Vassilikos
10. Patrick T. Ellinor
11. Lior Jankelson
12. Evangelos Kalampokis
This article has no evaluationsLatest version Sep 12, 2025
Zero-Shot Evaluation of Kimi K2 on Pediatric Clinical Cases

This article has 6 authors:
1. Gianluca Mondillo
2. Mariapia Masino
3. Simone Colosimo
4. Alessandra Perrotta
5. Vittoria Frattolillo
6. Fabio Giovanni Abbate
This article has no evaluationsLatest version Jul 29, 2025

Listed in

Abstract

VALUE OF THE DATA

Article activity feed

Related articles

Retrieval-Augmented Generation (RAG) in Healthcare: A Comprehensive Review

Cardiology Knowledge Assessment of Retrieval-Augmented Open versus Proprietary Large Language Models

Zero-Shot Evaluation of Kimi K2 on Pediatric Clinical Cases