A synthetic data generation framework for scalable and resource-efficient medical AI assistants

Abdurrahim Yilmaz
Furkan Yuceyalcin
Rahmetullah Varol
Ece Gokyayla
Ozan Erdem
Donghee Choi
Ali Anil Demircali
Gulsum Gencoglan
Joram M. Posma
Burak Temelkuran

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) show promise in the medical field but remain limited by high computational costs and privacy needs. We present SCALEMED (Scalable Clinical Assistants and LEarning for MEDicine), a data-centric framework designed to train and use specialized medical models on standard hardware. AnnotatorMed, an open-source annotation tool supporting local data labelling, is also developed thereby reinforcing privacy. For dermatology — a visually demanding field — SCALEMED integrates open-access (OA) data, PubMed (PM) literature, and instruction-following (IF) to create DermaSynth, a comprehensive dataset of dermatological images and reports via knowledge transfer with 1.2 million samples. We demonstrate DermatoLlama, a vision-language model that inherits high-level reasoning using DermaSynth yet remains resource-efficient for local deployment. By unifying open-source tools, privacy-focused design, and modular training approaches, SCALEMED provides adaptable AI solutions in resource-limited clinical settings, such as our exemplar of dermatology. This work highlights a practical path for healthcare institutions to adopt LLM-driven decision support systems without the burdens of large-scale architectures or external data exposure by reaching the success of state-of-the-art (SOTA) vision LLMs.

Version published to 10.1101/2025.05.17.25327785v1 on medRxiv
May 18, 2025

Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks

This article has 11 authors:
1. Wenya Xie
2. Qingying Xiao
3. Yu Zheng
4. Xidong Wang
5. Junying Chen
6. Ke Ji
7. Anningzhe Gao
8. Prayag Tiwari
9. Xiang Wan
10. Feng Jiang
11. Benyou Wang
This article has no evaluationsLatest version Jun 2, 2025
Semantic Encoding in Medical LLMs for Vocabulary Standardisation

This article has 3 authors:
1. Samuel Mainwood
2. Aashish Bhandari
3. Sonika Tyagi
This article has no evaluationsLatest version Jun 17, 2025
AI-Powered Ecosystem for Multilingual Diagnostics and Adaptive Specialty Mapping

This article has 4 authors:
1. G J Rahul
2. Rithika Ramesh Chettiar
3. S A Vinit
4. Manjula Devi P
This article has no evaluationsLatest version May 23, 2025

Listed in

Abstract

Article activity feed

Related articles

Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks

Semantic Encoding in Medical LLMs for Vocabulary Standardisation

AI-Powered Ecosystem for Multilingual Diagnostics and Adaptive Specialty Mapping