Semantic Encoding in Medical LLMs for Vocabulary Standardisation

Samuel Mainwood
Aashish Bhandari
Sonika Tyagi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

High-quality, standardised medical data availability remains a bot-tleneck for digital health and AI model development. A major hurdle is translating noisy free text into controlled clinical vocabularies, aiming for harmonisation and interoperability, especially when source datasets are inconsistent or incomplete. We bench-mark domain-specific encoder models against general LLMs for semantic-embedding retrieval using minimal vocabulary building blocks and test several prompt techniques. We also try prompt augmentation with LLM-generated differential definitions. We tested these prompts on open-source Llama and medically fine-tuned Llama models to steer their alignment toward accurate concept assignment across multiple prompt formats. Domain-tuned models consistently outperform general models of the same size in retrieval and generative tasks. However, performance is sensitive to prompt design and model size, and the benefits of adding LLM-generated context are inconsistent. While newer, larger foundation models are closing the gap, today’s lightweight open-source generative LLMs lack the stability and embedded clinical knowledge needed for deterministic vocabulary standardisation.

Version published to 10.1101/2025.06.16.25329716 on medRxiv
Jun 17, 2025

RAGnosis: Retrieval-Augmented Generation for Enhanced Medical Decision Making

This article has 5 authors:
1. Amir Rouhollahi
2. Ali Homaei
3. Aanchal Sahu
4. Rayan Ebnali Harari
5. Farhad R. Nezami
This article has no evaluationsLatest version Jun 12, 2025
Benchmarking Large Language Models on USMLE: Evaluating ChatGPT, DeepSeek, Grok, and Qwen in Clinical Reasoning and Medical Licensing Scenarios

This article has 7 authors:
1. Md Kamrul Siam
2. Angel Varela
3. Md Jobair Hossain Faruk
4. Jerry Q. Cheng
5. Huanying Gu
6. Abdullah Al Maruf
7. Zeyar Aung
This article has no evaluationsLatest version Jun 12, 2025
CLEVER: Clinical Large Language Model Evaluationby Expert Review

This article has 4 authors:
1. Veysel Kocaman
2. Mustafa Kaya
3. Andrei Ferrer
4. David Talby
This article has no evaluationsLatest version Jul 23, 2025

Listed in

Abstract

Article activity feed

Related articles

RAGnosis: Retrieval-Augmented Generation for Enhanced Medical Decision Making

Benchmarking Large Language Models on USMLE: Evaluating ChatGPT, DeepSeek, Grok, and Qwen in Clinical Reasoning and Medical Licensing Scenarios

CLEVER: Clinical Large Language Model Evaluationby Expert Review