Large Language Models Struggle to Encode Medical Concepts — A Multilingual Benchmarking and Comparative Analysis

Hossein Rouhizadeh
Anthony Yazdani
Boya Zhang
David Vicente Alvarez
Matthias Hüser
Alexandre Vanobberghen
Rui Yang
Irene Li
Andreas Walter
Douglas Teodoro

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Interoperability in health information systems is crucial for accurate data exchange across environments such as electronic health records, clinical notes, and medical research. The main challenge arises from the wide variation in biomedical concepts, their representation across different systems and languages, and the limited context, complicating data integration and standardization. Inspired by recent advances in large language models (LLMs), this study explores their potential role as biomedical knowledge engineers to (semi-)automate multilingual biomedical concept normalization, a key task for semantic interoperability of medical concepts. We developed a novel multilingual dataset comprising 59’104 unique terms mapped to 27’280 distinct biomedical concepts, designed to assess language model performance across this task within five European languages: English, French, German, Spanish, and Turkish. We then proposed a multi-stage pipeline based on a retrieve-then-rerank approach using sparse and dense retrievers, rerankers, and fusion approaches, leveraging discriminative and generative LLMs, with a predefined primary knowledge organization system. Our experiments show that the best discriminative model, e5, achieves an accuracy of 71%, surpassing the best generative model, Mistral, by 2% (p-value < 0.001). For semi-automated workflows, e5 maintained superior performance with 82% recall@10 versus Mistral’s 78%. Our findings demonstrate a pathway to how LLM-based approaches can advance the normalization of multilingual biomedical terms as well as the limitations of LLMs in encoding biomedical concepts.

Version published to 10.1101/2025.01.15.25320579v1 on medRxiv
Jan 15, 2025

Semantic Encoding in Medical LLMs for Vocabulary Standardisation

This article has 3 authors:
1. Samuel Mainwood
2. Aashish Bhandari
3. Sonika Tyagi
This article has no evaluationsLatest version Jun 17, 2025
Implementation of Large Language Models in Electronic Health Records

This article has 3 authors:
1. Maxime Griot
2. Jean Vanderdonckt
3. Demet Yuksel
This article has no evaluationsLatest version Jul 4, 2025
A synthetic data generation framework for scalable and resource-efficient medical AI assistants

This article has 10 authors:
1. Abdurrahim Yilmaz
2. Furkan Yuceyalcin
3. Rahmetullah Varol
4. Ece Gokyayla
5. Ozan Erdem
6. Donghee Choi
7. Ali Anil Demircali
8. Gulsum Gencoglan
9. Joram M. Posma
10. Burak Temelkuran
This article has no evaluationsLatest version May 18, 2025

Listed in

Abstract

Article activity feed

Related articles

Semantic Encoding in Medical LLMs for Vocabulary Standardisation

Implementation of Large Language Models in Electronic Health Records

A synthetic data generation framework for scalable and resource-efficient medical AI assistants