Generalist Large Language Models in a Specialized World: Evidence from the Italian National Medical Education Pathway
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Creating language-specific and domain-specific large language models presents substantial challenges, including computational demands and limited data availability. While it is commonly believed that the benefits of specialized models justify these challenges, we dispute this notion with a comparative evaluation in a low-resourced language and medical-specific domain. In our study, we analyze the performance of various LLMs applied to the Italian healthcare domain using novel unpublished datasets, consisting of five-choice questions from national pre-university and post-university medical exams, covering clinical and preclinical fields. As part of this work, we release these datasets to the research community. We evaluated multilingual and Italian-specific models, along with general-purpose and healthcare-specific models, spanning both open-source and proprietary architectures of varying sizes. Our results demonstrate that multilingual, general-purpose large models consistently exceed the pass threshold across all tests, with the best models achieving over 90% accuracy on postgraduate-level exams. Model size emerged as the most critical factor influencing performance, whereas domain specialization and single-language localization offered no measurable advantage. These findings challenge the traditional pretrain-then-finetune paradigm for domain and language localization in language models, suggesting that advancements in generic-purpose multilingual models may render domain-specific pretraining unnecessary in many specialized cases.