Prior Knowledge Shapes Success When Large Language Models Are Fine-Tuned for Biomedical Term Normalization

Daniel B. Hier
Steven K. Platt
Anh Nguyen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) often fail to correctly associate biomedical terms with their standardized ontology identifiers, posing challenges for downstream applications that rely on accurate, machine-readable codes. These linking failures can compromise the integrity of data used in precision medicine, clinical decision support, and population health. Fine-tuning can partially remedy these issues, but the degree of improvement varies across terms and terminologies. Focusing on the Human Phenotype Ontology (HPO), we show that a model’s prior knowledge of term–identifier pairs, acquired during pre-training, strongly predicts whether fine-tuning will enhance its linking accuracy. We evaluate prior knowledge in three complementary ways: (1) latent probabilistic knowledge, revealed through stochastic prompting, captures hidden associations not evident in deterministic output; (2) partial subtoken knowledge, reflected in incomplete but non-random generation of identifier components; and (3) term familiarity, inferred from annotation frequencies in the biomedical literature, which serve as a proxy for training exposure. We then assess how these forms of prior knowledge influence the accuracy of deterministic identifier linking. Fine-tuning performance varies most for terms in what we call the reactive middle zone of the ontology—terms with intermediate levels of prior knowledge that are neither absent nor fully consolidated. Fine-tuning was most successful when prior knowledge as measured by partial subtoken knowledge, was ‘weak’ or ‘medium’ or when prior knowledge as measured by latent probabilistic knowledge was ‘unknown’ or ‘weak’ (p<0.001). These terms from the ‘reactive middle’ exhibited the largest gains or losses in accuracy during fine-tuning, suggesting that the success of knowledge injection critically depends on the level of term–identifier pair knowledge in the LLM before fine-tuning.

Version published to 10.3390/info16090776
Sep 7, 2025
Version published to 10.20944/preprints202508.0574.v1
Aug 7, 2025

Federated Knowledge Retrieval Elevates Large Language Model Performance on Biomedical Benchmarks

This article has 2 authors:
1. Janet Joy
2. Andrew I. Su
This article has no evaluationsLatest version Aug 2, 2025
Evaluating Language Models for Biomedical Fact-Checking: A Benchmark Dataset for Cancer Variant Interpretation Verification

This article has 15 authors:
1. Caralyn Reisle
2. Cameron J. Grisdale
3. Kilannin Krysiak
4. Arpad M. Danos
5. Mariam Khanfar
6. Erin Pleasance
7. Jason Saliba
8. Melika Hanos
9. Nilan V. Patel
10. Asmita Jain
11. Joshua F McMichael
12. Ajay C. Venigalla
13. Malachi Griffith
14. Obi L. Griffith
15. Steven J. M. Jones
This article has no evaluationsLatest version Sep 15, 2025
BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

This article has 6 authors:
1. Baqer M. Merzah
2. Tania Taami
3. Salman Asoudeh
4. Amir reza Hossein pour
5. Saeed Mirzaee
6. Amir Ali Bengari
This article has no evaluationsLatest version Jul 21, 2025

Listed in

Abstract

Article activity feed

Related articles

Federated Knowledge Retrieval Elevates Large Language Model Performance on Biomedical Benchmarks

Evaluating Language Models for Biomedical Fact-Checking: A Benchmark Dataset for Cancer Variant Interpretation Verification

BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining