An active learning pipeline to automatically identify candidate terms for a CDSS ontology—measures, experiments, and performance

Shailesh Alluri
Keerthana Komatineni
Rohan Goli
Richard D. Boyce
Nina Hubig
Hua Min
Yang Gong
Dean F. Sittig
David Robinson
Paul Biondich
Adam Wright
Christian Nøhr
Timothy Law
Arild Faxvaag
Ronald Gimbel
Lior Rennert
Xia Jing

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In this paper, we present an active learning framework designed to automatically identify key phrases relevant to a Clinical Decision Support System (CDSS) ontology. CDSS ontology can play a critical role in standardizing medical vocabulary and enabling seamless data integration across healthcare systems. Traditional methods for ontology development are manual, labor-intensive, and require significant domain expertise. Our approach combines a BiLSTM-CRF model with humans in the loop active learning pipeline to progressively incorporate human experts’ feedback to improve its accuracy. We implement uncertainty sampling as our core document selection strategy, prioritizing instances where the model exhibits low confidence for human review. We introduce new uncertainty aggregation methods—KPSum, KPAvg, DOCSum, and DOCAvg—which, in combination with uncertainty measures such as Maximum Token Probability (MTP), Token Entropy (TE), and Margin, are used to calculate document-level confidence scores. These methods improve the document selection process in Active Learning and make the document selection process transparent and replicable, ensuring that the most informative documents are prioritized for annotation. This study underscores the value of active learning in facilitating ontology development, which can play a significant role in reducing manual effort and facilitating human experts, especially during the long term maintenance stage of ontology.

Version published to 10.1101/2025.04.15.25325868 on medRxiv
Apr 17, 2025

Multi-Model LLM Architectures for Personalized Summarization and Relevance Ranking in Biomedical Literature

This article has 3 authors:
1. Avinash Pandey
2. Alexey Kuznetsov
3. Snehasis Mukhopadhyay
This article has no evaluationsLatest version Jul 30, 2025
Automated Identification of Contextually Relevant Biomedical Entities with Grounded LLMs

This article has 6 authors:
1. Manuel Watter
2. Claudia Giuliani
3. Gita Benadi
4. Felix Engel
5. Harald Binder
6. Klaus Kaier
This article has no evaluationsLatest version Jul 8, 2025
Prior Knowledge Shapes Fine-Tuning Success for Biomedical Term Normalization

This article has 3 authors:
1. Daniel B Hier
2. Steven Keith Platt
3. Anh Nguyen
This article has no evaluationsLatest version Aug 7, 2025

Listed in

Abstract

Article activity feed

Related articles

Multi-Model LLM Architectures for Personalized Summarization and Relevance Ranking in Biomedical Literature

Automated Identification of Contextually Relevant Biomedical Entities with Grounded LLMs

Prior Knowledge Shapes Fine-Tuning Success for Biomedical Term Normalization