An active learning pipeline to automatically identify candidate terms for a CDSS ontology—measures, experiments, and performance

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In this paper, we present an active learning framework designed to automatically identify key phrases relevant to a Clinical Decision Support System (CDSS) ontology. CDSS ontology can play a critical role in standardizing medical vocabulary and enabling seamless data integration across healthcare systems. Traditional methods for ontology development are manual, labor-intensive, and require significant domain expertise. Our approach combines a BiLSTM-CRF model with humans in the loop active learning pipeline to progressively incorporate human experts’ feedback to improve its accuracy. We implement uncertainty sampling as our core document selection strategy, prioritizing instances where the model exhibits low confidence for human review. We introduce new uncertainty aggregation methods—KPSum, KPAvg, DOCSum, and DOCAvg—which, in combination with uncertainty measures such as Maximum Token Probability (MTP), Token Entropy (TE), and Margin, are used to calculate document-level confidence scores. These methods improve the document selection process in Active Learning and make the document selection process transparent and replicable, ensuring that the most informative documents are prioritized for annotation. This study underscores the value of active learning in facilitating ontology development, which can play a significant role in reducing manual effort and facilitating human experts, especially during the long term maintenance stage of ontology.

Article activity feed