Active learning pipeline to automatically identify candidate terms for a CDSS ontology—measures, experiments, and performance

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In this paper, we present an active learning framework designed to enhance the automatic identification of key phrases relevant to a Clinical Decision Support System (CDSS) ontology. CDSS ontology can play a critical role in standardizing medical vocabulary and enabling seamless data integration across healthcare systems. Traditional methods for ontology development are manual, labor-intensive, and require significant domain expertise. Our approach combines a Named Entity Recognition (NER) component based on a BiLSTM-CRF model with an active learning loop to progressively incorporate human experts’ feedback to improve its accuracy. We implement uncertainty sampling as our core data selection strategy, prioritizing instances where the model exhibits low confidence for human review. We introduce new uncertainty aggregation methods—KPSum, KPAvg, DOCSum, and DOCAvg—which, in combination with uncertainty measures such as Maximum Token Probability (MTP), Token Entropy (TE), and Margin, are used to calculate document-level confidence scores. These methods improve the selection process in Active Learning, ensuring that the most informative documents are prioritized for annotation. This study underscores the value of active learning in facilitating ontology development.

Article activity feed