MedError: A Machine-Assisted Framework for Systematic Error Analysis in Clinical Concept Extraction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Error analysis is a critical step in evaluating and improving clinical concept extraction models, the most common clinical natural language processing (NLP) task. Unlike corpus annotation, which follows standardized protocols for creating gold-standard datasets, error analysis requires nuanced judgment grounded in both clinical expertise and NLP knowledge. This is especially important given the heterogeneity of clinical text, where variations in documentation style, note structure, and terminology can substantially influence model behavior. Despite its importance, there is currently no standardized, user-level framework to support systematic error analysis in clinical concept extraction task. In this study, we developed and validated MedError, a machine-assisted, human-in-the-loop framework designed to standardize and enhance error analysis for clinical concept extraction tasks. We collected and manually curated a corpus of 1,187 unique errors from a total of 4,237 notes across three different distinct hospitals. The error categories were defined using our previously validated error taxonomy and included 480 false negatives and 707 false positives across 25 error types and 48 clinical concept categories. We evaluated the performance of three proprietary and three open-source large language models (LLMs) in automatically classifying these errors into 26 and 15 predefined categories. We further developed a machine-assisted framework, MedError, which integrates best practices in error analysis, LLM-assisted classification and reasoning, and a user-friendly interface to enable more efficient, reproducible, and context-aware error analysis. The framework supports both single-site and federated multisite error analysis, facilitating the effective translation of clinical NLP systems into real-world settings.

Article activity feed