Validation of Natural Language Processing for Surgical Complication Surveillance: Detecting Eleven Postoperative Complications from Electronic Health Records

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Postoperative complications (PCs) rates are crucial quality metrics in surgery, as they reflect both patient outcomes, perioperative care effectiveness and healthcare resource strain. Despite their importance, efficient, accurate, and affordable methods for tracking PCs are lacking. This study aimed to evaluate whether natural language processing (NLP) models could detect eleven PCs from surgical electronic health records (EHRs) at a level comparable to human curation.

Methods

17 486 surgical cases from 18 hospitals across two regions in Denmark, spanning six years, were included. The dataset was divided into training, validation, and test sets for NLP-model development and evaluation (50.2%/33.6%/16.2%). Model performance was compared against the current method of PC monitoring (ICD-10 codes) and manual curation, the latter serving as the gold standard.

Results

The NLP-models had a ROC AUC between 0.901 to 0.999 for the test set. Sensitivity of the models when compared to manual curation ranged from 0.701 to 1.00, except for myocardial infarction (0.500). Positive Predictive Value (PPV) ranged from 0.0165 to 0.947, and Negative Predictive Value from 0.995 to 1.00. The NLP-models significantly outperformed ICD-10 coding in detecting PC, resulting in 16.3% of cases would require manual curation to reach a PPV of 1.00

Conclusion

The NLP models alone were able to detect PCs at an acceptable level and performed superior to ICD-10 codes. Combining NLP based and manual curation was required to reach a PPV of 1.00. Therefore, NLP algorithms present a potential solution for comprehensive and real-time monitoring of PCs across the surgical field.

Article activity feed