Automated detection of bicuspid aortic valve from echocardiographic reports using natural language processing: a large-scale Veterans Affairs study

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Bicuspid aortic valve (BAV) is the most common congenital heart defect but often evades timely diagnosis due to variable clinical presentations. Prior to October 2024, no specific diagnosis code existed for BAV, limiting retrospective identification.

Objectives

To develop and validate a natural language processing (NLP) system for automated extraction of heart valve morphology from echocardiographic reports, with focus on BAV detection.

Methods

We developed a rule-based NLP system using MedSpaCy to analyze echocardiographic reports from the Veterans Affairs Corporate Data Warehouse. The system was trained on 555 manually annotated reports and validated on 170 held-out reports. Performance was measured using precision, recall, and F1-score for valve leaflet structure identification.

Results

The NLP system achieved excellent performance for BAV detection with precision of 0.984, recall of 0.955, and F1-score of 0.969. When applied to 14,453,591 echocardiographic documents from 3,478,658 patients, the system identified 84,019 patients (2.42%) with affirmed BAV. Among patients identified by the ICD-10 code Q23.81, NLP showed 86.1% concordance, with manual review confirming NLP accuracy in discordant cases.

Conclusions

This NLP approach enables large-scale retrospective identification of BAV patients from clinical text, creating the largest BAV cohort to date and facilitating future cardiovascular research and clinical decision-making.

Article activity feed