ECG classification with convolutional neural networks demonstrates resilience to sex-imbalances in data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Many ECG-AI models have been developed to predict a wide range of cardiovascular outcomes. The underrepresentation of women in cardiovascular disease studies has raised concerns if these models are equally predictive in women as compared to men. We tested the effect of sex-imbalance in training datasets on predictive performance of ECG-AI models, investigating imbalance in representation (ratio women-to-men), as well as in outcome prevalence, and percentage of misclassification.

Methods

We used a dataset containing raw 12-lead ECGs (n = 474,006) of 181,755 individuals who visited the University Medical Center Utrecht at any of the non-cardiology departments between July 1997 and August 2023 and sampled a sex-balanced dataset (n = 165,156) including only one ECG per individual. Multiple deep convolutional neural networks were trained to predict four outcomes; left bundle branch block, Long QT Syndrome, left ventricular hypertrophy or ECGs classified as ‘abnormal’ by a physician. Using subsampling, we simulated scenarios of sex-imbalance in representation (n scenario =5) for all outcomes and disease prevalence (n scenario =5), both representation and disease prevalence (n scenario =20) and disease misclassification (n scenario =7) for ‘abnormal’. Model performance was evaluated per scenario using area under the receiver operating characteristic curve (AUC) and smooth expected calibration error (smECE) for women and men separately.

Results

Across all scenario’s, the AUC remained stable, with small absolute differences between women and men for sex-imbalance in representation (ΔAUC: [0.002-0.025]), in disease prevalence (ΔAUC: [0.01-0.02]), in scenarios of both representation and disease prevalence (ΔAUC: [0.003-0.039]), and in outcome misclassification (ΔAUC: [0.007-0.077]). Only when disease prevalence in train and test data was sex-imbalanced, we observed differences in calibration error between sexes (max ΔsmECE: 0.26), with similar patterns for women and men.

Conclusion

The neural networks in this study demonstrated resilience to sex-imbalance in training ECG data.

Graphical summary of the study methodology and results showing that ECG classification with convolutional neural networks is not sensitive to sex-imbalances in datasets. AUC = Area under the receiver operating curve; smECE = smooth expected calibration error. Created in BioRender. Meijer, I. (2025) https://BioRender.com/nxkwvoi .

Article activity feed