Reliability of Artificial Intelligence-enhanced Electrocardiography

Lovedeep S Dhingra
Philip M Croon
Bruno Batinica
Arya Aminorroaya
Aline F Pedroso
Evangelos K Oikonomou
Rohan Khera

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

The scientific literature on artificial intelligence-enabled electrocardiography (AI-ECG) has defined a robust performance of AI models in detecting and predicting several structural heart disorders (SHDs) using ECGs. However, as a diagnostic test, the real-world clinical utility of AI-ECG reliability requires the consistency of its results when repeated under similar conditions.

Aim

To evaluate the reliability of AI-ECG models for different ECGs for the same person, across different diagnostic labels, and using varied modeling approaches.

Methods

We used ECG images (2000-2024) from 5 hospitals and an outpatient network within a large, integrated US health system. For each individual, we identified multiple ECGs recorded within a 30-day period. We evaluated 7 models: 6 convolutional neural networks (CNNs) trained to detect individual SHDs, including LV systolic dysfunction, left valve diseases and severe LVH; an ensemble XGBoost integrating individual CNNs as a composite screen for multiple SHDs. We used concordance correlation coefficient (CCC), Spearman correlation, Cohen’s kappa, and percent agreement in binary screen status to test model reliability. We evaluated factors associated with different AI-ECG outputs (Δ probability> 0.5) and assessed stability across ECG layouts (digital, printed, photo).

Results

Across sites, we identified 1,118,263 ECG pairs, with a median 1 (1-3) days between ECGs. The ensemble XGBoost had the higher test-retest correlation (CCC: 0.89-0.92) and agreement (kappa: 0.75-0.82) between pairs compared with CNNs (CCC: 0.78-0.88; kappa: 0.57-0.72). After adjusting for demographics, ECG pairs that included one or both inpatient ECG were significantly more likely to yield unstable predictions (ORs: 1.60 [1.50-1.70] and 1.91 [1.78-2.05], respectively) compared with pairs with both ECGs obtained in outpatient settings. Among outpatient pairs across sites, the XGBoost model had a CCC of 0.89-0.94, a Spearman correlation of 0.90-0.94, and a kappa of 0.78-0.84, with concordance rates of 89-92%. Notably, ensemble model predictions were also stable across different ECG layouts.

Conclusion

An ensemble AI-ECG model integrating multiple CNN predictions had higher reliability compared with models for individual disorders. Discordance was more common in inpatient ECGs, suggesting instability in high-acuity settings. Reliable ensemble AI-ECG model outputs support readiness for clinical implementation for SHD screening.

GRAPHICAL ABSTRACT

Study Design

Abbreviations: AR, aortic regurgitation; AS, aortic stenosis; CNN, convolutional neural network; ECG, electrocardiogram; FC, fully-connected layers; LVSD, left ventricular systolic dysfunction; MR, mitral regurgitation; SHD, structural heart diseases; sLVH, severe left ventricular hypertrophy, XGBoost, extreme gradient boosting.

Version published to 10.1101/2025.11.04.25339526 on medRxiv
Nov 6, 2025

Cardiac Classification with Multi-Scale Convolutional Neural Network From Paper ECG

This article has 3 authors:
1. Xue Cheng
2. Jiang Yi
3. Gao Peng
This article has no evaluationsLatest version Oct 7, 2025
Artificial Intelligence-Enabled Electrocardiogram for Elevated Left Ventricular Filling Pressure

This article has 11 authors:
1. Jaehyun Lim
2. Min Sung Lee
3. Jung Ho Suh
4. Sora Kang
5. Hak Seung Lee
6. Jong-Hwan Jang
7. Jeong Min Son
8. Joon-Myoung Kwon
9. Yong-Jin Kim
10. Kyung-Hee Kim
11. Seung-Pyo Lee
This article has no evaluationsLatest version Oct 7, 2025
ECG-SENNsation: Trustworthy 12-Lead ECG Classification by Self-Explaining Neural Networks for Clinical Decision Support and Knowledge Discovery

This article has 5 authors:
1. Marc Goettling
2. Alexander Hammer
3. Nadine Wäßnig
4. Hagen Malberg
5. Martin Schmidt
This article has no evaluationsLatest version Nov 12, 2025

Discuss this preprint

Listed in

Abstract

Background

Aim

Methods

Results

Conclusion

GRAPHICAL ABSTRACT

Study Design

Article activity feed

Related articles

Cardiac Classification with Multi-Scale Convolutional Neural Network From Paper ECG

Artificial Intelligence-Enabled Electrocardiogram for Elevated Left Ventricular Filling Pressure

ECG-SENNsation: Trustworthy 12-Lead ECG Classification by Self-Explaining Neural Networks for Clinical Decision Support and Knowledge Discovery