Machine learning to classify left ventricular hypertrophy using ECG feature extraction by variational autoencoder

Amulya Gupta
Christopher J. Harvey
Ashley DeBauge
Sumaiya Shomaji
Zijun Yao
Yongkuk Lee
Amit Noheria

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Traditional ECG criteria for left ventricular hypertrophy (LVH) have modest diagnostic yield.

Objective

Develop and validate machine learning models for LVH diagnosis from ECG.

Methods

ECG summary features (rate, intervals, axis), R-wave, S-wave and overall-QRS amplitudes, and QRS voltage-time integrals (VTI _QRS ) were extracted from 12-lead, vectorcardiographic X-Y-Z-lead, and 3D (L2 norm) representative-beat ECGs. Latent features (30 per ECG) were extracted using a variational autoencoder (trained on unselected >1 million ECGs) from X-Y-Z-lead representative-beat ECG signals. Logistic regression, random forest, light gradient boosted machine (LGBM), residual network (ResNet) and multilayer perceptron network (MLP) models using ECG features and sex, and a convolutional neural network (CNN) using ECG signals alone, were trained to predict LVH (left ventricular mass indexed in women >95 g/m ² , men >115 g/m ² ) on 482,734 adult ECG-echocardiogram (within 45 days) pairs. ROC-AUCs for LVH classification are reported from a separate hold-out test set.

Results

In the test set (n=54,984), AUC for LVH classification was higher for ML models using ECG features (LGBM 0.794, MLP 0.793, ResNet 0.795) compared with the best individual ECG variable (VTI _QRS-Z 0.707), the best traditional criterion (Cornell voltage-duration product 0.716), and the CNN using ECG signals (0.788). Among patients without LVH who had a follow-up echocardiogram >1 (closest to 5) year later, LGBM false positives, compared to true negatives, had a 3.07 (95% CI 2.44, 3.86)-fold higher odds of developing future LVH (p<0.0001).

Conclusions

ML models are superior to traditional ECG criteria to classify LVH. Models trained on extracted ECG features, including latent variational autoencoder representations, can outperform CNN models directly trained on ECG signals.

Version published to 10.1101/2024.10.14.24315460 on medRxiv
Oct 15, 2024

An Open-Source Retrospective Analysis of Hypertrophic and Dilated Cardiomyopathy Using Machine Learning and Electrocardiogram Data

This article has 2 authors:
1. Arda Altintepe
2. Asu Rustemli
This article has no evaluationsLatest version Jan 23, 2026
A Lightweight, CPU-Deployable, and Interpretable ECG Arrhythmia Classification Pipeline Using the MIT-BIH Arrhythmia Database

This article has 1 author:
1. Weihao Cheng
This article has no evaluationsLatest version Dec 23, 2025
Early Diagnosis Opportunities in Neonatal Transient Tachypnea with Electrocardiogram and Machine Learning

This article has 6 authors:
1. Oğuzhan Ay
2. Sezgin Gunes
3. Ilknur Akansu
4. Merve Emirhan
5. Zehra Tolar Sozkesen
6. Ayse Simsek
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Background

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

An Open-Source Retrospective Analysis of Hypertrophic and Dilated Cardiomyopathy Using Machine Learning and Electrocardiogram Data

A Lightweight, CPU-Deployable, and Interpretable ECG Arrhythmia Classification Pipeline Using the MIT-BIH Arrhythmia Database

Early Diagnosis Opportunities in Neonatal Transient Tachypnea with Electrocardiogram and Machine Learning