Contrastive Multi-modal Training with Electrocardiography and Natural Language Echocardiography Reports for Zero-shot Prediction of Structural Heart Disease
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Machine learning models for predicting structural heart disease (SHD) from electrocardiography (ECG) traditionally required structured echocardiographic data. The potential of echocardiography (ECHO) natural language reports remains underused. We describe MERL-ECHO, a multimodal model using contrastive language-image pre-training (CLIP) that aligns ECG with ECHO natural language reports for zero-shot SHD prediction.
Methods
We conducted a multi-center retrospective study using paired ECG and ECHO natural language reports from Queen Mary Hospital and Tung Wah Hospital in Hong Kong. MERL-ECHO was trained on 45,016 pairs ECG-ECHO pairs. Performance was evaluated on an internal test set covering 10 SHDs and on an external test set of 5,442 ECGs with ECHO-derived labels for 6 SHDs from Columbia University Irving Medical Center, USA.
Results
The cohort included 8,192 patients (mean age 73.7±16.5 years; 55.3% male). In the internal test set, MERL-ECHO achieved an average AUROC of 0.69, with strongest performance for left ventricular dilation (0.78), right ventricular systolic dysfunction (0.71), and tricuspid regurgitation (0.71). In the external test set, the average AUROC was 0.72, with highest performance for left ventricular systolic dysfunction (0.76) and aortic stenosis (0.76). Pre-training improved AUROC by up to 5%, performance scaled with larger datasets, and ResNet18 outperformed ViT-Tiny as ECG encoder by 7%. Saliency analysis revealed interpretable ECG features, including unexpected P-wave changes in aortic stenosis, suggesting novel disease markers.
Conclusions
MERL-ECHO leverages ECHO natural language reports for multimodal training with ECG. This CLIP-based model enables accurate zero-shot prediction of SHDs and highlights interpretable ECG features with potential clinical relevance.