CardioFM: A Multimodal Foundation Model for Joint ECG and PPG Representation Learning

Md Hassanuzzaman
Tilendra Choudhary
Alasdair Gent
Mihai Podgoreanu
Suresh Agarwal
Vijay Krishnamoorthy
Sivasubramanium Bhavani
Philip Yang
Annette Esper
Rishikesan Kamaleswaran

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Electrocardiography (ECG) and photoplethysmography (PPG) arise from the same heartbeat and are routinely co-acquired at every monitored bedside, yet no foundation model jointly encodes both modalities. Existing approaches are either ECG-specific, PPG-specific, or domain-agnostic, and none captures the cross-modal physiological coupling between cardiac electrical activity and peripheral hemodynamics. We present CardioFM, a self-supervised multimodal foundation model that integrates ECG Lead-II and PPG through bidirectional cross-modal attention and adaptive residual vector quantization. CardioFM is pretrained on over 500,000 hours from approximately 63,000 patients across intensive care, surgical, ambulatory, and consumer-wearable settings, learning unified representations that transfer across contexts without retraining. CardioFM achieves an F1-score of 0.86 for cardiovascular disease classification on PTB-XL, estimates the QT interval with a mean error of 20.2 ms approaching expert inter-observer variability, and measures pulse arrival time with a mean error of 22.7 ms sufficient to support non-invasive hemodynamic trending. When used as a feature extractor, CardioFM embeddings provide superior discrimination for intensive care false alarm reduction compared with ECG-FM, PaPaGei, and TimesFM, despite requiring substantially smaller representations. In contrast, generic temporal pretraining fails to encode clinically relevant waveform morphology. Demographic inference from waveform embeddings (age MAE: 10.4 years; gender AUC: 0.97; BMI MAE: 0.66 kg/m ² ) confirms that the learned representations encode fundamental biological characteristics without requiring diagnostic labels. The model maintains zero-shot reconstruction fidelity across five independent datasets spanning heterogeneous sensor hardware, sampling rates, and patient populations, with the cross-modal attention mechanism providing robustness to single-modality signal degradation. The 17.11-million-parameter encoder is compatible with edge-deployment constraints, and the model uses only signal modalities already acquired by standard bedside monitors and consumer wearables, requiring no additional sensing hardware. These findings demonstrate that a single multimodal foundation model can consolidate the fragmented landscape of cardiac biosignal analysis, providing a unified representational framework across clinical monitoring systems and wearable health technologies that may extend to broader critical illness surveillance.

Version published to 10.21203/rs.3.rs-9652631/v1 on Research Square
May 11, 2026

Enhanced precision of tensor electrocardiography through increased cumulative distribution function resolution: Validation in healthy individuals

This article has 8 authors:
1. Yayoi Tetsuo Tsukada
2. Hiroaki Hirayama
3. Kenji Yodogawa
4. Hiroshige Murata
5. Yu-ki Iwasaki
6. Takeo Fujino
7. Akihiro Shiozawa
8. Shingo Tsukada
This article has no evaluationsLatest version Jun 2, 2026
An ECG foundation model for generalizable cardiac function prediction across the lifespan

This article has 5 authors:
1. Yuting Yang
2. Lorenzo Peracchio
3. Joshua Mayourian
4. Timothy Miller
5. William G. La Cava
This article has no evaluationsLatest version May 27, 2026
Deep learning optimisation for cardiology: Neural Architecture Search-driven arrhythmia classification with electrocardiograms

This article has 4 authors:
1. Erik Vanegas Müller
2. Arese Joe-Oshodi
3. Abhirup Banerjee
4. Mauricio Villarroel
This article has no evaluationsLatest version May 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhanced precision of tensor electrocardiography through increased cumulative distribution function resolution: Validation in healthy individuals

An ECG foundation model for generalizable cardiac function prediction across the lifespan

Deep learning optimisation for cardiology: Neural Architecture Search-driven arrhythmia classification with electrocardiograms