Echo-Vision-FM: A Pre-training and Fine-tuning Framework for Echocardiogram Videos Vision Foundation Model

Ziyang Zhang
Qinxin Wu
Sirui Ding
Xiaolong Wang
Jiancheng Ye

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Echocardiograms provide vital insights into cardiac health, but their complex, multi-dimensional data presents challenges for analysis and interpretation. Current deep learning models for echocardiogram analysis often rely on supervised training, limiting their generalizability and robustness across datasets and clinical environments.

Objective

To develop and evaluate EchoVisionFM ( E chocardiogram video Vision F oundation M odel), a self-supervised video learning framework designed to pre-train a video encoder on large-scale, unlabeled echocardiogram data. EchoVisionFM aims to produce robust and transferrable spatiotemporal representations, improving downstream performance across diverse echocardiogram datasets and clinical conditions.

Methods

Our framework employs Echo-VideoMAE, an autoencoder-based video transformer that compresses and reconstructs echocardiogram video data by masking non-overlapping video patches and leveraging a ViT encoder-decoder structure. For enhanced representation, we introduce STFF-Net , a S patio T emporal F eature F usion Net work, to integrate spatial and temporal features from the manifold representations. We pre-trained EchoVisionFM using the MIMIC-IV-ECHO dataset and fine-tuned it on the EchoNet-Dynamic dataset for downstream tasks, including classification and regression of key cardiac parameters.

Results

EchoVisionFM demonstrated superior performance in classifying left ventricular ejection fraction (LVEF), achieving an accuracy of 89.12%, an F1 score of 0.9323, and an AUC of 0.9364. In regression tasks, EchoVisionFM outperformed state-of-the-art models, with LVEF prediction reaching a mean absolute error (MAE) of 4.18% and an R ² of 0.8022. The model also showed significant improvements in estimating end-systolic and end-diastolic volumes, with R ² values of 0.8006 and 0.7296, respectively. Incorporating STFF-Net led to further performance gains across tasks.

Conclusion

Our results indicate that large-scale self-supervised pre-training on echocardiogram videos enables the extraction of transferable and clinically relevant features, outperforming traditional CNN-based methods. The EchoVisionFM framework, particularly with STFF-Net, enhances the extraction of spatiotemporal features, improving the predictive accuracy for various cardiac parameters. EchoVisionFM offers a powerful, scalable approach for echocardiogram analysis, with potential applications in clinical diagnostics and research.

Version published to 10.1101/2024.10.09.24315195v2 on medRxiv
Oct 26, 2024
Version published to 10.1101/2024.10.09.24315195v1 on medRxiv
Oct 10, 2024

A Robust and Data-Efficient Deep Learning Modelfor Cardiac Assessment without Segmentation

This article has 2 authors:
1. Conor Artman
2. Ricardo Henao
This article has no evaluationsLatest version Oct 28, 2024
MedMAE: A Self-Supervised Backbone for Medical Imaging Tasks

This article has 5 authors:
1. Anubhav Gupta
2. Islam Osman
3. Mohamed S. Shehata
4. John W. Braun
5. Rebecca E. Feldman
This article has no evaluationsLatest version Oct 11, 2024
An Ensemble Deep Learning Algorithm for Structural Heart Disease Screening Using Electrocardiographic Images: PRESENT SHD

This article has 13 authors:
1. Lovedeep S Dhingra
2. Arya Aminorroaya
3. Veer Sangha
4. Aline F Pedroso
5. Sumukh Vasisht Shankar
6. Andreas Coppi
7. Murilo Foppa
8. Luisa CC Brant
9. Sandhi M Barreto
10. Antonio Luiz P Ribeiro
11. Harlan M Krumholz
12. Evangelos K Oikonomou
13. Rohan Khera
This article has no evaluationsLatest version Nov 7, 2024

Listed in

Abstract

Background

Objective

Methods

Results

Conclusion

Article activity feed

Related articles

A Robust and Data-Efficient Deep Learning Modelfor Cardiac Assessment without Segmentation

MedMAE: A Self-Supervised Backbone for Medical Imaging Tasks

An Ensemble Deep Learning Algorithm for Structural Heart Disease Screening Using Electrocardiographic Images: PRESENT SHD