Multi-Modal Data Fusion With Federated Multi-Head Attention for Diabetic Retinopathy Severity Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose Diabetic retinopathy (DR) is a leading cause of vision loss. Deep learning models have achieved high accuracy in DR detection from fundus images, but often ignore valuable patient information in electronic health records (EHR). This study develops a multi-modal data fusion approach for DR severity classification, integrating fundus images with key EHR features (glycated hemoglobin HbA1c, diabetes duration, and age) to improve performance. We also extend it to a federated learning framework for decentralized training across sites while preserving privacy. Key research questions: Can attention-based fusion of imaging and EHR data enhance DR classification over single-modality models? Does federated learning achieve comparable results to centralized training without sharing sensitive data? Methods The architecture utilizes ResNet-18 for extracting features from fundus images and a multilayer perceptron (MLP) for processing EHR tabular data. A multi-head attention mechanism fuses modalities to learn task-relevant interactions. Federated learning is implemented via FedML, simulating training across four client sites. Experiments use the RetinaMNIST dataset (1,600 fundus images) augmented with synthetic EHR data. Results The multi-modal model outperformed image-only and EHR-only baselines, achieving higher accuracy and area under the ROC curve (AUROC). The federated learning variant yielded performance close to the centrally trained model, with only a minor decrease in AUROC. Visualizations of attention weights revealed clinically relevant retinal regions. Conclusion Incorporating EHR data via attention-based fusion significantly improves DR severity grading, while federated learning facilitates secure multi-center collaboration without data sharing. This approach holds promise for real-world clinical deployment in privacy-sensitive environments.

Article activity feed