ProtoMFL: A Robust Multimodal Federated Learning Framework via Cross-Modal Prototype Integration

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multimodal federated learning (MFL) has made substantial progress in aggregating multimodal knowledge across distributed environments. However, it still encounters persistent challenges caused by modality-missing data at the client level. Traditional knowledge distillation–based approaches provide limited performance in handling these modality-missing scenarios. To mitigate the performance degradation caused by modality dropout, this paper proposes a prototype-based multimodal federated learning framework, termed Prototype-based Multimodal Federated Learning (ProtoMFL). By replacing sample-level representations with category-level prototypes as knowledge carriers, ProtoMFL enables more efficient cross-modal knowledge aggregation. The ProtoMFL framework consists of three core components. Cross-Modal Prototype Regularisation reduces distributional discrepancies between client and global models. Cross-Modal Prototype Contrast enhances the aggregation of similar prototypes and separation of dissimilar ones through contrastive learning. Cross-Modal Alignment enforces semantic alignment between modalities at the feature level, thereby mitigating the adverse effects of modality dropout. Experimental results show that ProtoMFL significantly outperforms existing methods in both accuracy and robustness across multiple benchmark datasets. Even under severe modality dropout, ProtoMFL maintains stable performance, achieving an average improvement of approximately 2.8% over the baseline CreamFL model without prototype mechanisms. This improvement effectively mitigates model drift issues caused by heterogeneous modalities.

Article activity feed