Generalizable and Resilient Multimodal Temporal Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Comprehending human sleep mechanisms is vital for diagnosing a range of neurological and physiological conditions. Traditional sleep staging relies on expert annotation of polysomnographic recordings, a process that is labor-intensive and susceptible to inconsistency. Although automated sleep staging has gained traction, most current systems depend predominantly on EEG signals, which limits their robustness in clinical scenarios where signal quality is often compromised. In this work, we propose \textbf{MedFuseSleep}, a multimodal temporal learning architecture built to classify sleep stages under imperfect data conditions. The model is specifically designed to maintain high performance even in the presence of missing or noisy inputs by adaptively incorporating EEG, EOG, and auxiliary physiological modalities. Drawing inspiration from mid-to-late fusion strategies and grounded in a multi-objective learning framework, MedFuseSleep facilitates cross-modal representation learning while preserving tolerance to corrupted or absent signals. This design enables effective sleep stage inference even when key modalities such as EEG are degraded or unavailable. We validate MedFuseSleep on the SHHS-1 dataset, a large-scale benchmark, and report consistent gains over both unimodal baselines and existing multimodal techniques. Notably, we find that multimodal training not only improves performance on full data but also leads to better unimodal generalization compared to training with unimodal inputs alone. Our findings emphasize the utility of resilient multimodal modeling and advocate for broader integration of robust fusion techniques in clinical time series applications.

Article activity feed