Emotion Recognition from rPPG via Physiologically-Inspired Temporal Encoding and Attention-based Curriculum Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Remote photoplethysmography (rPPG) enables non-contact physiological measurement for emotion recognition, yet the temporally sparse nature of emotional cardiovascular responses, intrinsic measurement noise, weak session-level labels, and subtle correlates of valence pose critical challenges. To address these issues, we propose a physiologically inspired deep learning framework comprising a Multi-scale Temporal Dynamics Encoder (MTDE) to capture autonomic nervous system dynamics across multiple timescales, an adaptive sparse α-entmax attention mechanism to identify salient emotional segments amidst noisy signals, Gated Temporal Pooling for robust aggregation of emotional fea-tures, and a structured three-phase curriculum learning strategy to systematically handle temporal sparsity, weak labels, and noise. Evaluated on the MAHNOB-HCI dataset (27 subjects, 527 sessions, subject-independent split), our temporal-only model achieved competitive performance in arousal recognition (66.04% accuracy, 61.97% weighted F1), surpassing prior CNN-LSTM baselines. However, lower performance in valence (62.26% accuracy) revealed inherent physiological limitations of unimodal temporal cardiovas-cular analysis. These findings establish clear benchmarks for temporal-only rPPG emo-tion recognition and underscore the necessity of incorporating spatial or multimodal information to effectively capture nuanced emotional dimensions such as valence, guiding future research directions in affective computing.