Multi-Scale Temporal Fusion Network for Real-Time Multimodal Emotion Recognition in IoT Environments
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The proliferation of Internet of Things (IoT) devices has created opportunities for continuous emotion monitoring, but existing systems face challenges in processing multimodal sensor data in real-time while maintaining accuracy across diverse temporal scales. This paper presents EmotionTFN (Emotion-aware Multi-Scale Temporal Fusion Network), a novel architecture for real-time multimodal emotion recognition in IoT environments. The system integrates physiological signals from EEG, PPG, and GSR sensors, along with visual and audio data, using a hierarchical temporal attention mechanism that captures emotion-relevant features across short-term (0.5-2s), medium-term (2-10s), and long-term (10-60s) time windows. Edge computing optimizations including model compression, quantization, and adaptive sampling enable deployment on resource-constrained devices. Extensive experiments on MELD, DEAP, and G-REx datasets demonstrate that EmotionTFN achieves 94.2% accuracy on discrete emotion classification and 0.087 mean absolute error on dimensional emotion prediction, outperforming baseline approaches by 6.8%. The system maintains sub-200ms latency on typical IoT hardware, shows robust performance under sensor failures, and achieves 40% energy efficiency improvement. Real-world deployment validation in smart home environments over four weeks confirms practical applicability with 97.2% system uptime and high user satisfaction while ensuring privacy through local processing.