DMMAF-HAR: Dynamic Multi-Modal Adaptive Fusion for Human Activity Recognition in Complex Environments
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Human Activity Recognition (HAR) faces significant challenges in dynamic real-world environments. This paper introduces DMMAF-HAR, a novel deep learning framework for robust HAR, integrating dynamic visual analysis, comprehensive modality-specific enhancement, and context-aware adaptive fusion. It incorporates a Dynamic Visual Chronometer Module (DVCM) for video-based dynamics and physical time scales; a Modality-Specific Enhancement and Feature Extractor (MSEFE) for tailored processing of IMU, body conduction, and acoustic data; and a Context-Adaptive Fusion and Classifier (CAFC) for intelligent, context-aware modal fusion. Evaluated on the challenging MobiAct++ dataset, DMMAF-HAR achieves state-of-the-art performance, significantly outperforming various single-modal and multi-modal baselines. Ablation studies confirm each module's contribution, with analyses highlighting robustness, cross-modality benefits, and computational efficiency. A complementary user study validates its practical utility and perceived reliability. Our contributions include physical time scale integration, comprehensive modality-specific processing, and a novel context-aware adaptive fusion, leading to superior robustness and accuracy for real-world HAR.