Research on multimodal fatigue driving detection method based on bidirectional temporal modeling and cross-attention mechanism
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Traffic safety issues caused by driver fatigue are becoming increasingly prominent, necessitating an efficient, robust, and deployable fatigue detection assistance system. Traditional geometric feature-based fatigue detection methods typically rely on manually set fixed thresholds and judge fatigue status using single or fused facial geometric features. These methods lack adaptability to individual differences and struggle to ensure stability and accuracy in complex scenarios. While purely image feature-driven methods can capture rich visual details, they often overlook the temporal evolution of keypoints, limiting their ability to model the dynamics of fatigue status. To overcome these challenges, this paper proposes a multimodal feature fusion-based fatigue detection model, MM-DMBICA (MediaPipe MobileNetV3-Dual Modal BiGRU CrossAttention). This model employs a dual-branch architecture: the geometry branch utilizes MediaPipe to extract facial keypoint coordinate sequences and models their temporal dynamics using a bidirectional GRU. The image branch employs MobileNetV3 as a frame-level feature extractor, combined with a BiGRU to capture temporal dependencies between video frames. Furthermore, a bidirectional CrossAttention mechanism, called CrossAttention, is introduced, leveraging a learnable query vector to enhance the interaction between the geometric and image modalities, enabling each modality to focus on the other's important temporal information. Finally, a gated fusion mechanism adaptively integrates the bimodal attention outputs, dynamically balancing the contributions of different features and improving classification robustness. Experiments demonstrate that this model effectively integrates spatial visual details with temporal behavioral patterns, significantly enhancing the ability to discriminate fatigue states in complex environments and providing a highly accurate solution for real-time fatigue monitoring in assisted driving systems.