Voice-Assisted Multimodal Fusion Network for Difficult Airway Assessment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Difficult airway management poses significant risks in anesthesia and emergency medicine, including hypoxemia and airway injury. While image-based methods have improved airway assessment, they often fail to capture functional information like airway patency and vocal cord movement. To address this, we propose VAMF-Net (Voice-Assisted Multimodal Fusion Network), which integrates three-view airway images with voice data to improve difficult airway evaluation accuracy. VAMF-Net employs an early fusion strategy of multi-view image features with contrastive learning pretraining, enabling early interaction between views to capture complementary information. Furthermore, We introduce a voice-assisted cross-attention (VCA) module, which prioritizes image data as the primary source while using voice data as supplementary input. A dataset of 1,106 samples (89 difficult and 1017 easy cases) was constructed, with each sample including three airway images (frontal open mouth, frontal tongue extended, and side head tilted back) and voice data. VAMF-Net achieved an AUC of 0.917, sensitivity of 0.931, and specificity of 0.815, demonstrating superior performance compared to existing methods.