Voice-Assisted Multimodal Fusion Network for Difficult Airway Assessment

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Difficult airway management poses significant risks in anesthesia and emergency medicine, including hypoxemia and airway injury. While image-based methods have improved airway assessment, they often fail to capture functional information like airway patency and vocal cord movement. To address this, we propose VAMF-Net (Voice-Assisted Multimodal Fusion Network), which integrates three-view airway images with voice data to improve difficult airway evaluation accuracy. VAMF-Net employs an early fusion strategy of multi-view image features with contrastive learning pretraining, enabling early interaction between views to capture complementary information. Furthermore, We introduce a voice-assisted cross-attention (VCA) module, which prioritizes image data as the primary source while using voice data as supplementary input. A dataset of 1,106 samples (89 difficult and 1017 easy cases) was constructed, with each sample including three airway images (frontal open mouth, frontal tongue extended, and side head tilted back) and voice data. VAMF-Net achieved an AUC of 0.917, sensitivity of 0.931, and specificity of 0.815, demonstrating superior performance compared to existing methods.

Article activity feed