ADFF-Net: An Attention-Based Dual-Stream Feature Fusion Network for Respiratory Sound Classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep learning–based respiratory sound classification has emerged as a promising non-invasive and cost-effective approach to assist clinical diagnosis. However, existing methods often face challenges, such as sub-optimal feature representation and limited model expressiveness. To address these issues, we propose an Attention-based Dual-stream Feature Fusion Network (ADFF-Net). Built upon the Audio Spectrogram Transformer, ADFF-Net takes Mel-filter bank and Mel-spectrogram features as dual-stream inputs, while an attention-based fusion module with skip connections is introduced to emphasize pathological spectral regions and preserve multi-scale time–frequency information. Extensive experiments on the ICBHI2017 database with the official train–test split show that ADFF-Net achieves superior performance in the four-class classification task, outperforming traditional fusion strategies and achieving results comparable to state-of-the-art approaches, with specificity of 81.39%, sensitivity of 42.91%, and overall accuracy of 62.14%. These findings highlight the effectiveness of dual-stream acoustic feature fusion and demonstrate the potential of ADFF-Net for clinical decision support in respiratory disease diagnosis.