ADFF-Net: An Attention-Based Dual-Stream Feature Fusion Network for Respiratory Sound Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep learning-based respiratory sound classification (RSC) has emerged as a promising non-invasive approach to assist clinical diagnosis. However, existing methods often face challenges, such as sub-optimal feature representation and limited model expressiveness. To address these issues, we propose an Attention-based Dual-stream Feature Fusion Network (ADFF-Net). Built upon the pre-trained Audio Spectrogram Transformer, ADFF-Net takes Mel-filter bank and Mel-spectrogram features as dual-stream inputs, while an attention-based fusion module with a skip connection is introduced to preserve both the raw energy and the relevant tonal variations within the multi-scale time–frequency representation. Extensive experiments on the ICBHI2017 database with the official train–test split show that, despite critical failure in sensitivity of 42.91%, ADFF-Net achieves state-of-the-art performance in terms of aggregated metrics in the four-class RSC task, with an overall accuracy of 64.95%, specificity of 81.39%, and harmonic score of 62.14%. The results confirm the effectiveness of the proposed attention-based dual-stream acoustic feature fusion module for the RSC task, while also highlighting substantial room for improving the detection of abnormal respiratory events. Furthermore, we outline several promising research directions, including addressing class imbalance, enriching signal diversity, advancing network design, and enhancing model interpretability.

Article activity feed