Environmental Sound Classification Using Feature Fusion of MFCCs, Mel-spectrogram, and Chroma

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Environmental sound recognition allows automated systems to analyze and react to a wide range of acoustic surroundings, supporting applications in safety, surveillance, and intelligent technologies. Nevertheless, identifying environmental sounds remains difficult because of their varied categories and intricate acoustic characteristics, which include both natural and artificial sound sources. Capturing the fine-grained differences among these sounds requires a strong and informative feature representation. In this work, we introduce an environmental sound classification approach based on feature fusion. To achieve a comprehensive representation of audio signals, we integrated three distinct features: Mel-Frequency Cepstral Coefficients (MFCCs), mel spectrograms, and chroma Short-Time Fourier Transform (STFT). We evaluated the proposed approach using multiple classifiers, including Support Vector Machine (SVM), Long Short-Term Memory (LSTM), and Bidirectional Long Short-Term Memory (BiLSTM). The effectiveness of the proposed model was assessed using the UrbanSound8K dataset. Experimental results indicate that combining multiple audio features substantially improves classification performance compared to using single-feature representations. Notably, the BiLSTM-based model achieved a high classification accuracy of 93.81% on the UrbanSound8K dataset.

Article activity feed