Acoustic Feature Synergy and Self-Supervised Learning for Robust Tabla Stroke Classification

Jaipreet Kaur
Rajdeep Singh Sohal
Manbir Kaur
Satinder Kaur

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate and robust automatic classification of tabla strokes is essential for music information retrieval and performance analysis, yet remains challenging due to complex timbral structures and subtle acoustic variability across stroke categories. To address this challenge, we propose a robust tabla stroke classification framework that integrates multidomain handcrafted features, spanning spectral, temporal, cepstral, and perceptual descriptors together with self-supervised learning (SSL) representation derived from a newly developed, manually annotated tabla dataset. This dataset is accompanied by an augmented counterpart that simulates realistic acoustic variability, enabling systematic evaluation under domain shift conditions. ANOVA F-test based feature selection is applied to retain the most discriminative attributes, and a range of machine learning classifiers are employed. Experimental results show that multidomain feature synergy significantly improves classification performance, with the Hybrid-8 configuration achieving up to 97.56% accuracy under in-domain evaluation, while SSL representation exhibits superior cross-domain robustness, attaining 94.07% accuracy when trained on original data and tested on augmented data. While handcrafted multidomain features yield near-ceiling accuracy in controlled settings, SSL representation provides stronger resilience to acoustic variability. These findings reveal a trade-off between peak discriminative performance and cross-domain generalization, highlighting the complementary strengths of handcrafted features and SSL representation for developing robust and generalizable tabla stroke classification system.

Version published to 10.21203/rs.3.rs-9043295/v1 on Research Square
Mar 24, 2026

A Comprehensive Review in Unimodal and Multimodal Emotion Recognition

This article has 39 authors:
1. Jiachen Luo
2. Qu Yang
3. Jiajun He
4. Yining Hua
5. Zheng Lian
6. Yuanchao Li
7. Siyang Song
8. Wen Wu
9. Dingdong Wang
10. Shuai Shen
11. Jingyao Wu
12. Guimin Hu
13. He Hu
14. Yong Li
15. Zixing Zhang
16. Jiadong Wang
17. Sifan Zhou
18. Zuojin Tang
19. Canran Xiao
20. Sheng Xu
21. Zhenjun Zhao
22. Xiangyang Xue
23. Sicheng Zhao
24. Yong Dai
25. Tomoki Toda
26. Licai Sun
27. Kailai Yang
28. Liyun Zhang
29. Cong Cai
30. Jiamin Du
31. Ziyang Ma
32. Mingjie Chen
33. Chengxuan Qian
34. Zhenlong Yuan
35. Xie Chen
36. Huy Phan
37. Lin Wang
38. Björn Schuller
39. Joshua Reiss
This article has no evaluationsLatest version Mar 30, 2026
Evaluating Early, Late and Hybrid Fusion in Multimodal Emotion Detection with Pretrained Models

This article has 3 authors:
1. Syed Riyas Ahamed
2. Sandip Saha
3. Awani Bhushan
This article has no evaluationsLatest version Apr 13, 2026
Automated Yoga Pose Classification Using Deep Learning on Image-Based Datasets

This article has 8 authors:
1. Anish Antony
2. M.A.H. Farquad
3. Ashvini Alashetty
4. Sachin Kumar
5. Punitkumar Basavaraj Nayak
6. Geethanjali P P
7. sachin sharma
8. ( Kamanuri Sekhar) Sekhar K. Sekhar
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Comprehensive Review in Unimodal and Multimodal Emotion Recognition

Evaluating Early, Late and Hybrid Fusion in Multimodal Emotion Detection with Pretrained Models

Automated Yoga Pose Classification Using Deep Learning on Image-Based Datasets