Investigating Sibilant Fricative Representation in Bangla Telemedicine Speech: A Cost-Aware Sampling Rate Optimization Study

Prajat Paul
Mohamed Mehfoud Bouh
Manan Vinod Shah
Forhad Hossain
Ashir Ahmed

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automatic speech recognition has advanced rapidly for high-resource languages, yet performance remains limited for low-resource languages such as Bangla, particularly in telehealth settings. Most systems rely on a standardized 16 kHz sampling rate, a design choice despite evidence that Bangla contains sibilant fricatives and other phonetic cues with substantial high-frequency energy that may be suppressed under bandwidth and latency constraints. This study evaluates audio sampling rate as a controllable signal-level parameter for Bangla telehealth ASR to identify an empirically grounded operating range balancing transcription accuracy, execution time, and network bandwidth. Twenty real-world Bangla doctor–patient consultations recorded at 32 kHz were deterministically resampled to 55 configurations between 8 kHz and 32 kHz and transcribed using a fixed cloud-based ASR system. Session-level Word Error Rate, execution latency, payload bandwidth, and high-frequency phonetic content were analyzed using a composite sibilant-likelihood score. WER decreased from 0.338 at 8 kHz to a local minimum of 0.232 at 18.75 kHz, with gains plateauing beyond this range despite substantial bandwidth increases. Elbow-point, Pareto frontier, weighted scoring, and Minimum Acceptable Trade-off analyses converged on an optimal region between 17.25 and 18.75 kHz, demonstrating that sampling-rate optimization improves ASR accuracy without proportional resource costs in telehealth settings.

Version published to 10.20944/preprints202603.1320.v1
Mar 17, 2026

Benchmarking Self-Supervised Speech Models on Multilingual Nigerian Speech

This article has 2 authors:
1. Omotayo Omoyemi
2. Ifeoluwa Oladeni
This article has no evaluationsLatest version Mar 20, 2026
Disentangling sociophonetic and physiological variation in /s/ acoustics across 12 languages

This article has 3 authors:
1. Massimo Lipari
2. Morgan Sonderegger
3. Meghan Clayards
This article has no evaluationsLatest version Mar 28, 2026
Morse Code Based ESP32 Communication with LLM Integration for Healthcare Applications

This article has 4 authors:
1. S. V. Ashok Sainaadh
2. M. Neil Kumar
3. B. Sai Sundhar Reddy
4. Mithun Kumar Kar
This article has no evaluationsLatest version Mar 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking Self-Supervised Speech Models on Multilingual Nigerian Speech

Disentangling sociophonetic and physiological variation in /s/ acoustics across 12 languages

Morse Code Based ESP32 Communication with LLM Integration for Healthcare Applications