Adaptive Baseline Calibration for Voice Stress Assessment in Speech Disfluency Monitoring

Nazar Kozak

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Voice stress assessment systems commonly employ fixed thresholds for classifying acoustic features (jitter, shimmer, F0 variability) into stress levels. We show that fixed thresholds produce highly skewed stress score distributions when applied to diverse speakers, with 61.4% of clips scored as high-stress (≥0.8) in the SEP-28K dataset—likely an artifact of inter-speaker vocal variability rather than genuine stress variation, given the informal podcast recording context. We propose an adaptive baseline algorithm using Welford's online algorithm for per-speaker calibration, followed by exponential moving average tracking. Applied to 14,645 clips with valid pitch estimates, the adaptive approach produces a more symmetric distribution (μ=0.530, σ=0.162) with substantially fewer extreme scores. We note that in the absence of ground-truth stress labels, we evaluate calibration quality by distribution shape rather than classification accuracy—a limitation shared by most voice stress analysis systems. We additionally report that YIN-based pitch detection achieves 98.1% F0 extraction rate on SEP-28K, compared to 12.1% with naive autocorrelation—a prerequisite for reliable voice stress features. We discuss implications for pediatric speech applications, where children's vocal characteristics (F0 range 250–400 Hz) differ substantially from adults and make fixed thresholds particularly problematic. The adaptive baseline algorithm is implemented in DisfluoSDK, an on-device framework for speech disfluency monitoring.

Version published to 10.31224/6768
Apr 7, 2026

Voice Stress Markers Are Orthogonal to Speech Disfluency Labels: A Large-Scale Analysis on SEP-28K

This article has 1 author:
1. Nazar Kozak
This article has no evaluationsLatest version Apr 7, 2026
Speech-Adaptive Detection of Unnatural Intra-Sentential Pauses Using Contextual Anomaly Modeling for Interpreter Training

This article has 7 authors:
1. Hyoeun Kang
2. Jin-Dong Kim
3. Juriae Lee
4. Hee-Jo Nam
5. Kon Woo Kim
6. Joowon Lim
7. Hyun-Seok Park
This article has no evaluationsLatest version Apr 3, 2026
Severity-Dependent Speech Characteristics and Clear Speech Response in Parkinson’s Disease: Perceptual, Acoustic, and Lingual Kinematic Findings

This article has 3 authors:
1. Austin Thompson
2. Lifeng Lin
3. Yunjung Kim
This article has no evaluationsLatest version Mar 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Voice Stress Markers Are Orthogonal to Speech Disfluency Labels: A Large-Scale Analysis on SEP-28K

Speech-Adaptive Detection of Unnatural Intra-Sentential Pauses Using Contextual Anomaly Modeling for Interpreter Training

Severity-Dependent Speech Characteristics and Clear Speech Response in Parkinson’s Disease: Perceptual, Acoustic, and Lingual Kinematic Findings