Voice Stress Markers Are Orthogonal to Speech Disfluency Labels: A Large-Scale Analysis on SEP-28K
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The relationship between voice stress markers and speech disfluency events has not been systematically quantified at scale, despite both being targets of clinical assessment in stuttering populations. We examine correlations between four acoustic stress features—jitter, shimmer, fundamental frequency (F0) standard deviation, and a composite stress score—and five disfluency types (prolongation, block, sound repetition, word repetition, interjection) across 14,645 three-second clips from the SEP-28K dataset with valid pitch estimates. Using both Pearson and point-biserial correlations with Bonferroni correction for 20 comparisons, we find that all absolute correlations fall below 0.05, with all effect sizes negligible by Cohen's convention (|r| < 0.10). The strongest observed association (composite stress × prolongation, r = -0.050) explains only 0.25% of variance. Distribution comparisons between fluent and disfluent clips yield Cohen's d < 0.10 for all stress features. These findings suggest that, at least in terms of linear associations in this dataset, acoustic voice stress markers and disfluency labels carry largely non-overlapping information. While non-linear or conditional dependencies cannot be ruled out from marginal correlations alone, the negligible effect sizes suggest that multimodal speech assessment systems may benefit from treating disfluency detection and stress monitoring as separate modules rather than modeling them jointly. We release analysis code and detailed statistical outputs to support reproducibility.