Psychiatric Voice Biomarkers: Methodological flaws in pediatric populations

Hammza Jabbar Abd Sattar Hamoudi
Mon-Ju Wu
Marsal Sanches
Cesar A. Soutullo
Carolina Olmos
Leslie K. Taylor
Giovanna Zunta-Soares
Jair C. Soares
Benson Mwangi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction

Psychiatric assessments rely on patient self-reports, clinician observations, and standardized scales, while objective technological tools are currently not reliable enough to be utilized in a clinical setting. Voice may be utilized as a biomarker in different scenarios, including differential diagnosis, assessing symptom severity and predicting suicidality. However, its use depends on accurate automatic speech recognition (ASR). Current gold standard open source ASR systems are trained mainly on adult speech and perform poorly in children, limiting application in pediatric psychiatry.

Methods

We benchmarked two open-source ASR models—NVIDIA Parakeet and Whisper-small—on the Ohio Child Speech Corpus (303 children, ages 4–9), using the reference human transcripts provided with the dataset. Audio was standardized to each model’s expected sampling rate. No model fine-tuning or adaptation was performed. For each utterance, we computed word error rate (WER) and character error rate (CER), and assessed semantic fidelity using Sentence Mover’s Distance (SMD) and BERTScore F1. Metrics were summarized overall, stratified by single-year age bins (4, 5, 6, 7, 8, 9), and also grouped into two broader categories: younger children (ages 4–6) and older children (ages 7–9). We compared WER, CER, SMD, and BERTScore F1 across both age groups and evaluated age effects as trends using nonparametric statistical tests.

Results

Both models showed significant age effects where younger children had markedly higher word error rates (WER >40%) and character error rates (CER >30%) compared to older children (WER ∼30%, CER ∼20%). Sentence mover distance improved with age, while BERTScore F1 remained stable. Despite age-related improvements, overall transcription accuracy was low.

Discussion

Current commonly used open-source ASR systems are inadequate for pediatric audio transcription, specifically in younger children. In order to build clinically translatable tools, collecting child-specific data and model fine-tuning through structured speech paradigms is essential.

Version published to 10.1101/2025.10.13.25337901 on medRxiv
Oct 15, 2025

Pilot Study of Voice Biomarkers: Exploring Healthy Controls in a Non-Clinical Setting

This article has 7 authors:
1. Tara Chatty
2. Shreshtha Das
3. Corinthian Ewesuedo
4. Ezimma Onwuka
5. Waleed Shirwa
6. Paul C. Bryson
7. Colin K. Drummond
This article has no evaluationsLatest version Dec 14, 2025
Challenges in the Diagnosis of Autism Spectrum Disorder: Contributions from Speech-Language Pathology

This article has 3 authors:
1. Renata Barros
2. Isabela Rodriguez
3. Eric Ferreira
This article has no evaluationsLatest version Jan 12, 2026
Tools for Helping Identify Behavior Disorders: Comparing Bayesian Evidence-Based and Machine Learning Approaches

This article has 7 authors:
1. Yinuo Liu
2. Eric Arden Youngstrom
3. Caroline Bodary
4. Zhuoyu Shi
5. Jennifer Youngstrom
6. Ekaterina Stepanova
7. Robert L. Findling
This article has no evaluationsLatest version Dec 12, 2025

Discuss this preprint

Listed in

Abstract

Introduction

Methods

Results

Discussion

Article activity feed

Related articles

Pilot Study of Voice Biomarkers: Exploring Healthy Controls in a Non-Clinical Setting

Challenges in the Diagnosis of Autism Spectrum Disorder: Contributions from Speech-Language Pathology

Tools for Helping Identify Behavior Disorders: Comparing Bayesian Evidence-Based and Machine Learning Approaches