What is the retest reliability of computationally extractable speech and language markers?

DERYA Cokal
Martin Villalba
Rui He
Claudio Flores Palominos
Annkathrin Böke
Philipp Homan
Klaus von Heusinger
Joseph Kambeitz
Wolfram Hinzen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Speech is a signal rich in information about cognitive and affective states, which can be of high clinical utility for detecting and monitoring mental health conditions. Numerous studies have employed natural language processing (NLP) and AI-based language models to derive potential psychological and neurocognitive insights from spontaneous speech as a marker. However, only few studies have investigated the test-retest reliability of commonly used features, a basic psychometric property crucial to clinical applications. In the present study, we use a crowdsourcing approach to test the reliability of a comprehensive set of speech- and language markers across three speech elicitation tasks (free speech, picture and cartoon descriptions) and four time points, using the intra-class correlation coefficient (ICC). We also explore the underlying factor structure of the feature space through an exploratory factor analysis (EFA). Results indicate that acoustic-prosodic features exhibit high test-retest reliability across all sessions. In contrast, semantic measures (e.g., semantic similarity, information density, and perplexity), speech quantity metrics, and syntactic complexity, exhibit low reliability, even when stimulus materials used for speech elicitation were kept identical. Although semantic features showed strong within-subject variability, EFA across the feature space revealed a latent factor specifically comprising BERT-based semantic features with a moderate-to-high ICC of 0.76. There was some limited evidence of free speech showing lower ICCs across tasks. Demographic, emotional and physical state factors contributed negligibly to ICC variance, indicating that these external factors had minimal impact on the consistency of the acoustic and semantic features. Overall, we find that acoustic-prosodic and text-based features have crucially different psychometric properties, with the latter showing low test-retest reliability individually, though semantic features form intercorrelated clusters that are more stable and capture significant aspects of the variance.

Version published to 10.31234/osf.io/zu63y_v1 on OSF Preprints
Jul 28, 2025

Test-Retest Reliability Analysis of Resting-state EEG Measures and Their Association with Long-Term Memory in Children and Adults

This article has 6 authors:
1. Anastasios Ziogas
2. Simon Ruch
3. Nicole H. Skieresz
4. Sandy C. Marca
5. Nicolas Rothen
6. Thomas P. Reber
This article has no evaluationsLatest version Aug 6, 2025
High-level Speech Processing During Mind-Wandering: Evidence from Neural Alignment with Language Models

This article has 5 authors:
1. Gal R Chen
2. Rachel Finkelshtein
3. Ariel Goldstein
4. Ran R Hassin
5. Leon Y Deouell
This article has no evaluationsLatest version Aug 31, 2025
Automated Speech-Fluency Explanations for Schizophrenia Diagnosis

This article has 4 authors:
1. Rok Rajher
2. Mila Marinković
3. Polona Rus Prelog
4. Jure Žabkar
This article has no evaluationsLatest version Sep 9, 2025

Listed in

Abstract

Article activity feed

Related articles

Test-Retest Reliability Analysis of Resting-state EEG Measures and Their Association with Long-Term Memory in Children and Adults

High-level Speech Processing During Mind-Wandering: Evidence from Neural Alignment with Language Models

Automated Speech-Fluency Explanations for Schizophrenia Diagnosis