Classification errors distort findings in automated speech processing: examples and solutions from child-development research

Lucas Gautheron
Evan Kidd
Anton Malko
Marvin Lavechin
Alejandrina Cristia

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

With the advent of wearable recorders, scientists are increasingly turning to automated methods of analysis of audio and video data in order to measure children's experience, behavior, and outcomes, with a sizable literature employing long-form audio-recordings to study language acquisition. While numerous articles report on the accuracy and reliability of the most popular automated classifiers, less has been written on the downstream effects of classification errors on measurements and statistical inferences (e.g., the estimate of correlations and effect sizes in regressions). This paper proposes a Bayesian approach to study the effects of algorithmic errors on key scientific questions, including the effect of siblings on children's language experience and the association between children's production and their input. In both the most commonly used \gls{lena}, and an open-source alternative (the Voice Type Classifier from the ACLEW system), we find that classification errors can significantly distort estimates. For instance, automated annotations underestimated the negative effect of siblings on adult input by 20--80\%, potentially placing it below statistical significance thresholds. We further show that a Bayesian calibration approach for recovering unbiased estimates of effect sizes can be effective and insightful, but does not provide a fool-proof solution. Both the issue reported and our solution may apply to any classifier involving event detection and classification with non-zero error rates.

Version published to 10.31234/osf.io/u925y_v1 on OSF Preprints
Jul 3, 2025

Performance and biases of the LENA® and ACLEW algorithms in analyzing language environments in Down, Fragile X, Angelman syndromes, and populations at elevated likelihood for autism

This article has 4 authors:
1. Marvin Lavechin
2. Lisa R. Hamrick
3. Bridgette Kelleher
4. Amanda Seidl
This article has no evaluationsLatest version Jan 6, 2026
Cognition governs whether neural alpha indexes speech-in-noise outcomes

This article has 3 authors:
1. Julian Ockelmann
2. Stefan Elmer
3. Nathalie Giroud
This article has no evaluationsLatest version Jan 9, 2026
Characterizing the frequency and content of speech from other children in daylong recordings of infants’ input

This article has 3 authors:
1. Federica Bulgarelli
2. Marzie Samimifar
3. Giselle Yinjie Yao
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Performance and biases of the LENA® and ACLEW algorithms in analyzing language environments in Down, Fragile X, Angelman syndromes, and populations at elevated likelihood for autism

Cognition governs whether neural alpha indexes speech-in-noise outcomes

Characterizing the frequency and content of speech from other children in daylong recordings of infants’ input