Psychometric Validation of a Deep Learning-based Foreground Speech Detection Algorithm for Everyday Conversation Detection

Amanda Marie Bernal
Johannes Leonhard Klinz
Valeria Pfeifer
David Sbarra
Charles L Raison
Nicole Nugent
Rajat Hebbar
Shrikanth Narayanan
Matthias R. Mehl

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Passive ambient audio sampling bears great potential for objectively measuring daily social activity and its association with wellbeing and health. However, detecting true conversations via human annotation is labor-intensive, and automatic labeling via audio signal processing has thus far only received proof-of-concept validation. Here, we conduct a comprehensive psychometric validation of a deep learning-based foreground speech detection algorithm for conversation activity detection (CAD) from ambient audio sampled with the Electronically Activated Recorder (EAR) method (Hebbar et al., 2021). We assess the CAD algorithm’s validity as an objective measure of conversation activity using four archival EAR datasets with human ground-truth conversation annotations (N = 566 participants, n = 167,539 audio recordings). Specifically, we evaluate, across the four samples, the degree to which the CAD algorithm converges with human-annotated conversation activity, yields temporal stability estimates of conversation activity, and replicates patterns of external correlates (with demographic, wellbeing, and personality measures) comparable to those derived from human ground-truth annotations. We further compare the distributional properties of conversation activity derived from the CAD algorithm and human ground-truth annotations and use this information for thresholding the algorithm’s continuous conversation activity estimates. Overall, the CAD algorithm evidences strong psychometric properties for estimating conversation activity across a range of participants and study characteristics, suggesting that it is suitable for at-scale deployment to objectively measure daily socializing from passively sampled ambient audio.

Version published to 10.31234/osf.io/xca5j_v1 on OSF Preprints
Mar 6, 2026

Real-Time Audio–Visual Emotion Detection for Human–AI Interaction Using a Cross-Modal Transformere

This article has 3 authors:
1. Nanda Gopal Malladi
2. Vrisheeka Mulakala
3. Deepa N
This article has no evaluationsLatest version Mar 12, 2026
Unsupervised Cross-Domain Adaptation for Wearable-based Human Activity Recognition

This article has 4 authors:
1. Indrajeet Ghosh
2. Garvit Chugh
3. Abu Zaher Md Fari
4. Nirmalya Roy
This article has no evaluationsLatest version Mar 12, 2026
Personalized Worker Physiological Load Assessment Using Multimodal Wearable PPG Analysis and Activity Recognition

This article has 3 authors:
1. Olena Litovska
2. Myroslav Mishchuk
3. Olena Pavliuk
This article has no evaluationsLatest version Mar 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Real-Time Audio–Visual Emotion Detection for Human–AI Interaction Using a Cross-Modal Transformere

Unsupervised Cross-Domain Adaptation for Wearable-based Human Activity Recognition

Personalized Worker Physiological Load Assessment Using Multimodal Wearable PPG Analysis and Activity Recognition