Classifying Child-Directed Speech and Shared Book Reading from LENA: Language-Specific Modeling and Temporal Resolution Effects

Sukhwan Jung
Jun Ho Chai
Ioana Buhnila
Eon-Suk Ko

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Language directed to children predicts early language development, motivating efforts to automate annotation in large-scale naturalistic corpora. Prior validation has focused on Western languages, underexploring typologically distinct systems such as Korean. This study evaluated machine learning–based classification of caregiver–child interactional contexts in daylong Korean recordings, addressing three research objectives: examining the relative performance of cross-linguistic transfer versus language-specific training; evaluating the impact of 1-minute versus 5-minute temporal resolution in training; and exploring the automatic detection of shared book reading (BR), a low-frequency but developmentally important subtype of child-directed speech. The model pretrained on English and Spanish recordings generalize poorly to Korean data, while training the same model architecture on Korean recordings substantially improved performance. This indicates that patterns captured in LENA-derived acoustic and conversational features may not be readily portable across languages without adaptation. Automated detection of shared book reading showed moderate reliability, likely reflecting its sparse distribution in naturalistic data, though classification performance showed strong discriminative ability. These findings support the feasibility of scalable, automated analysis of early language environments in a non-Western context, and highlight the importance of language-specific training for extending automated approaches across diverse linguistic and cultural contexts.

Version published to 10.31234/osf.io/dbfsu_v1 on OSF Preprints
Feb 26, 2026

Learning from the input: a corpus-based investigation of Chinese classifiers in children’s books and child-directed speech

This article has 5 authors:
1. Jinyu Shi
2. Yaling Hsiao
3. Yifan Yang
4. Elizabeth Wonnacott
5. Kate Nation
This article has no evaluationsLatest version Feb 4, 2026
Learning from the input: a corpus-based investigation of Chinese classifiers in children’s books and child-directed speech

This article has 5 authors:
1. Jinyu Shi
2. Yaling Hsiao
3. Yifan Yang
4. Elizabeth Wonnacott
5. Kate Nation
This article has no evaluationsLatest version Feb 4, 2026
Learning from the input: a corpus-based investigation of Chinese classifiers in children’s books and child-directed speech

This article has 5 authors:
1. Jinyu Shi
2. Yaling Hsiao
3. Yifan Yang
4. Elizabeth Wonnacott
5. Kate Nation
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Learning from the input: a corpus-based investigation of Chinese classifiers in children’s books and child-directed speech

Learning from the input: a corpus-based investigation of Chinese classifiers in children’s books and child-directed speech

Learning from the input: a corpus-based investigation of Chinese classifiers in children’s books and child-directed speech