Applying a Transformer-based machine-learning model to classify caregiver and infant behaviours during dyadic interactions.

Alexander Turner
Aly Magassouba
Sobanawartiny Wijeakumar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multimodal caregiver-infant interactions have both concurrent and long-term impacts of child attention, cognitive and social skills. These multimodal behaviours are manually coded by human researchers, making these approaches susceptible to observer bias, dependent on inter-rater reliability, and substantial demands on time and resources. In this study, we aimed to develop a multimodal machine-learning model that could be capable of automatically detecting and classifying multimodal behaviours from video recordings of caregivers and their infants (N=81; infant mean age = 251.334.9 days) engaging with objects. We focused on four behaviors i.e., caregiver scaffolding, caregiver intrusiveness, infant object engagement and infant distractibility. Low-level features from audio, video, and pose data were extracted using specific AI models, and input into a Transformer-based architecture capable of learning temporal patterns across modalities. Our findings revealed a significant contrast in model performance depending on how the data was partitioned. When the dataset was split such that data from all dyads contributed to the training, validation, and test sets - the models achieved notably high classification accuracy of over 98%. However, following 5-fold cross-validation with dyad-level separation, ensuring that test set dyads were entirely unseen during training and validation, the performance dropped markedly to ~55%. These results suggest that the models did not learn behaviors of interest but instead relied on video-specific or dyad-specific details. This work lays a foundation for future research aimed at refining these models and extending their applicability across diverse caregiving contexts.

Version published to 10.31234/osf.io/wrv9s_v1 on OSF Preprints
Sep 25, 2025

Caregiver-infant behaviours during multi-object play are associated with infant visual working memory.

This article has 3 authors:
1. Sobanawartiny Wijeakumar
2. Christina Davidson
3. Aimee Theyer
This article has no evaluationsLatest version Sep 24, 2025
Caregivers’ multimodal actions scaffold word learning and vocabulary growth in the early years

This article has 5 authors:
1. Antonia Jordan-Barros
2. Francesco Cabiddu
3. Ed Donnellan
4. Yan Gu
5. Gabriella Vigliocco
This article has no evaluationsLatest version Nov 19, 2025
CrySenseNet: A Deep Learning-Based Acoustic Intelligence System for Decoding Infant Cries

This article has 7 authors:
1. Krishna S
2. Anushka B R
3. Swetha Saju
4. Amrutha K V
5. Devika S Babu
6. Sishu Shankar Muni
7. Swetha P
This article has no evaluationsLatest version Sep 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Caregiver-infant behaviours during multi-object play are associated with infant visual working memory.

Caregivers’ multimodal actions scaffold word learning and vocabulary growth in the early years

CrySenseNet: A Deep Learning-Based Acoustic Intelligence System for Decoding Infant Cries