Applying a Transformer-based machine-learning model to classify caregiver and infant behaviours during dyadic interactions.

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multimodal caregiver-infant interactions have both concurrent and long-term impacts of child attention, cognitive and social skills. These multimodal behaviours are manually coded by human researchers, making these approaches susceptible to observer bias, dependent on inter-rater reliability, and substantial demands on time and resources. In this study, we aimed to develop a multimodal machine-learning model that could be capable of automatically detecting and classifying multimodal behaviours from video recordings of caregivers and their infants (N=81; infant mean age = 251.334.9 days) engaging with objects. We focused on four behaviors i.e., caregiver scaffolding, caregiver intrusiveness, infant object engagement and infant distractibility. Low-level features from audio, video, and pose data were extracted using specific AI models, and input into a Transformer-based architecture capable of learning temporal patterns across modalities. Our findings revealed a significant contrast in model performance depending on how the data was partitioned. When the dataset was split such that data from all dyads contributed to the training, validation, and test sets - the models achieved notably high classification accuracy of over 98%. However, following 5-fold cross-validation with dyad-level separation, ensuring that test set dyads were entirely unseen during training and validation, the performance dropped markedly to ~55%. These results suggest that the models did not learn behaviors of interest but instead relied on video-specific or dyad-specific details. This work lays a foundation for future research aimed at refining these models and extending their applicability across diverse caregiving contexts.

Article activity feed