Detecting Cry in Daylong Audio Recordings using Machine Learning: The Development and Evaluation of Binary Classifiers
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cry signals distress in infants/children and may serve as an early, ecologically valid, and scalable indicator of irritability, a transdiagnostic mental health risk marker. Machine learning may allow researchers to identify cry (among other vocalizations, environmental noise) in daylong audio recordings to predict neurodevelopmental outcomes. We (1) re-implemented an existing, and developed a novel, cry detection algorithm and (2) evaluated cry detection performance relative to ground truth (i.e., cry annotated by trained raters). In PyTorch, we re-implemented a support vector machine classifier that uses acoustic (e.g., chroma) and deep spectral features from a modified AlexNet. We developed a novel classifier combining Meta’s wav2vec 2.0 with conventional audio features and gradient boosting machines. Both classifiers were trained and evaluated using an open-source dataset (N=21) previously annotated for cry. In a new dataset (N=100), we annotated cry and examined the performance of both classifiers in identifying this ground truth. The existing and novel algorithms performed well in identifying ground truth cry in both the dataset in which they were developed (respective algorithm AUCs=0.897, 0.936) and the new dataset (AUCs=0.841, 0.902). This underscores generalization to unseen data. Bayesian comparison demonstrated that the novel, outperformed the existing, algorithm in identifying cry in both datasets; improvement in cry identification in noisy, natural environments is attributed to the novel algorithm’s feature space and use of gradient boosting machines. This research provides a foundation for efficient cry detection for neurodevelopmental studies, with implications for early identification of irritability and psychopathology and for intervention.