The Sleep-Wake Classification Performance of Pediatric-Trained Machine Learning Algorithms for Raw Accelerometer Data

Pin-Wei Chen
Christopher Cielo
Olivia Walch
Morgan McDonald
Peter X.K. Song
Cathy Goldstein
Jennette P. Moreno
Erica C. Jansen
Jonathan A. Mitchell

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction

Actigraphy sleep-wake classification methods increasingly seek to leverage raw acceleration data and machine-learning-based classification, but performance evaluation in pediatrics is limited. We trained machine-learning models using pediatric data and compared their sleep-wake classification performance with existing algorithms for children.

Methods

Sixty-five children (46% female, ages 5.3–17.7 years) completed in-lab overnight polysomnography and wore a GENEActiv device on their non-dominant wrist. The acceleration data were converted into 30-second epochs and aligned with physician-scored sleep-wake data from electroencephalography. Seven machine-learning models were trained using leave-one-subject-out cross-validation. Epoch-by-epoch analyses generated performance metrics (e.g., balanced accuracy [BA]) and discrepancy analyses provided overall sleep duration bias estimates. The combination of highest performance and least bias was used to rank using Euclidean distance scores - where a lower score represents closer to perfect performance and zero bias. For benchmarking, we included GGIR sleep scoring algorithms and an adult trained random forest classifier.

Results

Overall, 560.1 hours of polysomnography and actigraphy data were collected (74.4% of epochs were scored as sleep). The pediatric-trained local-global long-short term memory (LSTM) classifier had the most optimal epoch-by-epoch performance (e.g., BA=0.85, sensitivity=0.88, specificity=0.83, ROC-AUC=0.95, and Cohen’s kappa=0.67). These metrics exceeded that of an adult-trained random forest classifier and GGIR-based algorithms. Discrepancy analyses revealed that overall sleep duration was underestimated by an average of 25 minutes using the LSTM classifier with no proportional bias.

Conclusion

We trained seven pediatric sleep-wake classifiers that had strong ability to detect sleep and wake, with the LSTM classifier being most optimal.

Version published to 10.64898/2026.05.28.26354364 on medRxiv
Jun 1, 2026

Wearable sleep staging using photoplethysmography and accelerometry across sleep apnea severity: a focus on very severe sleep apnea

This article has 8 authors:
1. Sho Ogaki
2. Michiru Kaneda
3. Tomoyuki Nohara
4. Syuhei Fujita
5. Naoshi Osako
6. Tomoko Yagi
7. Yasuhiro Tomita
8. Takanori Ogata
This article has no evaluationsLatest version Apr 10, 2026
A Grid-Search Framework for Dataset-Specific Calibration of Actigraphy Sleep Detection Algorithms

This article has 1 author:
1. Ali Rahjouei
This article has no evaluationsLatest version Apr 9, 2026
Psychological State Analysis of Swimmers Based on Machine Learning and Multi-dimensional Data

This article has 7 authors:
1. Xiaobin Wu
2. Hanbing Tian
3. Yi Guo
4. Jing Zhao
5. Yifei Wang
6. Lijie Qiu
7. Xijia Qian
This article has no evaluationsLatest version Apr 9, 2026

Discuss this preprint

Listed in

Abstract

Introduction

Methods

Results

Conclusion

Article activity feed

Related articles

Wearable sleep staging using photoplethysmography and accelerometry across sleep apnea severity: a focus on very severe sleep apnea

A Grid-Search Framework for Dataset-Specific Calibration of Actigraphy Sleep Detection Algorithms

Psychological State Analysis of Swimmers Based on Machine Learning and Multi-dimensional Data