Wearable sleep staging using photoplethysmography and accelerometry across sleep apnea severity: a focus on very severe sleep apnea

Sho Ogaki
Michiru Kaneda
Tomoyuki Nohara
Syuhei Fujita
Naoshi Osako
Tomoko Yagi
Yasuhiro Tomita
Takanori Ogata

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Study Objectives

To evaluate wearable sleep staging across sleep apnea severity, in-cluding very severe sleep apnea defined as an apnea–hypopnea index (AHI) ≥ 50 events/h, and to assess how training-set composition affects performance in this subgroup.

Methods

We analyzed 552 overnight recordings: 318 from the Sleep Lab Dataset and 234 from the Hospital Dataset, of which 26.5% (N=62) had very severe sleep apnea. A deep learning model performed sleep staging from photoplethysmography-derived RR intervals and accelerometry recorded by a wrist-worn device. Baseline performance was assessed by 4-fold cross-validation using randomly partitioned folds from the combined datasets. We examined night-level associations with AHI severity. We also compared the baseline model with an ablation model trained on the same number of recordings but with all Sleep Lab Dataset recordings and lower-AHI Hospital Dataset recordings, evaluating both in the very severe subgroup.

Results

For 5-stage classification, Cohen’s kappa was 0.586 in the Sleep Lab Dataset and 0.446 in the Hospital Dataset. Under 4-stage staging, the gap narrowed, with kappa values of 0.632 and 0.525, respectively. In the Hospital Dataset, kappa declined with AHI severity, with median kappa differing by about 0.2 between mild and very severe groups. In the very severe subgroup, kappa decreased from 0.365 (baseline) to 0.303 (ablation).

Conclusions

Wearable sleep staging performance tended to decline across greater sleep apnea severity. Clinical utility may benefit from training data spanning the target severity spectrum and staging granularity matched to the intended use.

Statement of Significance

Repeated laboratory polysomnography is impractical for long-term sleep apnea management. Wearable sleep staging could support scalable monitoring, yet its reliability in clinically severe sleep apnea has remained unclear. This study developed and evaluated a wearable sleep staging approach in both sleep-laboratory and hospital cohorts. The hospital cohort included many severe and very severe cases. Performance was lower in the hospital cohort and declined with greater sleep apnea severity. A coarser staging scheme reduced the gap between cohorts, and models trained without representative very severe cases performed worse in this target population. These findings highlight the value of severity-aware model development and motivate future multi-night home validation with reliability cues.

Version published to 10.64898/2026.04.09.26350266 on medRxiv
Apr 10, 2026

Delayed Arousal Response to Sleep Apnea Encodes Mortality

This article has 9 authors:
1. Jiahao Fan
2. M. Brandon Westover
3. Yue Leng
4. Guo-Qiang Zhang
5. Katie L Stone
6. Susan Redline
7. Robert J. Thomas
8. Licong Cui
9. Haoqi Sun
This article has no evaluationsLatest version May 21, 2026
The Sleep-Wake Classification Performance of Pediatric-Trained Machine Learning Algorithms for Raw Accelerometer Data

This article has 9 authors:
1. Pin-Wei Chen
2. Christopher Cielo
3. Olivia Walch
4. Morgan McDonald
5. Peter X.K. Song
6. Cathy Goldstein
7. Jennette P. Moreno
8. Erica C. Jansen
9. Jonathan A. Mitchell
This article has no evaluationsLatest version Jun 1, 2026
Wearable Evidence Linking Dyskinesia Burden to Sleep Quality in Parkinson’s Disease

This article has 14 authors:
1. Viktoria Azoidou
2. Essa Bhadra
3. Ellen Camboe
4. Kamalesh C. Dey
5. Alexandra Zirra
6. Kira Rowsell
7. Corrine Quah
8. Caroline Budu
9. Thomas Boyle
10. David Gallagher
11. Jonathan P. Bestwick
12. Laura Pérez-Carbonell
13. Alastair J Noyce
14. Cristina Simonet
This article has no evaluationsLatest version Jun 2, 2026

Discuss this preprint

Listed in

Abstract

Study Objectives

Methods

Results

Conclusions

Statement of Significance

Article activity feed

Related articles

Delayed Arousal Response to Sleep Apnea Encodes Mortality

The Sleep-Wake Classification Performance of Pediatric-Trained Machine Learning Algorithms for Raw Accelerometer Data

Wearable Evidence Linking Dyskinesia Burden to Sleep Quality in Parkinson’s Disease