Wearable sleep staging using photoplethysmography and accelerometry across sleep apnea severity: a focus on very severe sleep apnea

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Study Objectives

To evaluate wearable sleep staging across sleep apnea severity, in-cluding very severe sleep apnea defined as an apnea–hypopnea index (AHI) 50 events/h, and to assess how training-set composition affects performance in this subgroup.

Methods

We analyzed 552 overnight recordings: 318 from the Sleep Lab Dataset and 234 from the Hospital Dataset, of which 26.5% (N=62) had very severe sleep apnea. A deep learning model performed sleep staging from photoplethysmography-derived RR intervals and accelerometry recorded by a wrist-worn device. Baseline performance was assessed by 4-fold cross-validation using randomly partitioned folds from the combined datasets. We examined night-level associations with AHI severity. We also compared the baseline model with an ablation model trained on the same number of recordings but with all Sleep Lab Dataset recordings and lower-AHI Hospital Dataset recordings, evaluating both in the very severe subgroup.

Results

For 5-stage classification, Cohen’s kappa was 0.586 in the Sleep Lab Dataset and 0.446 in the Hospital Dataset. Under 4-stage staging, the gap narrowed, with kappa values of 0.632 and 0.525, respectively. In the Hospital Dataset, kappa declined with AHI severity, with median kappa differing by about 0.2 between mild and very severe groups. In the very severe subgroup, kappa decreased from 0.365 (baseline) to 0.303 (ablation).

Conclusions

Wearable sleep staging performance tended to decline across greater sleep apnea severity. Clinical utility may benefit from training data spanning the target severity spectrum and staging granularity matched to the intended use.

Statement of Significance

Repeated laboratory polysomnography is impractical for long-term sleep apnea management. Wearable sleep staging could support scalable monitoring, yet its reliability in clinically severe sleep apnea has remained unclear. This study developed and evaluated a wearable sleep staging approach in both sleep-laboratory and hospital cohorts. The hospital cohort included many severe and very severe cases. Performance was lower in the hospital cohort and declined with greater sleep apnea severity. A coarser staging scheme reduced the gap between cohorts, and models trained without representative very severe cases performed worse in this target population. These findings highlight the value of severity-aware model development and motivate future multi-night home validation with reliability cues.

Article activity feed