A Reproducible Pipeline for Processing Commercial Wearable Step-Count Data in Aging Cohorts: Application and Evaluation in the STRRIDE-PD Reunion Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Wearable devices offer the ability to objectively characterize free-living physical activity; however, raw step-count data generated by commercial devices require systematic processing before they can support rigorous inference. We describe a transparent, reproducible standard operating procedure (SOP) for transforming epoch-level step-count data from commercial Garmin devices into participant-level analytic variables and demonstrate its application in the STRRIDE-PD Reunion study: a long-term follow-up of older adults originally enrolled in a supervised exercise intervention trial. This data pipeline standardizes timestamps, reconstructs daily epoch grids, infers wear time from observed step patterns, and applies a prespecified valid-day threshold (≥10 hours inferred wear time) to generate participant-level summaries. Among 67 participants (mean age 71.4 years, 65.7% women), the median valid-day count was 10 days, median average daily steps were 5,794, and participant-level estimates were identical across ≥10-hour and ≥6-hour valid-day thresholds. Wearable-derived step counts were significantly associated with 9 of 16 cardiometabolic and fitness outcomes, including cardiorespiratory fitness, body composition, and lipid profiles. By contrast, self-reported exercise – assessed via a frequency-by-duration composite ranked into deciles – was not significantly associated with any outcome. A regression calibration framework applied to the full sample quantified the attenuation underlying this discrepancy: the naive self-report model systematically underestimated associations relative to both the observed Garmin model and calibration-corrected estimates. These findings demonstrate that measurement approach is a determinant of scientific conclusions in physical activity research, and that reproducible wearable data pipelines are essential infrastructure for aging epidemiology.
Highlights
-
A reproducible standard operating procedure processed Garmin step-count data without wear-time indicators.
-
Wearable steps predicted 9 of 16 outcomes; self-reported exercise predicted none.
-
Regression calibration revealed that self-report systematically underestimated physical activity-health associations.
-
Measurement approach determines physical activity-health conclusions in aging cohort research.