Population-Scale Developmental Risk Screening from Retail Transaction Data Using Privacy-Preserving Federated Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Developmental delays have an adverse effect on the children of current society. There are one in six children who highlight the developmental delay, yet most identified for later than optimal intervention. This kind of delay is due to reliance on clinic-based screening and uneven access to pediatric expertise. We propose RetailHealth, a privacy-preserving digital phenotyping framework that leverages household retail purchasing patterns as an ambient behavioral signal for early estimation of developmental risk. The system integrates a clinician-aligned taxonomy (DevelopMap), temporal sequence models, and a federated learning protocol with formal differential privacy guarantees (ε < 1.0, δ = 10⁻⁵) to enable multi-retailer collaboration without sharing raw customer data. Using a behaviorally grounded synthetic cohort of 100,000 families, RetailHealth achieves AUROC values of 0.79 for language delay, 0.74 for motor delay, 0.84 for autism spectrum indicators, and 0.77 for ADHD indicators. The system identifies elevated developmental risk 8 to 14 months before typical clinical diagnosis, with 14.2-month lead time for autism spectrum indicators. Fairness analyses demonstrate equitable performance across income and geographic strata, with true positive rate gaps under 0.05. Privacy utility trade-off experiments show >=95% utility retention at ε = 0.5. Expert review by developmental clinicians yielded strong concordance (r = 0.76, p < 0.001) between model signals and clinical judgement. These results suggest that retail purchasing behavior can serve as a passive, scalable, and privacy-safe digital biomarker for developmental risk estimation, offering a pathway to broad, low-burden pediatric screening at a population scale.

Article activity feed