VALORIS: One-shot and lossless vertical logistic regression for privacy-protecting multi-site health analytics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Health analytics increasingly relies on variables held by different entities, such as clinical, laboratory, environmental, and genomic data. Due to legal, ethical, and social acceptability constraints, these vertically partitioned data often cannot be shared across organizations holding them. Conducting statistical analyses in such settings requires methods that protect privacy. We introduce VALORIS (Vertically partitioned Analytics under the LOgistic Regression model for Inference in Statistics), a novel method that enables lossless statistical inference (equivalent to the pooled analyses) under a logistic regression model without disclosing any individual-level data—including the outcome variable. VALORIS is a practical, one-shot algorithm that requires no third-party coordinator. The privacy-preserving properties of VALORIS were mathematically assessed, and a privacy-aware setting-dependent framework was provided to ensure individual-data privacy. We demonstrate the accuracy and feasibility of VALORIS through the investigation of potential factors associated with kidney failure among pediatric patients with chronic kidney disease using real health data from Necker–Enfants Malades Hospital. We further validate the proposed algorithm on a larger scale with a reproducible application using the MIMIC-IV database.