Differentially Private Distributed Inference
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
How can agents exchange information to learn from each other despite their privacy needs and security concerns? Consider healthcare centers that want to collaborate on a multicenter clinical trial, but are concerned about sharing sensitive patient information. Preserving individual privacy and enabling efficient social learning are both important desiderata, but they seem fundamentally at odds. We attempt to reconcile these desiderata by controlling information leakage using statistical disclosure control methods based on differential privacy (DP). Our agents use log-linear rules to update their belief statistics after communicating with their neighbors. DP randomization of beliefs offers communicating agents with plausible deniability with regard to their private information and is amenable to rigorous performance guarantees for the quality of statistical inference. We consider two information environments: one for distributed maximum likelihood estimation (MLE) given a finite number of private signals available at the start of time and another for online learning from an infinite, intermittent stream of private signals that arrive over time. Noisy information aggregation in the finite case leads to interesting trade-offs between rejecting low-quality states and making sure that all high-quality states are admitted in the algorithm output. The MLE setting has natural applications to binary hypothesis testing that we formalize with relevant statistical guarantees. Our results flesh out the nature of the trade-offs between the quality of the inference, learning accuracy, communication cost, and the level of privacy protection that the agents are afforded. In simulation studies, we perform a differentially private, distributed survival analysis on real-world data from an AIDS Clinical Trials Group (ACTG) study to determine whether new treatments improve over standard care. In addition, we used data from clinical trials in advanced cancer patients to determine whether certain biomedical indices affect patient survival. We show that our methods can achieve privacy-preserving inference with significantly more efficient computations than existing privacy-aware methods based on homomorphic encryption, and at lower error rates compared to first-order differentially private distributed optimization methods.