Distributed Conformal Prediction Without Raw Data Sharing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Conformal prediction is a powerful framework for constructing distribution-free prediction regions with guaranteed coverage. Existing conformal prediction methods are often developed in a centralized setting, assuming that all data are accessible at a single location. However, data are often distributed across multiple machines due to privacy concerns or storage constraints. When sharing raw data across different machines is restricted, conventional conformal prediction methods become infeasible. In this paper, we first propose a distributed confor-mal prediction method that leverages random forests without sharing raw data across machines. To efficiently construct prediction intervals, we design a novel distributed quantile estimation algorithm using the bisection method. Furthermore , we investigate a distributed conformalized quantile regression method that adapts the interval lengths to heteroscedasticity in the data. To improve pre-dictive efficiency, especially under complex error distributions such as skewed or multimodal structures, we further introduce a distributed conformal prediction method based on the highest density set, yielding narrower and more informative prediction regions. We establish the upper and lower bounds on the coverage of the proposed methods. Numerical simulations as well as an illustrative airline data example demonstrate the effectiveness of our proposed methods.