Penalized communication-efficient algorithm for quantile regression with high-dimensional and large-scale longitudinal data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
With the development of science and technology, the massive datasets distributed across multiple machines are becoming increasingly common. Traditional statistical methods are often not feasible for analyzing large-scale datasets due to excessive computing time, memory limitations, high communication costs, and privacy concerns. In this paper, we develop a penalized communication-efficient smoothed quantile regression approach for high-dimensional and large-scale longitudinal data in a distributed computing environment. Firstly, we transform the smoothed quantile regression into a weighted least-square regression using Newton's method. Secondly, we construct a surrogate loss function based on the Taylor expansion of the proposed weighted quadratic loss. Subsequently, we derive penalized surrogate estimating equations from the penalized surrogate loss and provide algorithms for obtaining the final distributed iterated parameter estimators.Furthermore, we investigate the statistical properties of the proposed estimators under certain mild conditions. Simulation studies demonstrate that the proposed method outperforms the traditional non-distributed algorithms in terms of variable selection and parameter estimation accuracy and efficiency. Finally, we apply the method to a real dataset to illustrate its practical performance.