Scalable Nonlinear Cox Modeling via Random Fourier Features with Analytic Uncertainty

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: The Cox proportional hazards model often fails to capture complex biomedical risk structures, such as U-shaped biomarker associations, due to its assumption of linearity between the log-hazard and covariates. While existing kernel-based generalizations offer the necessary flexibility, their 0 ( n 3 ) computational complexity limits applicability in large-scale cohort studies. Furthermore, most non-linear machine learning methods lack closed-form analytical measures of uncertainty for individual predictions. Methods: We developed a novel Random Fourier Features-based Cox regression approach (RFF-Cox) to model non-linear risk relationships within a scalable framework. By mapping stationary kernels into a finite-dimensional explicit feature space, the method reduces computational complexity to 0(nm 2 ) . Model parameters are estimated via the Newton–Raphson algorithm on a ridge-regularized partial likelihood, while the bandwidth parameter (σ) is automatically optimized using a marginal likelihood criterion based on the Laplace approximation. A distinguishing feature of our approach is the stabilization of the Fisher information matrix via eigen-decomposition, enabling the generation of analytical 95% confidence intervals for individual survival estimates through the delta method and log–log transformation. Performance was evaluated using controlled simulations and six real-world datasets with sample sizes ranging from 432 to 9,105. Results: In simulation scenarios, the RFF-Cox model demonstrated a marked accuracy advantage over the classical Cox model in capturing U-shaped risk functions (C-index: 0.84 vs. 0.68). In real-world applications, the model exhibited discriminatory power competitive with Random Survival Forests and Gradient Boosting methods while showing superior computational efficiency; for instance, training time on the SUPPORT2 dataset was reduced from 126 seconds to 1.4 seconds. IPCW-weighted calibration analyses yielded low Integrated Calibration Error (ICI < 0.05) across all time horizons, confirming the reliability of probability estimates. Moreover, uncertainty in individual predictions, quantified via analytical confidence intervals, varied significantly across risk groups. Conclusions: RFF-Cox provides a practical survival analysis framework that combines automatic hyperparameter selection, computational efficiency, and transparent reporting of statistical uncertainty. The method overcomes the limitations of classical linear models while offering the speed and interpretability required to serve as a viable alternative to machine learning algorithms in large-scale data settings.

Article activity feed