Scalable Nonlinear Cox Modeling via Random Fourier Features with Analytic Uncertainty

Fahrettin KAYA

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: The Cox proportional hazards model often fails to capture complex biomedical risk structures, such as U-shaped biomarker associations, due to its assumption of linearity between the log-hazard and covariates. While existing kernel-based generalizations offer the necessary flexibility, their 0 ( n ³ ) computational complexity limits applicability in large-scale cohort studies. Furthermore, most non-linear machine learning methods lack closed-form analytical measures of uncertainty for individual predictions. Methods: We developed a novel Random Fourier Features-based Cox regression approach (RFF-Cox) to model non-linear risk relationships within a scalable framework. By mapping stationary kernels into a finite-dimensional explicit feature space, the method reduces computational complexity to 0(nm ² ) . Model parameters are estimated via the Newton–Raphson algorithm on a ridge-regularized partial likelihood, while the bandwidth parameter (σ) is automatically optimized using a marginal likelihood criterion based on the Laplace approximation. A distinguishing feature of our approach is the stabilization of the Fisher information matrix via eigen-decomposition, enabling the generation of analytical 95% confidence intervals for individual survival estimates through the delta method and log–log transformation. Performance was evaluated using controlled simulations and six real-world datasets with sample sizes ranging from 432 to 9,105. Results: In simulation scenarios, the RFF-Cox model demonstrated a marked accuracy advantage over the classical Cox model in capturing U-shaped risk functions (C-index: 0.84 vs. 0.68). In real-world applications, the model exhibited discriminatory power competitive with Random Survival Forests and Gradient Boosting methods while showing superior computational efficiency; for instance, training time on the SUPPORT2 dataset was reduced from 126 seconds to 1.4 seconds. IPCW-weighted calibration analyses yielded low Integrated Calibration Error (ICI < 0.05) across all time horizons, confirming the reliability of probability estimates. Moreover, uncertainty in individual predictions, quantified via analytical confidence intervals, varied significantly across risk groups. Conclusions: RFF-Cox provides a practical survival analysis framework that combines automatic hyperparameter selection, computational efficiency, and transparent reporting of statistical uncertainty. The method overcomes the limitations of classical linear models while offering the speed and interpretability required to serve as a viable alternative to machine learning algorithms in large-scale data settings.

Version published to 10.21203/rs.3.rs-9108239/v1 on Research Square
Mar 20, 2026

Performance of Bayesian Additive Regression Trees (BART)-Survival implementation in Python: A Comparison with Traditional and R-Based BART Survival Analysis Methods

This article has 5 authors:
1. Jacob Tiegs
2. Julia Raykin
3. Stacey Adjei
4. Ilia Rochlin
5. Anna Bratcher
This article has no evaluationsLatest version Jan 23, 2026
The Survival Double Descent: Generalization Dynamics of Deep Neural Networks in Time-to-Event Analysis

This article has 2 authors:
1. Steven Hart
2. Ann Oberg
This article has no evaluationsLatest version Mar 4, 2026
Sparse Foundation Models for Continous-Time EHRs

This article has 3 authors:
1. William Ferrel
2. Timothy Chang
3. Samuel Lawrence
This article has no evaluationsLatest version Feb 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Performance of Bayesian Additive Regression Trees (BART)-Survival implementation in Python: A Comparison with Traditional and R-Based BART Survival Analysis Methods

The Survival Double Descent: Generalization Dynamics of Deep Neural Networks in Time-to-Event Analysis

Sparse Foundation Models for Continous-Time EHRs