Performance of Bayesian Additive Regression Trees (BART)-Survival implementation in Python: A Comparison with Traditional and R-Based BART Survival Analysis Methods

Jacob Tiegs
Julia Raykin
Stacey Adjei
Ilia Rochlin
Anna Bratcher

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Time-to-event analysis, widely used in public health to model outcomes such as disease progression and survival, has traditionally relied on maximum likelihood-based approaches like Cox proportional hazards models. Recent advances in non-parametric machine learning, notably Bayesian Additive Regression Trees (BART), have enhanced predictive performance in complex, high-dimensional data common to public health studies. BART has primarily been implemented for survival analysis in R, offering a flexible alternative to conventional models. With Python's growing adoption for large healthcare data analytics, this study evaluates a novel Python-based BART-Survival algorithm (p-BART), benchmarking its performance against an R-based BART implementation (r-BART) and traditional survival models using both simulated and real-world healthcare data. Methods The p-BART algorithm was evaluated using simulated datasets with varying complexity, including non-linear effects, interactions, and non-proportional hazards. Its performance was also assessed using real-world administrative healthcare data (Premier Healthcare Database). Key evaluation metrics included root mean squared error, bias, and coverage probability. Comparisons were made against Kaplan-Meier (KM), Cox proportional hazards model (CPHM), and r-BART implementation. Results In simpler scenarios, such as one or two sampling distributions and multivariate regression with proportional hazards, p-BART performed similarly to traditional statistical models and to r-BART. In more complex scenarios, including non-proportional hazards and models with nonlinear interactions, p-BART outperformed traditional methods while matching r-BART performance. In real-world data analysis, p-BART produced hazard ratio and survival probability estimates comparable to the CPHM and r-BART. Like other approaches, p-BART showed the greatest coverage loss and reduced reliability under extreme censoring scenarios. Conclusions This study demonstrated the accuracy and robustness of p-BART, even when the proportional hazards assumption was violated. It performed well in complex scenarios and was comparable to r-BART, making it valuable for public health researchers using Python. Like other methods, the performance of p-BART declines under high censoring. As Python and machine learning applications in public health expand, p-BART provides an advanced statistical method for analyzing large healthcare datasets, improving decision-making in disease in clinical practice and public health. Clinical Trial Number: Not applicable.

Version published to 10.21203/rs.3.rs-8605537/v1 on Research Square
Jan 23, 2026

A Unified Framework for Survival Prediction: Combining Machine Learning Feature Selection with Traditional Survival Analysis in Heart Failure and METABRIC Breast Cancer

This article has 7 authors:
1. Fangya Tan
2. Jian-Guo Zhou
3. Shuqiao Li
4. Bowen Long
5. Srikar Bellur
6. Yang Zhou
7. Mark Newman
This article has no evaluationsLatest version Jan 29, 2026
Development and validation of an interpretable machine learning model for predicting the risk of 8-year all-cause mortality in Cardiovascular-Kidney-Metabolic Syndrome among older adults: A multicenter and cohort study

This article has 5 authors:
1. Zhiren Zhu
2. Jie Zhang
3. Peirao Wu
4. Huiping Xue
5. Dongmei Gu
This article has no evaluationsLatest version Jan 25, 2026
Bayesian Elastic‑Net Cox Models for Time‑to‑Event Prediction: Application with Breast‑Cancer Cohort

This article has 3 authors:
1. Ersin Yılmaz
2. Syed Ejaz Ahmed
3. Dursun Aydın
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Unified Framework for Survival Prediction: Combining Machine Learning Feature Selection with Traditional Survival Analysis in Heart Failure and METABRIC Breast Cancer

Development and validation of an interpretable machine learning model for predicting the risk of 8-year all-cause mortality in Cardiovascular-Kidney-Metabolic Syndrome among older adults: A multicenter and cohort study

Bayesian Elastic‑Net Cox Models for Time‑to‑Event Prediction: Application with Breast‑Cancer Cohort