The impact of the number and the size of clusters on prediction performance of the stratified and the conditional shared gamma frailty Cox proportional hazards models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Researchers in biomedical research often analyse data that are subject to clustering. Development and validation of risk prediction models generally assumes independence of observations. For survival outcomes, the Cox proportional hazards regression model is commonly used to estimate an individual’s risk at fixed time horizons. The stratified Cox proportional hazards and the shared gamma frailty Cox proportional hazards regression models are two common approaches to account for the presence of clustering in the data. The accuracy of the predictions of these two approaches has not been examined. We conducted a set of Monte Carlo simulations to assess the impact of the number of clusters, the size of the clusters, and the within-cluster correlation in outcomes on the accuracy of the conditional predictions developed using the stratified and the shared gamma frailty Cox proportional hazards regression model. We compared the accuracy of the predictions in terms of discrimination, calibration and overall performance metrics. We found that the stratified and the shared gamma frailty model had similar performance, especially for larger size and higher number of clusters. For small cluster size, we observed slightly better discrimination and overall performance for the stratified model and better calibration for the shared gamma frailty model at shorter prediction horizons. The utility of the stratified Cox proportional hazards model for risk prediction is limited especially for high within-cluster correlation and when clusters are small, and at longer prediction horizons. Our results were accompanied with two applications using open source data on myelodysplastic syndrome and bladder cancer.

Article activity feed