Variable selection for competing risk regression models: recommendations for analyzing data from epidemiological studies

J. Mullaert
Sandra Schmeller
Peter C. Austin
A. Latouche

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

When fitting competing risks regression models, a variety of variable selection methods exist, including backward selection on the subdistribution hazard, on the cause-specific hazards, and penalized methods. However, a benchmark study comparing these different procedures is lacking.

We conducted an extensive simulation study to compare three variable selection procedures in terms of both model selection ability and predictive accuracy. 5120 datasets were simulated in various conditions aiming at being representative of real applications in clinical epidemiology. Results show that the backward selection procedure can lead to high false discovery rate (FDR) because of implementation choices. Even for scenarios with a high numbers of events per variable (EPV), the true model is rarely identified by any of the tested procedures. Survival predictions were assessed with time-dependent AUC and show similar performances for all methods. We also provide an application on real data from stem cell transplanted patients in hematology.

We conclude that the identification of the true model in competing risk regression is a very difficult task, and suggest some recommendations to analysts: (1) to report event per variable for the event type of interest and (2) to use multiple methods to deal with model uncertainty and avoid implementation pitfalls.

Version published to 10.1101/2024.11.25.24317882 on medRxiv
Nov 26, 2024

Retrospective analysis of macroscopic health, socioeconomic, and demographic risk predictors for COVID-19 accumulated mortality ratio

This article has 2 authors:
1. Murat Razi
2. Manuel Grana
This article has no evaluationsLatest version Sep 5, 2025
Bayesian LASSO with Categorical Predictors: Coding Strategies, Uncertainty Quantification, and Healthcare Applications

This article has 5 authors:
1. Xi Lu
2. Jieni Li
3. Rajender R. Aparasu
4. Nebil Yusuf
5. Cen Wu
This article has no evaluationsLatest version Oct 8, 2025
CHARIOT: Development and Internal Validation of a Cardiovascular Health Assessment and Risk-based Intervention Optimisation Tool

This article has 7 authors:
1. Alexander Pate
2. Bowen Jiang
3. Yun-Ting Huang
4. Sophie Griffiths
5. David Stables
6. Brian McMillan
7. Matthew Sperrin
This article has no evaluationsLatest version Sep 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Retrospective analysis of macroscopic health, socioeconomic, and demographic risk predictors for COVID-19 accumulated mortality ratio

Bayesian LASSO with Categorical Predictors: Coding Strategies, Uncertainty Quantification, and Healthcare Applications

CHARIOT: Development and Internal Validation of a Cardiovascular Health Assessment and Risk-based Intervention Optimisation Tool