Variable selection for competing risk regression models: recommendations for analyzing data from epidemiological studies
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
When fitting competing risks regression models, a variety of variable selection methods exist, including backward selection on the subdistribution hazard, on the cause-specific hazards, and penalized methods. However, a benchmark study comparing these different procedures is lacking.
We conducted an extensive simulation study to compare three variable selection procedures in terms of both model selection ability and predictive accuracy. 5120 datasets were simulated in various conditions aiming at being representative of real applications in clinical epidemiology. Results show that the backward selection procedure can lead to high false discovery rate (FDR) because of implementation choices. Even for scenarios with a high numbers of events per variable (EPV), the true model is rarely identified by any of the tested procedures. Survival predictions were assessed with time-dependent AUC and show similar performances for all methods. We also provide an application on real data from stem cell transplanted patients in hematology.
We conclude that the identification of the true model in competing risk regression is a very difficult task, and suggest some recommendations to analysts: (1) to report event per variable for the event type of interest and (2) to use multiple methods to deal with model uncertainty and avoid implementation pitfalls.