A Multi-Factor Machine Learning Model for Predicting and Preventing Clinical Trial Failures
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
About 15% of clinical trials terminate prematurely (fail), causing financial losses and delaying treatment development. This study utilized a subset of interventional trial records from the 471,252 studies registered in ClinicalTrials.gov until November 2023 to develop a clinical trial failure risk assessment machine learning tool and to examine factors leading to trial failure. The model incorporated trial design, participant demographics, eligibility criteria, disease categorization, and eligibility criteria complexity features. Compared to XGBoost, Random Forest, Catboost and AdaBoost, the LightGBM algorithm was the best performing, achieving a balanced accuracy of 0.677, with F1-scores of 0.770 for completed and 0.442 for terminated trials in the final model. Eligibility criteria readability emerged as one of the most important features for the model's predictions, as identified by the SHapley Additive exPlanations (SHAP) analysis. Our findings demonstrate this model's potential to identify trial failure risk, providing an opportunity to prevent clinical trial failure.