The Clinical Trial Duration Prediction Dataset: A Resource for AI Research
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Early forecasting of clinical trials duration supports planning, optimized resource allocation, and cost control. Developing effective AI predictors remains challenging, requiring collaboration between medical and computational expertise. Although a publicly available dataset exists for this task, it is not well-suited for development of an AI-driven predictor capable of accurately predicting clinical trials duration. This limitation arises from its lack of critical features that significantly influence trials length, as well as the presence of incompletely and inappropriately selected trials. To empower development of AI predictors, we introduce the Clinical Trial Duration Prediction (CTDP) dataset, derived from 81,943 completed drug trials across 1,520 ICD-10 categories. The CTDP dataset contains 51 features on clinical trial characteristics including phase, purpose, condition, eligibility, enrollment, arm information, design, endpoint, site, sponsor, and summary that together influence trial duration. CTDP dataset development through collaboration between clinical trial experts and AI researchers will enable the creation of robust and accurate AI-driven applications for predicting clinical trial durations.