Supporting Reanalysis and Reuse of Clinical Trial Data: A Case Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Reproducing published findings from clinical trials is a critical component of scientific transparency, yet it remains a challenging and under-practiced task. Despite increasing emphasis on reproducibility and data reuse in research policies, few real-world examples exist where independent teams have reproduced complex analyses using clinical trial data. In this case study, the aim was to independently reproduce the key findings of a high-impact clinical trial on rectal cancer treatment using shared trial data. Method: We organized a multi-team datathon, where each team was provided with the same dataset and supporting material, and was tasked to reproduce the results of the CAO/ARO/AIO-04 trial, with optional additional analysis. We contacted the original investigators for data access and reuse, and consulted them to understand the study, clinically and scientifically. Results: Five teams used R or Python to reproduce the statistical results, and the corresponding scripts can be found on Gitlab. All teams reproduced the analyses for primary outcome disease-free survival (DFS). The key findings on DFS were consistently reproduced, reinforcing confidence in the trial main conclusions. Result robustness was investigated using a different analytical software or statistical models. Nevertheless, challenges were encountered when the supplementary materials were not easily identified. Minor reporting issues were noticed in the reproduced paper. Conclusion: Reproduction of a major oncology clinical trial confirmed the reliability of its main conclusions. Divergences highlighted reporting gaps, such as incomplete protocols and broken links, that future trials should address. This case study demonstrates the value of systematic reproducibility checks for clinical research transparency and challenges in data sharing for reproducibility.