Evaluating the Reporting Quality of 21,041 Randomized Controlled Trial Articles

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Incomplete reporting of a study's methods and results hinders efforts to evaluate and reproduce research findings in randomized controlled trials (RCTs), leading to potential harm. While CONSORT guidelines were established to ensure transparency and reproducibility in RCTs, comprehensive adherence assessment has been infeasible at scale. We demonstrate that GPT-4o-mini used out of the box achieves state-of-the-art performance in evaluating RCT quality (F1 score: 0.85; precision: 0.96), with results validated by expert human annotators showing 92.24% agreement across 50 papers. Applying this tool to 21,041 open-access RCTs (1966-2024), we reveal temporal and domain trends: overall CONSORT compliance has improved substantially over time, rising from 27.3% in 1966-1990 to 56.1% in 2010-2024, yet critical methodological components remain severely underreported. Randomization procedures (9.7%), allocation concealment mechanisms (15.25%), and protocol access information (2.22%) were particularly deficient. Substantial variation exists across medical disciplines (35-63% compliance), with urology/nephrology and critical care demonstrating the highest compliance, while pharmacology showed the lowest. Trial characteristics including FDA regulation status, presence of data monitoring committees, reporting of adverse events, and mortality outcomes showed statistically significant but practically negligible differences in compliance rates. Our work provides a scalable AI framework to audit and improve RCT reporting, offering actionable insights for journals, researchers, and policymakers to enhance research integrity and clinical translation.

Article activity feed