RAPID: Reliable and efficient Automatic generation of submission rePorting checklists with Large language moDels
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Importance
Medical reporting guidelines are significant in improving the transparency, quality, and integrity of medical research, particularly in randomized clinical trials; adherence to these guidelines supports research interpretability and has direct implications for downstream applications, such as patient treatment. However, with over 600 distinct reporting guidelines, manual assessments are often time-consuming and labor-intensive.
Objective
To evaluate an automated reporting checklist generation tool using large language models and retrieval augmentation generation technology, called RAPID.
Design, Setting, and Participants
This study used large language models to design a retrieval augmentation generation architecture and collected published journal articles as training and validation sets to optimize prompts within the framework and comprehensively evaluate the performance of the framework. Medical reporting experiments were collected from 50 randomized controlled trials without the intervention of AI tools and 41 randomized controlled trials with the intervention of AI tools.
Main Outcomes and Measures
For effective evaluation of the performance of this tool, a classification accuracy metric (Reported/Not Reported) defined as the number of correct judgments divided by all judgments and a content consistent score metric defined as the number of contents retrieved by the tool that are the same as those retrieved by researchers divided by the total number of judgments were calculated.
Results
The RAPID tool uses the widely used Word document and Portable Document Format as an input file. Fifty published journal articles without the intervention of AI tools and 41 published journal articles with the intervention of AI tools were collected as CONSORT and CONSORT-AI datasets. All of the CONSORT reporting items (37) were included in the tool. RAPID achieved a high average accuracy rate of 92.11% and a content consistency score of 81.14% on the CONSORT dataset. Of the CONSORT-AI reporting items, 11 items related to the intervention of AI tools were included in the tool. RAPID achieved an average accuracy of 83.81% with a content consistency score of 72.51% on the CONSORT-AI dataset. For these two reporting guidelines, a training set of 5 articles was selected from each dataset to refine the prompts used in the tools for CONSORT and CONSORT-AI reporting checklist. The validation set of the remaining articles was used to assess the performance of the RAPID. The RAPID tool used the Word document and Portable Document Format of the articles as input files. A RAPID graphical user interface was built using JavaScript and Vue.
Conclusions and Relevance
The RAPID tool is designed to assist in the reporting of various types of trials. RAPID has strong scalability, which can be easily adapted to different medical reporting guidelines without transfer learning on a large dataset. RAPID may effectively save time and improve working efficiency for different user groups, for example, 1) simplifying the submission process and improving report quality by verifying manuscript completeness for medical authors; 2) facilitating evaluation of report quality for medical researchers; 3) expediting manuscript distribution for medical editors; and 4) identifying reporting deficiencies and providing deeper insights for review comments for reviewers.
Key Points
Question
Can large language model tools automatically generate medical reporting checklists for manuscripts of different types of clinic trials?
Findings
An automated reporting checklist generation tool using large language models and retrieval augmentation generation technology, RAPID, was developed. RAPID was fully evaluated on all items across two separate datasets related to the Consolidated Standards of Reporting Trials (CONSORT) and CONSORT-AI reporting guidelines. In the first dataset corresponding to CONSORT, RAPID achieved an average accuracy of 92.11%, while in the second dataset associated with CONSORT - AI, it reached an average accuracy of 83.81%. Additionally, RAPID is highly scalable. It can be easily and smoothly adapted to different medical reporting guidelines without transfer learning on a large dataset.
Meaning
RAPID can effectively save time and improve working efficiency for different user groups such as medical authors, researchers, editors, and reviewers.