AutoReporter: Development of an artificial intelligence tool for automated assessment of research reporting guideline adherence
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
To develop AutoReporter, a large-language-model system that automates evaluation of adherence to research reporting guidelines.
Materials and Methods
Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.
Results
AutoReporter, a zerolZlshot, nolZlretrieval prompt coupled with the o3-mini reasoning LLM, demonstrated optimal accuracy (CONSORT: 90.09%; SPIRIT: 92.07%), run-time (CONSORT: 617.26 seconds; SPIRIT: 544.51 seconds), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen’s κ>0.6) with expert ratings from the BenchReport benchmark.
Discussion
Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.
Conclusion
LLMs can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app .