AutoReporter: Development of an artificial intelligence tool for automated assessment of research reporting guideline adherence

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective

To develop AutoReporter, a large-language-model system that automates evaluation of adherence to research reporting guidelines.

Materials and Methods

Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.

Results

AutoReporter, a zerolZlshot, nolZlretrieval prompt coupled with the o3-mini reasoning LLM, demonstrated optimal accuracy (CONSORT: 90.09%; SPIRIT: 92.07%), run-time (CONSORT: 617.26 seconds; SPIRIT: 544.51 seconds), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen’s κ>0.6) with expert ratings from the BenchReport benchmark.

Discussion

Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.

Conclusion

LLMs can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app .

Article activity feed