Scaling Reproducibility: An AI-Assisted Workflow for Large-Scale Reanalysis

Yiqing Xu
Leo Yang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Reproducibility is central to research credibility, yet large-scale reanalysis of empricial data remains costly because replication packages vary widely in structure, software environment, and documentation. We develop and evaluate an agentic AI workflow that addresses this execution bottleneck while preserving scientific rigor. The system separates scientific reasoning from computational execution: researchers design fixed diagnostic templates, and the workflow automates the acquisition, harmonization, and execution of replication materials using pre-specified, version-controlled code. A structured knowledge layer records resolved failure patterns, enabling adaptation across heterogeneous studies while keeping each pipeline version transparent and stable. We evaluate this workflow on 92 instrumental variable (IV) studies, including 67 with manually verified reproducible 2SLS estimates and 25 newly published IV studies under identical criteria. For each paper, we analyze up to three two-stage least squares (2SLS) specifications, totaling 215. Across the 92 papers, the system achieves 87% end-to-end success overall. Conditional on accessible data and code, reproducibility is 100% at both the paper and specification levels. The framework substantially lowers the cost of executing established empirical protocols and can be adapted in empirical settings where analytic templates and norms of transparency are well established.

Version published to 10.31235/osf.io/ru5fa_v1 on OSF Preprints
Feb 18, 2026

LLM-Assisted Replication as Scientific Infrastructure

This article has 6 authors:
1. So Kubota
2. Hiromu Yakura
3. Sho Yamada
4. Yuki Nakamura
5. Tobias Werner
6. Samuel Coavoux
This article has no evaluationsLatest version Mar 13, 2026
LLM-Assisted Replication as Scientific Infrastructure

This article has 6 authors:
1. So Kubota
2. Hiromu Yakura
3. Sho Yamada
4. Yuki Nakamura
5. Tobias Werner
6. Samuel Coavoux
This article has no evaluationsLatest version Mar 13, 2026
Why Risk it, When You Can {rix} it: A Tutorial for Computational Reproducibility Focused on Simulation Studies

This article has 3 authors:
1. Felipe Fontana Vieira
2. Jason Geller
3. Bruno Rodrigues
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

LLM-Assisted Replication as Scientific Infrastructure

LLM-Assisted Replication as Scientific Infrastructure

Why Risk it, When You Can {rix} it: A Tutorial for Computational Reproducibility Focused on Simulation Studies