Innovations In Machine Assessment Of Replicability

Sarah Rajtmajer
Laxmaan Balaji
Daniel M. Benjamin
James Caverlee
Tatiana Chakravorti
Yiling Chen
Timothy M. Errington
Qizhang Feng
Fiona Fidler
Robert D Fraleigh
Aaron Frank
Timothy Fritton
James Gentile
C Lee Giles
Brandon Goldfedder
Christopher Griffin
Timothy Gulden
Xia Ben Hu
Yuzhong Huang
Sai Koneru
Anthony M. Kwasnica
Dong-Ho Lee
Kristina Lerman
Yang Liu
Michael Mclaughlin
Arjun Menon
Fred Morstatter
Nishanth Sridhar Nakshatri
Brian A. Nosek
David Pennock
Jay Pujara
Adam Russell
Vaibhav Singh
Anna Ms Squicciarini
Louise Tran
Andrew Harold Tyner
Juntao Wang
Zhuoer Wang
Xin Wei
Jian Wu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automated methods for the assessment of replicability of scientific claims offer a scalable complement to replication studies and traditional peer review. Drawing on a large dataset of claims, human judgments, and a limited set of replication outcomes, we developed and evaluated three distinct artificial intelligence systems designed to predict human expert assessments of replicability using diverse methodologies—including synthetic prediction markets, interpretable feature-based modeling, knowledge graph reasoning, and semantic parsing with argument structures. While these systems achieved modest calibration to human judgment distributions, they failed to discriminate between replicable and non-replicable claims. Our findings suggest that while machine assessments of research replicability may complement human reasoning, their current performance limitations and opportunities for bias demand careful evaluation before real-world application.

Version published to 10.31222/osf.io/bdmte_v1 on OSF Preprints
Apr 1, 2026

Predicting Replicability Challenge: Rounds 1 and 2

This article has 5 authors:
1. Andrew Harold Tyner
2. Elizabeth Boycan
3. Timothy M. Errington
4. Jennifer Lueck
5. Theresa Stankov
This article has no evaluationsLatest version Mar 10, 2026
Large-scale human predictions of the replicability of published social and behavioural science papers – a multi- study analysis

This article has 33 authors:
1. Fallon Mody
2. David Peter Wilkinson
3. Hannah Fraser
4. Fiona Fidler
5. Michael Metcalf Bishop
6. Mark Burgman
7. Martin Bush
8. Yiling Chen
9. Anna Dreber
10. Brandon Goldfedder
11. Michael Gordon
12. Elliot Gould
13. Charles T. Gray
14. Rebecca Groenewegen
15. Daniel George Hamilton
16. Anca Hanea
17. Andrew Head
18. Victoria Hemming
19. Felix Holzmeister
20. Magnus Johannesson
21. Alexandru Marcoci
22. Rose E O'Dea
23. Ross Pearson
24. Libby Rumpff
25. Felix Singleton Thorn
26. Eden T. Smith
27. Louisa Tran
28. Peter Anton Vesk
29. Juntao Wang
30. Aaron Willcox
31. Bonnie Wintle
32. Charles Richard Twardy
33. Thomas Pfeiffer
This article has no evaluationsLatest version Apr 1, 2026
repliCATS-SCORE: Elicited human predictions of social and behavioural science replicability

This article has 30 authors:
1. David Peter Wilkinson
2. Fallon Mody
3. Mark Burgman
4. Martin Bush
5. Beth Clarke
6. Hannah Fraser
7. Elliot Gould
8. Charles T. Gray
9. Rebecca Groenewegen
10. Daniel George Hamilton
11. Anca Hanea
12. Victoria Hemming
13. Rink Hoekstra
14. Steven Kambouris
15. Alexandru Marcoci
16. Shinichi Nakagawa
17. Rose E O'Dea
18. Timothy H. Parker
19. Ross Pearson
20. Libby Rumpff
21. Felix Singleton Thorn
22. Eden T. Smith
23. Don van Ravenzwaaij
24. Simine Vazire
25. Ans Vercammen
26. Peter Anton Vesk
27. Aaron Willcox
28. Bonnie Wintle
29. Fiona Fidler
30. Rania Poulis
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Predicting Replicability Challenge: Rounds 1 and 2

Large-scale human predictions of the replicability of published social and behavioural science papers – a multi- study analysis

repliCATS-SCORE: Elicited human predictions of social and behavioural science replicability