Predicting Replicability Challenge: Rounds 1 and 2

Andrew Harold Tyner
Elizabeth Boycan
Timothy M. Errington
Jennifer Lueck
Theresa Stankov

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Assessing the credibility of research claims is a central and continuous part of the scientific process, but current assessment strategies often require substantial time and effort. If automated assessment methods could deliver comparable performance at a fraction of the cost, they would enable researchers, funders, and policymakers to direct attention toward high-confidence claims and improve strategic allocation of resources toward investigating claims that are important but uncertain. To date these methods are promising but unproven. The Center for Open Science launched the Predicting Replicability Challenge in 2025 as a public competition to advance the goal of automated assessment of research claims. Across three rounds, participating teams are provided access to a training set of replication outcomes and are tasked with scoring a held out set of published claims based on their likelihood of being successfully replicated in a new sample of data. Participating teams demonstrated improved predictive performance across the first two rounds of the Challenge. No models in the first round outperformed the baseline Brier score of .25, while nearly all did so in the second round, with similar improvements found for ROC-AUC (Round 1 median = .53, Round 2 median = .66) and accuracy (Round 1 median = .51, Round 2 median = .63). Gains in predictive performance have been driven by improved calibration (predictions matching observed replication rates more closely) and discrimination (greater separation between the predictions for replicated and non-replicated claims). Improvements in the second round have brought the top models in line with or ahead of prior efforts to predict these claims, depending on the comparison set.

Version published to 10.31222/osf.io/kbva9_v1 on OSF Preprints
Mar 10, 2026

Innovations In Machine Assessment Of Replicability

This article has 40 authors:
1. Sarah Rajtmajer
2. Laxmaan Balaji
3. Daniel M. Benjamin
4. James Caverlee
5. Tatiana Chakravorti
6. Yiling Chen
7. Timothy M. Errington
8. Qizhang Feng
9. Fiona Fidler
10. Robert D Fraleigh
11. Aaron Frank
12. Timothy Fritton
13. James Gentile
14. C Lee Giles
15. Brandon Goldfedder
16. Christopher Griffin
17. Timothy Gulden
18. Xia Ben Hu
19. Yuzhong Huang
20. Sai Koneru
21. Anthony M. Kwasnica
22. Dong-Ho Lee
23. Kristina Lerman
24. Yang Liu
25. Michael Mclaughlin
26. Arjun Menon
27. Fred Morstatter
28. Nishanth Sridhar Nakshatri
29. Brian A. Nosek
30. David Pennock
31. Jay Pujara
32. Adam Russell
33. Vaibhav Singh
34. Anna Ms Squicciarini
35. Louise Tran
36. Andrew Harold Tyner
37. Juntao Wang
38. Zhuoer Wang
39. Xin Wei
40. Jian Wu
This article has no evaluationsLatest version Apr 1, 2026
repliCATS-SCORE: Elicited human predictions of social and behavioural science replicability

This article has 30 authors:
1. David Peter Wilkinson
2. Fallon Mody
3. Mark Burgman
4. Martin Bush
5. Beth Clarke
6. Hannah Fraser
7. Elliot Gould
8. Charles T. Gray
9. Rebecca Groenewegen
10. Daniel George Hamilton
11. Anca Hanea
12. Victoria Hemming
13. Rink Hoekstra
14. Steven Kambouris
15. Alexandru Marcoci
16. Shinichi Nakagawa
17. Rose E O'Dea
18. Timothy H. Parker
19. Ross Pearson
20. Libby Rumpff
21. Felix Singleton Thorn
22. Eden T. Smith
23. Don van Ravenzwaaij
24. Simine Vazire
25. Ans Vercammen
26. Peter Anton Vesk
27. Aaron Willcox
28. Bonnie Wintle
29. Fiona Fidler
30. Rania Poulis
This article has no evaluationsLatest version Apr 1, 2026
Assessments of Credibility in the Social and Behavioral Sciences

This article has 83 authors:
1. Anna Lou Abatayo
2. Titipat Achakulvisut
3. Daniel Acuna
4. Balazs Aczel
5. Laxmaan Balaji
6. Anita Bandrowski
7. Daniel M. Benjamin
8. Michael Metcalf Bishop
9. Gary L. Brase
10. Andrew William Brown
11. Martin Bush
12. James Caverlee
13. Tatiana Chakravorti
14. Yiling Chen
15. Macie Daley
16. Morteza Dehghani
17. Mirka Dirzo
18. Anna Dreber
19. Peter Eckmann
20. Timothy M. Errington
21. Qizhang Feng
22. Fiona Fidler
23. Samuel Field
24. Nicholas William Fox
25. Robert D Fraleigh
26. Aaron Frank
27. Hannah Fraser
28. James Gentile
29. C Lee Giles
30. Brandon Goldfedder
31. Phil Gooch
32. Michael Gordon
33. Elliot Gould
34. Christopher Griffin
35. Timothy Gulden
36. Noah Haber
37. Krystal Hahn
38. Felix Holzmeister
39. Xia Ben Hu
40. Yuzhong Huang
41. Magnus Johannesson
42. Brendan Kennedy
43. Melissa Kline Struhl
44. Anthony Kwasnica
45. Dong-Ho Lee
46. Kristina Lerman
47. Yang Liu
48. Allegra Pearce
49. Isabella Mandema
50. Alexandru Marcoci
51. Brinna Mawhinney
52. Souad McIntosh
53. Michael Mclaughlin
54. Arjun Menon
55. Olivia Miske
56. Fallon Mody
57. Fred Morstatter
58. Nishanth Sridhar Nakshatri
59. Brian A. Nosek
60. Michele B. Nuijten
61. David Pennock
62. Thomas Pfeiffer
63. Darien Pipkin
64. Jay Pujara
65. Sarah Rajtmajer
66. Martijn Roelandse
67. Adam Russell
68. Priya Silverstein
69. Vaibhav Singh
70. Courtney K. Soderberg
71. Anna Ms Squicciarini
72. Theresa Stankov
73. Jordan W Suchow
74. Barnabas Szaszi
75. Louisa Tran
76. Peter Anton Vesk
77. Tim Vines
78. Colby J. Vorland
79. Juntao Wang
80. Zhuoer Wang
81. David Peter Wilkinson
82. Bonnie Wintle
83. Jian Wu
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Innovations In Machine Assessment Of Replicability

repliCATS-SCORE: Elicited human predictions of social and behavioural science replicability

Assessments of Credibility in the Social and Behavioral Sciences