Assessments of Credibility in the Social and Behavioral Sciences

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Credibility assessment — determining whether research findings are trustworthy or believable — is essential to the research process. One aspect of credibility is repeatability, which includes assessing whether consistent results are obtained when using new data to answer the same question (replicability), when repeating the original analyses with the original data (reproducibility), or when conducting alternative analyses about the same question with the original data (robustness). These features of repeatability differ in the resources required to investigate them, and it is unknown how they relate with one another and with other features of credibility. We investigated relationships among credibility measures in a stratified random sample of claims made across the social and behavioral sciences. Measures of repeatability were modestly correlated with each other (r’s = 0.30, -0.04, -0.23) though the correlation between robustness and reproducibility is likely misestimated because selecting claims for robustness testing was partly contingent on reproducibility success. Replicability and human and machine predictions of replicability were modestly correlated (Median r = 0.23; Range = -0.10 to 0.47). Though estimated with substantial uncertainty in some cases, no discipline showed consistently higher repeatability than other disciplines across measures. For example, Education had the highest replicability estimate (0.63, 95% CI [.32 - .86]) and the lowest reproducibility estimate (0.25, 95% CI [.25, 95% CI [.06 - .38]), whereas Economics had the lowest replicability estimate (0.43, 95% CI [.28 - .64]) and nearly the highest reproducibility estimate (0.70, 95% CI [.56 - .85]). Repeatability measures were modestly and heterogeneously associated with other potential indicators of credibility (Median |r| = 0.08; Range = 0.01 to 0.55). Credibility assessment is multidimensional with substantial opportunity for innovation and validation of its measurement.

Article activity feed