On the Unreliability of Test-Retest Reliability

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The Test-Retest Coefficient (TRC) is a central metric of reliability in Classical Test Theory and modern psychological assessments. Originally developed by early 20th-century psychometri-cians, it relies on the assumptions of fixed (i.e., perfectly stable) true scores and independent error scores. However, these assumptions are rarely, if ever, tested, despite the fact that their violation can introduce significant biases. This article explores the foundations of these assump-tions and examines the performance of the TRC under varying conditions, including different sample sizes, true score stability, and error score dependence. Using simulated data, results show that decreasing true score stability biases TRC estimates, leading to underestimations of reliability. Additionally, error score dependence can inflate TRC values, making unreliable measures appear reliable. More fundamentally, when these assumptions are violated, the TRC becomes underidentified, meaning that multiple, substantively different data-generating process-es can yield the same coefficient, thus undermining its interpretability. These findings call into question the TRC’s suitability for applied settings, especially when traits fluctuate over time or measurement conditions are uncontrolled. Alternative approaches are briefly discussed

Article activity feed