Replicate me if you can: Assessing measurement reliability of individual differences in reading across measurement occasions and methods

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Psycholinguistic theories traditionally assume similar cognitive mechanisms across different speakers. However, more recently, researchers have begun to recognize the need to consider individual differences when explaining human cognition. An increasing number of studies have investigated how individual differences influence human sentence processing. Implicitly, these studies assume that individual-level effects can be replicated across experimental sessions and different assessment methods such as eye-tracking and self-paced reading. However, this assumption is challenged by the reliability paradox (Hedge, 2018). Thus, a crucial first step for a principled investigation of individual differences in sentence processing is to establish their measurement reliability, that is, the correlation of individual-level effects across multiple measurement occasions and methods. In this work, we present the first naturalistic eye movement corpus of reading data with four experimental sessions from each participant (two eye-tracking sessions and two self-paced reading sessions). We deploy a two-task Bayesian hierarchical model to assess the measurement reliability of individual differences in a range of psycholinguistic phenomena that are well-established at the population level, namely effects of word length, lexical frequency, surprisal, dependency length and number of to-be-integrated dependents. While our results indicate high reliability across measurement occasions for the word length effect, it is only moderate for higher-level psycholinguistic predictors such as lexical frequency, dependency distance, and the number of to-be-integrated dependencies, and even low for surprisal. Moreover, even after accounting for spillover effects, we observe only low to moderate reliability at the individual level across methods (eye-tracking and self-paced reading) for most predictors, and poor reliability for predictors of syntactic integration. These findings underscore the importance of establishing measurement reliability before drawing inferences about individual differences in sentence processing.

Article activity feed