Replicate me if you can: Assessing measurement reliability of individual differences in reading across measurement occasions and methods

Patrick Haller
Cui Ding
Maja Stegenwallner-Schütz
David Robert Reich
Iva Koncic
Silvia Makowski
Lena A. Jäger

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Psycholinguistic theories traditionally assume similar cognitive mechanisms across different speakers. However, more recently, researchers have begun to recognize the need to consider individual differences when explaining human cognition. An increasing number of studies have investigated how individual differences influence human sentence processing. Implicitly, these studies assume that individual-level effects can be replicated across experimental sessions and different assessment methods such as eye-tracking and self-paced reading. However, this assumption is challenged by the reliability paradox (Hedge, 2018). Thus, a crucial first step for a principled investigation of individual differences in sentence processing is to establish their measurement reliability, that is, the correlation of individual-level effects across multiple measurement occasions and methods. In this work, we present the first naturalistic eye movement corpus of reading data with four experimental sessions from each participant (two eye-tracking sessions and two self-paced reading sessions). We deploy a two-task Bayesian hierarchical model to assess the measurement reliability of individual differences in a range of psycholinguistic phenomena that are well-established at the population level, namely effects of word length, lexical frequency, surprisal, dependency length and number of to-be-integrated dependents. While our results indicate high reliability across measurement occasions for the word length effect, it is only moderate for higher-level psycholinguistic predictors such as lexical frequency, dependency distance, and the number of to-be-integrated dependencies, and even low for surprisal. Moreover, even after accounting for spillover effects, we observe only low to moderate reliability at the individual level across methods (eye-tracking and self-paced reading) for most predictors, and poor reliability for predictors of syntactic integration. These findings underscore the importance of establishing measurement reliability before drawing inferences about individual differences in sentence processing.

Version published to 10.31234/osf.io/muv4q_v2 on OSF Preprints
Aug 31, 2025
Version published to 10.31234/osf.io/muv4q on OSF Preprints
Sep 5, 2023

Clarifying the reliability paradox: poor measurement reliability attenuates group differences

This article has 2 authors:
1. Povilas Karvelis
2. Andreea Oliviana Diaconescu
This article has no evaluationsLatest version Sep 10, 2025
Precision Imaging for Intraindividual Investigation of the Reward Response

This article has 5 authors:
1. Matthew Mattoni
2. Shenghan Wang
3. Cooper J Sharp
4. Thomas M Olino
5. David V Smith
This article has no evaluationsLatest version Sep 28, 2025
Motivation biases behavior – not perception

This article has 3 authors:
1. Christian Wolf
2. Markus Lappe
3. Hugh Riddell
This article has no evaluationsLatest version Aug 21, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Clarifying the reliability paradox: poor measurement reliability attenuates group differences

Precision Imaging for Intraindividual Investigation of the Reward Response

Motivation biases behavior – not perception