Capturing learning on the fly: an eye-tracking method to quantify prediction errors and updating the prior

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This study presents a valuable framework that uses anticipatory eye movements to track how expectations are formed and revised during implicit probabilistic sequence learning. The evidence supporting a behavioural dissociation between errors arising from environmental noise and errors reflecting an inaccurate internal model is solid, but the oculomotor data describe behaviour rather than explain the underlying computational mechanisms, and the stronger mechanistic claims - that learning is more repetition-based than error-driven - remain incomplete without formal comparison against computational models of error-driven learning. The emerging reaction-time difference between conditions appears driven by slowing to low-probability stimuli rather than facilitation of high-probability ones, an asymmetry that requires decomposition and consideration of alternative explanations. The potential contamination of the anticipatory measure by starting gaze position should be addressed through control analyses, and the "process-pure" framing should be tempered, given that oculomotor behaviour is itself subject to motor learning.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The ability to build predictive models of the environment fundamentally drives adaptive behavior. Yet, the real-time dynamics of how these internal models are formed and updated remain poorly understood. Conventional methods often rely on indirect, offline measures or noisy motor responses, limiting insight into the fine-grained computational processes underlying learning. Here, we introduce a generalizable, gaze-based analytical framework that directly tracks the trial-by-trial dynamics of expectation formation and updating. Applying this framework to an unsupervised probabilistic learning task, we categorized anticipatory saccades to dissociate prediction errors arising from environmental stochasticity from those reflecting an inaccurate internal model, and quantified how these predictions were iteratively revised. Learners differentiated between these error types: noise-driven errors were more likely to happen, and triggered less updates than errors reflecting insufficient knowledge of the regularity. At the same time, participants exhibited a strong preference to repeat their previous predictions. This repetition bias was amplified when predictions aligned with the underlying regularity, but was also present for non-aligned responses. Critically, updating depended more strongly on whether a prior belief was consistent with the task’s probabilistic structure than on whether the predicted stimulus matched the actual, presented stimulus. These findings suggest that statistical learning may not strongly be driven by errors; rather, it may rely on conservative updating with relatively low learning rate, or, on a Hebbian, repetition-based process. Our framework thus offers a dual contribution: a broadly applicable tool for quantifying real-time expectations, and evidence for a learning strategy that prioritizes model stability in noisy environments.

Article activity feed

  1. eLife Assessment

    This study presents a valuable framework that uses anticipatory eye movements to track how expectations are formed and revised during implicit probabilistic sequence learning. The evidence supporting a behavioural dissociation between errors arising from environmental noise and errors reflecting an inaccurate internal model is solid, but the oculomotor data describe behaviour rather than explain the underlying computational mechanisms, and the stronger mechanistic claims - that learning is more repetition-based than error-driven - remain incomplete without formal comparison against computational models of error-driven learning. The emerging reaction-time difference between conditions appears driven by slowing to low-probability stimuli rather than facilitation of high-probability ones, an asymmetry that requires decomposition and consideration of alternative explanations. The potential contamination of the anticipatory measure by starting gaze position should be addressed through control analyses, and the "process-pure" framing should be tempered, given that oculomotor behaviour is itself subject to motor learning.

  2. Reviewer #1 (Public review):

    Summary:

    This manuscript presents an original quantitative approach for tracking the online formation and updating of prior beliefs. In an Alternating Serial Reaction Time task, participants were exposed to probabilistic visual streams, and their pre-stimulus saccadic behavior (i.e., the first eye movement after the previous stimulus disappeared) was monitored via eye-tracking. Since the stimuli followed an alternating probabilistic sequence, upcoming events did not appear with full certainty: some stimuli had a higher, some a lower probability. By comparing anticipatory oculomotor behavior between high and low probability events, the authors dissociated between learning/belief updating and general oculomotor noise. Noise-driven errors were more frequent than learning-dependent errors, which nonetheless triggered more belief updating (i.e., a change in oculomotor behavior in a subsequent encounter of the same event). Interestingly, updating depended more strongly on whether a prior belief was consistent with the task's probabilistic structure than on prediction errors. These findings suggest that incidental, implicit statistical learning may rely on conservative updating with a relatively low learning rate, or on errorless algorithms, rather than prediction errors per se.

    Strengths:

    By applying a fine-grained analysis of anticipatory oculomotor behavior, this work establishes new continuous metrics to quantify the gradual learning and refinement of prior expectations during statistical learning. These metrics provide convincing evidence of the dynamics of anticipatory oculomotor behavior.

    The method is paradigm-independent, offering generalizable metrics for tracking the dynamic formation and refinement of predictive models in any task involving probabilistic stimulus streams. In the future, computational modeling may leverage these continuous metrics to better dissect the mechanisms underlying statistical learning.

    Weaknesses:

    The authors subscribe to the idea that statistical learning is not a unified concept but rather is implemented via multiple underlying mechanisms. However, it remains unspecified what these different mechanisms could be, and how eye movements could contribute to distinguishing between them.

    The authors claim that they developed a novel methodological approach to probe whether anticipatory eye movements directly reflect priors, thereby filling an outstanding gap. However, this claim ignores mounting relevant work on structure learning using eye-tracking in the developmental field.

    The authors claim that their framework quantifies trial-by-trial oculomotor dynamics, while in fact the analyses use epochs (i.e. groups of multiple trials) as predictors. Why not use trial number as a predictor to truly investigate trial-by-trial dynamics that directly reflect anticipation, surprisal, and revision?

  3. Reviewer #2 (Public review):

    Summary:

    Hann and colleagues introduce a gaze-based analytical framework designed to capture, on a trial-by-trial basis, how people form and revise their predictions during implicit probabilistic sequence learning. Using an eye-tracking adaptation of an alternating sequence task, they record the first anticipatory saccade during the response-stimulus interval and classify each such saccade along two dimensions: whether it was directed toward a high- or low-probability upcoming stimulus (the learning-dependent vs. not-learning-dependent distinction), and whether the anticipated location coincided with the stimulus that actually appeared. A complementary iterative-updating metric codes whether a participant's prediction for a given three-element context is repeated or revised on successive encounters of that context.

    On the basis of these measures, the authors report that errors congruent with the inferred regularity - which they interpret as reflecting environmental noise - become progressively more frequent than errors reflecting an inaccurate internal model; that participants show a pronounced tendency to repeat their previous prediction rather than revise it; and that updates depend more on whether a prior belief is congruent with the task's statistical structure than on whether the previous prediction was confirmed. They interpret these results as evidence that statistical learning is less error-driven and more repetition-based (Hebbian in character) than is typically assumed.

    Strengths:

    The methodological ambition of the work is considerable, and the paper makes several contributions that are likely to be useful to the implicit-learning and predictive-processing communities. Using the first anticipatory saccade as a pre-response behavioral readout of prediction is conceptually well-motivated: it provides a trial-by-trial index of predictive orienting at a temporal resolution that manual reaction times cannot deliver, and it does so before the outcome of the trial is known. The explicit distinction between errors arising because the task's outcome is stochastic - that is, predictions congruent with the statistical structure but unconfirmed by the stochastic sample - and errors arising because the internal model is inaccurate is a theoretically meaningful move: predictive-coding and Bayesian accounts have long argued that these two sources of surprise should carry different weight for model revision, and the authors offer a behavioral operationalization of that distinction. The analytical pipeline is not tied to the specific paradigm used here and could be applied to other probabilistic sequence-learning tasks, which gives it broader methodological utility than a single-paradigm report. Finally, the demonstration that learners maintain their prior across successive occurrences of the same context, even when it has been disconfirmed by the most recent outcome, is a robust behavioral observation that speaks directly to an unresolved debate about whether statistical learning is dominantly error-driven.

    Weaknesses:

    The framework and the core behavioral observations are valuable, but several inferential steps - from the gaze signal to the cognitive constructs the authors invoke - are not fully supported by the present design, and these gaps affect how readers should interpret the stronger theoretical conclusions.

    The "process-pure" framing conflates sensitivity with construct purity. The authors repeatedly describe the eye-tracking measure as providing a more process-pure index of statistical learning than manual-response paradigms. Anticipatory saccades are themselves a learned motor behavior - the oculomotor system is among the most plastic motor outputs the primate brain generates, and sequence learning in the saccadic system is well-documented. The present design does not dissociate learning of the statistical structure from learning of the oculomotor sequence that expresses it, so the measure is not, on its face, free from the motor-learning confound that the authors criticize in button-press paradigms. The framing should be read as aspirational rather than as demonstrated by the present data.

    The oculomotor reaction-time data do not show the canonical signature of statistical learning. Reaction times for low-probability trials rise across epochs while those for high-probability trials remain approximately flat (Figure 5). The emerging difference between the two trial types, therefore, appears to be driven by a slowing of responses to low-probability stimuli rather than by a facilitation of responses to high-probability ones, and the authors do not rule out the alternative interpretations that this pattern reflects fatigue, a motor floor effect, or inhibition of unexpected locations. Because no fixation constraint is imposed during the response-stimulus interval, pre-stimulus gaze drift toward the anticipated location will artifactually reduce reaction time on precisely those trials the authors wish to treat as learning-driven; the fact that measured reaction times remain well above zero even on trials classified as correct anticipations is itself evidence that this contamination is present. The oculomotor reaction-time data, therefore, do not provide as clean a verification of learning as the manuscript implies.

    The correct/error labeling of anticipatory saccades incorporates information that the participant did not have. Because the first saccade occurs during the response-stimulus interval - that is, before the upcoming stimulus is revealed - the participant's internal predictive state is identical whether the trial is subsequently classified as a learning-dependent correct response or a learning-dependent error. Any difference in the epochwise frequency of these two categories must therefore be driven, at least in part, by the external stochastic structure of the task rather than by a difference in the predictive process itself. In particular, the observation that learning-dependent errors are the most frequent saccade type (Figure 7) is predicted by the prior probabilities of the outcomes alone, given a high-probability prediction, without appeal to any difference in predictive state. Readers should recognize that the theoretically meaningful contrast is between learning-dependent and not-learning-dependent anticipations (two categories), and that the four-way split risks confounding predictive state with outcome stochasticity.

    The iterative-updating metric does not distinguish prior revision from alternative processes. The binary update / no-update code, computed across non-contiguous occurrences of the same three-element context, does not discriminate between a genuine update of the internal model, simple episodic retrieval of a previously encountered triplet, and oculomotor perseveration. Without a formal generative model to anchor the interpretation, the central theoretical claim - that statistical learning is less error-driven than commonly assumed - is underdetermined by the data. The repetition pattern the authors observe is equally consistent with an error-driven model equipped with a low learning rate in a stable environment, an interpretation the authors themselves acknowledge in the Discussion. Adjudicating between these possibilities requires comparison against explicit computational models, which the present manuscript does not provide.

    Data loss and the absence of fixation control. An interpretable saccade is detected on fewer than half of all trials (48.76%; line 889), and the manuscript does not report the distribution of saccade counts per interval, the per-condition trial counts after all exclusions, or the decomposition of the 20% missing-data threshold into its underlying causes. Given that the entire inferential apparatus rests on this subset of trials, the degree of data loss is a relevant context for the reader. Separately, no fixation constraint is imposed between trials: the participant's starting gaze position at the onset of each response-stimulus interval is whatever position was reached at the end of the preceding response, and this starting position carries trial-history information correlated with the upcoming stimulus. This leaves open the possibility that what is classified as predictive orienting partly reflects the mechanical consequences of where the eye happened to be at the end of the previous trial. The authors defend the absence of a fixation cross on the grounds that it would transform the transitional structure of the task, but this is an empirical claim presented without a supporting citation.

    Heterogeneity within the high-probability condition is not addressed. The two routes to a high-probability triplet in the design - pattern-random-pattern (50% of trials) and random-pattern-random (12.5%) - differ both in their base rate and in the reliability of the contextual cue they provide. Collapsing across these subtypes is an analytical choice that may conceal heterogeneity in the underlying learning process.

    Appraisal: Do the results support the authors' conclusions?

    The framework succeeds in providing a trial-by-trial behavioral readout of predictive orienting that is more fine-grained than conventional reaction-time measures, and the behavioral dissociation between errors congruent with the regularity and errors reflecting an inaccurate internal model is a genuine empirical contribution. The conclusions about the mechanistic nature of statistical learning should be read as motivating hypotheses for future modeling work rather than as settled empirical claims.

    Impact and utility:

    The analytical framework introduced here is likely to be useful to researchers working on implicit learning, predictive processing, and Bayesian models of perception and cognition. The measure of predictive orienting and the iterative-updating code could be adapted to a range of probabilistic learning paradigms, and the behavioral dissociation between noise-driven and model-mismatch errors fills a methodological gap that the field has long acknowledged. The authors share their data and code openly, which will facilitate reuse. The most durable contribution of the paper is methodological; the theoretical claims about the nature of statistical learning will require additional computational modeling before they can be regarded as established.

  4. Author response:

    We thank the Reviewers for their time and effort reviewing our manuscript, we are particularly thankful for the literature recommendations of Reviewer 1, and the analysis ideas of Reviewer 2.

    We are glad that both Reviewers agree that the method we developed provides value to the field. We furthermore agree that our theoretical claims and conclusions could be supported by further analyses. Thus, we primarily plan to focus on this.

    We plan to strengthen our statements by:

    - Comparing our metrics to those of alternative learning processes and hypotheses

    - Additional analyses, including ones using standardized learning scores, collapsed saccade likelihoods for learning-dependent and not-learning-dependent saccades, angular deviations instead of the binary update variable, and a breakdown of high-probability triplets into ones that end with a pattern element or a random one.

    - Adding further information regarding saccades, trials without saccades, and saccade starting points.

    Furthermore, we plan to strengthen our Methods section: some of the Reviewers’ points potentially stem from our unclear description of the ASRT task, thus, the Task & Procedure section needs deeper and clearer explanations. Lastly, we will extend the Introduction, citing the literature recommended in the reviews, which indeed could provide further depth.