A retinotopic reference frame for space throughout human visual cortex

Martin Szinte
Gilles de Hollander
Marco Aqil
Inês Veríssimo
Serge Dumoulin
Tomas Knapen

Curated by eLife

eLife Assessment

This is a useful study, bolstering our understanding of spatial reference frames of visual perception. The high-resolution data and sophisticated analyses confirm and enhance earlier findings that visual representations operate in a predominantly retinotopic reference frame throughout the visual hierarchy in the human cortex. However, these analyses are currently incomplete, leaving open the possibility that eye-position gain and or spatiotopic representations may also be present.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

We perceive the world as stable despite our rapid eye movements. To explain our sense of visual stability, it has been suggested that the brain encodes the location of attended visual stimuli in an external, or spatiotopic, reference frame. However, such spatiotopy is seemingly at odds with the fundamental retinotopic organization of visual inputs. Here, we probe the spatial reference frame of vision using ultra-high-field (7T) fMRI and voxel-level receptive field modeling, while manipulating both gaze direction and spatial attention. To manipulate spatial attention, participants performed an equally demanding visual task on either a bar stimulus that traversed the visual field, or a small stimulus at fixation. To dissociate retinal stimulus position from its real-world position the entire stimulus array was placed at one of three distinct horizontal screen positions in each run. We found that population receptive fields in all cortical visual field maps are pinioned to the retina, irrespective of how spatial attention is deployed. This pattern of results is strong evidence for a fully retinotopic reference frame for visual-spatial processing. Reasoning that a spatiotopic reference frame could independently be computed at the population level of entire visual areas rather than in individual voxels, we additionally used Bayesian decoding of stimulus location from the BOLD response patterns in visual areas. We found that decoded stimulus locations also adhere to the retinotopic frame of reference. Again, this result holds for all visual areas and irrespective of the deployment of spatial attention. Our findings reorient the search for visual stability mechanisms toward transient sensorimotor interactions rather than static spatiotopic maps.

eLife
Apr 30, 2026

eLife Assessment

This is a useful study, bolstering our understanding of spatial reference frames of visual perception. The high-resolution data and sophisticated analyses confirm and enhance earlier findings that visual representations operate in a predominantly retinotopic reference frame throughout the visual hierarchy in the human cortex. However, these analyses are currently incomplete, leaving open the possibility that eye-position gain and or spatiotopic representations may also be present.

Read the original source
eLife
Apr 30, 2026

Reviewer #1 (Public review):

In this study, Szinte et al. measured the spatial selectivity of fMRI BOLD responses while subjects viewed dynamic noise stimuli vignetted by a moving bar aperture. Subjects viewed these moving bar stimuli as they fixated at one of three screen locations. This design enabled the authors to test whether fMRI responses are better explained by a model in which stimulus location is encoded relative to the retina or relative to the screen (in other words, 'retintopic' vs. 'spatiotopic' encoding). In retinotopic encoding, the pRFs should move with the eyes. In spatiotopic encoding, the pRFs should be locked to particular screen locations, regardless of eye position. The results are unambiguous: the retinotopic model wins.

A number of prior human fMRI studies have addressed this issue, and there is an overwhelming …

Reviewer #1 (Public review):

In this study, Szinte et al. measured the spatial selectivity of fMRI BOLD responses while subjects viewed dynamic noise stimuli vignetted by a moving bar aperture. Subjects viewed these moving bar stimuli as they fixated at one of three screen locations. This design enabled the authors to test whether fMRI responses are better explained by a model in which stimulus location is encoded relative to the retina or relative to the screen (in other words, 'retintopic' vs. 'spatiotopic' encoding). In retinotopic encoding, the pRFs should move with the eyes. In spatiotopic encoding, the pRFs should be locked to particular screen locations, regardless of eye position. The results are unambiguous: the retinotopic model wins.

A number of prior human fMRI studies have addressed this issue, and there is an overwhelming consensus in the field that spatial encoding throughout human visual cortex (and high-level cortex) is retinotopic (during fixation). All of the results shown in the present manuscript are consistent with these earlier observations. Szinte et al. also find that the degree of retinotopic selectivity is not affected by the task or locus of spatial attention. This too has been observed in multiple prior studies.

So, while this manuscript is primarily confirmatory, the study does nonetheless provide valuable measurements at 7T with a higher signal-to-noise ratio and high spatial resolution than previous studies. The authors also apply an innovative Bayesian decoding analysis (which is beautifully documented on their webpage, with a step-by-step tutorial and ample examples). So, a major strength of this paper is the methods; this study does set a high standard and is an ideal example for a rigorous, replicable analysis pipeline and cutting-edge statistical inference.

The results focus on the spatial profile of pRFs with different eye positions. However, the main idea behind eye-position gain fields is that the amplitude of the visual responses changes with eye position. I could not find any analysis testing response amplitude as a function of eye position. In the Discussion, the authors assert: "We did not find an influence of gaze position at the level of individual voxels nor at the level of visual areas." The authors speculate that this might be because gain fields have a salt-and-pepper organization in the cortex that cancel out when pooled across a voxel. While the salt-and-pepper explanation seems like perfectly fine speculation, here they are discussing a result that isn't shown in the Results!

Several prior human fMRI studies have reported eye position gain fields in humans, suggesting that the salt-and-pepper explanation is not correct. Rather, it is likely the case that the authors did not test a sufficiently wide range of eye positions to detect a gain modulation. For example, a study from Merriam et al. (J. Neurosci, 2013), which is mysteriously not cited here, measured both the spatial selectivity of visual receptive fields AND the response amplitude at 8 different eye positions that were spaced by as much as 24 degrees of visual angle (including both vertical and horizontal changes in eye position). Under these conditions, Merriam et al. did find reliable modulation in response amplitude with changes in eye position, even though the spatial selectivity of the responses did not change. Importantly, Merriam et al. found that visual response selectivity was consistent with a retinotopic reference frame (not a spatiotopic reference frame) and that this selectivity was invariant to the attention task. Consideration of these issues suggests that the experimental design used in the current experiment may have precluded the detection of eye position gain fields. The current manuscript would be much improved by a careful consideration of this prior literature, which is so closely related to what the authors report here.

Read the original source
eLife
Apr 30, 2026

Reviewer #2 (Public review):

Summary:

This manuscript describes a study using fMRI voxel-wise receptive field modeling and Bayesian decoding to assess the reference frame (spatiotopic vs retinotopic) of visual information. Participants viewed sequences of visual stimuli that moved across different screen locations. Across different conditions, participants either fixated at the screen center and viewed stimuli drifting across the full screen (full-screen condition), or fixated at a central, left, or right fixation position while stimuli drifted across a 4-deg aperture centered on that fixation (gaze-center, gaze-left, gaze-right conditions). Within each of those conditions, participants either attended to visual changes around fixation (attend-fix) or in the stimulus bar (attend-bar). First, standard population receptive field mapping …

Reviewer #2 (Public review):

Summary:

This manuscript describes a study using fMRI voxel-wise receptive field modeling and Bayesian decoding to assess the reference frame (spatiotopic vs retinotopic) of visual information. Participants viewed sequences of visual stimuli that moved across different screen locations. Across different conditions, participants either fixated at the screen center and viewed stimuli drifting across the full screen (full-screen condition), or fixated at a central, left, or right fixation position while stimuli drifted across a 4-deg aperture centered on that fixation (gaze-center, gaze-left, gaze-right conditions). Within each of those conditions, participants either attended to visual changes around fixation (attend-fix) or in the stimulus bar (attend-bar). First, standard population receptive field mapping was conducted on the full-screen conditions to obtain fiducial maps for each subject. Then, a variety of different analyses were performed, testing retinotopic vs spatiotopic predictions for the gaze-left and gaze-right conditions. Across the extensive set of analyses performed, and across all ROIs tested, the results always best matched the retinotopic predictions. This was the case for both attend-fix and attend-bar conditions. The authors conclude that visual representations operate in a retinotopic reference frame throughout the visual hierarchy, necessitating a "re-orienting" of the search for visual stability mechanisms.

Strengths:

The analyses are sophisticated and thorough, and the results are convincingly in favor of retinotopic representations. The attention manipulation is carefully done. And the finding that the most informative/reliable voxels are the most retinotopic is an important novel contribution.

Weaknesses:

(1) The theoretical advance of this work is unclear, because the finding that visual representations operate in a retinotopic reference frame throughout the visual hierarchy, and regardless of the deployment of spatial attention, has already been demonstrated with fMRI pattern analysis almost 15 years ago (Golomb & Kanwisher, 2012). To be clear, the techniques used in this current study are considerably more modern and sophisticated, and the attention manipulation is much better, but the finding is the same. More importantly, it is never really explained why, from a theoretical perspective, the results might have been expected to differ. Referring to this as an open question feels like a copout. The manuscript needs to engage more with the prior findings and explain the motivation for the current study. Was there something about the prior findings that caused them to doubt the retinotopic conclusion? Did they think that the 7T resolution or alternative decoding approaches might uncover something different? Was this intended as a replication test with more sophisticated techniques?

(2) I think there are definitely some new and useful things this study has to offer, but the overall theoretical contribution needs to be better clarified and contextualized within the prior literature. I would strongly recommend revisiting things like the title (not a novel contribution of this study) and the implication that the current findings "reframe" or "reorient" the search for visual stability mechanisms away from static spatiotopic maps (the field has arguably been "reoriented" in that way for some time now, and this study is certainly not the first to suggest a reframing along these lines). The discussion section, in particular, has little to no acknowledgement that these findings and ideas have been shown before.

(3) The analyses always pit retinotopic vs spatiotopic predictions. But what if both types co-existed, just with retinotopic more predominant? I think this general idea needs some discussion, if not additional analyses. Would the analyses be sensitive enough to pick up sparse spatiotopic coding if present?

Additional questions/critiques/suggestions:

(4) For the out-of-sample predictions analysis (Figure 2):

a) The spatiotopic predictions are much worse for earlier visual regions, but don't seem so different from gaze-center or retinotopic in later areas. How much might this be driven by the fact that pRF size increases along the hierarchy, and for large pRF sizes, the retinotopic and spatiotopic predictions might not be very differentiable? Is there a way to quantify this or include a control model that is neither retinotopic nor spatiotopic?

b) It looks like in some of the regions, the retinotopic (and maybe even spatiotopic) R2 change compared to the gaze center is reliably positive. Why would this be? Is there a reason the fit should be better for the gaze right or gaze left conditions compared to the gaze center?

(5) For the fitting retinotopic and spatiotopic pRF models (Figure 3) and other voxel-specific analyses:

a) For many of the statistics, results are averaged across voxels. This makes sense. But it also seems to me that taking a simple average might obscure some of the potential advantages of this voxel-wise approach. For example, what if there are sparse spatiotopic effects that are washed out by the averaging? Perhaps some way of looking at the statistical distribution of voxels' RFIs could be worth considering?

b) Are there some spatiotopic areas in the searchlight maps? It looks like there may be some blue clusters, but these cortical map figures are really hard to resolve.

(6) For the RFI as a function of model overlap and explained variance (Figure 4):

a) I like this analysis; I find it convincing and novel. Could it be further quantified by correlating on a voxelwise basis the reliability (e.g., explained variance) vs RFI?

b) I'm intrigued by the seemingly reliable blueish (spatiotopic) cells at the bottom of the V1-V3 grids. These seem to suggest that for the voxels with less spatial relevance (overlap), there might be something spatiotopic, even for relatively informative voxels (high explained variance)?

c) On a related note, is the "spatial relevance" measure the same as, or correlated with, eccentricity? It sounds like voxels with high spatial relevance (overlap with the central 4deg aperture) are the more foveal voxels. Intuitively, foveal voxels might be expected to be more retinotopic, right? In addition to clarifying this measure, it'd be nice to see a similar plot with eccentricity on the y-axis.

(7) For the Bayesian decoding (Figure 5):

a) A benefit of the Bayesian decoding (e.g., over the earlier studies using non-Bayesian decoding of retinotopic vs spatiotopic) is the uncertainty estimates. I think these analyses are interesting and should be in the main text figures, not a supplement.

b) Instead of line plots showing the decoded (best) position using the posterior distribution STD as the error shading, could you show the actual posterior distribution as heat maps (like the cartoon in B)? Is it possible there could be a second peak (or clear absence of one) at the spatiotopic prediction location?

(8) Also note that Golomb & Kanwisher also calculated the RFI measure for similar ROIs for both of their attention conditions. It may be worth comparing.

(9) Methods:

a) Is it true that 2 of the authors were actually naïve as to the purpose of the study? Regardless, given the small number of subjects and high ratio of authors as subjects, it might be nice to confirm that the results are not driven by the author-participants.

b) I think 44ms TR is a typo?

c) Why was the order of the bar movement directions always the same? Wouldn't this make the stimuli very predictable for the subjects, which could be potentially problematic?

d) I'm also curious why the gaze conditions were all presented in separate runs, as opposed to different blocks within a run.

e) The eccentricity maps for the fiducial maps (Figure 1G) seem a bit strange to me. Shouldn't the foveal representation be centered at the occipital pole, not the lateral surface?

Read the original source
Version published to 10.1101/2024.02.05.578862 on bioRxiv
Feb 6, 2024

Active vision is linked to category selectivity in the individual brain

This article has 4 authors:
1. Diana Kollenda
2. Elaheh Akbarifathkouhi
3. Maximilian Davide Broda
4. Benjamin de Haas
This article has no evaluationsLatest version Apr 16, 2026
Stimulus properties and visual field location interact to drive performance-independent perceptual confidence

This article has 2 authors:
1. Angela Shen
2. Megan A. K. Peters
This article has no evaluationsLatest version Apr 8, 2026
Sound Improves Peripheral Detection: Under a Narrowed Functional Visual Field in Simulated Visual-Field Impairment

This article has 2 authors:
1. Hikari Takebayashi
2. Yuji Wada
This article has no evaluationsLatest version Mar 13, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Active vision is linked to category selectivity in the individual brain

Stimulus properties and visual field location interact to drive performance-independent perceptual confidence

Sound Improves Peripheral Detection: Under a Narrowed Functional Visual Field in Simulated Visual-Field Impairment