An image reconstruction framework for characterizing initial visual encoding

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This rigorous computational study simulates the sampling of the visual image by cone photoreceptors in the human eye, and explains how the image content can be reconstructed from those cone signals. The authors show that a number of properties of the human retina and of human perception are predicted from these simulations. The manuscript could be further improved by analysis of how these conclusions compare to those reached by alternate theoretical approaches, and by a consideration of human eye movements.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

We developed an image-computable observer model of the initial visual encoding that operates on natural image input, based on the framework of Bayesian image reconstruction from the excitations of the retinal cone mosaic. Our model extends previous work on ideal observer analysis and evaluation of performance beyond psychophysical discrimination, takes into account the statistical regularities of the visual environment, and provides a unifying framework for answering a wide range of questions regarding the visual front end. Using the error in the reconstructions as a metric, we analyzed variations of the number of different photoreceptor types on human retina as an optimal design problem. In addition, the reconstructions allow both visualization and quantification of information loss due to physiological optics and cone mosaic sampling, and how these vary with eccentricity. Furthermore, in simulations of color deficiencies and interferometric experiments, we found that the reconstructed images provide a reasonable proxy for modeling subjects’ percepts. Lastly, we used the reconstruction-based observer for the analysis of psychophysical threshold, and found notable interactions between spatial frequency and chromatic direction in the resulting spatial contrast sensitivity function. Our method is widely applicable to experiments and applications in which the initial visual encoding plays an important role.

Article activity feed

  1. Evaluation Summary:

    This rigorous computational study simulates the sampling of the visual image by cone photoreceptors in the human eye, and explains how the image content can be reconstructed from those cone signals. The authors show that a number of properties of the human retina and of human perception are predicted from these simulations. The manuscript could be further improved by analysis of how these conclusions compare to those reached by alternate theoretical approaches, and by a consideration of human eye movements.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  2. Joint Public Review:

    The authors combine their existing software (ISETBio) for simulating the responses of the human cone mosaic to arbitrary spatiochromatic stimuli with a Bayesian model of image reconstruction, in order to explore hypotheses concerning the efficiency and the perceptual consequences of early visual processing. The simulated cone signals include the effect of human optics, the spectral sensitivities of the cones, a hexagonal cone mosaic, the relative numbers of L, M and S cones, and the Poisson noise of photopigment isomerization. They do not include any noise of biological origin or any synaptic processing beyond the photoreceptors. The image-reconstruction model computes Bayes-optimal estimates of ground-truth natural images given a root mean-squared-error cost function and an estimated prior for the spatio-chromatic structure of natural images. The assumption is that the average reconstruction error for natural images is a good proxy for the average information contained in the cone responses, under natural conditions like those that drove the evolution of the human retina.

    The authors' simulations address several interesting questions. The first is whether there is a principled explanation for why the human cone mosaic contains mostly L and M cones and relatively few S cones, and why the smaller number of S cones is fairly consistent across individuals, whereas the relative number of L and M cones is highly variable. They vary the relative numbers of cones in their simulations and find that ~10% of S cones is optimal for image reconstruction and that the quality of image reconstruction is largely invariant to the relative numbers of L and M cones. It is not clear whether accommodation of the lens was allowed to vary in order to determine whether the overall minimum RMS reconstruction error occurs when accommodation is centered on a wavelength near the peak of the L and M cone spectral sensitivities (which is where accommodation tends to be centered in humans).

    The second question concerns color appearance in color-deficient individuals. The authors created reconstructions of natural images for protanopia (no M cones), deuteranopia (no L cones), tritanopia (no S cones), and deuteranomaly (highly overlapping L and M cone spectral sensitivities). They found reconstructions qualitatively similar to other attempts to estimate color appearance in these individuals. While this is an important result, it is not clear whether there is any way to determine which estimates of color appearance are more accurate. Also, it is not clear what approaches would lead to qualitative failures.

    The third question concerns the effect of cone sampling on image appearance as a function of retinal eccentricity and when the optics of the eye are bypassed so that high-contrast, high-frequency grating (striped) patterns can be formed on the retina. Simulations of reconstructed natural images as a function of retinal eccentricity show complex effects because of the simultaneous changes in optics and the cone mosaic with eccentricity. For example, color can be more desaturated at intermediate eccentricities than in the fovea and far periphery. These are interesting predictions that should be testable in perception experiments. Simulations of reconstructed images of gratings when the optics are bypassed are qualitatively consistent with previous reports of grating appearance in the fovea and periphery.

    The last question concerns the effect of the reconstruction computation on predicted detection performance for sinewave grating stimuli (the contrast sensitivity function (CSF)). The authors' find that for diffraction limited optics, the CSFs of the ideal observer for yellow (L+M) grating and red-green (L-M) gratings of equal cone contrast are approximately the same. However, for the image-reconstruction-based observer that applies a matched template to the reconstructed image, the CSFs are much different for L+M and L-M stimuli, and they differ in a way that is qualitatively (but not quantitatively) consistent with human CSFs. This is a striking difference, but it is not obvious whether this result would hold for the more realistic optics assumed in most of the other simulations. As the authors show, the real optics matter because they strongly affect the reconstruction.

    The proposed reconstruction-error cost function is appealing because it can be applied to a wide range of different questions and can generate predictions for both performance and subjective appearance. One potential weakness of reconstruction error as a cost function is that not all the information captured by the error measure may be relevant for the tasks that humans (and other primates) needed to perform in order to survive and reproduce during the time period where the properties of the retina evolved. It would be useful (in the future) to compare reconstruction-based and task-based cost functions that both take into account the statistics of natural images. Other recommendations include a more realistic model of eye movements during human vision, and the inclusion of neural processing beyond the cone photoreceptors.