Comparing a computational model of visual problem solving with human vision on a difficult vision task
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Human vision is not merely a passive process of interpreting sensory input but incorporates generative mechanisms that infer and synthesize plausible interpretations of ambiguous or noisy data. This synergy between the generative and discriminative components, often described as analysis-by-synthesis, enables robust perception and rapid adaptation to out-of-distribution inputs. By leveraging top-down feedback, human vision excels in constructing meaningful interpretations even in challenging scenarios. In this work, we investigate a computational implementation of the analysis-by-synthesis paradigm using search in a generative model applied to an underspecified image dataset inspired by star constellations. The search is guided by low-level cues based on the structural fitness of solutions to the test images. This dataset serves as a testbed for exploring how inferred signals can guide the synthesis of suitable solutions in ambiguous conditions. Drawing on insights from human experiments, we develop a generative search algorithm and compare its performance to humans, examining factors such as accuracy, reaction time, and overlap in drawings. Our results shed light on possible mechanisms of human visual inference and highlight the potential of generative search models to emulate aspects of this process.
Author summary
Human vision is not just about passively receiving information from the environment. Rather, it also involves actively making sense of what we see. When faced with unclear or incomplete visual input, our brains use prior knowledge to fill in gaps and create the most likely interpretation. This ability helps us recognize objects and patterns even in difficult conditions. In this study, we explore how this process can be replicated using computer models. Specifically, we test a method that generates possible interpretations of underspecified visual data, inspired by the way people recognize star constellations. By comparing the model’s performance with human participants, we examine how well it mirrors human perception. We analyze factors such as accuracy, response time, and similarities in the interpretations produced.
Our findings offer insights into how people make sense of uncertain visual information and suggest ways in which computer models can be designed to mimic this ability. These results could contribute to the understanding of human vision but also help advances in artificial vision systems to improve technologies beyond relying on pattern recognition.