Comparing a computational model of visual problem solving with human vision on a difficult vision task

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Human vision is not merely a passive process of interpreting sensory input but can also function as a problem-solving process incorporating generative mechanisms to interpret ambiguous or noisy data. This synergy between the generative and discriminative components, often described as analysis-by-synthesis, enables robust perception and rapid adaptation to out-of-distribution inputs. In this work, we investigate a computational implementation of the analysis-by-synthesis paradigm using genetic search in a generative model, applied to a visual problem-solving task inspired by star constellations. The search is guided by low-level cues based on the structural fitness of candidate solutions compared to the test images. This dataset serves as a testbed for exploring how inferred signals can guide the synthesis of suitable solutions in ambiguous conditions, framing visual inference as an instance of complex problem solving. Drawing on insights from human experiments, we develop a generative search algorithm and compare its performance to humans, examining factors such as accuracy, reaction time, and overlap in drawings. Our results shed light on possible mechanisms of human visual problem solving and highlight the potential of generative search models to emulate aspects of this process.

Author summary

Human vision is not just about passively receiving information from the environment. Rather, it also involves actively making sense of what we see. When faced with unclear or incomplete visual input, our brains use prior knowledge to fill in gaps and create the most likely interpretation. This ability helps us recognize objects and patterns even in difficult conditions. In this study, we explore how this process can be replicated using computer models. Specifically, we test a method that generates possible interpretations of ambiguous visual data, inspired by the way people recognize star constellations. By comparing the model’s performance with human participants, we examine how well it mirrors human perception. We analyze factors such as accuracy, response time, and similarities in the interpretations produced.

Our findings offer insights into how people make sense of uncertain visual information and suggest ways in which computer models can be designed to mimic this ability. These results can contribute to our understanding of human perception as well as help advance artificial vision systems beyond simple pattern recognition.

Article activity feed