A joint probabilistic model of human scene and object recognition via non-hierarchical residual computation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Visual object and scene recognition have been extensively studied, but separately. We here propose that the two processes could be intrinsically linked in the neural system. We developed a Joint Residual Variational Autoencoder (JRVAE) with two networks: VAE1 for coarse scene recognition and VAE2 for object recognition using residuals from VAE1’s reconstructions. Our model demonstrates emergent functional specialization when conditioned on information reduction in peripheral vision, with quantitative analysis confirming VAE1 excels at the representation of scenes while VAE2 specializes in that of objects. This architecture naturally implements figure-ground segmentation and aligns with neurobiological evidence of distinct cortical pathways. Our findings suggest residual computation enables joint visual processing that mirrors human perception’s coarse-to-fine principle in perception.