A method for automatically generating semantic information distribution maps of images

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Abstract. Semantic information in images—such as meaningful, recognizable regions or objects that are incongruent with the overall scene—can effectively capture attention. The meaning map is currently the primary method for mapping the distribution of semantic information across visual stimuli. However, this approach relies on subjective human ratings of semantic meaningfulness and requires extensive manual annotation for each image prior to analysis. To address these limitations and enable rapid, efficient, and reproducible generation of semantic distribution maps for any given image, this paper proposes an automated method for constructing semantic information maps using multimodal large language models (MLLMs). Our approach generates two types of semantic maps: local semantic information maps, which quantify the semantic content at each spatial location within an image, and global semantic maps, which assess the contextual relevance of local regions to the overall scene. Additionally, the method can generate distribution maps of visual information associated with specific concepts. We argue that this method significantly enhances the precision and flexibility in controlling, measuring, and manipulating semantic information in visual stimuli, thereby advancing research in visual attention and visual language processing. The method is currently under further refinement.

Article activity feed