Focal Correlation and Event-based Focal Visual-Content Text Attention for Past Event Search

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Every minute, countless hours of video and an enormous number of images are uploaded by people worldwide to the internet and social media platforms. Over a span of just a few years, users on these platforms, often utilizing their smartphones, accumulate tens of thousands of photos and countless hours of video. These recordings capture cherished moments such as weddings, family gatherings, and birthday parties, creating a vivid visual archive of human experiences and offering a unique window into contemporary life. When analyzed correctly, these images and videos can help reconstruct important events, from personal milestones to significant historical occurrences like war crimes, human rights violations, and terrorist acts. Each photo and video holds a wealth of information, often arranged in sequences based on timestamps and accompanied by text annotations, tags, or other metadata. However, handling this multimodal data presents challenges, particularly when some photos or videos lack annotations. A sophisticated method has been developed to manage these irregularities, ensuring relevant videos are displayed as evidence alongside direct answers derived from the input data. This system not only provides precise answers but also includes pertinent evidential photos or text snippets to substantiate the reasoning process. Examining every photo, video, or text query and answer manually is time-consuming. Therefore, it is crucial to identify relevant photos or videos quickly to verify answers efficiently. In this study, we propose a Long Short-Term Memory (LSTM) based visual question answering model integrated with convolutional layer non-linearity to tackle this challenge. The proposed adaptive memory network with attention mechanism model is designed and evaluated on common objects in context dataset with weight initialization and regularization, and model optimization. The Event-based Focal Visual-Content Text Attention (EFVCTA) framework is proposed in this paper. The system is designed on the basis of the proposed EFVCTA for the automated retrieval of historical events through the utilization of visual question answering (VQA) technology. The system can be used to identify the past event’s information from the photos of the various training programmes, workshops, conferences, annual social gatherings, etc. conducted in the academic institutions.

Article activity feed