Focal Correlation and Event-based Focal Visual-Content Text Attention for Past Event Search

Pranita P. Deshmukh
Shanmugam Poonkuntran

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Every minute, countless hours of video and an enormous number of images are uploaded by people worldwide to the internet and social media platforms. Over a span of just a few years, users on these platforms, often utilizing their smartphones, accumulate tens of thousands of photos and countless hours of video. These recordings capture cherished moments such as weddings, family gatherings, and birthday parties, creating a vivid visual archive of human experiences and offering a unique window into contemporary life. When analyzed correctly, these images and videos can help reconstruct important events, from personal milestones to significant historical occurrences like war crimes, human rights violations, and terrorist acts. Each photo and video holds a wealth of information, often arranged in sequences based on timestamps and accompanied by text annotations, tags, or other metadata. However, handling this multimodal data presents challenges, particularly when some photos or videos lack annotations. A sophisticated method has been developed to manage these irregularities, ensuring relevant videos are displayed as evidence alongside direct answers derived from the input data. This system not only provides precise answers but also includes pertinent evidential photos or text snippets to substantiate the reasoning process. Examining every photo, video, or text query and answer manually is time-consuming. Therefore, it is crucial to identify relevant photos or videos quickly to verify answers efficiently. In this study, we propose a Long Short-Term Memory (LSTM) based visual question answering model integrated with convolutional layer non-linearity to tackle this challenge. The proposed adaptive memory network with attention mechanism model is designed and evaluated on common objects in context dataset with weight initialization and regularization, and model optimization. The Event-based Focal Visual-Content Text Attention (EFVCTA) framework is proposed in this paper. The system is designed on the basis of the proposed EFVCTA for the automated retrieval of historical events through the utilization of visual question answering (VQA) technology. The system can be used to identify the past event’s information from the photos of the various training programmes, workshops, conferences, annual social gatherings, etc. conducted in the academic institutions.

Version published to 10.20944/preprints202505.1872.v1
May 23, 2025

A Multimodal Information Mining and Classification Framework for Textual Content Understanding in Complex Video Scenes

This article has 3 authors:
1. Kinsley Harper
2. Wyne Nasir
3. Jaxon Everett
This article has no evaluationsLatest version May 16, 2025
Leveraging 3DCNN and Weighted Similarity Metrics forEnhanced Content-Based Video Retrieval

This article has 3 authors:
1. Farooq Shaik
2. Ashu Abdul
3. Jatindra Kumar Dash
This article has no evaluationsLatest version Jun 16, 2025
AI-Powered Fake News Detection Tool for Nepali Media

This article has 1 author:
1. Ghimire Plan
This article has no evaluationsLatest version May 30, 2025

Listed in

Abstract

Article activity feed

Related articles

A Multimodal Information Mining and Classification Framework for Textual Content Understanding in Complex Video Scenes

Leveraging 3DCNN and Weighted Similarity Metrics forEnhanced Content-Based Video Retrieval

AI-Powered Fake News Detection Tool for Nepali Media