Dynamic Contextual Relational Alignment Network for Open-Vocabulary Video Visual Relation Detection

Linyu Lou
Jiarong Mo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Video Visual Relation Detection plays a central role in understanding complex video content by identifying evolving spatio-temporal interactions between object tracklets. However, current approaches are hindered by long-tailed predicate distributions, the gap between image-based semantics and video dynamics, and the challenge of generalizing to unseen relation categories. We introduce the Dynamic Contextual Relational Alignment Network (DCRAN), an end-to-end framework designed to address these issues. DCRAN integrates a spatio-temporal gating mechanism to enrich tracklet representations with surrounding context, a dynamic relational prompting module that produces adaptive predicate prompts for each subject--object pair, and a multi-granular semantic alignment module that jointly aligns object features and relational representations with their corresponding textual cues through hierarchical contrastive learning. Experiments on standard benchmarks show that DCRAN substantially improves the detection of both frequent and previously unseen relations, demonstrating the value of dynamic prompting and multi-level alignment for robust video relational understanding.

Version published to 10.20944/preprints202511.1974.v1
Nov 25, 2025

Contextualized Diverse Reasoning: Enhancing Video Question Answering with Multi-Perspective MLLM Pathways

This article has 2 authors:
1. Xuan Li
2. Haoran Zuo
This article has no evaluationsLatest version Jan 5, 2026
Reassessing Multimodal Pathways for Learning Action Meaning

This article has 4 authors:
1. Bastien Morel
2. Anaïs Coppens
3. Elodie Fairchild
4. Mathieu Hoorde
This article has no evaluationsLatest version Dec 22, 2025
EgoFusion: Unified Semantic and Scale-Aware Prompt Fusion for Egocentric Action Recognition

This article has 3 authors:
1. Hechenrui Fan
2. Huaihai Lyu
3. Chaofan Chen
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Contextualized Diverse Reasoning: Enhancing Video Question Answering with Multi-Perspective MLLM Pathways

Reassessing Multimodal Pathways for Learning Action Meaning

EgoFusion: Unified Semantic and Scale-Aware Prompt Fusion for Egocentric Action Recognition