Contextual Knowledge Infusion via Iterative Semantic Tracing for Vision–Language Understanding

Maëlys Dubois
Yanis Lambert
Elodie Fairchild
Elise Berg

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The challenge of integrating external knowledge into visual reasoning frameworks has motivated a growing interest in models capable of bridging perceptual understanding with abstract, non-visual information. Unlike conventional visual question answering settings, knowledge-driven VQA demands a joint interpretation of visible cues and facts that are absent from the image itself. This paper introduces a new perspective on this task and proposes \textsc{KV-Trace}, a unified semantic tracing framework that emphasizes iterative knowledge refinement and structured visual interpretation. Instead of treating visual and knowledge modalities as homogeneous sources, our framework explicitly distinguishes their representational roles and organizes them into a progressive reasoning pipeline. Through a dynamic knowledge memory space and a query-sensitive semantic propagation mechanism, \textsc{KV-Trace} composes multi-stage reasoning steps that evolve according to the underlying question. Extensive experiments conducted on the KRVQR and FVQA benchmarks demonstrate that our model achieves improved reasoning depth and generalization capacity. Additional ablation studies further verify the contribution of each reasoning component and highlight the interpretability benefits gained from explicit knowledge structuring.

Version published to 10.20944/preprints202511.1075.v1
Nov 14, 2025

Differentiable Retrieval-Guided Multimodal Reasoning for Knowledge-Intensive Visual Question Understanding

This article has 4 authors:
1. Camille Dupuis
2. Juliette Declerck
3. Elodie Fairchild
4. Thomas Damme
This article has no evaluationsLatest version Oct 23, 2025
Cognitive Grounding for Visual Question Reasoning via Dynamic Knowledge Imagination

This article has 4 authors:
1. Madison Clarke
2. Ryan Dhaliwal
3. Brielle Monroe
4. Isabelle Chen
This article has no evaluationsLatest version Oct 27, 2025
Towards Explainable Language Reasoning via Multi-Modal Knowledge Graphs

This article has 7 authors:
1. chunyu lu
2. jun luo
3. kang yu
4. tianran chen
5. xueli wang
6. feng qian
7. chen xi
This article has no evaluationsLatest version Oct 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Differentiable Retrieval-Guided Multimodal Reasoning for Knowledge-Intensive Visual Question Understanding

Cognitive Grounding for Visual Question Reasoning via Dynamic Knowledge Imagination

Towards Explainable Language Reasoning via Multi-Modal Knowledge Graphs