UREPTrack: Unified RGB-Event Visual Tracking via PoolFormer Backbone

Min Lu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Visual Object Tracking (VOT) faces significant challenges under conditions such as fast motion, motion blur, and extreme illumination. RGB-only trackers often degrade in these scenarios, while event cameras provide microsecond latency and high dynamic range but lack rich spatial semantics. We introduce UREPTrack, a unified, single-stage, attention-free RGB-event tracker built on a lightweight PoolFormer backbone. Raw event data are voxelized into compact spatiotemporal tensors and, together with RGB template and search patches, embedded and concatenated into a single token stream processed by a shared backbone. A fully con- volutional head jointly predicts classification confidence, center offsets, and box size, eliminating the need for multi-branch Siamese pipelines and costly self-attention. UREPTrack achieves state-of-the-art performance, setting new benchmarks on COESOT (S 64.4, P 77.5, NP 76.2, BOC 23.7) at 170 FPS, VisEvent (S 55.46, SR0.5 67.01, SR0.75 46.96, P 71.58, NP 75.22), and FE108 (P 94.3, S 65.9). Ablation studies confirm (i) the complementarity of RGB and event modalities, (ii) the superiority of event voxelization over image-like alternatives, and (iii) favorable accuracy and effciency scaling across PoolFormer sizes. UREPTrack provides a practical, high-speed solution for real-time, multi-modal tracking on resource-constrained hardware. Our codes will be publicly released in https://github.com/HamadYA/UREPTrack.

Version published to 10.21203/rs.3.rs-7551764/v1 on Research Square
Sep 24, 2025

Diving Performance Analysis with 3D Motion Knowledge Hypergraphs

This article has 4 authors:
1. Jingbo Wang
2. Yifan Xie
3. Yitao Xie
4. Hongyu Xiao
This article has no evaluationsLatest version Sep 8, 2025
Object Tracking: A Comprehensive Survey From Classical Approaches to Large Vision-Language and Foundation Models

This article has 4 authors:
1. Rahul Raja
2. Arpita Vats
3. Omkar Thawakar
4. Tajamul Ashraf
This article has no evaluationsLatest version Oct 6, 2025
Object Tracking: A Comprehensive Survey From Classical Approaches to Large Vision-Language and Foundation Models

This article has 4 authors:
1. Rahul Raja
2. Arpita Vats
3. Omkar Thawakar
4. Tajamul Ashraf
This article has no evaluationsLatest version Oct 6, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Diving Performance Analysis with 3D Motion Knowledge Hypergraphs

Object Tracking: A Comprehensive Survey From Classical Approaches to Large Vision-Language and Foundation Models

Object Tracking: A Comprehensive Survey From Classical Approaches to Large Vision-Language and Foundation Models