Weakly Supervised Temporal Action Localization Based on Feature Enhancement

Hongying Zhang
Yi Yao

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Weakly-supervised Temporal Action Localization (WTAL) aims to accuratelylocalize and classify action instances in untrimmed long videos using only video-level annotations. Although most existing WTAL methods leverage pre-trainedfeature extractors to obtain RGB and optical flow features—thereby reducing computational costs—this strategy suffers from two critical limitations: (1)limited temporal receptive fields, resulting in inadequate exploitation of contextual information; and (2) interference from irrelevant background content,which degrades overall performance. To address these issues, we propose aFeature-Enhanced Network (FE-Net), which comprises three key components: theLocal Feature Expansion and Enhancement Module (LF-EEM), the Cross-modalFusion Enhancement Module (CEM), and the Cross-temporal Gated FeatureFusion Module (CGFF). Specifically, LF-EEM expands the temporal receptivefield to better capture complete action instances. CEM leverages the complementary nature of auxiliary and primary modalities to suppress background noise inthe primary modality through cross-modal fusion. Furthermore, CGFF employsa cross-temporal gating mechanism during feature fusion to emphasize salientchanges across time, replacing simple concatenation. Extensive experiments conducted on two large-scale benchmark datasets, THUMOS-14 and ActivityNetv1.2, demonstrate that FE-Net significantly enhances the performance of existingWTAL methods. These results validate the effectiveness of our proposed modulesand provide new insights for advancing temporal action localization under weak supervision.

Version published to 10.21203/rs.3.rs-6719748/v1 on Research Square
Jun 9, 2025

SGDet-Light: Synergistic Global-Local Learning for Efficient Small Object Detection

This article has 6 authors:
1. Di WU
2. ZhongZheng Liu
3. ZiHan Chen
4. YuePing Xiao
5. XiaoLin Zhu
6. Qin WAN
This article has no evaluationsLatest version Jul 28, 2025
Place Recognition Meet Multiple Modalities: A Comprehensive Review, Current Challenges and Future Development

This article has 4 authors:
1. Zhenyu Li
2. Tianyi Shang
3. Pengjie Xu
4. Zhaojun Deng
This article has no evaluationsLatest version Jun 17, 2025
A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement

This article has 3 authors:
1. Muhammad Azeem Aslam
2. Hassan Khalid
3. Nisar Ahmed
This article has no evaluationsLatest version Jun 27, 2025

Listed in

Abstract

Article activity feed

Related articles

SGDet-Light: Synergistic Global-Local Learning for Efficient Small Object Detection

Place Recognition Meet Multiple Modalities: A Comprehensive Review, Current Challenges and Future Development

A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement