AFR: Adaptive Feature Refinement for Fine-Grained Video Anomaly Detection

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Video anomaly detection (VAD) remains a challenging task, especially when identifying occluded small scale and transient anomalies in complex contexts. Existing methods (such as optical flow analysis, trajectory modeling, and sparse coding) often ignore enhancement of fine-grained target features and lack adaptive precise attention mechanisms, thus limiting the robustness and generalization ability of anomaly detection. To this end, we propose an adaptive feature refinement (AFR), AFR method to improve the performance of small-scale anomaly detection. The AFR method integrates small-object attention module (SAM) into the feature pyramid network (FPN) of the clipdriven multi-scale instance learning architecture to adaptively enhance the feature representation of key areas. In addition, we combine the comparison language-image pre-training (CLIP) model to enrich semantic information and improve the generalization ability across scenes. Specifically, the SAM module guides the model to pay attention to the discriminant patterns of small-scale anomalies through channel recalibration and spatial attention mechanisms, while the semantic prior of the CLIP model further strengthens the expression ability of visual features. The AFR method combines optimization of SAM and CLIP to show superior generalization performance in cross-scale and cross-scene anomaly detection tasks. Extensive experiments on two common benchmark datasets UCF-Crime and XDViolence show that the AFR method outperforms existing state-of-the-art methods in performance, verifying its effectiveness and migration in real-world video anomaly detection tasks.

Article activity feed