Lightweight Pose-Based Shoplifting Detection for Retail: A Confidence-Weighted LSTM Approach using Human Pose Trajectories
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present a privacy-preserving shoplifting detector that operates solely on human pose trajectories, avoiding any identifiable imagery. The system standardizes 2D keypoints, derives kinematics-aware features (torso-normalized positions, velocities, and limb-orientation cues), and applies attention over a lightweight, causal LSTM to capture short, subtle actions typical of retail incidents. To handle pose noise and label ambiguity, we introduce a confidence-weighted training objective that scales each window’s contribution by per-sample pose quality (joint confidence, visible-joint ratio, limb-stability), preceded by a sliding-window quality-control stage with bounded interpolation and smoothing.We evaluate on a curated UCF-Crime shoplifting subset—30 videos after strict pose-quality control—and, using our Focal-weighted model, achieve 71% weighted accuracy, 77.23% AUC-ROC at the person-ID (PID) level, and 85% recall. Leveraging our PID testing aggregation method—averaging probabilities across QC-passed windows per individual—the approach exhibits notable robustness and stable performance across varied scenarios, delivering superior/comparable results to state-of-the-art pose-based anomaly detectors while remaining truly lightweight for seamless edge deployment. The classifier runs in real time on commodity CPUs when provided precomputed keypoints, enabling use where privacy policies prohibit pixel processing. Importantly, PID-level aggregation combined with confidence weighting materially reduces false alarms in deployment, improving practical reliability without increasing computational cost and offering a compact, privacy-compliant solution for multi-camera retail environments.