Device-Free Hand Gesture Recognition with ESP32 Wi-Fi CSI: Formal Doppler Modeling and Lightweight Deep Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Wi-Fi Channel State Information (CSI) has emerged as a powerful modality for device-free gesture recognition, enabling human–computer interaction without cameras or wearables. Existing systems, however, often rely on PC-class network interface cards (NICs) and computationally heavy neural networks, which limits deployment in resource-constrained IoT settings. This paper presents a complete, mathematically grounded pipeline for non-contact hand gesture recognition using low-cost ESP32 modules that expose CSI. We model gesture-induced CSI as a superposition of static and Doppler-shifted multipath components, derive a time–frequency representation based on short-time Fourier transforms (STFT), and pose gesture recognition as a multi-class classification problem on CSI spectrogram tensors. A lightweight depthwise separable CNN (DS-CNN) front-end and gated recurrent unit (GRU) back-end form a compact deep architecture with fewer than 150,000 trainable parameters. An ESP32 AP–STA testbed at 2.4 GHz collects CSI at 100 Hz for ten alphanumeric gestures plus a steady class, yielding approximately 2,000 labeled trials from eight users. The proposed model attains 97.2% accuracy and macro F1-score of 0.971 in in-session evaluation and 92.1% accuracy in cross-session tests, with 20 ms median inference latency on a Raspberry Pi 4 edge node. We compare against an SVM with hand-crafted features and a heavier CNN baseline, analyze robustness to user orientation and distance, and discuss generalization through a learning-theoretic lens. The results demonstrate that ESP32-based Wi-Fi CSI, coupled with principled signal modeling and lightweight deep learning, can support practical, privacy-preserving gesture interfaces in smart environments.