Device-Free Hand Gesture Recognition with ESP32 Wi-Fi CSI: Formal Doppler Modeling and Lightweight Deep Learning

Saurav Chaudhari
Ketan Pise
Dinesh Fukate
Shantanu Gawande

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Wi-Fi Channel State Information (CSI) has emerged as a powerful modality for device-free gesture recognition, enabling human–computer interaction without cameras or wearables. Existing systems, however, often rely on PC-class network interface cards (NICs) and computationally heavy neural networks, which limits deployment in resource-constrained IoT settings. This paper presents a complete, mathematically grounded pipeline for non-contact hand gesture recognition using low-cost ESP32 modules that expose CSI. We model gesture-induced CSI as a superposition of static and Doppler-shifted multipath components, derive a time–frequency representation based on short-time Fourier transforms (STFT), and pose gesture recognition as a multi-class classification problem on CSI spectrogram tensors. A lightweight depthwise separable CNN (DS-CNN) front-end and gated recurrent unit (GRU) back-end form a compact deep architecture with fewer than 150,000 trainable parameters. An ESP32 AP–STA testbed at 2.4 GHz collects CSI at 100 Hz for ten alphanumeric gestures plus a steady class, yielding approximately 2,000 labeled trials from eight users. The proposed model attains 97.2% accuracy and macro F1-score of 0.971 in in-session evaluation and 92.1% accuracy in cross-session tests, with 20 ms median inference latency on a Raspberry Pi 4 edge node. We compare against an SVM with hand-crafted features and a heavier CNN baseline, analyze robustness to user orientation and distance, and discuss generalization through a learning-theoretic lens. The results demonstrate that ESP32-based Wi-Fi CSI, coupled with principled signal modeling and lightweight deep learning, can support practical, privacy-preserving gesture interfaces in smart environments.

Version published to 10.20944/preprints202602.0018.v1
Feb 2, 2026

Personalized Wrist–Forearm Static Gesture Recognition Using the Vicara Kai<sup>TM</sup> Controller and Convolutional Neural Network

This article has 1 author:
1. Jacek Józef Szedel
This article has no evaluationsLatest version Feb 2, 2026
Human Activity Recognition in the Deep Learning Era: Different Modalities, Recent Advances in Applications, and Emerging Techniques

This article has 2 authors:
1. Mohammad Osman Khan
2. Imran Khan Apu
This article has no evaluationsLatest version Dec 10, 2025
Joint Contactless Temperature, Humidity, and Occupancy Sensing via Wi-Fi Channel State Information on ESP32 Nodes

This article has 4 authors:
1. Saurav Chaudhari
2. Ketan Pise
3. Dinesh Fukate
4. Shantanu Gawande
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Personalized Wrist–Forearm Static Gesture Recognition Using the Vicara Kai<sup>TM</sup> Controller and Convolutional Neural Network

Human Activity Recognition in the Deep Learning Era: Different Modalities, Recent Advances in Applications, and Emerging Techniques

Joint Contactless Temperature, Humidity, and Occupancy Sensing via Wi-Fi Channel State Information on ESP32 Nodes