EYECON: A MULTIMODAL HUMAN-COMPUTER INTERFACE INTEGRATING GAZE TRACKING AND VOICE COMMANDS

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Human-Computer Interaction (HCI) continues to evolve towards more intuitive and accessible control paradigms, particularly for users with motor impairments. This paper presents EyeCon, a novel multimodal interface that synergizes real-time eye tracking with robust voice recognition to provide a hands-free method for system navigation and control. The system leverages a standard webcam and the Mediapipe framework for high-fidelity facial landmark and iris tracking, translating gaze direction into cursor movements on the screen. Concurrently, an offline voice recognition module processes vocal commands such as ”click,” ”scroll,” and ”open,” which are mapped to system-level actions. An integration layer serves as the core of EyeCon, intelligently fusing the continuous stream of gaze data with discrete voice commands to execute user intentions accurately. This dual-modal approach significantly reduces the ambiguity and cognitive load associated with purely gaze-based or voice-based systems, specifically addressing the ”Midas touch” problem. We detail the system architecture, implementation of core algorithms for calibration, gaze mapping, and command fusion, and present a comprehensive performance evaluation. The results demonstrate the system’s potential as an effective, low-cost, and accessible assistive technology that requires no specialized hardware.

Article activity feed