Edge-Deployed Context-Aware LLM Framework for Low-Latency Bi-Directional Gaze-Speech HRI

Yuxin Zhai
Yao-Tian Chian

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The integration of Large Language Models (LLMs) has advanced Human-Robot Interaction (HRI), enabling more natural and context-aware multimodal exchanges via gaze and speech. However, current cloud-reliant LLM-HRI frameworks introduce critical edge challenges: latency, network dependency, privacy risks, and high operational costs. To address these, we propose the Efficient Edge-Deployed Multi-modal Human-Robot Interaction (EM-HRI) framework. EM-HRI features a novel hybrid LLM architecture, combining lightweight local models for rapid intent recognition with a compact, edge-deployable LLM for complex contextual reasoning. Optimized multimodal perception and a dynamic context graph further ensure real-time efficiency, robust situational awareness, and interaction quality. Experimental results on an ambiguous robot manipulation task demonstrate EM-HRI substantially reduces response time and energy consumption, while maintaining or improving task completion, user confidence, and ambiguity resolution over cloud LLM solutions. These findings validate EM-HRI's potential for real-time, efficient, and intelligent HRI on edge devices, fostering more autonomous and reliable robotic applications.

Version published to 10.20944/preprints202602.0308.v1
Feb 4, 2026

Multi-Modal LLMs and Multi-Camera Tracking across an Edge- Fog-Cloud Architecture: Toward Inclusive Smart Urban Mobility

This article has 4 authors:
1. Mourad Raif
2. Abdessamad El Rharras
3. Rachid Saadane
4. Abdellah Chehri
This article has no evaluationsLatest version Mar 9, 2026
ConsciousDriver: A Context-Aware Multimodal Personalized Autonomous Driving System

This article has 2 authors:
1. Noah Fang
2. Salma Ali
This article has no evaluationsLatest version Jan 20, 2026
CMAFNet: Efficient Cross-Modal Alignment and Fusion for Real-Time RGB–Infrared Object Detection in Autonomous Driving

This article has 3 authors:
1. Zi-Han Huang
2. Chen-Wei Liang
3. Mu-Jiang-Shan Wang
This article has no evaluationsLatest version Mar 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Modal LLMs and Multi-Camera Tracking across an Edge- Fog-Cloud Architecture: Toward Inclusive Smart Urban Mobility

ConsciousDriver: A Context-Aware Multimodal Personalized Autonomous Driving System

CMAFNet: Efficient Cross-Modal Alignment and Fusion for Real-Time RGB–Infrared Object Detection in Autonomous Driving