Multi-Modal LLMs and Multi-Camera Tracking across an Edge- Fog-Cloud Architecture: Toward Inclusive Smart Urban Mobility

Mourad Raif
Abdessamad El Rharras
Rachid Saadane
Abdellah Chehri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The integration of Generative Artificial-Intelligence (Gen-AI) with edge computing techniques opens a large spectrum of opportunities in intelligent transportation system (ITS), where emerging mobility infrastructures can significantly benefit vulnerable road users, and specifically visually impaired people (VIP). In this paper, we propose a human-aware, three-tier Edge-Fog-Cloud architecture which unlocks mutual recognition between vehicles and visually impaired individuals, through distributed multimodal perception, reasoning, and communication. Compared to conventional centralized frameworks, this framework presents the interaction of multi-camera tracking, multimodal large language models (MLLMs), and Generative AI in a coherent, latency-aware processes. At the Edge perception detectors reinforce privacy-preserving embeddings, and bidirectional lightweight-based communication protocols between pedestrians, and vehicles. The results are passed to the Fog level to process cross-camera data using graph-association and occlusion-aware transformer-based algorithms. The Cloud hosts the model hub, knowledge base, model governance, and updates aligned with ISO/IEC 42001, IEEE 7000, and NIST AI RMF standards. Multimodal large language models (MLLMs) provide grounded scene narration and instruction-based queries for inclusive interactions. The insights gained and methods showcased in this research highlight the potential of our architecture to extend ITS capabilities from perception to understanding, by combining generative recovery and Re-Identification in challenging environments, withing a trustworthy AI governance framework. Our contribution defines the reference framework and design principles for future Assistive Vehicular Intelligence.

Version published to 10.21203/rs.3.rs-8952135/v1 on Research Square
Mar 9, 2026

Edge-Deployed Context-Aware LLM Framework for Low-Latency Bi-Directional Gaze-Speech HRI

This article has 2 authors:
1. Yuxin Zhai
2. Yao-Tian Chian
This article has no evaluationsLatest version Feb 4, 2026
Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention

This article has 4 authors:
1. Shiyang Yan
2. Yanfeng Wu
3. Zhennan Liu
4. Chengwei Xie
This article has no evaluationsLatest version Mar 24, 2026
CMAFNet: Efficient Cross-Modal Alignment and Fusion for Real-Time RGB–Infrared Object Detection in Autonomous Driving

This article has 3 authors:
1. Zi-Han Huang
2. Chen-Wei Liang
3. Mu-Jiang-Shan Wang
This article has no evaluationsLatest version Mar 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Edge-Deployed Context-Aware LLM Framework for Low-Latency Bi-Directional Gaze-Speech HRI

Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention

CMAFNet: Efficient Cross-Modal Alignment and Fusion for Real-Time RGB–Infrared Object Detection in Autonomous Driving