Multi-Modal LLMs and Multi-Camera Tracking across an Edge- Fog-Cloud Architecture: Toward Inclusive Smart Urban Mobility

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The integration of Generative Artificial-Intelligence (Gen-AI) with edge computing techniques opens a large spectrum of opportunities in intelligent transportation system (ITS), where emerging mobility infrastructures can significantly benefit vulnerable road users, and specifically visually impaired people (VIP). In this paper, we propose  a human-aware, three-tier Edge-Fog-Cloud architecture which unlocks mutual recognition between vehicles and visually impaired individuals, through distributed multimodal perception, reasoning, and communication. Compared to conventional centralized frameworks, this framework presents the interaction of multi-camera tracking, multimodal large language models (MLLMs), and Generative AI in a coherent, latency-aware processes. At the Edge perception detectors reinforce privacy-preserving embeddings, and bidirectional lightweight-based communication protocols between pedestrians, and vehicles. The results are passed to the Fog level to process cross-camera data using graph-association and occlusion-aware transformer-based algorithms. The Cloud hosts the model hub, knowledge base, model governance, and updates aligned with ISO/IEC 42001, IEEE 7000, and NIST AI RMF standards. Multimodal large language models (MLLMs) provide grounded scene narration and instruction-based queries for inclusive interactions. The insights gained and methods showcased in this research highlight the potential of our architecture to extend ITS capabilities from perception to understanding, by combining generative recovery and Re-Identification in challenging environments, withing a trustworthy AI governance framework. Our contribution defines the reference framework and design principles for future Assistive Vehicular Intelligence.

Article activity feed