Sensing in Smart Cities: A Multimodal Machine Learning Perspective
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Smart cities generate vast multimodal data from IoT devices, surveillance systems, health monitors, and environmental monitoring infrastructure. The seamless integration and interpretation of such multimodal data is essential for intelligent decision-making and adaptive urban services. Multimodal machine learning (MML) provides a unified framework to fuse and analyze diverse sources, surpassing conventional unimodal and rule-based approaches. This review surveys the role of MML in smart city sensing across mobility, public safety, healthcare, and environmental domains, outlining key data modalities, enabling technologies and state-of-the-art fusion architectures. We analyze major methodological and deployment challenges, including data alignment, scalability, modality-specific noise, infrastructure limitations, privacy, and ethics, and identify future directions toward scalable, interpretable, and responsible MML for urban systems. This survey serves as a reference for AI researchers, urban planners, and policymakers seeking to understand, design, and deploy multimodal learning solutions for intelligent urban sensing frameworks.