Sensing in Smart Cities: A Multimodal Machine Learning Perspective
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Smart cities rely on diverse sensing infrastructures, generating vast multimodal data from IoT devices, surveillance systems, health monitors and environmental sensors. The seamless integration and interpretation of such multimodal data is key to enabling intelligent decision-making and adaptive urban services. Multimodal machine learning (MML) offers a powerful paradigm that surpasses traditional unimodal and rule-based methods and enables the effective integration of heterogeneous data from diverse applications across transportation, public safety, healthcare and environmental monitoring domains in smart cities. This review surveys the role of MML in smart city sensing, covering key data modalities, enabling technologies and state-of-the-art MML techniques such as fusion methods and deep learning-based architectures. We identify leading challenges in both MML methods including alignment, scalability and modality-specific noise, as well as in urban deployment scenarios, including infrastructure constraints, privacy concerns and ethical implications. Finally, we suggest future research directions toward the development of scalable, interpretable and ethically informed MML systems for smart cities. This survey serves as a reference for AI researchers, urban planners and policymakers seeking to understand, design and deploy multimodal learning solutions for complex urban environments.