Integrated Multimodal Data Pipelines for Intelligent Security Monitoring in Smart Cities
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study proposes an integrated multimodal data pipeline designed for intelligent security monitoring in smart cities. A unified neural architecture was developed to simultaneously process video frames, acoustic signals, and environmental metrics through parallel encoding and late fusion. To address class imbalance and data scarcity, a hybrid training strategy combining synthetic and field-labeled samples was adopted, together with a weighted cross-entropy loss function. Experimental evaluations were conducted across three metropolitan districts with 1,800 hours of multimodal data and a total of 5,600 annotated event samples. The proposed system achieved an F1-score of 0.947, representing an average improvement of 8.3% compared with single-modality baselines. Even under sensor dropout conditions, performance degradation was limited to less than 3%. In terms of efficiency, the model converged within 30 epochs and maintained an inference latency of 18.7 ms, close to lightweight baseline models while outperforming them in accuracy. Furthermore, Bayesian uncertainty estimation confirmed that 95% of predictions fell within confidence intervals, validating the robustness and reliability of the proposed framework. These findings highlight the potential of integration-centric approaches for building scalable, fault-tolerant, and high-accuracy surveillance infrastructures in smart cities.