Transferring population group knowledge from multimodal large language model to small model: using urban safety perception evaluation as case study

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Human-centric urban planning curates environments that address behaviors, perceptions, and feelings. Conventional methods of understanding people's perceptions and feelings depend on questionnaires and surveys that are time-consuming and labor-intensive. Recent studies developed deep-learning-based computer vision tools to decipher the safety perception. Yet, this method only provides a generalized approximation of human perception and fails to incorporate personalized and local assessments.Large language models (LLM), trained with immersive human knowledge, have demonstrated the potential to simulate human understanding of images, space, and place.In view of these challenges and motivations, our study proposes an LLM-based framework that zero-shot adapts general safety perception evaluation models to specific demographics and perception scenarios. The framework first generates (1) general and (2) demographic and scenario-specific textual descriptions for street view images (SVIs) in specific areas. Then SVIs and general descriptions are used to train a safety perception model, and specific descriptions are used to fine-tune it via a pseudo-labeling strategy.To validate our method's effectiveness, we conduct experiments using individual-level safety perception data from Stockholm. The results show that general model's accuracy decreased by 19.7–25 percent when evaluating safety perception with a specific population group, while the accuracy of fine-tuned model with our method improved by 14.9–24 percent. We further employ this framework to map safety perceptions of four predefined demographic groups (middle-aged, elderly, women, and men) in Hong Kong across two perception scenarios: traffic accidents and crime.Our framework provides governments with a new tool for large-scale automated evaluation of urban perceptions across different groups.

Article activity feed