Low data image based urban noise estimation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Self-supervised representation learning (SSL) methods have shown advances in learning invariant representations for computer vision tasks. However, their robustness under practical deployment conditions remains underexplored. Urban noise pollution represents a critical environmental challenge with significant public health implications. Traditional noise monitoring requires expensive sensor networks, limiting scalability. We introduce a machine learning framework predicting background noise from casual smartphone imagery with only 400 samples. Our pipeline combines transformer-based semantic segmentation (SegFormer) with interpretable hand-crafted features (natural/ manmade proportions, movable-object metrics, visual complexity) fed into Support Vector Regression. Critically, we introduce perceptually-grounded evaluation based on the ±3 dB just-noticeable difference threshold established in psychoacoustic research. Perceptually-adjusted evaluation (MAE=2.92 dB, RMSE=4.53 dB, R2 = 0.61) demonstrates accuracy comparable to large-scale systems while requiring dramatically fewer resources, enabling scalable citizenscience driven urban acoustic monitoring.

Article activity feed