Street images classification according to COVID-19 risk in Lima, Peru: a convolutional neural networks feasibility analysis

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

During the COVID-19 pandemic, convolutional neural networks (CNNs) have been used in clinical medicine (eg, X-rays classification). Whether CNNs could inform the epidemiology of COVID-19 classifying street images according to COVID-19 risk is unknown, yet it could pinpoint high-risk places and relevant features of the built environment. In a feasibility study, we trained CNNs to classify the area surrounding bus stops (Lima, Peru) into moderate or extreme COVID-19 risk.

Design

CNN analysis based on images from bus stops and the surrounding area. We used transfer learning and updated the output layer of five CNNs: NASNetLarge, InceptionResNetV2, Xception, ResNet152V2 and ResNet101V2. We chose the best performing CNN, which was further tuned. We used GradCam to understand the classification process.

Setting

Bus stops from Lima, Peru. We used five images per bus stop.

Primary and secondary outcome measures

Bus stop images were classified according to COVID-19 risk into two labels: moderate or extreme.

Results

NASNetLarge outperformed the other CNNs except in the recall metric for the moderate label and in the precision metric for the extreme label; the ResNet152V2 performed better in these two metrics (85% vs 76% and 63% vs 60%, respectively). The NASNetLarge was further tuned. The best recall (75%) and F1 score (65%) for the extreme label were reached with data augmentation techniques. Areas close to buildings or with people were often classified as extreme risk.

Conclusions

This feasibility study showed that CNNs have the potential to classify street images according to levels of COVID-19 risk. In addition to applications in clinical medicine, CNNs and street images could advance the epidemiology of COVID-19 at the population level.

Article activity feed

  1. SciScore for 10.1101/2021.09.06.21263188: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    We used the Python libraries google_streetview.
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Future work should take under consideration the potential caveats of imbalanced data to improve the classification accuracy for the extreme risk label and other labels with fewer observations. Ongoing and future work includes the development of a classification model for the four outcome labels (i.e., moderate, high, very high and extreme COVID-19 risk). We will implement techniques that can potentiate the classification capacity of the neural networks, including ensemble models,11 novel loss functions not currently implemented in the Keras environment (e.g., squared earth mover’s distance-based loss function),12 and we may try other architectures (e.g., SqueezeNet13) with similar precision yet less computationally expensive. Strengths and limitations: In this preliminary work, we followed a pre-defined analytical protocol which included transfer learning leveraging on large and deep neural networks trained with millions of images (ImageNet). We still had to train the parameters of the output layer, for which we did not have a massive number of images. Future work could expand our analysis with information and images from more bus stops or other public spaces to train a more robust model. Ideally, these images should come from different cities. This information may be available in other countries. There are further limitations we must acknowledge. First, the images and labels were not synchronic; that is, the figures and the labels were not collected on the same date. This is...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.