Machine learning applied on chest x-ray can aid in the diagnosis of COVID-19: a first experience from Lombardy, Italy

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

We aimed to train and test a deep learning classifier to support the diagnosis of coronavirus disease 2019 (COVID-19) using chest x-ray (CXR) on a cohort of subjects from two hospitals in Lombardy, Italy.

Methods

We used for training and validation an ensemble of ten convolutional neural networks (CNNs) with mainly bedside CXRs of 250 COVID-19 and 250 non-COVID-19 subjects from two hospitals (Centres 1 and 2). We then tested such system on bedside CXRs of an independent group of 110 patients (74 COVID-19, 36 non-COVID-19) from one of the two hospitals. A retrospective reading was performed by two radiologists in the absence of any clinical information, with the aim to differentiate COVID-19 from non-COVID-19 patients. Real-time polymerase chain reaction served as the reference standard.

Results

At 10-fold cross-validation, our deep learning model classified COVID-19 and non-COVID-19 patients with 0.78 sensitivity (95% confidence interval [CI] 0.74–0.81), 0.82 specificity (95% CI 0.78–0.85), and 0.89 area under the curve (AUC) (95% CI 0.86–0.91). For the independent dataset, deep learning showed 0.80 sensitivity (95% CI 0.72–0.86) (59/74), 0.81 specificity (29/36) (95% CI 0.73–0.87), and 0.81 AUC (95% CI 0.73–0.87). Radiologists’ reading obtained 0.63 sensitivity (95% CI 0.52–0.74) and 0.78 specificity (95% CI 0.61–0.90) in Centre 1 and 0.64 sensitivity (95% CI 0.52–0.74) and 0.86 specificity (95% CI 0.71–0.95) in Centre 2.

Conclusions

This preliminary experience based on ten CNNs trained on a limited training dataset shows an interesting potential of deep learning for COVID-19 diagnosis. Such tool is in training with new CXRs to further increase its performance.

Article activity feed

  1. SciScore for 10.1101/2020.04.08.20040907: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This study has some limitations. First, we trained our model on a limited number of cases, from the same geographical area. We could improve performances and generalizability of our model by adding new images, in particular from different geographical regions than Lombardy. Second, the external validation set was only temporally separate but not geographically separate from the training one and also relatively small. Third, we did not include other data such as clinical conditions such as symptoms and pulse oximeter data as complementary information to be given to the AI model and the human readers, a perspective to be explored in future studies. In conclusion, we preliminarily showed that a CNN-based AI system applied to bedside CXR in patients suspected to be infected with COVID-19, even though trained on a limited number of cases, enable to reach a 0.80 sensitivity and a 0.81 specificity in an independent temporally separate patient group. The system could be used as a second opinion tool in studies aimed at assessing its usefulness for improving the final sensitivity and specificity in different geographical and temporal setting. Its performance could be improved by training on larger multi-institutional datasets.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.