Development of a convolutional neural network to differentiate among the etiology of similar appearing pathological B lines on lung ultrasound: a deep learning study

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Lung ultrasound (LUS) is a portable, low-cost respiratory imaging tool but is challenged by user dependence and lack of diagnostic specificity. It is unknown whether the advantages of LUS implementation could be paired with deep learning (DL) techniques to match or exceed human-level, diagnostic specificity among similar appearing, pathological LUS images.

Design

A convolutional neural network (CNN) was trained on LUS images with B lines of different aetiologies. CNN diagnostic performance, as validated using a 10% data holdback set, was compared with surveyed LUS-competent physicians.

Setting

Two tertiary Canadian hospitals.

Participants

612 LUS videos (121 381 frames) of B lines from 243 distinct patients with either (1) COVID-19 (COVID), non-COVID acute respiratory distress syndrome (NCOVID) or (3) hydrostatic pulmonary edema (HPE).

Results

The trained CNN performance on the independent dataset showed an ability to discriminate between COVID (area under the receiver operating characteristic curve (AUC) 1.0), NCOVID (AUC 0.934) and HPE (AUC 1.0) pathologies. This was significantly better than physician ability (AUCs of 0.697, 0.704, 0.967 for the COVID, NCOVID and HPE classes, respectively), p<0.01.

Conclusions

A DL model can distinguish similar appearing LUS pathology, including COVID-19, that cannot be distinguished by humans. The performance gap between humans and the model suggests that subvisible biomarkers within ultrasound images could exist and multicentre research is merited.

Article activity feed

  1. SciScore for 10.1101/2020.10.13.20212258: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The model’s weights were initialized with weights originating from an Xception model that was pre-trained on ImageNet19, which are freely available via TensorFlow.
    TensorFlow
    suggested: (tensorflow, RRID:SCR_016345)
    All code was written in Python 3.7.
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study has some important limitations. The first relates to the opaqueness that is implicit to deep neural networks. Despite using Grad-CAM, the decisions by the trained model are not outwardly justified and we are unable to critique its methods and must trust its predictions. Our benchmarking survey did not exactly replicate the questions posed to our neural network which made our statistical analysis more complex than it might have needed to be otherwise. The other limitations of our study related to the data size and sources. Though our model performance signal was strong, the addition of further training data can only aid with generalizability of the model. Future work will focus on validation through multi-center collation of COVID LUS encounters. Lastly, our data were all from hospitalized patients and our results may not generalize to those who are less ill.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.