Multi-population generalizability of a deep learning-based chest radiograph severity score for COVID-19
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
To tune and test the generalizability of a deep learning-based model for assessment of COVID-19 lung disease severity on chest radiographs (CXRs) from different patient populations.
A published convolutional Siamese neural network-based model previously trained on hospitalized patients with COVID-19 was tuned using 250 outpatient CXRs. This model produces a quantitative measure of COVID-19 lung disease severity (pulmonary x-ray severity (PXS) score). The model was evaluated on CXRs from 4 test sets, including 3 from the United States (patients hospitalized at an academic medical center (N = 154), patients hospitalized at a community hospital (N = 113), and outpatients (N = 108)) and 1 from Brazil (patients at an academic medical center emergency department (N = 303)). Radiologists from both countries independently assigned reference standard CXR severity scores, which were correlated with the PXS scores as a measure of model performance (Pearson R ). The Uniform Manifold Approximation and Projection (UMAP) technique was used to visualize the neural network results.
Tuning the deep learning model with outpatient data showed high model performance in 2 United States hospitalized patient datasets ( R = 0.88 and R = 0.90, compared to baseline R = 0.86). Model performance was similar, though slightly lower, when tested on the United States outpatient and Brazil emergency department datasets ( R = 0.86 and R = 0.85, respectively). UMAP showed that the model learned disease severity information that generalized across test sets.
A deep learning model that extracts a COVID-19 severity score on CXRs showed generalizable performance across multiple populations from 2 continents, including outpatients and hospitalized patients.
Article activity feed
-
-
SciScore for 10.1101/2020.09.15.20195453: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The model training was implemented in Python (version 3.6.9) with the Pytorch package (version 1.5.0), using the Adam optimizer19 (initial learning rate = 0.00002, β1 = 0.9, β2 = 0.999). Pytorchsuggested: (PyTorch, RRID:SCR_018536)Statistical tests were performed using the scipy Python package (version 1.1.0), with an a priori threshold for statistical significance set at P<0.05. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:There are …
SciScore for 10.1101/2020.09.15.20195453: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The model training was implemented in Python (version 3.6.9) with the Pytorch package (version 1.5.0), using the Adam optimizer19 (initial learning rate = 0.00002, β1 = 0.9, β2 = 0.999). Pytorchsuggested: (PyTorch, RRID:SCR_018536)Statistical tests were performed using the scipy Python package (version 1.1.0), with an a priori threshold for statistical significance set at P<0.05. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:There are limitations to this study. First, the reference standard label used for disease severity assessment on CXRs is determined by radiologists, which has inherent variability. We used the average of multiple radiologist raters for the reference standard to decrease the variability in this study. However, other reference standards such as CT-derived scores may be promising, as has been found using digitally reconstructed radiographs from CT.23 Second, while studying the technical properties of deep learning-based models like PXS score is necessary, making such CXR-based severity scores clinically useful in addressing the COVID-19 pandemic is a different avenue of important research. Future work into how radiologists and other clinicians can use the PXS score (and other developed lung disease severity scores) to guide patient management or workflows will be essential to deliver value.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-