Deep learning models for COVID-19 chest x-ray classification: Preventing shortcut learning using feature disentanglement

Anusua Trivedi
Caleb Robinson
Marian Blazes
Anthony Ortiz
Jocelyn Desbiens
Sunil Gupta
Rahul Dodhia
Pavan K. Bhatraju
W. Conrad Liles
Jayashree Kalpathy-Cramer
Aaron Y. Lee
Juan M. Lavista Ferres

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (ScreenIT)

Abstract

In response to the COVID-19 global pandemic, recent research has proposed creating deep learning based models that use chest radiographs (CXRs) in a variety of clinical tasks to help manage the crisis. However, the size of existing datasets of CXRs from COVID-19+ patients are relatively small, and researchers often pool CXR data from multiple sources, for example, using different x-ray machines in various patient populations under different clinical scenarios. Deep learning models trained on such datasets have been shown to overfit to erroneous features instead of learning pulmonary characteristics in a phenomenon known as shortcut learning. We propose adding feature disentanglement to the training process. This technique forces the models to identify pulmonary features from the images and penalizes them for learning features that can discriminate between the original datasets that the images come from. We find that models trained in this way indeed have better generalization performance on unseen data; in the best case we found that it improved AUC by 0.13 on held out data. We further find that this outperforms masking out non-lung parts of the CXRs and performing histogram equalization, both of which are recently proposed methods for removing biases in CXR datasets.

Version published to 10.1371/journal.pone.0274098
Oct 6, 2022
ScreenIT
Mar 1, 2021
SciScore for 10.1101/2021.02.11.20196766: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
Clinicians should be aware of potential limitations and biases when incorporating model predictions into their clinical assessment. Finally, our approach has potential clinical applications beyond automated diagnosis. CXR diagnostic models that rely on relevant pulmonary findings may be also useful for the development of prognostic models, by combining the CXR information with other clinical and demographic data to predict which …
SciScore for 10.1101/2021.02.11.20196766: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
Clinicians should be aware of potential limitations and biases when incorporating model predictions into their clinical assessment. Finally, our approach has potential clinical applications beyond automated diagnosis. CXR diagnostic models that rely on relevant pulmonary findings may be also useful for the development of prognostic models, by combining the CXR information with other clinical and demographic data to predict which patients are at risk for severe disease.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2021.02.11.20196766 on medRxiv
Feb 13, 2021

Efficient Chest X-Ray Feature Extraction and Feature Fusion for Pneumonia Detection Using Lightweight Pretrained Deep Learning Models

This article has 3 authors:
1. Yashvi Chandola
2. Vivek Uniyal
3. Yamini Bachheti
This article has no evaluationsLatest version Jun 30, 2025
Multimodal Deep Learning for ARDS Detection

This article has 6 authors:
1. Stefan Broecker
2. Jason Y. Adams
3. Girish Kumar
4. Rachael A. Callcut
5. Yuan Ni
6. Thomas Strohmer
This article has no evaluationsLatest version Aug 12, 2025
Development and International Validation of a Deep Learning Model for Predicting Acute Pancreatitis Severity from CT Scans

This article has 28 authors:
1. Yanqi Xu
2. Brigitta Teutsch
3. Weicheng Zeng
4. Yang Hu
5. Shikhar Rastogi
6. Emmy Yuebi Hu
7. Isabella DeGregorio
8. Wan Fung Chui
9. Benjamin I. Richter
10. Ryan Cummings
11. Julia E. Goldberg
12. Edwin Mathieu
13. Belinda Appiah Asare
14. Péter Hegedűs
15. Kriszta-Beáta Gurza
16. István Viktor Szabó
17. Hedvig Tarján
18. Andrea Szentesi
19. Ruben Borbély
20. Dorottya Molnár
21. Nándor Faluhelyi
22. Áron Vincze
23. Katalin Márta
24. Péter Hegyi
25. Qi Lei
26. Tamas Gonda
27. Chenchan Huang
28. Yiqiu Shen
This article has no evaluationsLatest version Jul 7, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Efficient Chest X-Ray Feature Extraction and Feature Fusion for Pneumonia Detection Using Lightweight Pretrained Deep Learning Models

Multimodal Deep Learning for ARDS Detection

Development and International Validation of a Deep Learning Model for Predicting Acute Pancreatitis Severity from CT Scans