Public Covid-19 X-ray datasets and their impact on model bias – A systematic review of a significant problem

Version published to 10.1016/j.media.2021.102225

Dec 1, 2021

SciScore for 10.1101/2021.02.15.21251775: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
For the indirect search, PubMed and preprint services (medRxiv, bioRxiv and arXiv) were queried with the search terms “COVID-19” & “X-ray” & “dataset”.	PubMed suggested: (PubMed, RRID:SCR_004846) bioRxiv suggested: (bioRxiv, RRID:SCR_003933) arXiv suggested: (arXiv, RRID:SCR_006500)
In the parallel indirect search to yield datasets from papers, all papers with less than 10 citations based on google scholar where filtered out and the remaining papers were analysed to extract the datasets employed.	google scholar suggested: (Google Scholar, RRID:SCR_008878)

Results from OddPub: We did not detect …

SciScore for 10.1101/2021.02.15.21251775: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
For the indirect search, PubMed and preprint services (medRxiv, bioRxiv and arXiv) were queried with the search terms “COVID-19” & “X-ray” & “dataset”.	PubMed suggested: (PubMed, RRID:SCR_004846) bioRxiv suggested: (bioRxiv, RRID:SCR_003933) arXiv suggested: (arXiv, RRID:SCR_006500)
In the parallel indirect search to yield datasets from papers, all papers with less than 10 citations based on google scholar where filtered out and the remaining papers were analysed to extract the datasets employed.	google scholar suggested: (Google Scholar, RRID:SCR_008878)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Some authors (Cohen et al. (2020c)) have already acknowledge that many datasets do not represent the real world distribution of cases, that the presence of selection bias is highly probable (particularly on case study collections), and therefore that clinical claims must take into account these limitations. However, the first step to tackle these issues is to have a good description of datasets in order to implement some strategy to reduce the bias. Or at least to be fully aware of model limitations and range of applicability. Unknown confounders and collider bias are not as problematic in prediction models as they are in causal inference (Griffith et al. (2020); Wynants et al. (2020)). However, model generalizability is compromised and its prediction power can only be maintained when training and target population remain similar and go through the same sampling mechanism. Even in this particular case, specifying the optimal target population cannot be done without knowing the training population characteristics. Recently, there are some recent efforts to address the general problem of bias in AI, and in particular regarding the use of human data. In Mitchell et al. (2019), for example, authors encourage transparent model reporting and propose a framework to describe many aspect of model building, including dataset description. General considerations about clinical prediction model (Steyerberg (2009)) are as relevant in AI models as in linear regression models, although in th...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Public Covid-19 X-ray datasets and their impact on model bias – A systematic review of a significant problem

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Assessment of reporting biases in studies included in Campbell Systematic Reviews: A systematic review

Towards Evaluating the Diagnostic Ability of LLMs

Comparative Analysis of Predictive Models for Individual Cancer Risk: Approaches and Applications