Algorithmic Fairness and Bias Mitigation for Clinical Machine Learning: Insights from Rapid COVID-19 Diagnosis by Adversarial Learning

This article has been Reviewed by the following groups

Read the full article

Abstract

Machine learning is becoming increasingly prominent in healthcare. Although its benefits are clear, growing attention is being given to how machine learning may exacerbate existing biases and disparities. In this study, we introduce an adversarial training framework that is capable of mitigating biases that may have been acquired through data collection or magnified during model development. For example, if one class is over-presented or errors/inconsistencies in practice are reflected in the training data, then a model can be biased by these. To evaluate our adversarial training framework, we used the statistical definition of equalized odds. We evaluated our model for the task of rapidly predicting COVID-19 for patients presenting to hospital emergency departments, and aimed to mitigate regional (hospital) and ethnic biases present. We trained our framework on a large, real-world COVID-19 dataset and demonstrated that adversarial training demonstrably improves outcome fairness (with respect to equalized odds), while still achieving clinically-effective screening performances (NPV > 0.98). We compared our method to the benchmark set by related previous work, and performed prospective and external validation on four independent hospital cohorts. Our method can be generalized to any outcomes, models, and definitions of fairness.

Article activity feed

  1. SciScore for 10.1101/2022.01.13.22268948: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    They found that machine learning-based screening tests – CURIAL-1.0 [12] and CURIAL-Rapide/-Lab [13] – could rapidly detect COVID-19 amongst patients presenting to ED, and performed effectively as tests-of-exclusion (quick identification of patients who are most likely to test negative) during external validation across three NHS trusts.
    CURIAL-Rapide/-Lab
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    One limitation could be that our data was skewed with respect to the protected features. As we are using neural networks, different distributions of the protected label can give significantly different results for the adversary model. This has previously been discussed [16], as using balanced data was found to have a much stronger effect on adversarial training. Thus, future experiments would greatly benefit from balanced training data. With respect to debiasing against ethnicity, one limitation is the ambiguity of certain categories, namely, “Unknown,” “Mixed,” and “Other.” In our experiments, we kept these categories in order to maximize the number of cases (especially COVID-19 positive cases) used in training. This may have impacted the adversary network’s ability to confidently differentiate between different ethnicities, hindering its influence on the main network. Bias may also still exist with respect to data missingness. Although we used population median imputation to “fill-in” missing values, the nature of the missing data may have conveyed important information, or reflected biases such as differences in access, practice, or recording protocols. Another limitation is the difficulty in understanding how social, behavioral, and genetic factors independently and collectively impact outcomes. For example, consistent genetic effects across racial groups can result in genetic variants with a common biological effect; however, that effect can also be modified by both envi...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.