Domain Shifts in Machine Learning Based Covid-19 Diagnosis From Blood Tests

This article has been Reviewed by the following groups

Read the full article

Abstract

Many previous studies claim to have developed machine learning models that diagnose COVID-19 from blood tests. However, we hypothesize that changes in the underlying distribution of the data, so called domain shifts, affect the predictive performance and reliability and are a reason for the failure of such machine learning models in clinical application. Domain shifts can be caused, e.g., by changes in the disease prevalence (spreading or tested population), by refined RT-PCR testing procedures (way of taking samples, laboratory procedures), or by virus mutations. Therefore, machine learning models for diagnosing COVID-19 or other diseases may not be reliable and degrade in performance over time. We investigate whether domain shifts are present in COVID-19 datasets and how they affect machine learning methods. We further set out to estimate the mortality risk based on routinely acquired blood tests in a hospital setting throughout pandemics and under domain shifts. We reveal domain shifts by evaluating the models on a large-scale dataset with different assessment strategies, such as temporal validation. We present the novel finding that domain shifts strongly affect machine learning models for COVID-19 diagnosis and deteriorate their predictive performance and credibility. Therefore, frequent re-training and re-assessment are indispensable for robust models enabling clinical utility.

Article activity feed

  1. SciScore for 10.1101/2021.04.06.21254997: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    In particular, the model classes RF, KNN and SVM are trained with the scikit-learn package 0.22.1.
    scikit-learn
    suggested: (scikit-learn, RRID:SCR_002577)
    XGB is trained with the XGBClassifier from the Python package XGBoost 1.3.1.
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    One limitation of our work could be that we did not evaluate the generalization of our model to other hospitals. A transfer of a COVID-19 diagnostic model should only be done with thorough re-assessments, as a domain shift between hospitals might be present. Besides others, such domain shifts from one institution to another could result from different testing strategies, laboratory equipment or demographics of the population in the hospital catchment area. Re-training of models rather than transferring to another hospital should be considered to obtain a skilled and trustworthy model. However, this is not part of our investigation. Our findings and suggestions about domain shifts should be accounted for in all hospitals when applying a COVID-19 model. We evaluate our models on different cohorts to show the high performance as well as to reveal the domain shifts. However, the 2020 cohort only contains subjects that were tested for COVID-19 and where a blood test was taken. Hence, the 2020 cohort only is a subset of the total patient cohort on which the model will be applied. To counteract missing samples from a particular group, we also use the pre-pandemic negatives, which should cover a wide variety of negatives due to the large data set. An evaluation of all blood tests of 2020 just is not possible due to the lack of RT-PCR tests which serve as labels in our ML approach. Non-tested subjects of 2020 cannot be assumed to be negatives, therefore we discard them. This could onl...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.