Machine Learning for Prediction of Patients on Hemodialysis with an Undetected SARS-CoV-2 Infection

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

We developed a machine learning (ML) model that predicts the risk of a patient on hemodialysis (HD) having an undetected SARS-CoV-2 infection that is identified after the following ≥3 days.

Methods

As part of a healthcare operations effort, we used patient data from a national network of dialysis clinics (February–September 2020) to develop an ML model (XGBoost) that uses 81 variables to predict the likelihood of an adult patient on HD having an undetected SARS-CoV-2 infection that is identified in the subsequent ≥3 days. We used a 60%:20%:20% randomized split of COVID-19–positive samples for the training, validation, and testing datasets.

Results

We used a select cohort of 40,490 patients on HD to build the ML model (11,166 patients who were COVID-19 positive and 29,324 patients who were unaffected controls). The prevalence of COVID-19 in the cohort (28% COVID-19 positive) was by design higher than the HD population. The prevalence of COVID-19 was set to 10% in the testing dataset to estimate the prevalence observed in the national HD population. The threshold for classifying observations as positive or negative was set at 0.80 to minimize false positives. Precision for the model was 0.52, the recall was 0.07, and the lift was 5.3 in the testing dataset. Area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) for the model was 0.68 and 0.24 in the testing dataset, respectively. Top predictors of a patient on HD having a SARS-CoV-2 infection were the change in interdialytic weight gain from the previous month, mean pre-HD body temperature in the prior week, and the change in post-HD heart rate from the previous month.

Conclusions

The developed ML model appears suitable for predicting patients on HD at risk of having COVID-19 at least 3 days before there would be a clinical suspicion of the disease.

Article activity feed

  1. SciScore for 10.1101/2020.06.15.20131680: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: This analysis was performed in adherence with the Declaration of Helsinki under a protocol reviewed by New England Independent Review Board (NEIRB).
    Consent: This retrospective analysis was determined to be exempt and did not require consent (Needham Heights, MA, United States; NEIRB#1-17-1302368-1).
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.