A Machine Learning Model Incorporating Laboratory Blood Tests Discriminates Between SARS-CoV-2 and Influenza Infections at Emergency Department Visit

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza virus are contagious respiratory pathogens with similar symptoms but require different treatment and management strategies. This study investigated whether laboratory blood tests can discriminate between SARS-CoV-2 and influenza infections at emergency department (ED) presentation.

Methods

723 influenza A/B positive (2018/1/1 to 2020/3/15) and 1,281 SARS-CoV-2 positive (2020/3/11 to 2020/6/30) ED patients were retrospectively analyzed. Laboratory test results completed within 48 hours prior to reporting of virus RT-PCR results, as well as patient demographics were included to train and validate a random forest (RF) model. The dataset was randomly divided into training (2/3) and testing (1/3) sets with the same SARS-CoV-2/influenza ratio. The Shapley Additive Explanations technique was employed to visualize the impact of each laboratory test on the differentiation.

Results

The RF model incorporating results from 15 laboratory tests and demographic characteristics discriminated SARS-CoV-2 and influenza infections, with an area under the ROC curve value 0.90 in the independent testing set. The overall agreement with the RT-PCR results was 83% (95% CI: 80-86%). The test with the greatest impact on the differentiation was serum total calcium level. Further, the model achieved an AUC of 0.82 in a new dataset including 519 SARS-CoV-2 ED patients (2020/12/1 to 2021/2/28) and the previous 723 influenza positive patients. Serum calcium level remained the most impactful feature on the differentiation.

Conclusion

We identified characteristic laboratory test profiles differentiating SARS-CoV-2 and influenza infections, which may be useful for the preparedness of overlapping COVID-19 resurgence and future seasonal influenza.

Article activity feed

  1. SciScore for 10.1101/2021.08.06.21261713: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    EthicsIRB: This study was approved by the Institutional Review Board (IRB) of Weill Cornell Medicine and deemed IRB exempt by the University of Buffalo.
    Sex as a biological variablenot detected.
    RandomizationThe whole data set was randomly split into a training set (2/3 of cases) and a testing set (1/3 cases) with the same ratio of SARS-CoV-2/influenza cases as the ratio for the overall cases.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Subsequently, a random forest classifier model was developed incorporating the results of 15 selected laboratory tests and patient age, gender, and race, using the Python scikit-learn package 0.23.2.
    Python
    suggested: (IPython, RRID:SCR_001658)
    scikit-learn
    suggested: (scikit-learn, RRID:SCR_002577)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    A study limitation is that our model’s performance has not been validated in a dataset including concurrent SARS-CoV-2 and influenza positive patients as Influenza RT-PCR testing was suspended from March to September 2020 to prioritize resources for SARS-CoV-2 testing. We attempted to collect new data from November 2020 to February 2021, however, there was only one influenza positive case during this time in our hospital ED. This observation was consistent with the extremely low level of seasonal influenza in North America12. Despite a lack of direct comparison, the characteristic profile of SARS-CoV-2 in comparison to influenza infection is still valid and has the potential to impact patient care. The performance of our model could be further improved when it is trained with more concurrent influenza and SARS-CoV-2 patient data.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.