Computing SARS-CoV-2 Infection Risk From Symptoms, Imaging, and Test Data: Diagnostic Model Development

This article has been Reviewed by the following groups

Read the full article

Abstract

Assigning meaningful probabilities of SARS-CoV-2 infection risk presents a diagnostic challenge across the continuum of care.

Objective

The aim of this study was to develop and clinically validate an adaptable, personalized diagnostic model to assist clinicians in ruling in and ruling out COVID-19 in potential patients. We compared the diagnostic performance of probabilistic, graphical, and machine learning models against a previously published benchmark model.

Methods

We integrated patient symptoms and test data using machine learning and Bayesian inference to quantify individual patient risk of SARS-CoV-2 infection. We trained models with 100,000 simulated patient profiles based on 13 symptoms and estimated local prevalence, imaging, and molecular diagnostic performance from published reports. We tested these models with consecutive patients who presented with a COVID-19–compatible illness at the University of California San Diego Medical Center over the course of 14 days starting in March 2020.

Results

We included 55 consecutive patients with fever (n=43, 78%) or cough (n=42, 77%) presenting for ambulatory (n=11, 20%) or hospital care (n=44, 80%). In total, 51% (n=28) were female and 49% (n=27) were aged <60 years. Common comorbidities included diabetes (n=12, 22%), hypertension (n=15, 27%), cancer (n=9, 16%), and cardiovascular disease (n=7, 13%). Of these, 69% (n=38) were confirmed via reverse transcription-polymerase chain reaction (RT-PCR) to be positive for SARS-CoV-2 infection, and 20% (n=11) had repeated negative nucleic acid testing and an alternate diagnosis. Bayesian inference network, distance metric learning, and ensemble models discriminated between patients with SARS-CoV-2 infection and alternate diagnoses with sensitivities of 81.6%-84.2%, specificities of 58.8%-70.6%, and accuracies of 61.4%-71.8%. After integrating imaging and laboratory test statistics with the predictions of the Bayesian inference network, changes in diagnostic uncertainty at each step in the simulated clinical evaluation process were highly sensitive to location, symptom, and diagnostic test choices.

Conclusions

Decision support models that incorporate symptoms and available test results can help providers diagnose SARS-CoV-2 infection in real-world settings.

Article activity feed

  1. SciScore for 10.1101/2020.09.18.20197582: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    25RT-PCR specificity of 99.8% is based on published data from Abbott Molecular.
    Abbott
    suggested: (Abbott, RRID:SCR_010477)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study has limitations. First, we used simulated patient data based on prevalence and conditional symptom probabilities to train and validate our DML and ensemble models that biased the ensemble model to heavily weight the DML model predictions. Second, the number of patients in our clinical test dataset was relatively small, and this dataset was enriched for SARS-CoV-2 positive patients due to the cancellation of all elective procedures and the use of telemedicine for almost all patient visits during the study period - leaving clinics and hospitals open primarily for COVID-19 patients and the acutely ill. Third, 80% of the patients in our clinical test dataset were from inpatient services, potentially biasing model accuracy by disease severity. Fourth, we chose as a reference standard the RT-PCR test results for SARS-CoV-2 infection despite outstanding questions about false negative rates in NAAT tests due to operator dependency and patient-level differences in viral loads across upper respiratory tract sites.Error! Bookmark not defined.,25 Overall, we found that Bayesian inference network, metric-learning model, and ensemble models trained and validated on a simulated patient dataset had sensitivities (81.6 – 84.2%) and specificities (58.8 – 70.6%) for discriminating between COVID-19 infection and other potential diagnoses in real clinical settings. These models had higher sensitivities than reported for most commonly used diagnostics, and model specificities were higher...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.