Variants in SARS-CoV-2 associated with mild or severe outcome

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Introduction

The coronavirus disease 2019 (COVID-19) pandemic is a global public health emergency causing a disparate burden of death and disability around the world. The viral genetic variants associated with outcome severity are still being discovered.

Methods

We downloaded 155 958 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from GISAID. Of these genomes, 3637 samples included useable metadata on patient outcomes. Using this subset, we evaluated whether SARS-CoV-2 viral genomic variants improved prediction of reported severity beyond age and region. First, we established whether including genomic variants as model features meaningfully increased the predictive power of our model. Next, we evaluated specific variants in order to determine the magnitude of association with severity and the frequency of these variants among SARS-CoV-2 genomes.

Results

Logistic regression models that included viral genomic variants outperformed other models (area under the curve = 0.91 as compared with 0.68 for age and gender alone; P < 0.001). We found 84 variants with odds ratios greater than 2 for outcome severity (17 and 67 for higher and lower severity, respectively). The median frequency of associated variants was 0.15% (interquartile range 0.09–0.45%). Altogether 85% of genomes had at least one variant associated with patient outcome.

Conclusion

Numerous SARS-CoV-2 variants have 2-fold or greater association with odds of mild or severe outcome and collectively, these variants are common. In addition to comprehensive mitigation efforts, public health measures should be prioritized to control the more severe manifestations of COVID-19 and the transmission chains linked to these severe cases.

Lay summary: This study explores which, if any, SARS-CoV-2 viral genomic variants are associated with mild or severe COVID-19 patient outcomes. Our results suggest that there are common genomic variants in SARS-CoV-2 that are more often associated with negative patient outcomes, which may impact downstream public health measures.

Article activity feed

  1. Hebah Al Khatib

    Review 2: "Variants in SARS-CoV-2 Associated with Mild or Severe Outcome"

    This preprint reports viral variants can improve classification of COVID-19 outcomes as compared with models using only age and region, with some individual variants associated with disease severity. Reviewers suggest major revisions to improve and clarify data analysis.

  2. Min Xie

    Review 1: "Variants in SARS-CoV-2 Associated with Mild or Severe Outcome"

    This preprint reports viral variants can improve classification of COVID-19 outcomes as compared with models using only age and region, with some individual variants associated with disease severity. Reviewers suggest major revisions to improve and clarify data analysis.

  3. SciScore for 10.1101/2020.12.01.20242149: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    FASTA sequences were aligned to the reference sequence, Wuhan-Hu-1 (NCBI: NC_045512.2; GISAID: EPI_ISL_402125) using Minimap2 (version 2.17).(13) Resulting VCF (Variant Call Format) files were annotated using SnpEff (version 5.0) and filtered using SnpSift.(14, 15) The shell scripts used for variant alignment and variant calling, along with the Python scripts used to perform the steps described below, are available on GitHub at https://github.com/mskar/variants.
    Minimap2
    suggested: (Minimap2, RRID:SCR_018550)
    SnpEff
    suggested: (SnpEff, RRID:SCR_005191)
    Python
    suggested: (IPython, RRID:SCR_001658)
    Variant and Metadata Modeling: Annotated VCF files were parsed, pivoted to wide format, and joined with GISAID patient data using Pandas (version 1.0.3).(16) Logistic regression models with the default L1 penalty (Lasso regularization) were fit to the patient (rows) and variant (columns) data using Scikit-learn (version 0.23.2).(17) Logistic regression model Area Under the Curve (AUC) and accuracy values were calculated using Scikit-learn.(17) Models were persisted as pickle files using joblib (version 0.14.1).
    Scikit-learn
    suggested: (scikit-learn, RRID:SCR_002577)
    Plotting and Statistical Analysis: Scatter and bar plots were created using Pandas (version 1.0.3),(16) Matplotlib (version 3.2.1),(18) and Seaborn (version 0.10.1).(19) Logistic regression model AUC p-values and Chi-square test p-values for association of variants with “Severe” outcomes were obtained using Scipy (version 1.5.0).(20) Variant frequency was calculated using Pandas.(16) Genome position tracks were added to scatterplots using DNA Features Viewer (version 3.0.3).(21) ROC curves were plotted using Scikit-learn (version 0.23.2),(17) and Matplotlib.(18) Logistic regression model Area Under the Curve (AUC) and accuracy values were calculated using Scikit-learn.(17)
    Matplotlib
    suggested: (MatPlotLib, RRID:SCR_008624)
    Scipy
    suggested: (SciPy, RRID:SCR_008058)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The COVID-19 pandemic demonstrated the limitations of the global healthcare system in intensive care units, mechanical ventilators, and emerging therapeutics and other medical countermeasures.(46, 47) Early in the outbreak, cities such as New York became inundated with infections and their ability to adequately sort and treat patients was quickly overwhelmed.(48) The existence of a rapid and accurate tool that could help identify COVID-19 patients or clusters that are more likely to experience severe symptoms or require intensive medical resources (e.g., inpatient hospitalization and ventilation) may be able to help healthcare systems allocate resources to the regions with the most critical needs. Therefore, by providing a molecular risk factor for more severe outcomes, these findings could help prioritize limited treatment supplies to those at greatest risk, particularly as therapeutic interventions for infectious disease often need to be given early in the disease course (e.g., empiric antivirals for influenza). There are limitations with our analyses. First, the SARS-CoV-2 genomes uploaded to GISAID are not necessarily representative of all circulating genomes, which can introduce a selection or sampling bias into our analyses based on region, patient severity, or other unmeasured factors. In Supplemental Figure S2 we show the sampling patterns over time by our patient severity categorization. We sought to mitigate these limitations by eliminating the categories that had a...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.