Generalized linear models provide a measure of virulence for specific mutations in SARS-CoV-2 strains

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

This study aims to highlight SARS-COV-2 mutations which are associated with increased or decreased viral virulence. We utilize genetic data from all strains available from GISAID and countries’ regional information, such as deaths and cases per million, as well as COVID-19-related public health austerity measure response times. Initial indications of selective advantage of specific mutations can be obtained from calculating their frequencies across viral strains. By applying modelling approaches, we provide additional information that is not evident from standard statistics or mutation frequencies alone. We therefore, propose a more precise way of selecting informative mutations. We highlight two interesting mutations found in genes N (P13L) and ORF3a (Q57H). The former appears to be significantly associated with decreased deaths and cases per million according to our models, while the latter shows an opposing association with decreased deaths and increased cases per million. Moreover, protein structure prediction tools show that the mutations infer conformational changes to the protein that significantly alter its structure when compared to the reference protein.

Article activity feed

  1. SciScore for 10.1101/2020.08.17.253484: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    phangorn, lme4, dfoptim, car, reshape2, ggplot2, gridExtra, PredictABEL, dplyr, tidyr, scales, ggpubr.
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)
    I-TASSER was selected for protein structure modelling, since it outperformed other servers according to results from the 13th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)[30]
    I-TASSER
    suggested: (I-TASSER, RRID:SCR_014627)
    The PyMOL software (https://pymol.org/2/) was used for the visualization of the protein molecules.
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)
    Protein-protein complexes were constructed using the ClusPro (v2.0)[33] and HDOCK[31] algorithms and binding affinities were calculated using the PRODIGY webserver[34].
    ClusPro
    suggested: (ClusPro, RRID:SCR_018248)
    The DynaMut webserver[36], was used to visualize non-covalent molecular interactions, calculated by the Arpeggio algorithm[37].
    Arpeggio
    suggested: (Arpeggio, RRID:SCR_010876)
    Finally, binding affinities and dissociation constants (Kd) were calculated using the PRODIGY webserver[34].
    PRODIGY
    suggested: None

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.