Characterization of SARS-CoV-2 viral diversity within and across hosts

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

In light of the current COVID-19 pandemic, there is an urgent need to accurately infer the evolutionary and transmission history of the virus to inform real-time outbreak management, public health policies and mitigation strategies. Current phylogenetic and phylodynamic approaches typically use consensus sequences, essentially assuming the presence of a single viral strain per host. Here, we analyze 621 bulk RNA sequencing samples and 7,540 consensus sequences from COVID-19 patients, and identify multiple strains of the virus, SARS-CoV-2, in four major clades that are prevalent within and across hosts. In particular, we find evidence for (i) within-host diversity across phylogenetic clades, (ii) putative cases of recombination, multi-strain and/or superinfections as well as (iii) distinct strain profiles across geographical locations and time. Our findings and algorithms will facilitate more detailed evolutionary analyses and contact tracing that specifically account for within-host viral diversity in the ongoing COVID-19 pandemic as well as future pandemics.

Article activity feed

  1. SciScore for 10.1101/2020.05.07.083410: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Specifically, we used BWA (Li and Durbin, 2009) to map sequencing reads to the SARS-CoV-2 reference genome (NC 045512.2).
    BWA
    suggested: (BWA, RRID:SCR_010910)
    Subsequently, we marked PCR duplicates using GATK (DePristo et al., 2011).
    GATK
    suggested: (GATK, RRID:SCR_001876)
    Phylogenetic Analysis of Identified SARS-CoV-2 Viral Strains: We used RAxML (Stamatakis, 2014) from the Augur (Neher and Bedford, 2015) pipeline to generate the phylogenetic tree over the k = 25 viral strains.
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    Protein Structure and Mutation Analysis: To characterize the 27 missense mutations on protein structure, we searched on the protein databank [PDB, Berman et al. (2000)] and found the solved structure for three SARS-CoV-2 proteins, including nsp12 [PDB ID: 7BV1, (Yin et al., 2020)], nsp15 [PDB ID: 6VWW, (Kim et al., 2020)] and S protein [PDB ID: 6VXX, (Walls et al., 2020)].
    Mutation Analysis
    suggested: None

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.