SARS-CoV-2 exhibits intra-host genomic plasticity and low-frequency polymorphic quasispecies

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

In December 2019, an outbreak of atypical pneumonia (Coronavirus disease 2019 - COVID-19) associated with a novel coronavirus (SARS-CoV-2) was reported in Wuhan city, Hubei province, China. The outbreak was traced to a seafood wholesale market and human to human transmission was confirmed. The rapid spread and the death toll of the new epidemic warrants immediate intervention. The intra-host genomic variability of SARS-CoV-2 plays a pivotal role in the development of effective antiviral agents and vaccines, but also in the design of accurate diagnostics.

We analyzed NGS data derived from clinical samples of three Chinese patients infected with SARS-CoV-2, in order to identify small- and large-scale intra-host variations in the viral genome. We identified tens of low- or higher-frequency single nucleotide variations (SNVs) with variable density across the viral genome, affecting 7 out of 10 protein-coding viral genes. The majority of these SNVs corresponded to missense changes. The annotation of the identified SNVs but also of all currently circulating strain variations revealed colocalization of intra-host but also strain specific SNVs with primers and probes currently used in molecular diagnostics assays. Moreover, we de-novo assembled the viral genome, in order to isolate and validate intra-host structural variations and recombination breakpoints. The bioinformatics analysis disclosed genomic rearrangements over poly-A / poly-U regions located in ORF1ab and spike (S) gene, including a potential recombination hot-spot within S gene.

Our results highlight the intra-host genomic diversity and plasticity of SARS-CoV-2, pointing out genomic regions that are prone to alterations. The isolated SNVs and genomic rearrangements, reflect the intra-patient capacity of the polymorphic quasispecies, which may arise rapidly during the outbreak, allowing immunological escape of the virus, offering resistance to anti-viral drugs and affecting the sensitivity of the molecular diagnostics assays.

Article activity feed

  1. SciScore for 10.1101/2020.03.27.009480: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    We aligned the raw read data on reference strain MN975262.1 using bowtie2 [18], after quality check with FastQC (www.bioinformatics.bbsrc.ac.uk/projects/fastqc).
    bowtie2
    suggested: (Bowtie 2, RRID:SCR_016368)
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    After removing PCR duplicates, SNVs were called with a Bonferroni-corrected P-value threshold of 0.05 using samtools [20] and LoFreq [21].
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    LoFreq
    suggested: (LoFreq, RRID:SCR_013054)
    We annotated the variations to the reference strain using snpEff [22], SNVs effects were further filtered with snpSift [23] and we estimated the average mutation rate per gene across the viral genome using R scripts.
    snpEff
    suggested: (SnpEff, RRID:SCR_005191)
    snpSift
    suggested: (SnpSift, RRID:SCR_015624)
    To investigate intra-host genomic rearrangements, we performed de novo assembly of the SARS-CoV-2 genomes using Spades [25], and the resulting contigs were analyzed with BLAST [26] and confirmed by remapping of the raw reads.
    Spades
    suggested: (SPAdes, RRID:SCR_000131)
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.