Insights on early mutational events in SARS-CoV-2 virus reveal founder effects across geographical regions

This article has been Reviewed by the following groups

Read the full article

Abstract

Here we aim to describe early mutational events across samples from publicly available SARS-CoV-2 sequences from the sequence read archive and GenBank repositories. Up until 27 March 2020, we downloaded 50 illumina datasets, mostly from China, USA (WA State) and Australia (VIC). A total of 30 datasets (60%) contain at least a single founder mutation and most of the variants are missense (over 63%). Five-point mutations with clonal (founder) effect were found in USA next-generation sequencing samples. Sequencing samples from North America in GenBank (22 April 2020) present this signature with up to 39% allele frequencies among samples ( n = 1,359). Australian variant signatures were more diverse than USA samples, but still, clonal events were found in these samples. Mutations in the helicase, encoded by the ORF1ab gene in SARS-CoV-2 were predominant, among others, suggesting that these regions are actively evolving. Finally, we firmly urge that primer sets for diagnosis be carefully designed, since rapidly occurring variants would affect the performance of the reverse transcribed quantitative PCR (RT-qPCR) based viral testing.

Article activity feed

  1. SciScore for 10.1101/2020.04.09.034462: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Data Collection: Raw illumina sequencing data were downloaded from the following NCBI SRA BioProjects: SRA: PRJNA601736 (Chinese datasets), SRA: PRJNA603194 (Chinese dataset) (Wu et al. 2020b), SRA: PRJNA605907 (Chinese datasets) (Shen et al. 2020), SRA: PRJNA607948 (USA-Wisconsin datasets), SRA: PRJNA608651 (Nepal dataset), SRA: PRJNA610428 (USA-Washington datasets), SRA: PRJNA612578 (USA-San-Diego dataset), SRA: PRJNA231221 (USA-Washington dataset) (Sichtig et al. 2019), SRA: PRJNA613958 (Australian-Victoria datasets), SRA: PRJNA231221 (USA-Maryland dataset), and SRA: PRJNA614995 (USA-Utah datasets).
    NCBI SRA BioProjects
    suggested: None
    Data processing: Raw reads were aligned with bowtie2 aligner (v2.2.6) (Langmead & Salzberg 2012) against SARS-CoV-2 reference genome NC_045512.2 (https://www.ncbi.nlm.nih.gov/nuccore/NC_045512), using the following parameters: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50.
    bowtie2
    suggested: (Bowtie 2, RRID:SCR_016368)
    Samtools v1.9 (using htslib v1.9) (Li et al. 2009) was used to sort sam files, remove duplicate reads and index bam files. bcftools v1.9 (part of the samtools framework) was used to obtain depth of coverage in each aligned sample.
    Samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.