Insights on early mutational events in SARS-CoV-2 virus reveal founder effects across geographical regions
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (PeerJ)
- Evaluated articles (ScreenIT)
Abstract
Here we aim to describe early mutational events across samples from publicly available SARS-CoV-2 sequences from the sequence read archive repository. Up until March 27, 2020, we downloaded 53 illumina datasets, mostly from China, USA (Washington DC) and Australia (Victoria). Of 30 high quality datasets, 27 datasets (90%) contain at least a single founder mutation and most of the variants are missense (over 63%). Five-point mutations with clonal (founder) effect were found in USA sequencing samples. Sequencing samples from USA in GenBank present this signature with 50% allele frequencies among samples. Australian mutation signatures were more diverse than USA samples, but still, clonal events were found in those samples. Mutations in the helicase and orf1a coding regions from SARS-CoV-2 were predominant, among others, suggesting that these proteins are prone to evolve by natural selection. Finally, we firmly urge that primer sets for diagnosis be carefully designed, since rapidly occurring variants would affect the performance of the reverse transcribed quantitative PCR (RT-qPCR) based viral testing.
Article activity feed
-
SciScore for 10.1101/2020.04.09.034462: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Data Collection: Raw illumina sequencing data were downloaded from the following NCBI SRA BioProjects: SRA: PRJNA601736 (Chinese datasets), SRA: PRJNA603194 (Chinese dataset) (Wu et al. 2020b), SRA: PRJNA605907 (Chinese datasets) (Shen et al. 2020), SRA: PRJNA607948 (USA-Wisconsin datasets), SRA: PRJNA608651 (Nepal dataset), SRA: PRJNA610428 (USA-Washington datasets), SRA: PRJNA612578 (USA-San-Diego dataset), SRA: PRJNA231221 (USA-Washington dataset) (Sichtig et al. 2019), SRA: … SciScore for 10.1101/2020.04.09.034462: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Data Collection: Raw illumina sequencing data were downloaded from the following NCBI SRA BioProjects: SRA: PRJNA601736 (Chinese datasets), SRA: PRJNA603194 (Chinese dataset) (Wu et al. 2020b), SRA: PRJNA605907 (Chinese datasets) (Shen et al. 2020), SRA: PRJNA607948 (USA-Wisconsin datasets), SRA: PRJNA608651 (Nepal dataset), SRA: PRJNA610428 (USA-Washington datasets), SRA: PRJNA612578 (USA-San-Diego dataset), SRA: PRJNA231221 (USA-Washington dataset) (Sichtig et al. 2019), SRA: PRJNA613958 (Australian-Victoria datasets), SRA: PRJNA231221 (USA-Maryland dataset), and SRA: PRJNA614995 (USA-Utah datasets). NCBI SRA BioProjectssuggested: NoneData processing: Raw reads were aligned with bowtie2 aligner (v2.2.6) (Langmead & Salzberg 2012) against SARS-CoV-2 reference genome NC_045512.2 (https://www.ncbi.nlm.nih.gov/nuccore/NC_045512), using the following parameters: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50. bowtie2suggested: (Bowtie 2, RRID:SCR_016368)Samtools v1.9 (using htslib v1.9) (Li et al. 2009) was used to sort sam files, remove duplicate reads and index bam files. bcftools v1.9 (part of the samtools framework) was used to obtain depth of coverage in each aligned sample. Samtoolssuggested: (SAMTOOLS, RRID:SCR_002105)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-
-
-
