SARS-CoV-2 sequence typing, evolution and signatures of selection using CoVa, a Python-based command-line utility
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
The current global pandemic COVID-19, caused by SARS-CoV-2, has resulted in millions of infections worldwide in a few months. Global efforts to tackle this situation have produced a tremendous body of genomic data, which can be used for tracing transmission routes, characterization of isolates, and monitoring variants with potential for unusual virulence. Several groups have analyzed these genomes using different approaches. However, as new data become available, the research community needs a pipeline to perform a set of routine analyses, that can quickly incorporate new genome sequences and update the analysis reports. We developed a programmatic tool, CoVa, with this objective. It is a fast, accurate and user-friendly utility to perform a variety of genome analyses on hundreds of SARS-CoV-2 sequences. Using CoVa, we define a modified sequence typing nomenclature and identify sites under positive selection. Further analysis identified some peptides and sites showing geographical patterns of selection. Specifically, we show differences in sequence type distribution between sequences from India and those from the rest of the world. We also show that several sites show signatures of positive selection uniquely in sequences from India. Preliminary evolutionary analysis, using features that will be incorporated into CoVa in the near future, show a mutation rate of 7.4 × 10 −4 substitutions/site/year, confirm a temporal signal with a November 2019 origin of SARS-CoV-2, and a heterogeneity in the geographical distribution of Indian samples.
Article activity feed
-
-
SciScore for 10.1101/2020.06.09.082834: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources NCBI Refseq accession NC_045512 is used as the variant calling reference in the pipeline. Refseqsuggested: (RefSeq, RRID:SCR_003496)Similarly, CoVa limits split-support computation in FastTree to 100 runs for both speed and memory optimization without compromising on accuracy. FastTreesuggested: (FastTree, RRID:SCR_015501)One of the key advantages of using MAFFT in CoVa is its ability to quickly incorporate new sequences to an existing MSA (6). CoVasuggested: (COVA, RRID:SCR_005175)Evolution of SARS-CoV-2: Two multiple sequence alignments built using - 1) only Indian samples and 2) samples … SciScore for 10.1101/2020.06.09.082834: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources NCBI Refseq accession NC_045512 is used as the variant calling reference in the pipeline. Refseqsuggested: (RefSeq, RRID:SCR_003496)Similarly, CoVa limits split-support computation in FastTree to 100 runs for both speed and memory optimization without compromising on accuracy. FastTreesuggested: (FastTree, RRID:SCR_015501)One of the key advantages of using MAFFT in CoVa is its ability to quickly incorporate new sequences to an existing MSA (6). CoVasuggested: (COVA, RRID:SCR_005175)Evolution of SARS-CoV-2: Two multiple sequence alignments built using - 1) only Indian samples and 2) samples across the globe (excluding Indian samples) were merged together as a single multiple sequence alignment (MSA) using the mafft --merge option (MAFFT reference). MAFFTsuggested: (MAFFT, RRID:SCR_011811)This pipeline was created using python 2.7. pythonsuggested: (IPython, RRID:SCR_001658)We used TempEst (12) to find the root of the tree such that it optimised for the temporal signal by trying all possible roots and chose the one that minimised the mean of the square of the residuals. TempEstsuggested: (TempEst, RRID:SCR_017304)Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 12. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-