Genomic epidemiology of SARS-CoV-2 reveals multiple lineages and early spread of SARS-CoV-2 infections in Lombardy, Italy
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
From February to April 2020, Lombardy (Italy) reported the highest numbers of SARS-CoV-2 cases worldwide. By analyzing 346 whole SARS-CoV-2 genomes, we demonstrate the presence of seven viral lineages in Lombardy, frequently sustained by local transmission chains and at least two likely to have originated in Italy. Six single nucleotide polymorphisms (five of them non-synonymous) characterized the SARS-CoV-2 sequences, none of them affecting N-glycosylation sites. The seven lineages, and the presence of local transmission clusters within three of them, revealed that sustained community transmission was underway before the first COVID-19 case had been detected in Lombardy.
Article activity feed
-
-
SciScore for 10.1101/2020.07.19.20152322: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The mapping of cleaned reads was performed against the GenBank reference genome NC_045512.2 (Wuhan, collection date: December 2019) using BWA-mem4, and consensus sequences were generated using samtools 1.10.5 Single nucleotide polymorphisms (SNP variants) were called through a pipeline based on samtools/bcftools6, and all SNPs having a minimum supporting read frequency of 40% with a depth ≥100 were retained in the consensus sequence. BWA-mem4suggested: Nonesamtoolssuggested: (SAMTOOLS, RRID:SCR_002105)Sequences were aligned using ClustalX and manually inspected in Bioedit. Bioeditsuggested…SciScore for 10.1101/2020.07.19.20152322: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The mapping of cleaned reads was performed against the GenBank reference genome NC_045512.2 (Wuhan, collection date: December 2019) using BWA-mem4, and consensus sequences were generated using samtools 1.10.5 Single nucleotide polymorphisms (SNP variants) were called through a pipeline based on samtools/bcftools6, and all SNPs having a minimum supporting read frequency of 40% with a depth ≥100 were retained in the consensus sequence. BWA-mem4suggested: Nonesamtoolssuggested: (SAMTOOLS, RRID:SCR_002105)Sequences were aligned using ClustalX and manually inspected in Bioedit. Bioeditsuggested: (BioEdit, RRID:SCR_007361)In order to obtain a corresponding time-scaled maximum clade credibility tree, a Bayesian coalescent tree analysis was undertaken with BEAST v1.10.4,11 using the HKY+Q4 substitution model with gamma-distributed rate variation with an exponential population growth tree prior and an uncorrelated relaxed molecular clock, under a noninformative continuous-time Markov chain (CTMC) reference prior12. BEASTsuggested: (BEAST, RRID:SCR_010228)The maximum clade credibility (MCC) tree was inferred from the Bayesian posterior tree distribution using TreeAnnotator, and visualized with FigTree 1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). FigTreesuggested: (FigTree, RRID:SCR_008515)Statistical Analysis: Data were analyzed using Rgui and the statistical software package SPSS (v32.0; SPSS Inc., Chicago, IL). SPSSsuggested: (SPSS, RRID:SCR_002865)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Our study may have some limitations. The analysis of phylogenetic structures during such an early phase of the pandemic should be interpreted carefully, as the number of mutations that define phylogenetic lineages is small and may be similar to the rate of potential errors introduced during reverse transcription, PCR amplification, or sequencing.22 To overcome these problems, Bayesian approach, known to be a powerful way to estimate species divergence, and thus expected to provide more robust results, was applied. Moreover, the integration of host characteristics (such as geographical location, collection date and clinical manifestations) aided phylogenetic interpretation. Moreover, the intra-host variability of SARS-CoV-2, and the role of potential existing minority variants has not been investigated here. Initial evidences suggest that intra-host variation of SARS-CoV-2 can be frequently found among clinical samples (median number of intra-host variants: 1–4), but at the same time these variants were not observed in the population as polymorphisms, probably suggesting a bottleneck or purifying selection involved.23,24 Thus, ad hoc designed studies are necessary to provide an extensive overview of SARS-CoV-2 intra-host variability and minority variants description, if and how these minority variants can spread in the population, and their potential role in virulence and transmissibility. In the peak of epidemic, SARS-CoV-2 diagnosis was mainly addressed to symptomatic cases ...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-