Phylodynamics of SARS-CoV-2 transmission in Spain

This article has been Reviewed by the following groups

Read the full article

Abstract

Objectives

SARS-CoV-2 whole-genome analysis has identified three large clades spreading worldwide, designated G, V and S. This study aims to analyze the diffusion of SARS-CoV-2 in Spain/Europe.

Methods

Maximum likelihood phylogenetic and Bayesian phylodynamic analyses have been performed to estimate the most probable temporal and geographic origin of different phylogenetic clusters and the diffusion pathways of SARS-CoV-2.

Results

Phylogenetic analyses of the first 28 SARS-CoV-2 whole genome sequences obtained from patients in Spain revealed that most of them are distributed in G and S clades (13 sequences in each) with the remaining two sequences branching in the V clade. Eleven of the Spanish viruses of the S clade and six of the G clade grouped in two different monophyletic clusters (S-Spain and G-Spain, respectively), with the S-Spain cluster also comprising 8 sequences from 6 other countries from Europe and the Americas. The most recent common ancestor (MRCA) of the SARS-CoV-2 pandemic was estimated in the city of Wuhan, China, around November 24, 2019, with a 95% highest posterior density (HPD) interval from October 30-December 17, 2019. The origin of S-Spain and G-Spain clusters were estimated in Spain around February 14 and 18, 2020, respectively, with a possible ancestry of S-Spain in Shanghai.

Conclusions

Multiple SARS-CoV-2 introductions have been detected in Spain and at least two resulted in the emergence of locally transmitted clusters, with further dissemination of one of them to at least 6 other countries. These results highlight the extraordinary potential of SARS-CoV-2 for rapid and widespread geographic dissemination.

Article activity feed

  1. SciScore for 10.1101/2020.04.20.050039: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Sequences were aligned using MAFFT software and sequences were manually edited using AliView v1.26 [12, 13].
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    AliView
    suggested: (AliView, RRID:SCR_002780)
    Phylogenetic and evolutionary analyses: Phylogenies of large alignments were inferred by FastTree software v2.1.11 [14].
    FastTree
    suggested: (FastTree, RRID:SCR_015501)
    Root- to-tip genetic distances against sample collection dates were measured with TempEst v1.5.1 and Bayesian time-scaled phylogenetic analyses were performed with BEAST v1.10.4 to estimate the date and location of the most recent common ancestors (MRCA) as well as to estimate the rate of evolution of the virus [15, 16].
    TempEst
    suggested: (TempEst, RRID:SCR_017304)
    BEAST
    suggested: (BEAST, RRID:SCR_010228)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The main limitations of the present study are related to the fact that genomic sequences are being generated by diverse strategies following different steps that could affect the quality of the sequences. Different sample preparation techniques are being used, including overlapping amplicons, targeted capture where the viral RNA is enriched and metagenomic total RNA sequencing of rRNA depleted samples. The first two methods require less sequencing effort, but the possibility that some RNA molecules could be missed cannot be ruled out. On the contrary, the metagenomic approach is hypothesis-free, but implies a high number of sequencing reads. Another point to take into account is the sequencing strategies per se, since several approaches are being used, including Sanger sequencing and next generation sequencing platforms, such as iSeq, MiSeq, NextSeq and Novaseq from Illumina, MinION and GridION from Nanopore and IonTorrent from ThermoFisher [6]. All these technologies also have their own biases. Finally, the informatics employed to analyze the data is the step where more diversity of options are being identified. For all these reasons, some of the genetic differences observed between samples could be attributable to the error rate of sequencing platforms, indicating that genomes may be more similar than observed. On the other hand, the use of a reference genome to align the reads instead of following a de novo approach could mask some real genetic differences. In this sense, ...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 11. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.