Genomic epidemiology of SARS-CoV-2 in Esteio, Rio Grande do Sul, Brazil

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Brazil is the third country most affected by Coronavirus disease-2019 (COVID-19), but viral evolution in municipality resolution is still poorly understood in Brazil and it is crucial to understand the epidemiology of viral spread. We aimed to track molecular evolution and spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Esteio (Southern Brazil) using phylogenetics and phylodynamics inferences from 21 new genomes in global and regional context. Importantly, the case fatality rate (CFR) in Esteio (3.26%) is slightly higher compared to the Rio Grande do Sul (RS) state (2.56%) and the entire Brazil (2.74%).

Results

We provided a comprehensive view of mutations from a representative sampling from May to October 2020, highlighting two frequent mutations in spike glycoprotein (D614G and V1176F), an emergent mutation (E484K) in spike Receptor Binding Domain (RBD) characteristic of the B.1.351 and P.1 lineages, and the adjacent replacement of 2 amino acids in Nucleocapsid phosphoprotein (R203K and G204R). E484K was found in two genomes from mid-October, which is the earliest description of this mutation in Southern Brazil. Lineages containing this substitution must be subject of intense surveillance due to its association with immune evasion. We also found two epidemiologically-related clusters, including one from patients of the same neighborhood. Phylogenetics and phylodynamics analysis demonstrates multiple introductions of the Brazilian most prevalent lineages (B.1.1.33 and B.1.1.248) and the establishment of Brazilian lineages ignited from the Southeast to other Brazilian regions.

Conclusions

Our data show the value of correlating clinical, epidemiological and genomic information for the understanding of viral evolution and its spatial distribution over time. This is of paramount importance to better inform policy making strategies to fight COVID-19.

Article activity feed

  1. SciScore for 10.1101/2021.01.21.21249906: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: Ethics statement: Ethical approval was obtained from the Brazilian’s National Ethics Committee (Comissão Nacional de Ética em Pesquisa — CONEP) under process number 30934020.5.0000.0008.
    RandomizationLatitudes and longitudes were attributed to a randomly selected point next to the center of each region or continent.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Briefly, quality control was performed FastQC v0.11.9 and low-quality reads and adapters were removed using Trimmomatic v0.39 (Bolger et al. 2014).
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    PCR duplicates were discarded using Picard MarkDuplicates v2.23.8 (https://broadinstitute.github.io/picard/) with REMOVE_DUPLICATES=true.
    Picard
    suggested: (Picard, RRID:SCR_006525)
    Reads were mapped to the reference SARS-CoV-2 genome (GenBank accession number NC_045512.2) using Burrows–Wheeler Aligner (BWA-MEM) v0.7.17 (Li and Durbin 2009) and unmapped reads were discarded.
    BWA-MEM
    suggested: (Sniffles, RRID:SCR_017619)
    Coverage values for each genome were calculated using bedtools v2.26.0 (Quinlan 2014) and plotted using the karyoploteR v1.12.4 package (Gel and Serra 2017).
    bedtools
    suggested: (BEDTools, RRID:SCR_006646)
    Virome analysis: As the respiratory panel kit used allows the detection of ∼40 respiratory viral pathogens, the viral composition of each sample was verified using Kaiju v1.7.3 (Menzel et al. 2016) and Kraken v2.0.7-beta (Wood et al. 2019) against a reference database of viral sequences.
    Kraken
    suggested: (Kraken, RRID:SCR_005484)
    Mutation analysis: Sequence positions in this work refer to GenBank RefSeq sequence NC_045512.2, a genome isolated and sequenced from Wuhan (China), early in the pandemic.
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    SNPs and insertions/deletions (INDELs) were assessed in each sample by using snippy variant calling and core genome alignment pipeline v4.6.0 (https://github.com/tseemann/snippy), which uses FreeBayes v1.3.2 (Garrison and Marth 2012) variant caller and snpEff v5.0 (Cingolani et al. 2012) to annotate and predict the effects of variants on genes and proteins.
    FreeBayes
    suggested: (FreeBayes, RRID:SCR_010761)
    snpEff
    suggested: (SnpEff, RRID:SCR_005191)
    Briefly, this pipeline uses the augur toolkit to (i) exclude short and low quality sequences or those with incomplete sampling date; (ii) align filtered sequences using MAFFT v7.471 (Katoh and Standley 2013); mask uninformative sites from the alignment; (iv) perform context subsampling using genetically closely-related genomes to our focal subset prioritizing sequences geographically closer to RS state, Brazil; (v) build maximum likelihood (ML) phylogenetics tree using IQ-TREE v2.0.3, employing the best-fit model of nucleotide substitution as indicated by ModelFinder (Nguyen et al. 2015); (vi) generate a time-scaled tree resolving polytomies and internal nodes with TreeTime v0.7.6, and under a strict clock under a skyline coalescent prior with a rate of 8×10−4 substitutions per site per year (Sagulenko et al. 2018); (vii) label clades, assign mutations and infer geographic movements; and (viii) export results to JSON format to enable interactive visualization through auspice.us.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)
    The ML tree was inspected in TempEst v1.5.3 (Rambaut et al. 2016) to investigate the temporal signal through regression of root-to-tip genetic divergence against sampling dates.
    TempEst
    suggested: (TempEst, RRID:SCR_017304)
    Phylodynamics and phylogeographic analysis: All global sequences (until December 24, 2020) belonging to lineages B.1.1.248 (n=405) and B.1.1.33 (n=725), found in high frequency in this study, were recovered from the filtered MAFFT alignment performed inside Nextstrain ncov pipeline in the previous step.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The TMRCA and the spatial diffusion of these important circulating lineages through Brazil were separately estimated for each lineage using a Bayesian Markov Chain Monte Carlo (MCMC) approach as implemented in BEAST v1.10.4 (Suchard et al. 2018), using the BEAGLE library v3 (Ayres et al. 2012) to save computational time.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    BEAGLE
    suggested: (BEAGLE, RRID:SCR_001789)
    Two MCMC chains were run for at least 120 million generations and convergence of the MCMC chains was inspected using Tracer v1.7.1 (Rambaut et al. 2018).
    Tracer
    suggested: (Tracer, RRID:SCR_019121)
    MCC trees were visualized using FigTree v1.4 (http://tree.bio.ed.ac.uk/software/figtree/) and additional annotations were performed in ggtree R package v2.0.4 (Yu et al. 2017).
    FigTree
    suggested: (FigTree, RRID:SCR_008515)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Some limitations should be considered. Firstly, it was not possible to analyze a larger sample size. Moreover, the low quantity of sequences from the RS state to contextualize our sequences limited the inference of events of introduction and movement of the virus with municipal and state resolution. Still in this respect, we have observed a dramatic drop in the sequencing efforts from Brazil after April (Candido et al. 2020b), which made it difficult to measure the main circulating lineages in the country during our investigation period (May-October, 2020) and may introduce confounding factors. Nevertheless, our results provide a comprehensive view of viral mutations from a time- and age-representative sample from May to October 2020, highlighting two frequent mutations in Spike glycoprotein (D614G and V1176F), an emergent mutation in Spike RBD (E484K) characteristic of the South African lineage B.1.351, and the adjacent replacement of 2 amino acids in Nucleocapsid phosphoprotein (R203K and G204R). A significant viral diversity was evidenced by the absence of identical isolates in our samples. Furthermore, we identified patterns of SARS-CoV-2 viral diversity inside Southern Brazil, demonstrating the major role of community transmission in viral spreading and the establishment of Brazilian lineages. This fact was demonstrated by the dominance of lineages B.1.1.248 and B.1.1.33, widely distributed throughout the Brazilian states and with very low occurrence in other countries. ...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.