SARS-CoV-2 surveillance in Italy through phylogenomic inferences based on Hamming distances derived from pan-SNPs, -MNPs and -InDels

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Faced with the ongoing global pandemic of coronavirus disease, the ‘National Reference Centre for Whole Genome Sequencing of microbial pathogens: database and bioinformatic analysis’ (GENPAT) formally established at the ‘Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise’ (IZSAM) in Teramo (Italy) is in charge of the SARS-CoV-2 surveillance at the genomic scale. In a context of SARS-CoV-2 surveillance requiring correct and fast assessment of epidemiological clusters from substantial amount of samples, the present study proposes an analytical workflow for identifying accurately the PANGO lineages of SARS-CoV-2 samples and building of discriminant minimum spanning trees (MST) bypassing the usual time consuming phylogenomic inferences based on multiple sequence alignment (MSA) and substitution model.

Results

GENPAT constituted two collections of SARS-CoV-2 samples. The first collection consisted of SARS-CoV-2 positive swabs collected by IZSAM from the Abruzzo region (Italy), then sequenced by next generation sequencing (NGS) and analyzed in GENPAT ( n  = 1592), while the second collection included samples from several Italian provinces and retrieved from the reference Global Initiative on Sharing All Influenza Data (GISAID) ( n  = 17,201). The main results of the present work showed that (i) GENPAT and GISAID detected the same PANGO lineages, (ii) the PANGO lineages B.1.177 (i.e. historical in Italy) and B.1.1.7 (i.e. ‘UK variant’) are major concerns today in several Italian provinces, and the new MST-based method (iii) clusters most of the PANGO lineages together, (iv) with a higher dicriminatory power than PANGO lineages, (v) and faster that the usual phylogenomic methods based on MSA and substitution model.

Conclusions

The genome sequencing efforts of Italian provinces, combined with a structured national system of NGS data management, provided support for surveillance SARS-CoV-2 in Italy. We propose to build phylogenomic trees of SARS-CoV-2 variants through an accurate, discriminant and fast MST-based method avoiding the typical time consuming steps related to MSA and substitution model-based phylogenomic inference.

Article activity feed

  1. SciScore for 10.1101/2021.05.25.21257370: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    II/ Isolation and sequencing: Concerning the samples from the first collection, acquisition of sequencing data implied successively sampling (oropharyngeal swab transport medium or bronchoalveolar lavage), virus inactivation (PrimeStore® MTM, in BSL3 biocontainment laboratory), nucleic acid purification (MagMaxTM CORE from Thermofisher), real-time RT-PCR-based SARS-CoV-2 RNA detection (TaqManTM 2019-nCoV Assay Kit v1 or v2 from Thermofisher) [23], RNA reverse transcription through multiplexing PCR (primer scheme nCoV-2019/V1) following the ARTIC protocol (https://artic.network/) [95], cDNA purification (AMPure XP beads, Agencourt), cDNA quantification (Qubit dsDNA HS Assay Kit and Qubit fluorometer 2.0 from Thermofisher or QuantiFluor ONE dsDNA System from Promega and FLUOstar OMEGA from BMG Labtech), library preparation (Illumina DNA Prep kit) and 150 bp paired-end read sequencing (MiniSeq or NextSeq500 from Illumina).
    Thermofisher
    suggested: (ThermoFisher; SL 8; Centrifuge, RRID:SCR_020809)
    MiniSeq
    suggested: None
    More precisly, we implemented a mapping-based variant calling analysis including functional variant annotations based on Trimmomatic [68], BWA [39], FreeBayes [42] and SNPeff [96] implemented in Snippy [98] because this workflow is fast and already well packaged in Docker (Figure 1).
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    BWA
    suggested: (BWA, RRID:SCR_010910)
    FreeBayes
    suggested: (FreeBayes, RRID:SCR_010761)
    SNPeff
    suggested: (SnpEff, RRID:SCR_005191)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.