SARS-CoV-2 surveillance in Italy through phylogenomic inferences based on Hamming distances derived from pan-SNPs, -MNPs and -InDels

Adriano Di Pasquale
Nicolas Radomski
Iolanda Mangone
Paolo Calistri
Alessio Lorusso
Cesare Cammà

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Background

Faced with the ongoing global pandemic of coronavirus disease, the ‘National Reference Centre for Whole Genome Sequencing of microbial pathogens: database and bioinformatic analysis’ (GENPAT) formally established at the ‘Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise’ (IZSAM) in Teramo (Italy) is in charge of the SARS-CoV-2 surveillance at the genomic scale. In a context of SARS-CoV-2 surveillance requiring correct and fast assessment of epidemiological clusters from substantial amount of samples, the present study proposes an analytical workflow for identifying accurately the PANGO lineages of SARS-CoV-2 samples and building of discriminant minimum spanning trees (MST) bypassing the usual time consuming phylogenomic inferences based on multiple sequence alignment (MSA) and substitution model.

Results

GENPAT constituted two collections of SARS-CoV-2 samples. The first collection consisted of SARS-CoV-2 positive swabs collected by IZSAM from the Abruzzo region (Italy), then sequenced by next generation sequencing (NGS) and analyzed in GENPAT ( n = 1592), while the second collection included samples from several Italian provinces and retrieved from the reference Global Initiative on Sharing All Influenza Data (GISAID) ( n = 17,201). The main results of the present work showed that (i) GENPAT and GISAID detected the same PANGO lineages, (ii) the PANGO lineages B.1.177 (i.e. historical in Italy) and B.1.1.7 (i.e. ‘UK variant’) are major concerns today in several Italian provinces, and the new MST-based method (iii) clusters most of the PANGO lineages together, (iv) with a higher dicriminatory power than PANGO lineages, (v) and faster that the usual phylogenomic methods based on MSA and substitution model.

Conclusions

The genome sequencing efforts of Italian provinces, combined with a structured national system of NGS data management, provided support for surveillance SARS-CoV-2 in Italy. We propose to build phylogenomic trees of SARS-CoV-2 variants through an accurate, discriminant and fast MST-based method avoiding the typical time consuming steps related to MSA and substitution model-based phylogenomic inference.

Version published to 10.1186/s12864-021-08112-0
Oct 30, 2021

SciScore for 10.1101/2021.05.25.21257370: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms

Sentences

Resources

II/ Isolation and sequencing: Concerning the samples from the first collection, acquisition of sequencing data implied successively sampling (oropharyngeal swab transport medium or bronchoalveolar lavage), virus inactivation (PrimeStore® MTM, in BSL3 biocontainment laboratory), nucleic acid purification (MagMaxTM CORE from Thermofisher), real-time RT-PCR-based SARS-CoV-2 RNA detection (TaqManTM 2019-nCoV Assay Kit v1 or v2 from Thermofisher) [23], RNA reverse transcription through multiplexing PCR (primer scheme nCoV-2019/V1) following the ARTIC protocol (https://artic.network/) [95], …

SciScore for 10.1101/2021.05.25.21257370: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
II/ Isolation and sequencing: Concerning the samples from the first collection, acquisition of sequencing data implied successively sampling (oropharyngeal swab transport medium or bronchoalveolar lavage), virus inactivation (PrimeStore® MTM, in BSL3 biocontainment laboratory), nucleic acid purification (MagMaxTM CORE from Thermofisher), real-time RT-PCR-based SARS-CoV-2 RNA detection (TaqManTM 2019-nCoV Assay Kit v1 or v2 from Thermofisher) [23], RNA reverse transcription through multiplexing PCR (primer scheme nCoV-2019/V1) following the ARTIC protocol (https://artic.network/) [95], cDNA purification (AMPure XP beads, Agencourt), cDNA quantification (Qubit dsDNA HS Assay Kit and Qubit fluorometer 2.0 from Thermofisher or QuantiFluor ONE dsDNA System from Promega and FLUOstar OMEGA from BMG Labtech), library preparation (Illumina DNA Prep kit) and 150 bp paired-end read sequencing (MiniSeq or NextSeq500 from Illumina).	Thermofisher suggested: (ThermoFisher; SL 8; Centrifuge, RRID:SCR_020809) MiniSeq suggested: None
More precisly, we implemented a mapping-based variant calling analysis including functional variant annotations based on Trimmomatic [68], BWA [39], FreeBayes [42] and SNPeff [96] implemented in Snippy [98] because this workflow is fast and already well packaged in Docker (Figure 1).	Trimmomatic suggested: (Trimmomatic, RRID:SCR_011848) BWA suggested: (BWA, RRID:SCR_010910) FreeBayes suggested: (FreeBayes, RRID:SCR_010761) SNPeff suggested: (SnpEff, RRID:SCR_005191)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Version published to 10.1101/2021.05.25.21257370 on medRxiv
May 25, 2021

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

This article has 31 authors:
1. Sofia Herrera Agüero
2. Aldo Sosa
3. Alexander Martínez
4. Ambar Moreno
5. César Roberto Conde Pereira
6. Claudia Gonzalez
7. Claudio Soto Garita
8. Daniel Ulate
9. Estela Cordero-Laurent
10. Hebleen Brenes
11. Isaac Miguel Sánchez
12. Jairo Mendez-Rico
13. Jessica Góndola
14. Jose Arturo Molina-Mora
15. Juliana Leite
16. Leticia Franco
17. Linda Mendoza
18. Lionel Gresh
19. Lucia De La Cruz
20. Mitzi Castro Paz
21. Monica Barahona
22. Naomi Iihoshi
23. Oris Chavarria
24. Priscila Born
25. Ruby Melany Aguillón
26. Ruth Carolina Vasquez Cordova
27. Selene Gonzalez
28. Sofia Carolina Alvarado Silva
29. Xochitl Sandoval López
30. Yvonne Imbert
31. Francisco Duarte-Martínez
This article has no evaluationsLatest version Jan 14, 2026
Regional prospective whole-genome sequencing surveillance of ESBL-producing Escherichia coli and Klebsiella pneumoniae in the Netherlands: a multicentre study on nosocomial and interhospital transmission

This article has 9 authors:
1. Julinha M. Thelen
2. Veronica A.T.C. Weterings
3. Andreas L.E. van Arkel
4. Wouter van den Bijllaardt
5. Jean-Luc Murk
6. Jeroen Tjhie
7. Jaco J. Verweij
8. Bas Wintermans
9. Joep J.J.M. Stohr
This article has no evaluationsLatest version Jan 7, 2026
Two years of genomic surveillance capacity development in Guinea: an operational roadmap for local implementation in low-income countries and tracking of SARS-CoV-2 circulation dynamics

This article has 46 authors:
1. Magassouba Magassouba
2. Emanuele Gustani-Buss
3. Kékoura Ifono
4. Emily Victoria Nelson
5. Jacob Camara
6. Annibaldis Giuditta
7. Annick Renevey
8. Julia Hinzmann
9. Mette Hinrichs
10. Sarah Ryter
11. Ehizojie Emua
12. Saa Lucien Millimono
13. Eugene Kolie
14. Moussa Condé
15. Bakary Sylla
16. Nourdine Ibrahim
17. Stephane Mely
18. Hugo Soubrier
19. Joëlle Goüy de Bellocq
20. Beatriz Escudero-Pérez
21. Laura N. Cuypers
22. Elodie Moissonnier
23. Lien De Caluwé
24. Jonas Müller
25. Anke Thielebein
26. Alexandru Tomazatos
27. Christine Jacobsen
28. Meike Pahlmann
29. Beate Becker-Ziaja
30. Cyril Erameh
31. Sylvanus Okogbenin
32. Fara Raymond Koundouno
33. Youssouf Sidibé
34. Kaba Keïta
35. Mamadou Boye Keita
36. Gianluca Loi
37. Moke Fundji Jean Marie Kipela
38. Georges Alfred Ki-Zerbo
39. Seydou Dia
40. Philippe Lemey
41. Stephan Günther
42. Camara
43. Barré Soropogui
44. Liana Eleni Kafetzopoulou
45. Sanaba Boumbaly
46. Sophie Duraffour
This article has no evaluationsLatest version Jan 22, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Related articles

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

Regional prospective whole-genome sequencing surveillance of ESBL-producing Escherichia coli and Klebsiella pneumoniae in the Netherlands: a multicentre study on nosocomial and interhospital transmission

Two years of genomic surveillance capacity development in Guinea: an operational roadmap for local implementation in low-income countries and tracking of SARS-CoV-2 circulation dynamics