A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis

Carla Mavian
Simone Marini
Mattia Prosperi
Marco Salemi

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution.

Objective

The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community.

Methods

We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020.

Results

Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations.

Conclusions

At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.

SciScore for 10.1101/2020.03.16.20034470: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Evaluation of the presence of phylogenetic signal satisfying resolved phylogenetic relationships among sequences was carried out with IQ-TREE, allowing the software to search for all possible quartets using the best-fitting nucleotide substitution model 11.	IQ-TREE suggested: (IQ-TREE, RRID:SCR_017254)
Exploration of temporal structure, i.e. presence of molecular clock in the data, was assessed by regression of divergence -root-to-tip genetic distance-against sampling time using …

SciScore for 10.1101/2020.03.16.20034470: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Evaluation of the presence of phylogenetic signal satisfying resolved phylogenetic relationships among sequences was carried out with IQ-TREE, allowing the software to search for all possible quartets using the best-fitting nucleotide substitution model 11.	IQ-TREE suggested: (IQ-TREE, RRID:SCR_017254)
Exploration of temporal structure, i.e. presence of molecular clock in the data, was assessed by regression of divergence -root-to-tip genetic distance-against sampling time using TempEst 19.	TempEst suggested: (TempEst, RRID:SCR_017304)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Version published to 10.2196/19170
Jun 1, 2020
Version published to 10.2196/preprints.19170
Apr 6, 2020
Version published to 10.1101/2020.03.16.20034470 on medRxiv
Mar 20, 2020

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

This article has 15 authors:
1. Pulchérie Pelembi
2. Philippe Colson
3. Alain Farra
4. Ornella Anne Sibiro-Demi
5. Christian Noël Malaka
6. Aurélia Kwasiborski
7. Véronique Hourdel
8. Gilles Landry Ngaya
9. Romaric Nzoumbou-Boko
10. Jean-Claude Manuguerra
11. Emmanuel Ryvalin Nakoune-Yandoko
12. Guy VERNET
13. Bernard La Scola
14. Valérie Caro
15. Alexandre Manirakiza
This article has no evaluationsLatest version Jan 19, 2026
Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

This article has 31 authors:
1. Sofia Herrera Agüero
2. Aldo Sosa
3. Alexander Martínez
4. Ambar Moreno
5. César Roberto Conde Pereira
6. Claudia Gonzalez
7. Claudio Soto Garita
8. Daniel Ulate
9. Estela Cordero-Laurent
10. Hebleen Brenes
11. Isaac Miguel Sánchez
12. Jairo Mendez-Rico
13. Jessica Góndola
14. Jose Arturo Molina-Mora
15. Juliana Leite
16. Leticia Franco
17. Linda Mendoza
18. Lionel Gresh
19. Lucia De La Cruz
20. Mitzi Castro Paz
21. Monica Barahona
22. Naomi Iihoshi
23. Oris Chavarria
24. Priscila Born
25. Ruby Melany Aguillón
26. Ruth Carolina Vasquez Cordova
27. Selene Gonzalez
28. Sofia Carolina Alvarado Silva
29. Xochitl Sandoval López
30. Yvonne Imbert
31. Francisco Duarte-Martínez
This article has no evaluationsLatest version Jan 14, 2026
DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

This article has 13 authors:
1. Claudia Carranza
2. Lucia Ortiz
3. Maria Eugenia Castellanos
4. Ana Silvia Gonzalez-Reiche
5. Renata Mendizabal-Cabrera
6. Zain Khalil
7. Adriana van DeGuchte
8. Keith Farrugia
9. Mariana Herrera
10. Ernesto Mena
11. Celia Cordon-Rosales
12. Harm van Bakel
13. Daniel R. Perez
Reviewed by Access Microbiology

This article has 3 evaluationsLatest version Feb 3, 2026Latest activity Jul 20, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA