Landscape of SARS-CoV-2 genomic surveillance, public availability extent of genomic data, and epidemic shaped by variants: a global descriptive study
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Background
Genomic surveillance has shaped our understanding of SARS-CoV-2 variants, which have proliferated globally in 2021. Characterizing global genomic surveillance, sequencing coverage, the extent of publicly available genomic data coupled with traditional epidemiologic data can provide evidence to inform SARS-CoV-2 surveillance and control strategies.
Methods
We collected country-specific data on SARS-CoV-2 genomic surveillance, sequencing capabilities, public genomic data, and aggregated publicly available variant data. We divided countries into three levels of genomic surveillance and sequencing availability based on predefined criteria. We downloaded the merged and deduplicated SARS-CoV-2 sequences from multiple public repositories, and used different proxies to estimate the sequencing coverage and public availability extent of genomic data, in addition to describing the global dissemination of variants.
Findings
Since the start of 2021, the COVID-19 global epidemic clearly featured increasing circulation of Alpha, which was rapidly replaced by the Delta variant starting around May 2021 and reaching a global prevalence of 96.6% at the end of July 2021. SARS-CoV-2 genomic surveillance and sequencing availability varied markedly across countries, with 63 countries performing routine genomic surveillance and 79 countries with high availability of SARS-CoV-2 sequencing. Less than 3.5% of confirmed SARS-CoV-2 infections were sequenced globally since September 2020, with the lowest sequencing coverage in the WHO regions of Eastern Mediterranean, South East Asia, and Africa. Across different variants, 28-52% of countries with explicit reporting on variants shared less than half of their variant sequences in public repositories. More than 60% of demographic and 95% of clinical data were absent in GISAID metadata accompanying sequences.
Interpretation
Our findings indicated an urgent need to expand sequencing capacity of virus isolates, enhance the sharing of sequences, the standardization of metadata files, and supportive networks for countries with no sequencing capability.
Research in context
Evidence before this study
On September 3, 2021, we searched PubMed for articles in any language published after January 1, 2020, using the following search terms: (“COVID-19” OR “SARS-CoV-2”) AND (“Global” OR “Region”) AND (“genomic surveillance” OR “sequencing” OR “spread”). Among 43 papers identified, few papers discussed the global diversity in genomic surveillance, sequencing, public availability of genomic data, as well as the global spread of SARS-CoV-2 variants. A paper from Furuse employed the publicly GISAID data to evaluate the SARS-CoV-2 sequencing effort by country from the perspectives of “fraction”, “timeliness”, and “openness”. Another viewpoint paper by Case Western Reserve University’s team discussed the impediments of genomic surveillance in several countries during the COVID-19 pandemic. The paper as reported by Campbell and colleagues used the GISAID data to present the global spread and estimated transmissibility of recently emerged SARS-CoV-2 variants. We also found several studies that reported the country-level genomic surveillance and spread of variants. To our knowledge, no research has quantitatively depicted the global SARS-CoV-2 genomic surveillance, sequencing ability, and public availability extent of genomic data.
Added value of this study
This study collected country-specific data on SARS-CoV-2 genomic surveillance, sequencing capabilities, public genomic data, and aggregated publicly available variant data as of 20 August 2021. We found that genomic surveillance strategies and sequencing availability is globally diverse. Less than 3.5% of confirmed SARS-CoV-2 infections were sequenced globally since September 2020. Our analysis of publicly deposited SARS-CoV-2 sequences and officially reported number of variants implied that the public availability extent of genomic data is low in some countries, and more than 60% of demographic and 95% of clinical data were absent in GISAID metadata accompanying sequences. We also described the pandemic dynamics shaped by VOCs.
Implications of all the available evidence
Our study provides a landscape for global sequencing coverage and public availability extent of sequences, as well as the evidence for rapid spread of SRAS-CoV-2 variants. The pervasive spread of Alpha and Delta variants further highlights the threat of SARS-CoV-2 mutations despite the availability of vaccines in many countries. It raised an urgent need to do more work on defining the ideal sampling schemes for different purposes (e.g., identifying new variants) with an additional call to share these data in public repositories to allow for further rapid scientific discovery.
Article activity feed
-
SciScore for 10.1101/2021.09.06.21263152: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Notably, several countries, many of which are classified as low- or lower middle-income countries by the World Bank, lack genomic surveillance data, likely due to limitations in infrastructure capacity and resources. However, even some countries classified …
SciScore for 10.1101/2021.09.06.21263152: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Notably, several countries, many of which are classified as low- or lower middle-income countries by the World Bank, lack genomic surveillance data, likely due to limitations in infrastructure capacity and resources. However, even some countries classified as high-income, have suffered from a slow and inconsistent process of adopting genomics-based surveillance34. Establishment of reference laboratories and networks to provide sequencing services for countries without established sequencing capacity may enable improved detection and tracking of emerging variants worldwide. The detection of most variants relies on the full-length or partial genomic sequencing, but the sequences only become available for the global community when the laboratories have established sequencing capacity, willing to share, and legally allowed to upload them. The discrepancies in sharing was observed in each region, which confirmed that some countries are sequencing but are not uploading. However, our study observed a sharing extent of exceed 100% exists in some countries, likely due to delays in the official reporting of sequencing results, or the incomplete official reporting system. The timely sharing of those enables to adequately contextualize local data when looking at introductions and examine transmission routes, as well as to look for sites of repeated mutations that can guide laboratory work on characterizing those mutations effects on therapeutics and vaccine efficacy. The underlying reaso...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-