Genomic Epidemiology of SARS-CoV-2 in Pakistan

Abstract

COVID-19 has swept globally and Pakistan is no exception. To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan, we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1, 2020. We identified a total of 347 mutated positions, 31 of which were over-represented in Pakistan. Meanwhile, we found over 1000 intra-host single-nucleotide variants (iSNVs). Several of them occurred concurrently, indicating possible interactions among them or coevolution. Some of the high-frequency iSNVs in Pakistan were not observed in the global population, suggesting strong purifying selections. The genomic epidemiology revealed five distinctive spreading clusters. The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure, indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation (G8371T in ORF1ab) of this cluster. Furthermore, 28 putative international introductions were identified, several of which are consistent with the epidemiological investigations. In all, this study has inferred the possible pathways of introductions and transmissions of SARS-CoV-2 in Pakistan, which could aid ongoing and future viral surveillance and COVID-19 control.

Article activity feed

SciScore for 10.1101/2021.06.24.21255875: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Reads with high mapping quality (MQ > 25) were retained by SAMtools(14), and duplicated reads were marked with MarkDuplicates package in Genome Analysis Toolkit (GATK)(15).	Genome Analysis Toolkit suggested: None
Genomic variants were identified using uniquely mapped reads by HaplotypeCaller package in GATK.	HaplotypeCaller suggested: None GATK suggested: (GATK, RRID:SCR_001876)
Detection of intra-host variations: To identify the intra-host variant, mpileup files were generated by samtools v1.8 and then parsed by VarScan v2.3.9 along with an in-house script to identify intra-host variants.	samtoo…

SciScore for 10.1101/2021.06.24.21255875: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Reads with high mapping quality (MQ > 25) were retained by SAMtools(14), and duplicated reads were marked with MarkDuplicates package in Genome Analysis Toolkit (GATK)(15).	Genome Analysis Toolkit suggested: None
Genomic variants were identified using uniquely mapped reads by HaplotypeCaller package in GATK.	HaplotypeCaller suggested: None GATK suggested: (GATK, RRID:SCR_001876)
Detection of intra-host variations: To identify the intra-host variant, mpileup files were generated by samtools v1.8 and then parsed by VarScan v2.3.9 along with an in-house script to identify intra-host variants.	samtools suggested: (SAMTOOLS, RRID:SCR_002105) VarScan suggested: (VARSCAN, RRID:SCR_006849)
All intra-host variants identified had to satisfy the following criteria: (1) sequencing depth ≥ 100, (2) minor allele frequency ≥ 5%, (3) minor allele frequency ≥ 2% on each strand, (4) minor allele counts ≥ 10 on each strand, (5) strand bias of the minor allele < 10, (6) minor allele was supported by the inner part of the read (excluding 10 base pairs on each end), and (7) minor allele was supported by ≥ 10 reads that mapped exclusively to the genome of Betacoronavirus by Kraken v2.0.8-beta on each strand.	Kraken suggested: (Kraken, RRID:SCR_005484)
Multiple sequence alignment was performed with MUSCLE v 3.8.31(20), and the UTR sequences of all sequences were truncated based on nucleotide coordinates of the reference genome (GenBank: MN908947.3)(12).	MUSCLE suggested: (MUSCLE, RRID:SCR_011812)

Results from OddPub: Thank you for sharing your data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Pulchérie Pelembi
Philippe Colson
Alain Farra
Ornella Anne Sibiro-Demi
Christian Noël Malaka
Aurélia Kwasiborski
Véronique Hourdel
Gilles Landry Ngaya
Romaric Nzoumbou-Boko
Jean-Claude Manuguerra
Emmanuel Ryvalin Nakoune-Yandoko
Guy VERNET
Bernard La Scola
Valérie Caro
Alexandre Manirakiza

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Claudia Carranza
Lucia Ortiz
Maria Eugenia Castellanos
Ana Silvia Gonzalez-Reiche
Renata Mendizabal-Cabrera
Zain Khalil
Adriana van DeGuchte
Keith Farrugia
Mariana Herrera
Ernesto Mena
Celia Cordon-Rosales
Harm van Bakel
Daniel R. Perez

Reviewed by Access Microbiology

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

Sofia Herrera Agüero
Aldo Sosa
Alexander Martínez
Ambar Moreno
César Roberto Conde Pereira
Claudia Gonzalez
Claudio Soto Garita
Daniel Ulate
Estela Cordero-Laurent
Hebleen Brenes
Isaac Miguel Sánchez
Jairo Mendez-Rico
Jessica Góndola
Jose Arturo Molina-Mora
Juliana Leite
Leticia Franco
Linda Mendoza
Lionel Gresh
Lucia De La Cruz
Mitzi Castro Paz
Monica Barahona
Naomi Iihoshi
Oris Chavarria
Priscila Born
Ruby Melany Aguillón
Ruth Carolina Vasquez Cordova
Selene Gonzalez
Sofia Carolina Alvarado Silva
Xochitl Sandoval López
Yvonne Imbert
Francisco Duarte-Martínez

Genomic Epidemiology of SARS-CoV-2 in Pakistan

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts