Computational Analysis of SARS-CoV-2 and SARS-Like Coronavirus Diversity in Human, Bat and Pangolin Populations

Abstract

In 2019, a novel coronavirus, SARS-CoV-2/nCoV-19, emerged in Wuhan, China, and has been responsible for the current COVID-19 pandemic. The evolutionary origins of the virus remain elusive and understanding its complex mutational signatures could guide vaccine design and development. As part of the international “CoronaHack” in April 2020, we employed a collection of contemporary methodologies to compare the genomic sequences of coronaviruses isolated from human (SARS-CoV-2; n = 163), bat (bat-CoV; n = 215) and pangolin (pangolin-CoV; n = 7) available in public repositories. We have also noted the pangolin-CoV isolate MP789 to bare stronger resemblance to SARS-CoV-2 than other pangolin-CoV. Following de novo gene annotation prediction, analyses of gene–gene similarity network, codon usage bias and variant discovery were undertaken. Strong host-associated divergences were noted in ORF3a, ORF6, ORF7a, ORF8 and S, and in codon usage bias profiles. Last, we have characterised several high impact variants (in-frame insertion/deletion or stop gain) in bat-CoV and pangolin-CoV populations, some of which are found in the same amino acid position and may be highlighting loci of potential functional relevance.

SciScore for 10.1101/2020.11.24.391763: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
In this regard, for defining genes, we first employed PROKKA (Rapid Prokaryotic Genome Annotation) to curate the genes for each of the coronavirus genomes.	PROKKA suggested: (Prokka, RRID:SCR_014732)
Prodigal is an unsupervised ab initio prediction method and therefore does not rely on previous knowledge to predict ORFs, which, unlike sequence homology based tools such as BLAST, does not require previously annotated sequence data to identify potential genes within novel genomes.	Prod…

SciScore for 10.1101/2020.11.24.391763: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
In this regard, for defining genes, we first employed PROKKA (Rapid Prokaryotic Genome Annotation) to curate the genes for each of the coronavirus genomes.	PROKKA suggested: (Prokka, RRID:SCR_014732)
Prodigal is an unsupervised ab initio prediction method and therefore does not rely on previous knowledge to predict ORFs, which, unlike sequence homology based tools such as BLAST, does not require previously annotated sequence data to identify potential genes within novel genomes.	Prodigal suggested: (Prodigal, RRID:SCR_011936)
Howeber, to overcome the limitations and intricacies of contemporary ab initio genome annotation techniques, BLAST was used to identify additional genes with strong homology to those present in the SARS-CoV-2 reference genome released by Ensembl v100 (SARS-CoV-2 ref) ASM985889v3 [15](https://covid-19.ensembl.org).	Ensembl suggested: (Ensembl, RRID:SCR_002344)
Clustal Omega 1.2.4 [70] was used to perform a multiple sequence alignment for each of the genomes with default parameters.	Clustal Omega suggested: (Clustal Omega, RRID:SCR_001591)
The phylogenetic tree was inferred from the multiple sequence alignment with RAxML [71] using default parameters apart from the GTRGAMMA option and bootstrapping set to 20.	RAxML suggested: (RAxML, RRID:SCR_006086)
Gene sequence output of the PROKKA and BLAST searches (where correct frame was present) were collated and BLAST searched against the SARS-CoV-2 ref genes; genes that have a BLAST result were included and annotated with the SARS-CoV-2 gene.	BLAST suggested: (BLASTX, RRID:SCR_001653)
Custom Python scripts (available on Github (https://github.com/coronahack2020/final_paper.git) were used to summarise the frequencies of each of the codons.	Python suggested: (IPython, RRID:SCR_001658)
Haplotype aware variant consequences were generated using VEP (Variant Effect Predictor) [78] [79]) and BCFtools/csq [80].	Variant Effect Predictor suggested: None
After trimming the raw reads using Trimmomatic v.	Trimmomatic suggested: (Trimmomatic, RRID:SCR_011848)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Computational Analysis of SARS-CoV-2 and SARS-Like Coronavirus Diversity in Human, Bat and Pangolin Populations

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

SARS-CoV2 and Anti-COVID-19 mRNA Vaccines: Is There a Plausible Mechanistic Link with Cancer?

Identification and characterization of a SARS-CoV-2 M ^pro G23 deletion ensitrelvir-resistant mutant

SARS-CoV-2 cross-reactive B-cells outnumber seasonal coronavirus spike-specific clones at the end of the COVID-19 pandemic

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

SARS-CoV2 and Anti-COVID-19 mRNA Vaccines: Is There a Plausible Mechanistic Link with Cancer?

Identification and characterization of a SARS-CoV-2 M pro G23 deletion ensitrelvir-resistant mutant

SARS-CoV-2 cross-reactive B-cells outnumber seasonal coronavirus spike-specific clones at the end of the COVID-19 pandemic

Identification and characterization of a SARS-CoV-2 M ^pro G23 deletion ensitrelvir-resistant mutant