Sequence Analysis for SNP Detection and Phylogenetic Reconstruction of SARS-CoV-2 Isolated from Nigerian COVID-19 Cases

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Background

Coronaviruses are a group of viruses that belong to the Family Coronaviridae, Genus Betacoronavirus . In December 2019, a new coronavirus disease (COVID-19) characterized by severe respiratory symptoms was discovered. The causative pathogen was a novel coronavirus known as 2019-nCoV and later as SARS-CoV-2. Within two months of its discovery, COVID-19 became a pandemic causing widespread morbidity and mortality.

Methodology

Whole genome sequence data of SARS-CoV-2 isolated from Nigerian COVID-19 cases were retrieved by downloading from GISAID database. A total of 18 sequences that satisfied quality assurance (length ≥ 29700 nts and number of unknown bases denoted as ‘N’ ≤ 5%) were used for the study. Multiple sequence alignment (MSA) was done in MAFFT (Version 7.471) while SNP calling was implemented in DnaSP (Version 6.12.03) respectively and then visualized in Jalview (Version 2.11.1.0). Phylogenetic analysis was with MEGA X software.

Results

Nigerian SARS-CoV-2 had 99.9% genomic similarity with four large conserved genomic regions. A total of 66 SNPs were identified out of which 31 were informative. Nucleotide diversity assessment gave Pi = 0.00048 and average SNP frequency of 2.22 SNPs per 1000 nts. Non-coding genomic regions particularly 5’UTR and 3’UTR had a SNP density of 3.77 and 35.4 respectively. The region with the highest SNP density was ORF10 with a frequency of 8.55 SNPs/1000 nts). Majority (72.2%) of viruses in Nigeria are of L lineage with preponderance of D614G mutation which accounted for 11 (61.1%) out of the 18 viral sequences. Nigeria SARS-CoV-2 revealed 3 major clades namely Oyo, Ekiti and Osun on a maximum likelihood phylogenetic tree.

Conclusion and Recommendation

Nigerian SARS-CoV-2 reveals high mutation rate together with preponderance of L lineage and D614G mutants. Implication of these mutations for SARS-CoV-2 virulence and the need for more aggressive testing and treatment of COVID-19 in Nigeria is discussed. Additionally, attempt to produce testing kits for COVID-19 in Nigeria should consider the conserved regions identified in this study. Strict adherence to COVID-19 preventive measure is recommended in view of Nigerian SARS-CoV-2 phylogenetic clustering pattern, which suggests intensive community transmission possibly rooted in communal culture characteristic of many ethnicities in Nigeria.

Article activity feed

  1. SciScore for 10.1101/2020.09.25.310078: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The genomes were initially aligned with MAUVE to check for large scale genomic changes including large deletions, gene inversion, and genome rearrangements.
    MAUVE
    suggested: (Mauve, RRID:SCR_012852)
    Then, the sequences were re-aligned in MAFFT (Fig.1) to produce aligned sequences that were fed into DnaSP for SNP and haplotype analysis and subsequently into Jalview 2.11.1.0 for visualization and automatic determination of allelic frequency of SNPs.
    DnaSP
    suggested: (DnaSP, RRID:SCR_003067)
    Jalview
    suggested: (Jalview, RRID:SCR_006459)
    Phylogenetic Analysis: Maximum likelihood phylogenetic tree construction was implemented in MEGA X using sequences that had been aligned by MAFFT employing Tamura-Nei evolutionary model under assumption of uniform nucleotide substitution [21, 22].
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    A limitation of this study is the small sample size of eighteen genome sequences used in this study, which we consider too small for detection of all SARS-CoV-2 SNPs in Nigeria. In subsequent studies, it will be of interest to see if distribution of SNPs and conserved regions identified in this study are peculiar to Nigerian SARS-CoV-2 or have complete overlap with genomes of SARS-CoV-2 found elsewhere. If there are regions of non-overlap, any attempt to produce testing kits with high sensitivity for Nigeria’s COVID-19 cases should take note of the conserved regions and SNP distribution in the genomes of SARS-CoV-2 detected in this study. The frequency of SNPs observed in the Nigerian SARS-CoV-2 genome is higher than twice the frequency of SNPs in human genome which is generally taken to be 1 SNP per 1000 bps. The UTRs, especially 3’UTR, are mutation hot spots in Nigerian SARS-CoV-2 genome. This is expected because UTRs, unlike coding sequences, are generally under more relaxed or neutral selection pressure which allow mutations to accumulate at a higher rate in that region [25]. Relatively high SNP densities were also recorded in some of the coding regions of Nigerian SARS-CoV-2. An example is ORF 10 region with a SNP density of 8.55 SNPs/1000 nts. Although coronaviridae have proofreading capability due to their exonuclease activity during nucleotide replication, mutation rate of SARS-CoV-2 was estimated at ∼6 × 10− 4 nucleotides/genome/year with the capacity to mutate durin...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.