Whole genome sequence analysis showing unique SARS-CoV-2 lineages of B.1.524 and AU.2 in Malaysia

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

SARS-CoV-2 has spread throughout the world since its discovery in China, and Malaysia is no exception. WGS has been a crucial approach in studying the evolution and genetic diversity of SARS-CoV-2 in the ongoing pandemic. Despite considerable number of SARS-CoV-2 genome sequences have been submitted to GISAID and NCBI databases, there is still scarcity of data from Malaysia. This study aims to report new Malaysian lineages of the virus, responsible for the sustained spikes in COVID-19 cases during the third wave of the pandemic. Patients with nasopharyngeal and/or oropharyngeal swabs confirmed COVID-19 positive by real-time RT-PCR with C T value < 25 were chosen for WGS. The selected SARS-CoV-2 isolates were then sequenced, characterized and analyzed along with 986 sequences of the dominant lineages of D614G variants currently circulating throughout Malaysia. The prevalence of clade GH and G formed strong ground for the presence of two Malaysian lineages of AU.2 and B.1.524 that has caused sustained spikes of cases in the country. Statistical analysis on the association of gender and age group with Malaysian lineages revealed a significant association ( p <0.05). Phylogenetic analysis revealed dispersion of 41 lineages, of these, 22 lineages are still active. Mutational analysis showed presence of unique G1223C missense mutation in transmembrane domain of the spike protein. For better understanding of the SARS-CoV-2 evolution in Malaysia especially with reference to the reported lineages, large scale studies based on WGS are warranted.

Article activity feed

  1. SciScore for 10.1101/2021.08.11.21261902: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    To do this, 1356 viruses were analysed manually using Pivot table and the date was filtered to months and year in Excel.
    Pivot
    suggested: (Pivot, RRID:SCR_013999)
    Excel
    suggested: None
    The multiple sequence alignment was performed using DECIPHER [26] and SeqinR [27] packages in R version 4.0.2 and finalized using MEGA X 11 [28].
    DECIPHER
    suggested: (DECIPHER, RRID:SCR_006552)
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    Mutation analysis via computational prediction tools: Mutation analysis were analyzed using Nextclade v.1.5.2, a web-based analysis server (https://clades.nextstrain.org) by comparing against a wild-type of Wuhan-Hu-1 (NC_045512.2).
    Mutation
    suggested: (mutationSeq, RRID:SCR_006815)
    Next, the potential pathogenicity effect of the amino acid substitution on TM domain biological function was investigated by uploading a 3D structure of TM domain PDB ID: 7LC8 and TM domain amino acid sequence onto mCSM-membrane [31] and uploading TM amino acid sequence onto Protein Variation Effect Analyzer (PROVEAN) [32] and SNAP 2 tools [33]; the web-based servers for predicting the effect of mutations on the biological function of a protein.
    PROVEAN
    suggested: (PROVEAN, RRID:SCR_002182)
    SNAP
    suggested: (SNAP, RRID:SCR_007936)
    Chi-square test were carried out using IBM SPSS v25.0 to testing the statistical significance association of gender, patient status and age groups with Malaysian lineages.
    SPSS
    suggested: (SPSS, RRID:SCR_002865)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This study has some limitations. First, the work on WGS in characterizing the circulating variants in Malaysia needed to be underscored systematically by representing Malaysian cases. Therefore, in order to success in combating the spread of COVID-19 in Malaysia, utilizing a viral genomics sequencing is critical to be used as a key tool for understanding the spread of COVID-19. By integrating viral genomics with epidemiological and modelling data, local transmission chains and regional spread were able to be tracked and audited in real time [64]. This strategy was proven to curb the spread of COVID-19 in a developed country for example Australia [65] and New Zealand [64]. Second, lack of patient clinical status details deposited to GISAID database hampered the analysis of the impact of the distribution of each individual clades on the disease epidemiology locally. Therefore, specifying whether the virus samples were collected from asymptomatic or mild symptoms, to severe or deceased might help to identify the prevalence of each major clade and lineage frequently detected. We also discovered a plethora of unclear entries that offer very little information about the real source of a sample. All of these issues can affect the effectiveness and accuracy of association studies. We therefore advocate for SARS-CoV-2 genomic data providers to comprehensively when submitting metadata, and encourage genomic database maintainers to be aware of potential errors in incoming samples and to...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.