Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Genomic data analysis is a fundamental system for monitoring pathogen evolution and the outbreak of infectious diseases. Based on bioinformatics and deep learning, this study was designed to identify the genomic variability of SARS-CoV-2 worldwide and predict the impending mutation rate. Analysis of 259044 SARS-CoV-2 isolates identify 3334545 mutations (14.01 mutations per isolate), suggesting a high mutation rate. Strains from India showed the highest no. of mutations (48) followed by Scotland, USA, Netherlands, Norway, and France having up to 36 mutations. Besides the most prominently occurring mutations (D416G, F106F, P314L, and UTR:C241T), we identify L93L, A222V, A199A, V30L, and A220V mutations which are in the top 10 most frequent mutations. Multi-nucleotide mutations GGG>AAC, CC>TT, TG>CA, and AT>TA have come up in our analysis which are in the top 20 mutational cohort. Future mutation rate analysis predicts a 17%, 7%, and 3% increment of C>T, A>G, and A>T, respectively in the future. Conversely, 7%, 7%, and 6% decrement is estimated for T>C, G>A, and G>T mutations, respectively. T>G\A, C>G\A, and A>T\C are not anticipated in the future. Since SARS-CoV-2 is evolving continuously, our findings will facilitate the tracking of mutations and help to map the progression of the COVID-19 intensity worldwide.
Article activity feed
-
SciScore for 10.1101/2021.05.23.445341: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Alignment of total 259044 SARS-CoV-2 genome sequences was done by using NUCMER v3.1 algorithm 28 where NC_045512.2 was considered as reference sequence. NUCMERsuggested: NoneResults from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Result…
SciScore for 10.1101/2021.05.23.445341: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Alignment of total 259044 SARS-CoV-2 genome sequences was done by using NUCMER v3.1 algorithm 28 where NC_045512.2 was considered as reference sequence. NUCMERsuggested: NoneResults from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-
-