Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning

Md. Shahadat Hossain
A. Q. M. Sala Uddin Pathan
Md. Nur Islam
Mahafujul Islam Quadery Tonmoy
Mahmudul Islam Rakib
Md. Adnan Munim
Otun Saha
Atqiya Fariha
Hasan Al Reza
Maitreyee Roy
Newaz Mohammed Bahadur
Md. Mizanur Rahaman

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Genomic data analysis is a fundamental system for monitoring pathogen evolution and the outbreak of infectious diseases. Based on bioinformatics and deep learning, this study was designed to identify the genomic variability of SARS-CoV-2 worldwide and predict the impending mutation rate. Analysis of 259044 SARS-CoV-2 isolates identify 3334545 mutations (14.01 mutations per isolate), suggesting a high mutation rate. Strains from India showed the highest no. of mutations (48) followed by Scotland, USA, Netherlands, Norway, and France having up to 36 mutations. Besides the most prominently occurring mutations (D416G, F106F, P314L, and UTR:C241T), we identify L93L, A222V, A199A, V30L, and A220V mutations which are in the top 10 most frequent mutations. Multi-nucleotide mutations GGG>AAC, CC>TT, TG>CA, and AT>TA have come up in our analysis which are in the top 20 mutational cohort. Future mutation rate analysis predicts a 17%, 7%, and 3% increment of C>T, A>G, and A>T, respectively in the future. Conversely, 7%, 7%, and 6% decrement is estimated for T>C, G>A, and G>T mutations, respectively. T>G\A, C>G\A, and A>T\C are not anticipated in the future. Since SARS-CoV-2 is evolving continuously, our findings will facilitate the tracking of mutations and help to map the progression of the COVID-19 intensity worldwide.

SciScore for 10.1101/2021.05.23.445341: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Alignment of total 259044 SARS-CoV-2 genome sequences was done by using NUCMER v3.1 algorithm 28 where NC_045512.2 was considered as reference sequence.	NUCMER suggested: None

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Result…

SciScore for 10.1101/2021.05.23.445341: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Alignment of total 259044 SARS-CoV-2 genome sequences was done by using NUCMER v3.1 algorithm 28 where NC_045512.2 was considered as reference sequence.	NUCMER suggested: None

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Version published to 10.1101/2021.05.23.445341 on bioRxiv
May 24, 2021
Version published to 10.1016/j.imu.2021.100798
Jan 1, 2021

Genetic Characterization of Influenza A (A/H3N2) Viruses Reveals Antigenic Drift in Receptor Binding Domain and Possible Vaccine Mismatch in Strains Circulating in Riyadh, Saudi Arabia, 2024-2025

This article has 7 authors:
1. Shatha Ata Abdulgader
2. Ibrahim M. Aziz
3. Abdulhadi M. Abdulwahed
4. Mohamed A. Farrag
5. Reem M. Aljowaie
6. Abdulaziz M. Almuqrin
7. Fahad N. Almajhdi
This article has no evaluationsLatest version Dec 30, 2025
The population frequency of predicted pathogenic variants in commonly-affected genes in CAKUT in the general population

This article has 2 authors:
1. Mary Huang
2. Judy Savige
This article has no evaluationsLatest version Dec 17, 2025
Genome-Wide Discovery and Characterization of Putative Antimicrobial Resistance-Associated Small Open Reading Frames (sORFs) in the Staphylococcus aureus Pan-Genome

This article has 4 authors:
1. Saad Khan
2. Mehede Hassan Rubel
3. Mahmudul Hasan
4. Juan Philippe Teixeira
This article has no evaluationsLatest version Dec 19, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Genetic Characterization of Influenza A (A/H3N2) Viruses Reveals Antigenic Drift in Receptor Binding Domain and Possible Vaccine Mismatch in Strains Circulating in Riyadh, Saudi Arabia, 2024-2025

The population frequency of predicted pathogenic variants in commonly-affected genes in CAKUT in the general population

Genome-Wide Discovery and Characterization of Putative Antimicrobial Resistance-Associated Small Open Reading Frames (sORFs) in the Staphylococcus aureus Pan-Genome