SARS-CoV-2 ORF8 sequence conservation and mutational analysis — insight into the influence of dataset size on identifying top mutations
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Given how quickly the SARS-CoV-2 virus mutates, the COVID-19 pandemic has been a major source of concern. The ORF8 accessory protein is one such protein, which is reported to have undergone many mutations. This makes ORF8 an intriguing protein to investigate how these mutations might play a role in overall ORF8 activity. In this study, we have performed conservation and mutational analysis on SARS-CoV- 2 ORF8 protein sequences to identify the conserved and mutated residues. We have also split the ORF8 sequence data into SARS-CoV-2 variant datasets to further identify top mutations across each of them. The mutated and conserved residues were visualised on the available structure of ORF8 to highlight the conserved and mutated sites, which might hold some biological significance. Finally, our study also investigated the significance of sequence dataset size in capturing top mutations following multiple sequence alignments.
Author Summary
The COVID-19 pandemic was caused by the SARS-CoV-2 virus, which is known to change over time, i.e., it gets mutated, resulting in the generation of different variants. The ORF8 accessory protein of the SARS-CoV- 2 genome is known to undergo these changes more frequently. In our study, we used SARS-CoV-2 ORF8 protein sequences from various variants to identify mutations among them. Furthermore, we have discovered sites that remain unchanged over time, a phenomenon known as conservation. We think that these unchanged and changed sites could be important for biology and studying them will help in understanding the underlying mechanism of how ORF8 interacts with partner proteins based on existing experimental data. Lastly, we have looked at how much sequence data is sufficient for identifying the top mutated sites.