Common low complexity regions for SARS-CoV-2 and human proteomes as potential multidirectional risk factor in vaccine development
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
The rapid spread of the COVID-19 demands immediate response from the scientific communities. Appropriate countermeasures mean thoughtful and educated choice of viral targets (epitopes). There are several articles that discuss such choices in the SARS-CoV-2 proteome, other focus on phylogenetic traits and history of the Coronaviridae genome/proteome. However none consider viral protein low complexity regions (LCRs). Recently we created the first methods that are able to compare such fragments. We show that five low complexity regions (LCRs) in three proteins (nsp3, S and N) encoded by the SARS-CoV-2 genome are highly similar to regions from human proteome. As many as 21 predicted T-cell epitopes and 27 predicted B-cell epitopes overlap with the five SARS-CoV-2 LCRs similar to human proteins. Interestingly, replication proteins encoded in the central part of viral RNA are devoid of LCRs. Similarity of SARS-CoV-2 LCRs to human proteins may have implications on the ability of the virus to counteract immune defenses. The vaccine targeted LCRs may potentially be ineffective or alternatively lead to autoimmune diseases development. These findings are crucial to the process of selection of new epitopes for drugs or vaccines which should omit such regions.
Author summary
The outbreak of the COVID-19 disease affects humans all over the globe. More and more people get sick and many die because of the deadly SARS-CoV-2 virus. The whole machinery of this pathogen is enclosed in a short sequence of nucleotides, building blocks for both RNA and DNA strands. This RNA virus encodes less than 30 protein sequences that change the fate of our societies. Its proteins are composed of 20 amino acids (building bricks) that are usually used quite freely by proteins. However, there are fragments where only one or a few amino acids are used. We name those low complexity regions (LCRs). We invented the first programmes able to compare such LCRs. Using this new methodology we were able to show similarity of some viral proteins to human ones. This discovery has a serious implication when designing vaccines or drugs. It means that companies should not use these very LCRs as targets because it may trigger an autoimmune disease. On the other hand this specific similarity may suggest some kind of disguise of viral proteins into the machinery of human cells.
Article activity feed
-
SciScore for 10.1101/2020.08.11.245993: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources SARS-CoV-2 protein sequences: All full-length protein sequences of the SARS-CoV-2 proteome were retrieved on 28 April 2020 from the ViralZone web portal (https://viralzone.expasy.org/8996) which provides pre-release access to the SARS Coronavirus 2 protein sequences in UniProt. ViralZonesuggested: (ViralZone, RRID:SCR_006563)UniProtsuggested: (UniProtKB, RRID:SCR_004426)Repeat is defined as at least 3 times the occurrence of a specific amino acid pattern. Repeatsuggested: (ProRepeat, RRID:SCR_006113)To annotate human proteins with their corresponding GO terms from Biological Process, … SciScore for 10.1101/2020.08.11.245993: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources SARS-CoV-2 protein sequences: All full-length protein sequences of the SARS-CoV-2 proteome were retrieved on 28 April 2020 from the ViralZone web portal (https://viralzone.expasy.org/8996) which provides pre-release access to the SARS Coronavirus 2 protein sequences in UniProt. ViralZonesuggested: (ViralZone, RRID:SCR_006563)UniProtsuggested: (UniProtKB, RRID:SCR_004426)Repeat is defined as at least 3 times the occurrence of a specific amino acid pattern. Repeatsuggested: (ProRepeat, RRID:SCR_006113)To annotate human proteins with their corresponding GO terms from Biological Process, Molecular Function and Cellular Component namespaces we used BiomaRt R package [102]. BiomaRtsuggested: (biomaRt, RRID:SCR_019214)Statistical analysis was performed with topGO R package [103] and to assess overrepresentation of GO term annotations in obtained clusters we applied hypergeometric test with false discovery Benjamin-Hochberg multiple testing correction with adjusted p-value cutoff 5%. topGOsuggested: (topGO, RRID:SCR_014798)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
-
