Common low complexity regions for SARS-CoV-2 and human proteomes as potential multidirectional risk factor in vaccine development

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The rapid spread of the COVID-19 demands immediate response from the scientific communities. Appropriate countermeasures mean thoughtful and educated choice of viral targets (epitopes). There are several articles that discuss such choices in the SARS-CoV-2 proteome, other focus on phylogenetic traits and history of the Coronaviridae genome/proteome. However none consider viral protein low complexity regions (LCRs). Recently we created the first methods that are able to compare such fragments. We show that five low complexity regions (LCRs) in three proteins (nsp3, S and N) encoded by the SARS-CoV-2 genome are highly similar to regions from human proteome. As many as 21 predicted T-cell epitopes and 27 predicted B-cell epitopes overlap with the five SARS-CoV-2 LCRs similar to human proteins. Interestingly, replication proteins encoded in the central part of viral RNA are devoid of LCRs. Similarity of SARS-CoV-2 LCRs to human proteins may have implications on the ability of the virus to counteract immune defenses. The vaccine targeted LCRs may potentially be ineffective or alternatively lead to autoimmune diseases development. These findings are crucial to the process of selection of new epitopes for drugs or vaccines which should omit such regions.

Author summary

The outbreak of the COVID-19 disease affects humans all over the globe. More and more people get sick and many die because of the deadly SARS-CoV-2 virus. The whole machinery of this pathogen is enclosed in a short sequence of nucleotides, building blocks for both RNA and DNA strands. This RNA virus encodes less than 30 protein sequences that change the fate of our societies. Its proteins are composed of 20 amino acids (building bricks) that are usually used quite freely by proteins. However, there are fragments where only one or a few amino acids are used. We name those low complexity regions (LCRs). We invented the first programmes able to compare such LCRs. Using this new methodology we were able to show similarity of some viral proteins to human ones. This discovery has a serious implication when designing vaccines or drugs. It means that companies should not use these very LCRs as targets because it may trigger an autoimmune disease. On the other hand this specific similarity may suggest some kind of disguise of viral proteins into the machinery of human cells.

Article activity feed

  1. SciScore for 10.1101/2020.08.11.245993: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    SARS-CoV-2 protein sequences: All full-length protein sequences of the SARS-CoV-2 proteome were retrieved on 28 April 2020 from the ViralZone web portal (https://viralzone.expasy.org/8996) which provides pre-release access to the SARS Coronavirus 2 protein sequences in UniProt.
    ViralZone
    suggested: (ViralZone, RRID:SCR_006563)
    UniProt
    suggested: (UniProtKB, RRID:SCR_004426)
    Repeat is defined as at least 3 times the occurrence of a specific amino acid pattern.
    Repeat
    suggested: (ProRepeat, RRID:SCR_006113)
    To annotate human proteins with their corresponding GO terms from Biological Process, Molecular Function and Cellular Component namespaces we used BiomaRt R package [102].
    BiomaRt
    suggested: (biomaRt, RRID:SCR_019214)
    Statistical analysis was performed with topGO R package [103] and to assess overrepresentation of GO term annotations in obtained clusters we applied hypergeometric test with false discovery Benjamin-Hochberg multiple testing correction with adjusted p-value cutoff 5%.
    topGO
    suggested: (topGO, RRID:SCR_014798)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.