An integrated in silico immuno-genetic analytical platform provides insights into COVID-19 serological and vaccine targets

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Background

The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has a major global health and socio-economic burden. It has instigated the mobilisation of resources into the development of control tools, such as diagnostics and vaccines. The poor performance of some diagnostic serological tools has emphasised the need for up to date immune-informatic analyses to inform the selection of viable targets for further study. This requires the integration and analysis of genetic and immunological data for SARS-CoV-2 and its homology with other human coronavirus species to understand cross-reactivity.

Methods

We have developed an online “immuno-analytics” resource to facilitate SARS-CoV-2 research, combining an extensive B/T-cell epitope mapping and prediction meta-analysis, and human CoV sequence homology mapping and protein database annotation, with an updated variant database and geospatial tracking for >7,800 non-synonymous mutation positions derived from >150,000 whole genome sequences. To demonstrate its utility, we present an integrated analysis of SARS-CoV-2 spike and nucleocapsid proteins, both being vaccine and serological diagnostic targets, including an analysis of changes in relevant mutation frequencies over time.

Results

Our analysis reveals that the nucleocapsid protein in its native form appears to be a sub-optimal target for use in serological diagnostic platforms. The most frequent mutations were the spike protein D614G and nsp12 L314P, which were common (>86%) across all the geographical regions. Some mutations in the spike protein (e.g. A222V and L18F) have increased in frequency in Europe during the latter half of 2020, detected using our automated algorithms. The tool also suggests that orf3a proteins may be a suitable alternative target for diagnostic serologic assays in a post-vaccine surveillance setting.

Conclusions

The immuno-analytics tool can be accessed online ( http://genomics.lshtm.ac.uk/immuno ) and will serve as a useful resource for biological discovery and surveillance in the fight against SARS-CoV-2. Further, the tool may be adapted to inform on biological targets in future outbreaks, including potential emerging human coronaviruses that spill over from animal hosts.

Article activity feed

  1. SciScore for 10.1101/2020.05.11.089409: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Whole genome sequence data analysis: SARS-CoV-2 nucleotide sequences were downloaded from NCBI (https://www.ncbi.nlm.nih.gov) and GISAID (https://www.gisaid.org).
    https://www.ncbi.nlm.nih.gov
    suggested: (GENSAT at NCBI - Gene Expression Nervous System Atlas, RRID:SCR_003923)
    As a part of an automated in-house pipeline, sequences were aligned using MAFFT software (v7.2) [12] and trimmed to the beginning of the first reading frame (orf1ab-nsp1).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Using data available from the NCBI COVID-19 resource, a modified annotation (GFF) file was generated and open reading frames (ORFs) for each respective viral protein were extracted (taking in to account ribosomal slippage) using bedtools ‘getfasta’ function [13].
    bedtools
    suggested: (BEDTools, RRID:SCR_006646)
    Each ORF was translated using EMBOSS transeq software [14] and the variants for each protein sequence were identified using an in-house script.
    EMBOSS
    suggested: (EMBOSS, RRID:SCR_008493)
    Using BLASTp [26] we mapped short amino acid epitope sequences onto the canonical sequence of SARS-CoV-2 proteins.
    BLASTp
    suggested: (BLASTP, RRID:SCR_001010)
    Coronavirus homology analysis: Reference proteomes for SARS, MERS, OC43, 229E, HKU1 and NL63 α and β coronavirus (-CoV) species were sourced from UniProt database.
    UniProt
    suggested: (UniProtKB, RRID:SCR_004426)
    Homologous peptide sequences with a BLAST bitscore indicating 10 or more residues mapped to the target sequence were recorded and parsed for display on the graph.
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    The BioCircos.js library [10] was used to generate the interactive plot and Datatables.net libraries for the table.
    BioCircos
    suggested: None
    For the temporal/geographic non-synonymous mutation plots, we partitioned the whole genome sequencing dataset by week and continent and plotted non-synonymous allele frequencies using the Google Charts JavaScript libraries.
    Google
    suggested: (Google, RRID:SCR_017097)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.