Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2: toward universal blueprints for vaccine designs
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
The global population is at present suffering from a pandemic of Coronavirus disease 2019 (COVID-19), caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The goals of this study were to use artificial intelligence (AI) to predict blueprints for designing universal vaccines against SARS-CoV-2, that contain a sufficiently broad repertoire of T-cell epitopes capable of providing coverage and protection across the global population. To help achieve these aims, we profiled the entire SARS-CoV-2 proteome across the most frequent 100 HLA-A, HLA-B and HLA-DR alleles in the human population, using host-infected cell surface antigen presentation and immunogenicity predictors from the NEC Immune Profiler suite of tools, and generated comprehensive epitope maps. We then used these epitope maps as input for a Monte Carlo simulation designed to identify statistically significant “epitope hotspot” regions in the virus that are most likely to be immunogenic across a broad spectrum of HLA types. We then removed epitope hotspots that shared significant homology with proteins in the human proteome to reduce the chance of inducing off-target autoimmune responses. We also analyzed the antigen presentation and immunogenic landscape of all the nonsynonymous mutations across 3400 different sequences of the virus, to identify a trend whereby SARS-COV-2 mutations are predicted to have reduced potential to be presented by host-infected cells, and consequently detected by the host immune system. A sequence conservation analysis then removed epitope hotspots that occurred in less-conserved regions of the viral proteome. Finally, we used a database of the HLA genotypes of approximately 22 000 individuals to develop a “digital twin” type simulation to model how effective different combinations of hotspots would work in a diverse human population, and used the approach to identify an optimal constellation of epitopes hotspots that could provide maximum coverage in the global population. By combining the antigen presentation to the infected-host cell surface and immunogenicity predictions of the NEC Immune Profiler with a robust Monte Carlo and digital twin simulation, we have managed to profile the entire SARS-CoV-2 proteome and identify a subset of epitope hotspots that could be harnessed in a vaccine formulation to provide a broad coverage across the global population.
Article activity feed
-
SciScore for 10.1101/2020.04.21.052084: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Vaccine design process: Concretely, we approach the vaccine design process in four steps: Each step is described in more detail in the supplementary methods Variant immunogenic potential across the mutating sequences of SARS-CoV-2: We downloaded all the strains available in the GISAID database [36] as of 31.03.2020, and ran them through the Nexstrain/Augur software suite with default parameters [39]. Nexstrain/Augursuggested: NoneThese sets were individually processed using the Clustal Omega (v1.2.4) [40] software via the command line interface with default parameter settings. Clustal Omegas…SciScore for 10.1101/2020.04.21.052084: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Vaccine design process: Concretely, we approach the vaccine design process in four steps: Each step is described in more detail in the supplementary methods Variant immunogenic potential across the mutating sequences of SARS-CoV-2: We downloaded all the strains available in the GISAID database [36] as of 31.03.2020, and ran them through the Nexstrain/Augur software suite with default parameters [39]. Nexstrain/Augursuggested: NoneThese sets were individually processed using the Clustal Omega (v1.2.4) [40] software via the command line interface with default parameter settings. Clustal Omegasuggested: (Clustal Omega, RRID:SCR_001591)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
-
