AIVE: accurate predictions of SARS-CoV-2 infectivity from comprehensive analysis

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    The study provides valuable insight into the biological significance of SARS-CoV-2 by using a series of computational analyses of viral proteins. While evidence is solid, it is obscured by a lack of clarity about the objectives of the analyses and in the overall writing of the article. The study will be impactful to the researchers in the field but will benefit from improved presentation.

This article has been Reviewed by the following groups

Read the full article

Abstract

This study presents an innovative research model utilizing big data science and protein structure prediction AI software. An unprecedented amount of SARS-CoV-2 data has been accumulated compared with previous infectious diseases, enabling insights into its evolutionary process and more thorough analyses. We identified amino acid substitutions ranging from hydrophilic to hydrophobic, or positively charged amino acids in the RBM region. An increased frequency of amino acid substitutions to lysine (K) and arginine (R) was detected in Variants of Concern (VOCs) and viral sequencing data. As the virus evolved to Omicron, commonly occurring mutations became fixed components of the new viral sequence. Furthermore, in specific positions, only one type of amino acid substitution and a notable absence of mutations at D467 was detected across viral sequences in VOCs. The binding affinity with the ACE2 receptor increased for later lineages. We developed APESS, a mathematical model evaluating infectivity based on biochemical and mutational properties calculated from a protein prediction of AlphaFold. We validated discoveries of features found through APESS. Infectivity was evaluated in silico using real-world viral sequences and in vitro viral entry assays. Using Machine Learning, we predicted mutations that had the potential to become more prominent. APESS and characteristics we discovered are featured in AIVE, a web-based system, accessible at https://ai-ve.org. AIVE provides an infectivity measurement of mutations entered by users which is available on fast APESS calculations and visualization of results without GPU installation. We established a clear link between specific viral properties and increased infectivity. Comprehensive analysis and specialized AIVE reporting enhance our understanding of SARS-CoV-2 and enable more accurate predictions of infectivity.

Article activity feed

  1. eLife Assessment

    The study provides valuable insight into the biological significance of SARS-CoV-2 by using a series of computational analyses of viral proteins. While evidence is solid, it is obscured by a lack of clarity about the objectives of the analyses and in the overall writing of the article. The study will be impactful to the researchers in the field but will benefit from improved presentation.

  2. Reviewer #1 (Public review):

    Summary:

    Park et al. conducted various analyses attempting to elucidate the biological significance of SARS-CoV-2 mutations. However, the study lacks a clear objective. The specific goals of the analyses in each subsection are unclear, as is how the results from these subsections are interconnected. Compiling results from unrelated analyses into a single paper can be confusing for readers. Clarifying the objective and narrowing down the topics would make the paper's purpose clearer.

    The logic of the study is also unclear. For instance, the authors developed an evaluation score, APESS, for analyzing viral sequences. Although they state that the APESS score correlates with viral infectivity, there is no explanation in the results section about why this is the case.

    The structure of the paper should be reconsidered.

  3. Reviewer #2 (Public review):

    Summary:

    The authors have developed a machine learning tool AIVE to predict the infectivity of SARS-CoV-2 variants and also a scoring metric to measure infectivity. A large number of virus sequences were used with a very detailed analysis that incorporates hydrophobic, hydrophilic, acid, and alkaline characteristics. The protein structures were also considered to measure infectivity and search for core mutations. The study especially focused on the S protein of SARS-CoV-2. The contents of this study would be of interest to many researchers related to this area and the web service would be helpful to easily analyze such data without in-depth bioinformatics expertise.

    Strengths:

    - Analysis of large-scale data.

    - Experimental validation on a partial set of searched mutations.

    - A user-friendly web-based analysis platform that is made public.

    Weaknesses:

    - Complexity of the research.