AI-based search for convergently expanding, advantageous mutations in SARS-CoV-2 by focusing on oligonucleotide frequencies

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Among mutations that occur in SARS-CoV-2, efficient identification of mutations advantageous for viral replication and transmission is important to characterize and defeat this rampant virus. Mutations rapidly expanding frequency in a viral population are candidates for advantageous mutations, but neutral mutations hitchhiking with advantageous mutations are also likely to be included. To distinguish these, we focus on mutations that appear to occur independently in different lineages and expand in frequency in a convergent evolutionary manner. Batch-learning SOM (BLSOM) can separate SARS-CoV-2 genome sequences according by lineage from only providing the oligonucleotide composition. Focusing on remarkably expanding 20-mers, each of which is only represented by one copy in the viral genome, allows us to correlate the expanding 20-mers to mutations. Using visualization functions in BLSOM, we can efficiently identify mutations that have expanded remarkably both in the Omicron lineage, which is phylogenetically distinct from other lineages, and in other lineages. Most of these mutations involved changes in amino acids, but there were a few that did not, such as an intergenic mutation.

Article activity feed

  1. SciScore for 10.1101/2022.05.13.491763: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    However, there are obvious limitations for shortening the oligonucleotide. The present method is suitable for excavating advantageous mutations in regions that have received less attention, rather than in the S gene region where the concentration of functionally influential mutations is apparent. Since the mutations in Table 1 cover rather randomly a wide range of lineages, most (if not all) of the mutations are thought to have expanded in a convergent evolutionary way. We will here discuss a reason that Lambda and Iota were more closely related to Omicron than other lineages, both in the BLSOM and in Table 1. Mutations presented in Table 1 are thought to have occurred far prior to the time when the Omicron genome sequence was first reported (Nov. 2021), because almost all Omicron strains isolated in the Nov. 2021 had these mutations. In the case of coronaviruses, recombination plays an important role in their evolution, and therefore, it is difficult to identify the true origin of each mutation in focus. The proximity of Omicron to Alpha and Iota in the BLSOM and in Table 1 may not reflect their evolutionary origins. Analysis of the newly accumulated Omicron genome sequences: One peculiarity of SARS-CoV-2 research is that, regardless of when genome sequences were obtained for analysis, a large number of new sequences have accumulated during the analysis and manuscript preparation; in the present study, the accumulation of genome sequences of Omicron, especially the BA.2 subl...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.