Machine learning-based approach KEVOLVE efficiently identifies SARS-CoV-2 variant-specific genomic signatures

Dylan Lebatteux
Hugo Soudeyns
Isabelle Boucoiran
Soren Gantt
Abdoulaye Baniré Diallo

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Machine learning was shown to be effective at identifying distinctive genomic signatures among viral sequences. These signatures are defined as pervasive motifs in the viral genome that allow discrimination between species or variants. In the context of SARS-CoV-2, the identification of these signatures can assist in taxonomic and phylogenetic studies, improve in the recognition and definition of emerging variants, and aid in the characterization of functional properties of polymorphic gene products. In this paper, we assess KEVOLVE, an approach based on a genetic algorithm with a machine-learning kernel, to identify multiple genomic signatures based on minimal sets of k -mers. In a comparative study, in which we analyzed large SARS-CoV-2 genome dataset, KEVOLVE was more effective at identifying variant-discriminative signatures than several gold-standard statistical tools. Subsequently, these signatures were characterized using a new extension of KEVOLVE (KANALYZER) to highlight variations of the discriminative signatures among different classes of variants, their genomic location, and the mutations involved. The majority of identified signatures were associated with known mutations among the different variants, in terms of functional and pathological impact based on available literature. Here we showed that KEVOLVE is a robust machine learning approach to identify discriminative signatures among SARS-CoV-2 variants, which are frequently also biologically relevant, while bypassing multiple sequence alignments. The source code of the method and additional resources are available at: https://github.com/bioinfoUQAM/KEVOLVE .

Version published to 10.1371/journal.pone.0296627
Jan 19, 2024
ScreenIT
Feb 10, 2022
SciScore for 10.1101/2022.02.07.479343: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement …
SciScore for 10.1101/2022.02.07.479343: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2022.02.07.479343 on bioRxiv
Feb 7, 2022

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

This article has 1 author:
1. Marvin I. De los Santos
This article has no evaluationsLatest version Dec 22, 2025
Machine Learning–Driven Discovery of Host Genetic Factors for Paratuberculosis in Goats Within the One Health Framework

This article has 11 authors:
1. Yalçın Yaman
2. Ahmet ESER
3. Devran Coşkun
4. Ramazan Aymaz
5. Yiğit Emir Kişi
6. Murat Keleş
7. Serdar Yağcı
8. Özgül Gülaydın
9. Serkan Süleyman Şengül
10. Kıvanç İrak
11. Memiş Bolacalı
This article has no evaluationsLatest version Jan 30, 2026
Cross-Geographic Validation Demonstrates Universal Transcriptomic Signatures for Tuberculosis Diagnosis: A Machine Learning Study

This article has 1 author:
1. Siddalingaiah H.S.
This article has no evaluationsLatest version Dec 31, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

Machine Learning–Driven Discovery of Host Genetic Factors for Paratuberculosis in Goats Within the One Health Framework

Cross-Geographic Validation Demonstrates Universal Transcriptomic Signatures for Tuberculosis Diagnosis: A Machine Learning Study