CAN A MACHINE LEARNING ALGORITHM IDENTIFY SARS-COV-2 VARIANTS BASED ON CONVENTIONAL rRT-PCR? PROOF OF CONCEPT

This article has been Reviewed by the following groups

Read the full article

Abstract

1

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been and remains one of the major challenges humanity has faced thus far. Over the past few months, large amounts of information have been collected that are only now beginning to be assimilated. In the present work, the existence of residual information in the massive numbers of rRT-PCRs that tested positive out of the almost half a million tests that were performed during the pandemic is investigated. This residual information is believed to be highly related to a pattern in the number of cycles that are necessary to detect positive samples as such. Thus, a database of more than 20,000 positive samples was collected, and two supervised classification algorithms (a support vector machine and a neural network) were trained to temporally locate each sample based solely and exclusively on the number of cycles determined in the rRT-PCR of each individual. Finally, the results obtained from the classification show how the appearance of each wave is coincident with the surge of each of the variants present in the region of Galicia (Spain) during the development of the SARS-CoV-2 pandemic and clearly identified with the classification algorithm.

Article activity feed

  1. SciScore for 10.1101/2021.11.12.21266286: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    5.1 Limitations: The main limitation of this study is that it is applied only to the data from a single area. This is justified by the fact that the dataset required for training had to avoid any trace of data that could lead the algorithm to distinguish data from other data. In fact, as mentioned above, this reduced the size of our dataset since we had to discard many tests due to the use of a different set of target genes. The purpose of this work is to show that a distinguishable signature on the Ct pattern seems to exist, but until proven using different labs, test conditions, etc., no further generalization should be made than the mere existence of this pattern. 5.2 Assessment of the potential interest of the proven concept: The ML tool proposed in this work represents an additional tool that can improve the relevance of rRT-PCR results. The classical interpretation of a qualitative, Boolean result (positive or negative) can be completed with additional information regarding the estimation of probable virus variants (if recognized by an ML algorithm). This can be useful for many purposes, such as the following: pandemic control (quick detection of the arrival of new variants), as a screening tool for virus sequencing, as a quality check of the tests and/or reagents, etc. In addition, this work is just the initial step toward a completely new methodology applied to rRT-PCR not only in the case of SARS-CoV-2 but also in many other diseases.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.