A machine learning-based approach to determine infection status in recipients of BBV152 whole virion inactivated SARS-CoV-2 vaccine for serological surveys

This article has been Reviewed by the following groups

Read the full article

Abstract

Data science has been an invaluable part of the COVID-19 pandemic response with multiple applications, ranging from tracking viral evolution to understanding the effectiveness of interventions. Asymptomatic breakthrough infections have been a major problem during the ongoing surge of Delta variant globally. Serological discrimination of vaccine response from infection has so far been limited to Spike protein vaccines used in the higher-income regions. Here, we show for the first time how statistical and machine learning (ML) approaches can discriminate SARS-CoV-2 infection from immune response to an inactivated whole virion vaccine (BBV152, Covaxin, India), thereby permitting real-world vaccine effectiveness assessments from cohort-based serosurveys in Asia and Africa where such vaccines are commonly used. Briefly, we accessed serial data on Anti-S and Anti-NC antibody concentration values, along with age, sex, number of doses, and number of days since the last vaccine dose for 1823 Covaxin recipients. An ensemble ML model, incorporating a consensus clustering approach alongside the support vector machine (SVM) model, was built on 1063 samples where reliable qualifying data existed, and then applied to the entire dataset. Of 1448 self-reported negative subjects, 724 were classified as infected. Since the vaccine contains wild-type virus and the antibodies induced will neutralize wild type much better than Delta variant, we determined the relative ability of a random subset of such samples to neutralize Delta versus wild type strain. In 100 of 156 samples, where ML prediction differed from self-reported uninfected status, Delta variant, was neutralized more effectively than the wild type, which cannot happen without infection. The fraction rose to 71.8% (28 of 39) in subjects predicted to be infected during the surge, which is concordant with the percentage of sequences classified as Delta (75.6%-80.2%) over the same period.

Article activity feed

  1. SciScore for 10.1101/2021.12.16.21267889: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Antibodies
    SentencesResources
    Separated plasma was stored at -80°C until used to assess antibodies against recombinant protein representing Nucleocapsid (NC) and Spike (S) antigens of SARS-CoV-2 using Elecsys Anti-SARS-CoV-2 kits (Roche Diagnostics) based on Electro-chemiluminescence Immunoassay (ECLIA) according to manufacturer’s procedure.
    antibodies against recombinant protein representing Nucleocapsid (NC)
    suggested: None
    Individuals with a Cut-off index (COI) value of > 1.0 and a value of > 0.8 U/mL were considered to be positive for Anti-NC and Anti-S antibodies, respectively.
    Anti-NC
    suggested: None
    Anti-S
    suggested: None
    Software and Algorithms
    SentencesResources
    We used the Python library, namely, Scikit-learn (version 0.24.1) for predictive modeling.
    Python
    suggested: (IPython, RRID:SCR_001658)
    Scikit-learn
    suggested: (scikit-learn, RRID:SCR_002577)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Though this study fills the gap in the field, it comes with certain limitations. First, the self-reported status is questionnaire-based which might have a certain level of inconsistency. Second, not infected individuals might get infected at any time after the questionnaire form filling, which means that longitudinal data might have been better at capturing the real infection rate. Third, the samples come from different employees of institutes and their relatives, which might not be the real representation of the overall country’s population, especially in rural areas. The geographical locations covered by our study include CSIR labs located throughout the country (Naushin et al., 2021) which complements the predominantly rural locale by the ICMR study. Further, a recent ICMR paper of pilot study with 114 individuals shows that people with infection and 1 dose of Covaxin are equally responsive to the ones with infection naïve 2 doses (Kumar et al., 2021). This study was able to address an important gap in available literature for calculating vaccine effectiveness with whole virion vaccine from serology-based surveys. Thus, our work fills the gap where we developed and validated a method to identify asymptomatic COVID-19 infected individuals and thus were able to calculate vaccine efficacy in one of the largest cohorts of India. Finally, we were able to provide the protection efficacy (PE) of 55% (95%CI 43%–64%) for fully vaccinated subjects. This was per the available literat...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.