A machine learning explanation of the pathogen-immune relationship of SARS-CoV-2 (COVID-19), model to predict immunity, and therapeutic opportunity

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Importance

The clinical impacts of this study are it: (1) identified three immunological factors that differentiate asymptomatic, or resistant, COVID-19 patients; (2) identified the levels of those factors that can be used by clinicians to predict who is likely to be asymptomatic or symptomatic; (3) identified a novel COVID-19 therapeutic for further testing; and, (4) ordinally ranked 34 common immunological factors by their importance in predicting disease severity.

Objectives

The primary objectives of this study were to learn if machine learning could identify patterns in the pathogen-host immune relationship that differentiate or predict COVID-19 symptom immunity and, if so, which ones and at what levels. The secondary objective was to learn if machine learning could take such differentiators to build a model that could predict COVID-19 immunity with clinical accuracy. The tertiary objective was to learn about the relevance of other immune factors.

Design

This was a comparative effectiveness research study on 53 common immunological factors using machine learning on clinical data from 74 similarly-grouped Chinese COVID-19-positive patients, 37 of whom were symptomatic and 37 asymptomatic.

Setting

A single-center primary-care hospital in the Wanzhou District of China.

Participants

Immunological factors were measured in patients who were diagnosed as SARS-CoV-2 positive by reverse transcriptase-polymerase chain reaction (RT-PCR) in the 14 days before the recordation of the observations. The median age of the 37 asymptomatic patients was 41 years (range 8-75 years), 22 were female, 15 were male. For comparison, 37 RT-PCR test-positive patients were selected and matched to the asymptomatic group by age, comorbidities, and sex.

Main Outcome

The primary study outcome was that asymptomatic COVID-19 patients could be identified by three distinct immunological factors and level: stem-cell growth factor-beta (SCGF-β) (> 127637), interleukin-16 (IL-16) (> 45), and macrophage colony-stimulating factor (M-CSF) (> 57). The secondary study outcome was the novel suggestion that stem-cell therapy with SCGF-β may be a new valuable therapeutic for COVID-19.

Results

When SCGF-β was included in the machine-learning analysis, a decision-tree and extreme gradient boosting algorithms classified and predicted COVID-19 symptoms immunity with 100% accuracy. When SCGF-β was excluded, a random-forest algorithm classified and predicted COVID-19 asymptomatic and symptomatic cases with 94.8% area under the ROC curve accuracy (95% CI 90.17% to 100%). Thirty-four (34) common immune factors have statistically significant (P-value < .05) associations with COVID-19 symptoms and 19 immune factors appear to have no statistically significant association.

Conclusion

People with an SCGF-β level > 127637, or an IL-16 level > 45 and M-CSF level > 57, appear to be predictively immune to COVID-19, 100% and 94.8% (ROC AUC) of the time, respectively. Testing levels of these three immunological factors may be a valuable tool at the point-of-care for managing and preventing outbreaks. Further, stem-cell therapy via SCGF-β and/or M-CSF appear to be promising novel therapeutics for COVID-19.

Article activity feed

  1. SciScore for 10.1101/2020.07.27.20162867: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Minitab 19 (version 19.2020.1, Minitab LLC) was used to calculate means, 95% confidence intervals, P-values, and two-sample T-tests of statistical significance.
    Minitab
    suggested: (Minitab, RRID:SCR_014483)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This study has several limitations. First, it is unknown from the dataset how many days passed between exposure to the virus and immunological testing, or whether it was universally the same number of days. Second, because immune profiles are temporally sensitive, ideally, several tests would have been taken over several days, which did not occur [23]. Third, immunological signaling and processing are multifactorial and complex. Therefore, it is unclear why SCGF-Beta levels are categorically high in asymptomatic patients and low in symptomatic patients, or whether they are causal to SARS-CoV-2 response. Fourth, combinatorial and sequential analysis of these immunological elements may be an important future research area to optimize therapeutic research outcomes. Fifth, at least one study in a leading journal, Lancet, found that Chinese SARS-CoV-2 case data may have been misreported by as much as 400% [24]. That study, and much higher case and fatalities numbers in over 200 countries, have created distrust and skepticism of SARS-CoV-2-related data originating in China. Future research could ameliorate these limitations and focus on a more extensive study group to attempt to reproduce the results. Moreover, a prospective case-control study of patients with decreased SCFG-β levels and supplementation was protective against SARS-CoV-2 severity and symptoms.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.07.27.20162867: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Correlation coefficients were also computed using Minitab via Spearman rho because the data was distributed nonparametrically.
    Minitab
    suggested: (Minitab, RRID:SCR_014483)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

    This study has several limitations. One, it is unknown from the dataset how many days passed between exposure to the virus and immunological testing, or whether it was universally the same number of days. Two, because immune profiles are temporally sensitive, ideally, several tests would have been taken over several days, which did not occur (Janford, 2020). Three, immunological signaling and processing are multifactorial and complex. Therefore, it is unclear why SCGF-Beta levels are categorically high in asymptomatic patients and low in symptomatic patients, or whether they are causal to SARS-CoV-2 response. Four, combinatorial and sequential analysis of these immunological elements may be an important future research area to optimize therapeutic research outcomes. Five, at least one study in a leading journal, Lancet, found that Chines SARS-CoV-2 case data may have been misreported by as much as 400% (Tsang, 2020). That study, and much higher case and fatalities numbers in over 200 countries, have created distrust and skepticism of SARS-CoV-2-related data originating in China. Future research could ameliorate these limitations and focus on a more extensive study group to attemp to reproduce the results. Moreover, a prospective case-control study of patients with decreased SCFGlevels and supplementation was protective against SARS-CoV-2 severity and symptoms. Conclusion One implication of these findings is that if we can predict the 80% of society who may be immune or resistant to SARS-CoV-2, or asymptomatic, it may profoundly impact public health intervention decisions as to who needs to be protected and how much. If, for example, 80% of the shelter-in-place orders and the resultant dramatic reduction in economic and social activity could have been prevented by accurately predicting who is at low risk of infection, the economic benefits alone may have been valued in US$ trillions. The second implication of these findings is evidence that elevated levels of SCGF-β, IL-16, and M-CSF may have a causal relationship with SARS-CoV-2 immunity or resistance may have utility as diagnostic determinan to (a) inform public health policy decisions to prioritize and reduce shelter-in-place orders to minimize economic and social impacts; (b) advance therapeutic research; and, (c) prioritize vaccine distribution to benefit those with the greatest need and risks first.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.

  3. SciScore for 10.1101/2020.07.27.20162867: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.RandomizationRatt randomly partitioned the data to select and train on 80% (n=59), validate on 10% (7), and test on 10% (7 of observations.Blindingnot detected.Power Analysisnot detected.Sex as a biological variableThe median age of the 37 asymptomatic patients was 41 years (range 8-75 years), 22 were female, 15 were male.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Correlation coefficients were also computed using Minitab via Spearman rho because the data was distributed nonparametrically.
    Minitab
    suggested: (Minitab, SCR_014483)

    Data from additional tools added to each annotation on a weekly basis.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.