A machine learning explanation of the pathogen-immune relationship of SARS-CoV-2 (COVID-19), model to predict immunity, and therapeutic opportunity
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Importance
The clinical impacts of this study are it: (1) identified three immunological factors that differentiate asymptomatic, or resistant, COVID-19 patients; (2) identified the levels of those factors that can be used by clinicians to predict who is likely to be asymptomatic or symptomatic; (3) identified a novel COVID-19 therapeutic for further testing; and, (4) ordinally ranked 34 common immunological factors by their importance in predicting disease severity.
Objectives
The primary objectives of this study were to learn if machine learning could identify patterns in the pathogen-host immune relationship that differentiate or predict COVID-19 symptom immunity and, if so, which ones and at what levels. The secondary objective was to learn if machine learning could take such differentiators to build a model that could predict COVID-19 immunity with clinical accuracy. The tertiary objective was to learn about the relevance of other immune factors.
Design
This was a comparative effectiveness research study on 53 common immunological factors using machine learning on clinical data from 74 similarly-grouped Chinese COVID-19-positive patients, 37 of whom were symptomatic and 37 asymptomatic.
Setting
A single-center primary-care hospital in the Wanzhou District of China.
Participants
Immunological factors were measured in patients who were diagnosed as SARS-CoV-2 positive by reverse transcriptase-polymerase chain reaction (RT-PCR) in the 14 days before the recordation of the observations. The median age of the 37 asymptomatic patients was 41 years (range 8-75 years), 22 were female, 15 were male. For comparison, 37 RT-PCR test-positive patients were selected and matched to the asymptomatic group by age, comorbidities, and sex.
Main Outcome
The primary study outcome was that asymptomatic COVID-19 patients could be identified by three distinct immunological factors and level: stem-cell growth factor-beta (SCGF-β) (> 127637), interleukin-16 (IL-16) (> 45), and macrophage colony-stimulating factor (M-CSF) (> 57). The secondary study outcome was the novel suggestion that stem-cell therapy with SCGF-β may be a new valuable therapeutic for COVID-19.
Results
When SCGF-β was included in the machine-learning analysis, a decision-tree and extreme gradient boosting algorithms classified and predicted COVID-19 symptoms immunity with 100% accuracy. When SCGF-β was excluded, a random-forest algorithm classified and predicted COVID-19 asymptomatic and symptomatic cases with 94.8% area under the ROC curve accuracy (95% CI 90.17% to 100%). Thirty-four (34) common immune factors have statistically significant (P-value < .05) associations with COVID-19 symptoms and 19 immune factors appear to have no statistically significant association.
Conclusion
People with an SCGF-β level > 127637, or an IL-16 level > 45 and M-CSF level > 57, appear to be predictively immune to COVID-19, 100% and 94.8% (ROC AUC) of the time, respectively. Testing levels of these three immunological factors may be a valuable tool at the point-of-care for managing and preventing outbreaks. Further, stem-cell therapy via SCGF-β and/or M-CSF appear to be promising novel therapeutics for COVID-19.
Article activity feed
-
SciScore for 10.1101/2020.07.27.20162867: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Minitab 19 (version 19.2020.1, Minitab LLC) was used to calculate means, 95% confidence intervals, P-values, and two-sample T-tests of statistical significance. Minitabsuggested: (Minitab, RRID:SCR_014483)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:This study has several limitations. First, it is unknown from the dataset how many days passed between exposure to the …
SciScore for 10.1101/2020.07.27.20162867: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Minitab 19 (version 19.2020.1, Minitab LLC) was used to calculate means, 95% confidence intervals, P-values, and two-sample T-tests of statistical significance. Minitabsuggested: (Minitab, RRID:SCR_014483)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:This study has several limitations. First, it is unknown from the dataset how many days passed between exposure to the virus and immunological testing, or whether it was universally the same number of days. Second, because immune profiles are temporally sensitive, ideally, several tests would have been taken over several days, which did not occur [23]. Third, immunological signaling and processing are multifactorial and complex. Therefore, it is unclear why SCGF-Beta levels are categorically high in asymptomatic patients and low in symptomatic patients, or whether they are causal to SARS-CoV-2 response. Fourth, combinatorial and sequential analysis of these immunological elements may be an important future research area to optimize therapeutic research outcomes. Fifth, at least one study in a leading journal, Lancet, found that Chinese SARS-CoV-2 case data may have been misreported by as much as 400% [24]. That study, and much higher case and fatalities numbers in over 200 countries, have created distrust and skepticism of SARS-CoV-2-related data originating in China. Future research could ameliorate these limitations and focus on a more extensive study group to attempt to reproduce the results. Moreover, a prospective case-control study of patients with decreased SCFG-β levels and supplementation was protective against SARS-CoV-2 severity and symptoms.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
SciScore for 10.1101/2020.07.27.20162867: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Correlation coefficients were also computed using Minitab via Spearman rho because the data was distributed nonparametrically. Minitabsuggested: (Minitab, RRID:SCR_014483)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
This study has several …
SciScore for 10.1101/2020.07.27.20162867: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Correlation coefficients were also computed using Minitab via Spearman rho because the data was distributed nonparametrically. Minitabsuggested: (Minitab, RRID:SCR_014483)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
This study has several limitations. One, it is unknown from the dataset how many days passed between exposure to the virus and immunological testing, or whether it was universally the same number of days. Two, because immune profiles are temporally sensitive, ideally, several tests would have been taken over several days, which did not occur (Janford, 2020). Three, immunological signaling and processing are multifactorial and complex. Therefore, it is unclear why SCGF-Beta levels are categorically high in asymptomatic patients and low in symptomatic patients, or whether they are causal to SARS-CoV-2 response. Four, combinatorial and sequential analysis of these immunological elements may be an important future research area to optimize therapeutic research outcomes. Five, at least one study in a leading journal, Lancet, found that Chines SARS-CoV-2 case data may have been misreported by as much as 400% (Tsang, 2020). That study, and much higher case and fatalities numbers in over 200 countries, have created distrust and skepticism of SARS-CoV-2-related data originating in China. Future research could ameliorate these limitations and focus on a more extensive study group to attemp to reproduce the results. Moreover, a prospective case-control study of patients with decreased SCFGlevels and supplementation was protective against SARS-CoV-2 severity and symptoms. Conclusion One implication of these findings is that if we can predict the 80% of society who may be immune or resistant to SARS-CoV-2, or asymptomatic, it may profoundly impact public health intervention decisions as to who needs to be protected and how much. If, for example, 80% of the shelter-in-place orders and the resultant dramatic reduction in economic and social activity could have been prevented by accurately predicting who is at low risk of infection, the economic benefits alone may have been valued in US$ trillions. The second implication of these findings is evidence that elevated levels of SCGF-β, IL-16, and M-CSF may have a causal relationship with SARS-CoV-2 immunity or resistance may have utility as diagnostic determinan to (a) inform public health policy decisions to prioritize and reduce shelter-in-place orders to minimize economic and social impacts; (b) advance therapeutic research; and, (c) prioritize vaccine distribution to benefit those with the greatest need and risks first.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.
-
SciScore for 10.1101/2020.07.27.20162867: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization Ratt randomly partitioned the data to select and train on 80% (n=59), validate on 10% (7), and test on 10% (7 of observations. Blinding not detected. Power Analysis not detected. Sex as a biological variable The median age of the 37 asymptomatic patients was 41 years (range 8-75 years), 22 were female, 15 were male. Table 2: Resources
Software and Algorithms Sentences Resources Correlation coefficients were also computed using Minitab via Spearman rho because the data was distributed nonparametrically. Minitabsuggested: (Minitab, SCR_014483)Data from additional tools added …
SciScore for 10.1101/2020.07.27.20162867: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization Ratt randomly partitioned the data to select and train on 80% (n=59), validate on 10% (7), and test on 10% (7 of observations. Blinding not detected. Power Analysis not detected. Sex as a biological variable The median age of the 37 asymptomatic patients was 41 years (range 8-75 years), 22 were female, 15 were male. Table 2: Resources
Software and Algorithms Sentences Resources Correlation coefficients were also computed using Minitab via Spearman rho because the data was distributed nonparametrically. Minitabsuggested: (Minitab, SCR_014483)Data from additional tools added to each annotation on a weekly basis.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.
-
