A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil

Abstract

The new coronavirus disease (COVID-19) is a challenge for clinical decision-making and the effective allocation of healthcare resources. An accurate prognostic assessment is necessary to improve survival of patients, especially in developing countries. This study proposes to predict the risk of developing critical conditions in COVID-19 patients by training multipurpose algorithms. We followed a total of 1040 patients with a positive RT-PCR diagnosis for COVID-19 from a large hospital from São Paulo, Brazil, from March to June 2020, of which 288 (28%) presented a severe prognosis, i.e. Intensive Care Unit (ICU) admission, use of mechanical ventilation or death. We used routinely-collected laboratory, clinical and demographic data to train five machine learning algorithms (artificial neural networks, extra trees, random forests, catboost, and extreme gradient boosting). We used a random sample of 70% of patients to train the algorithms and 30% were left for performance assessment, simulating new unseen data. In order to assess if the algorithms could capture general severe prognostic patterns, each model was trained by combining two out of three outcomes to predict the other. All algorithms presented very high predictive performance (average AUROC of 0.92, sensitivity of 0.92, and specificity of 0.82). The three most important variables for the multipurpose algorithms were ratio of lymphocyte per C-reactive protein, C-reactive protein and Braden Scale. The results highlight the possibility that machine learning algorithms are able to predict unspecific negative COVID-19 outcomes from routinely-collected data.

SciScore for 10.1101/2020.08.26.20182584: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	IRB: The study was approved by the Institutional Review Board (IRB) of BP - A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent. Consent: The study was approved by the Institutional Review Board (IRB) of BP - A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent.
Randomization	Due to the unbalanced nature of the outcomes, random undersampling was performed in the training set, by randomly selecting examples from the majority class for exclusion.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

…

SciScore for 10.1101/2020.08.26.20182584: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	IRB: The study was approved by the Institutional Review Board (IRB) of BP - A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent. Consent: The study was approved by the Institutional Review Board (IRB) of BP - A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent.
Randomization	Due to the unbalanced nature of the outcomes, random undersampling was performed in the training set, by randomly selecting examples from the majority class for exclusion.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
All the analyzes were performed using the Python programming language with the scikit-learn library.	Python suggested: (IPython, RRID:SCR_001658) scikit-learn suggested: (scikit-learn, RRID:SCR_002577)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

The study has a few limitations that need to be mentioned. First, some of the outcomes overlap which may have helped the performance of the aggregated models, even though in the majority of cases the outcomes were independent. In the case of ICU admission, 55% of the patients did not die or used MV, while in the case of MV and death, 63% and 70% of their respective aggregated model was trained on other outcomes. Ideally, the outcomes would never overlap, but this is clinically unfeasible given the interlaced nature of negative prognostic outcomes. Another limitation is that we analyzed data from an urban COVID-19 hotspot in Brazil, in a period where clinical protocols for the disease were still being established, so this could affect the incidence of prognostic outcomes and may not directly generalize to other periods.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

A Machine Learning-Based Model for Predicting Rhabdomyolysis in Patients With Sepsis

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

A Machine Learning-Based Model for Predicting Rhabdomyolysis in Patients With Sepsis

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database​

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database