A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
The new coronavirus disease (COVID-19) is a challenge for clinical decision-making and the effective allocation of healthcare resources. An accurate prognostic assessment is necessary to improve survival of patients, especially in developing countries. This study proposes to predict the risk of developing critical conditions in COVID-19 patients by training multipurpose algorithms. We followed a total of 1040 patients with a positive RT-PCR diagnosis for COVID-19 from a large hospital from São Paulo, Brazil, from March to June 2020, of which 288 (28%) presented a severe prognosis, i.e. Intensive Care Unit (ICU) admission, use of mechanical ventilation or death. We used routinely-collected laboratory, clinical and demographic data to train five machine learning algorithms (artificial neural networks, extra trees, random forests, catboost, and extreme gradient boosting). We used a random sample of 70% of patients to train the algorithms and 30% were left for performance assessment, simulating new unseen data. In order to assess if the algorithms could capture general severe prognostic patterns, each model was trained by combining two out of three outcomes to predict the other. All algorithms presented very high predictive performance (average AUROC of 0.92, sensitivity of 0.92, and specificity of 0.82). The three most important variables for the multipurpose algorithms were ratio of lymphocyte per C-reactive protein, C-reactive protein and Braden Scale. The results highlight the possibility that machine learning algorithms are able to predict unspecific negative COVID-19 outcomes from routinely-collected data.
Article activity feed
-
SciScore for 10.1101/2020.08.26.20182584: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: The study was approved by the Institutional Review Board (IRB) of BP - A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent.
Consent: The study was approved by the Institutional Review Board (IRB) of BP - A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent.Randomization Due to the unbalanced nature of the outcomes, random undersampling was performed in the training set, by randomly selecting examples from the majority class for exclusion. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
… SciScore for 10.1101/2020.08.26.20182584: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: The study was approved by the Institutional Review Board (IRB) of BP - A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent.
Consent: The study was approved by the Institutional Review Board (IRB) of BP - A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent.Randomization Due to the unbalanced nature of the outcomes, random undersampling was performed in the training set, by randomly selecting examples from the majority class for exclusion. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources All the analyzes were performed using the Python programming language with the scikit-learn library. Pythonsuggested: (IPython, RRID:SCR_001658)scikit-learnsuggested: (scikit-learn, RRID:SCR_002577)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:The study has a few limitations that need to be mentioned. First, some of the outcomes overlap which may have helped the performance of the aggregated models, even though in the majority of cases the outcomes were independent. In the case of ICU admission, 55% of the patients did not die or used MV, while in the case of MV and death, 63% and 70% of their respective aggregated model was trained on other outcomes. Ideally, the outcomes would never overlap, but this is clinically unfeasible given the interlaced nature of negative prognostic outcomes. Another limitation is that we analyzed data from an urban COVID-19 hotspot in Brazil, in a period where clinical protocols for the disease were still being established, so this could affect the incidence of prognostic outcomes and may not directly generalize to other periods.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-