Effectiveness, Explainability and Reliability of Machine Meta-Learning Methods for Predicting Mortality in Patients with COVID-19: Results of the Brazilian COVID-19 Registry

Bruno Barbosa Miranda de Paiva
Polianna Delfino-Pereira
Claudio Moisés Valiense de Andrade
Virginia Mara Reis Gomes
Maria Clara Pontello Barbosa Lima
Maira Viana Rego Souza-Silva
Marcelo Carneiro
Karina Paula Medeiros Prado Martins
Thaís Lorenna Souza Sales
Rafael Lima Rodrigues de Carvalho
Magda C. Pires
Lucas Emanuel F. Ramos
Rafael T. Silva
Adriana Falangola Benjamin Bezerra
Alexandre Vargas Schwarzbold
Aline Gabrielle Sousa Nunes
Amanda de Oliveira Maurílio
Ana Luiza Bahia Alves Scotton
André Soares de Moura Costa
Andriele Abreu Castro
Bárbara Lopes Farace
Christiane Corrêa Rodrigues Cimini
Cíntia Alcantara De Carvalho
Daniel Vitório Silveira
Daniela Ponce
Elayne Crestani Pereira
Euler Roberto Fernandes Manenti
Evelin Paola de Almeida Cenci
Fernanda Barbosa Lucas
Fernanda D’Athayde Rodrigues
Fernando Anschau
Fernando Antonio Botoni
Fernando Graça Aranha
Frederico Bartolazzi
Gisele Alsina Nader Bastos
Giovanna Grunewald Vietta
Guilherme Fagundes Nascimento
Helena Carolina Noal
Helena Duani
Heloisa Reniers Vianna
Henrique Cerqueira Guimarães
Isabela Moraes Gomes
Jamille Hemétrio Salles Martins Costa
Jéssica Rayane Corrêa Silva da Fonseca
Júlia Di Sabatino Santos Guimarães
Júlia Drumond Parreiras de Morais
Juliana Machado Rugolo
Joanna D’arc Lyra Batista
Joice Coutinho de Alvarenga
José Miguel Chatkin
Karen Brasil Ruschel
Leila Beltrami Moreira
Leonardo Seixas de Oliveira
Liege Barella Zandoná
Lílian Santos Pinheiro
Luanna da Silva Monteiro
Lucas de Deus Sousa
Luciane Kopittke
Luciano de Souza Viana
Luis César de Castro
Luisa Argolo Assis
Luisa Elem Almeid Santos
Máderson Alvares de Souza Cabral
Magda Cesar Raposo
Maiara Anschau Floriani
Maria Angélica Pires Ferreira
Maria Aparecida Camargos Bicalho
Mariana Frizzo de Godoy
Matheus Carvalho Alves Nogueira
Meire Pereira de Figueiredo
Milton Henriques Guimarães-Júnior
Mônica Aparecida de Paula De Sordi
Natália da Cunha Severino Sampaio
Neimy Ramos de Oliveira
Pedro Ledic Assaf
Raquel Lutkmeier
Reginaldo Aparecido Valacio
Renan Goulart Finger
Roberta Senger
Rochele Mosmann Menezes
Rufino de Freitas Silva
Saionara Cristina Francisco
Silvana Mangeon Mereilles Guimarães
Silvia Ferreira Araújo
Talita Fischer Oliveira
Tatiana Kurtz
Tatiani Oliveira Fereguetti
Thainara Conceição de Oliveira
Thulio Henrique Oliveira Diniz
Yara Cristina Neves Marques Barbosa Ribeiro
Yuri Carlotto Ramires
Marcos André Gonçalves
Milena Soriano Marcolino

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (ScreenIT)

Abstract

Objective

To provide a thorough comparative study among state-of-the-art machine learning methods and statistical methods for determining in-hospital mortality in COVID-19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods.

Materials and Methods

De-identified data were obtained from COVID-19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID-19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross-validation procedure, from which we assessed performance and interpretability metrics.

Results

The Stacking of machine learning models improved over the previous state-of-the-art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macro F1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the ‘why’.

Conclusion

The best results were obtained using the meta-learning ensemble model – Stacking. State-of the art explainability techniques such as SHAP-values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine-learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions.

ScreenIT
Nov 6, 2021
SciScore for 10.1101/2021.11.01.21265527: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
Software and Algorithms
Sentences Resources
A prespecified case report form was used, applying Research Electronic Data Capture (REDCap) tools (15).
REDCap
suggested: (REDCap, RRID:SCR_003445)
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
This is an intrinsic limitation of regression models, and the variable may be seen as non-significant due to the fact that it is a non-linear association. As previously …
SciScore for 10.1101/2021.11.01.21265527: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
Software and Algorithms
Sentences Resources
A prespecified case report form was used, applying Research Electronic Data Capture (REDCap) tools (15).
REDCap
suggested: (REDCap, RRID:SCR_003445)
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
This is an intrinsic limitation of regression models, and the variable may be seen as non-significant due to the fact that it is a non-linear association. As previously mentioned, an important limitation of regression models is collinearity. When exploiting LASSO regression in our previous work (4), we had to exclude some features which had shown to be important in the boosting model due to high collinearity. This may explain the difference in the features included in both models, despite the fact that all features included in both had previous evidence of association with COVID-19 prognosis. Another interesting remark is shown in Fig 4, in which we can see the relative importance of each feature. Here, again, age is the most important single feature (due to higher mean SHAP value), which is in line with previous studies (3,31,32). In an American study in intensive care units, age has shown higher discriminatory capacity when used in isolation (AUC 0.66) than the Sequential Organ Failure Assessment (SOFA) score (0.55) for mortality prediction, in a cohort study of adult patients from 18 ICUs in the US, with COVID-19 pneumonia. This score is widely used at emergency departments and ICUs worldwide to determine the extent of a person’s organ function or rate of failure (42). In the present study, the remaining features, when combined, yield higher predictive value in this task than just age. Reliability: Finally, we investigate issues related to the reliability of the models. Ne...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2021.11.01.21265527 on medRxiv
Nov 2, 2021

Software and Algorithms
Sentences	Resources
A prespecified case report form was used, applying Research Electronic Data Capture (REDCap) tools (15).	REDCap suggested: (REDCap, RRID:SCR_003445)

Software and Algorithms
Sentences	Resources
A prespecified case report form was used, applying Research Electronic Data Capture (REDCap) tools (15).	REDCap suggested: (REDCap, RRID:SCR_003445)

Predicting Short-Term Mortality in Severe Cirrhosis: An Interpretable Machine Learning Model Integrating Routine Clinical Indicators

This article has 5 authors:
1. Shun Zhang
2. Rui Liu
3. Zhengjie Li
4. Tao Pan
5. Xudong Wen
This article has no evaluationsLatest version Jul 11, 2025
Analysis of prognostic factors affecting adult patients with anemia combined with sepsis and construction of a prediction model

This article has 4 authors:
1. Jinmin CHEN
2. Yongmei YANG
3. Min LIANG
4. Yanbo LIU
This article has no evaluationsLatest version Jul 8, 2025
Using Ensemble and Multi-Model Learning to Predict Longitudinal Hospitalization of Cardiovascular Patients and Warn of Pandemic Risks: A Retrospective Cohort Study with External Validations

This article has 7 authors:
1. Chengbo Fu
2. Zhe Zheng
3. Xiaoyan Yin
4. Xuanchu Ge
5. Xiangyu Zeng
6. Lingfeng Zha
7. Yanze Li
This article has no evaluationsLatest version Jun 10, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Objective

Materials and Methods

Results

Conclusion

Article activity feed

Related articles

Predicting Short-Term Mortality in Severe Cirrhosis: An Interpretable Machine Learning Model Integrating Routine Clinical Indicators

Analysis of prognostic factors affecting adult patients with anemia combined with sepsis and construction of a prediction model

Using Ensemble and Multi-Model Learning to Predict Longitudinal Hospitalization of Cardiovascular Patients and Warn of Pandemic Risks: A Retrospective Cohort Study with External Validations