Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach

Igor Gadelha Pereira
Joris Michel Guerin
Andouglas Gonçalves Silva Júnior
Gabriel Santos Garcia
Prisco Piscitelli
Alessandro Miani
Cosimo Distante
Luiz Marcos Garcia Gonçalves

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

The contribution of this paper is twofold. First, a new data driven approach for predicting the Covid-19 pandemic dynamics is introduced. The second contribution consists in reporting and discussing the results that were obtained with this approach for the Brazilian states, with predictions starting as of 4 May 2020. As a preliminary study, we first used an Long Short Term Memory for Data Training-SAE (LSTM-SAE) network model. Although this first approach led to somewhat disappointing results, it served as a good baseline for testing other ANN types. Subsequently, in order to identify relevant countries and regions to be used for training ANN models, we conduct a clustering of the world’s regions where the pandemic is at an advanced stage. This clustering is based on manually engineered features representing a country’s response to the early spread of the pandemic, and the different clusters obtained are used to select the relevant countries for training the models. The final models retained are Modified Auto-Encoder networks, that are trained on these clusters and learn to predict future data for Brazilian states. These predictions are used to estimate important statistics about the disease, such as peaks and number of confirmed cases. Finally, curve fitting is carried out to find the distribution that best fits the outputs of the MAE, and to refine the estimates of the peaks of the pandemic. Predicted numbers reach a total of more than one million infected Brazilians, distributed among the different states, with São Paulo leading with about 150 thousand confirmed cases predicted. The results indicate that the pandemic is still growing in Brazil, with most states peaks of infection estimated in the second half of May 2020. The estimated end of the pandemics (97% of cases reaching an outcome) spread between June and the end of August 2020, depending on the states.

Version published to 10.3390/ijerph17145115
Jul 15, 2020

SciScore for 10.1101/2020.05.11.20098392: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
To solve this issue, we use the scikit-learn [25] implementation of Affinity Propagation [26] with a damping factor of 0.8, applied to the UMAP embedded space.	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No …

SciScore for 10.1101/2020.05.11.20098392: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
To solve this issue, we use the scikit-learn [25] implementation of Affinity Propagation [26] with a damping factor of 0.8, applied to the UMAP embedded space.	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2020.05.11.20098392 on medRxiv
May 18, 2020

Is Excess Mortality Returning to Pre-Pandemic Levels? A Multi-Model Stochastic Approach for COVID-19: The Spanish Case

This article has 2 authors:
1. Julio Ibáñez-Soriano
2. Francisco G. Morillas-Jurado
This article has no evaluationsLatest version Jan 27, 2026
Machine Learning Analysis of COVID19 Transmission Dynamics Demographic Risk and Contact Tracing Outcomes in Nigeria

This article has 7 authors:
1. Bolanle Adefowoke Ojokoh
2. Oluwafemi A. Sarumi
3. Sadura Priscilla Akinrinwa
4. Abimbola H. Afolayan
5. Tobore V. Igbe
6. Abiola Ezekiel Taiwo
7. Uchechukwu M. Chukwuocha
This article has no evaluationsLatest version Dec 12, 2025
Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review

This article has 1 author:
1. Hiteash Mahajan
This article has no evaluationsLatest version Jan 27, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Is Excess Mortality Returning to Pre-Pandemic Levels? A Multi-Model Stochastic Approach for COVID-19: The Spanish Case

Machine Learning Analysis of COVID19 Transmission Dynamics Demographic Risk and Contact Tracing Outcomes in Nigeria

Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review