Forecasting COVID-19 cases at the Amazon region: a comparison of classical and machine learning models

Dalton Garcia Borges de Souza
Francisco Tarcísio Alves Júnior
Nei Yoshihiro Soma

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

BACKGROUND

Since the first reports of COVID-19, decision-makers have been using traditional epidemiological models to predict the days to come. However, the enhancement of computational power, the demand for adaptable predictive frameworks, the short past of the disease, and uncertainties related to input data and prediction rules, also make other classical and machine learning techniques viable options.

OBJECTIVE

This study investigates the efficiency of six models in forecasting COVID-19 confirmed cases with 17 days ahead. We compare the models autoregressive integrated moving average (ARIMA), Holt-Winters, support vector regression (SVR), k-nearest neighbors regressor (KNN), random trees regressor (RTR), seasonal linear regression with change-points (Prophet), and simple logistic regression (SLR).

MATERIAL AND METHODS

We implement the models to data provided by the health surveillance secretary of Amapáa, a Brazilian state fully carved in the Amazon rainforest, which has been experiencing high infection rates. We evaluate the models according to their capacity to forecast in different historical scenarios of the COVID-19 progression, such as exponential increases, sudden decreases, and stability periods of daily cases. To do so, we use a rolling forward splitting approach for out-of-sample validation. We employ the metrics RMSE, R-squared, and sMAPE in evaluating the model in different cross-validation sections.

FINDINGS

All models outperform SLG, especially Holt-Winters, that performs satisfactorily in all scenarios. SVR and ARIMA have better performances in isolated scenarios. To implement the comparisons, we have created a web application, which is available online.

CONCLUSION

This work represents an effort to assist the decision-makers of Amapá in future decisions to come, especially under scenarios of sudden variations in the number of confirmed cases of Amapá, which would be caused, for instance, by new contamination waves or vaccination. It is also an attempt to highlight alternative models that could be used in future epidemics.

ScreenIT
Oct 15, 2020
SciScore for 10.1101/2020.10.09.332908: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected.
Randomization not detected.
Blinding not detected.
Power Analysis not detected.
Sex as a biological variable not detected.
Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar …
SciScore for 10.1101/2020.10.09.332908: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected.
Randomization not detected.
Blinding not detected.
Power Analysis not detected.
Sex as a biological variable not detected.
Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
No funding statement was detected.
No protocol registration statement was detected.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2020.10.09.332908 on bioRxiv
Oct 9, 2020

Interpretable Multi-Horizon Machine Learning Framework for PM₂.₅ Forecasting in Tashkent: Toward Early-Warning Air Quality Management

This article has 2 authors:
1. Moulay Rachid Babaa
2. Otabek Atabaev
This article has no evaluationsLatest version Dec 18, 2025
Applying Multiple Linear Regression to Enhance Short-Term Stock Forecasting Accuracy

This article has 2 authors:
1. TOUSIF AL RASHID
2. Raj Kumar
This article has no evaluationsLatest version Dec 15, 2025
Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review

This article has 1 author:
1. Hiteash Mahajan
This article has no evaluationsLatest version Jan 27, 2026

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

BACKGROUND

OBJECTIVE

MATERIAL AND METHODS

FINDINGS

CONCLUSION

Article activity feed

Related articles

Interpretable Multi-Horizon Machine Learning Framework for PM₂.₅ Forecasting in Tashkent: Toward Early-Warning Air Quality Management

Applying Multiple Linear Regression to Enhance Short-Term Stock Forecasting Accuracy

Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review