Machine Learning Model for Predicting Number of COVID19 Cases in Countries with Low Number of Tests

Samy Hashim
Sally Farooq
Eleni Syriopoulos
Kai de la Lande Cremer
Alexander Vogt
Nol de Jong
Victor L. Aguado
Mihai Popescu
Ashraf K. Mohamed
Muhamed Amin

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

The COVID-19 pandemic has presented a series of new challenges to governments and health care systems. Testing is one important method for monitoring and therefore controlling the spread of COVID-19. Yet with a serious discrepancy in the resources available between rich and poor countries not every country is able to employ widespread testing. Here we developed machine learning models for predicting the number of COVID-19 cases in a country based on multilinear regression and neural networks models. The models are trained on data from US states and tested against the reported infections in the European countries. The model is based on four features: Number of tests Population Percentage Urban Population and Gini index. The population and number of tests have the strongest correlation with the number of infections. The model was then tested on data from European countries for which the correlation coefficient between the actual and predicted cases R ² was found to be 0.88 in the multi linear regression and 0.91 for the neural network model. The model predicts that the actual number of infections in countries where the number of tests is less than 10% of their populations is at least 26 times greater than the reported numbers.

ScreenIT
Jul 16, 2021
SciScore for 10.1101/2021.07.12.21260298: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
Software and Algorithms
Sentences Resources
The multiple linear regression model was built using Scikit-learn library16.
Scikit-learn
suggested: (scikit-learn, RRID:SCR_002577)
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any …
SciScore for 10.1101/2021.07.12.21260298: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
Software and Algorithms
Sentences Resources
The multiple linear regression model was built using Scikit-learn library16.
Scikit-learn
suggested: (scikit-learn, RRID:SCR_002577)
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2021.07.12.21260298 on medRxiv
Jul 14, 2021

Software and Algorithms
Sentences	Resources
The multiple linear regression model was built using Scikit-learn library16.	Scikit-learn suggested: (scikit-learn, RRID:SCR_002577)

Software and Algorithms
Sentences	Resources
The multiple linear regression model was built using Scikit-learn library16.	Scikit-learn suggested: (scikit-learn, RRID:SCR_002577)

Machine Learning Analysis of COVID19 Transmission Dynamics Demographic Risk and Contact Tracing Outcomes in Nigeria

This article has 7 authors:
1. Bolanle Adefowoke Ojokoh
2. Oluwafemi A. Sarumi
3. Sadura Priscilla Akinrinwa
4. Abimbola H. Afolayan
5. Tobore V. Igbe
6. Abiola Ezekiel Taiwo
7. Uchechukwu M. Chukwuocha
This article has no evaluationsLatest version Dec 12, 2025
Predicting COVID-19 case counts using SARS-CoV-2 genetic diversity from wastewater

This article has 21 authors:
1. Sana Naderi
2. Steven G. Sutcliffe
3. Gavin M. Douglas
4. Sukriye Celikkol Aydin
5. Inès Levade
6. Judith Fafard
7. Lila Naouelle Salhi
8. Fernando Sanchez-Quete
9. Sarah Reiling
10. Ju-ling Liu
11. Marc-Denis Rioux
12. David Dreifuss
13. Ivan Topolsky
14. Niko Beerenwinkel
15. Selena M. Sagan
16. Stephanie K. Loeb
17. Peter Vanrolleghem
18. Sarah Dorner
19. Dominic Frigon
20. Jiannis Ragoussis
21. B. Jesse Shapiro
This article has no evaluationsLatest version Nov 10, 2025
Acute respiratory infections risk prediction using machine learning among Ethiopian children Aged 6 Months to 2 Years

This article has 3 authors:
1. Ewunate Assaye Kassaw
2. Biruk Beletew Abate
3. Ashenafi Kibret Sendekie
This article has no evaluationsLatest version Dec 9, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine Learning Analysis of COVID19 Transmission Dynamics Demographic Risk and Contact Tracing Outcomes in Nigeria

Predicting COVID-19 case counts using SARS-CoV-2 genetic diversity from wastewater

Acute respiratory infections risk prediction using machine learning among Ethiopian children Aged 6 Months to 2 Years