Covid-19 Pandemic Data Analysis and Forecasting using Machine Learning Algorithms

Sohini Sengupta
Sareeta Mugde
Garima Sharma

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (ScreenIT)

Abstract

India reported its first Covid-19 case on 30th Jan 2020 and the number of cases reported heavily escalated from March, 2020. This research paper analyses COVID -19 data initially at a global level and then drills down to the scenario obtained in India. Data is gathered from multiple data sources-several authentic government websites. The need of the hour is to accurately forecast when the numbers will reach at its peak and then diminish. It will be of huge help to public welfare professionals to plan the preventive measures to be taken keeping the economic balance of the country as well. Variables such as gender, geographical location, age etc. have been represented using Python and Data Visualization techniques. Time Series Forecasting techniques including Machine Learning models like Linear Regression, Support Vector Regression, Polynomial Regression and Deep Learning Forecasting Model like LSTM(Long short-term memory) are deployed to study the probable hike in cases and in the near future. A comparative analysis is also done to understand which model fits the best for our data. Data is considered till 30 ^th July, 2020. The results show that a statistical model named sigmoid model is outperforming other models. Also the Sigmoid model is giving an estimate of the day on which we can expect the number of active cases to reach its peak and also when the curve will start to flatten. Strength of Sigmoid model lies in providing a count of date that no other model offers and thus it is the best model to predict Covid cases counts –this is unique feature of analysis in this paper. Certain feature engineering techniques have been used to transfer data into logarithmic scale as is affords better comparison removing any data extremities or outliers. Based on the predictions of the short-term interval, our model can be tuned to forecast long time intervals.

ScreenIT
Mar 1, 2021
SciScore for 10.1101/2020.06.25.20140004: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected.
Randomization not detected.
Blinding not detected.
Power Analysis not detected.
Sex as a biological variable not detected.
Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar …
SciScore for 10.1101/2020.06.25.20140004: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected.
Randomization not detected.
Blinding not detected.
Power Analysis not detected.
Sex as a biological variable not detected.
Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a protocol registration statement.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2020.06.25.20140004v2 on medRxiv
Aug 12, 2020

SciScore for 10.1101/2020.06.25.20140004: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms

Sentences Resources

The visualizations are created in Python where using the matplotlib , seaborn , plotly libraries and also datetime library for time series data analysis .

Python

suggested: (IPython, SCR_001658)

      <div style="margin-bottom:8px">
        <div><b>matplotlib</b></div>
        <div>suggested: (MatPlotLib, <a href="https://scicrunch.org/resources/Any/search?q=SCR_008624">SCR_008624</a>)</div>
      </div>
    </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">CLUSTER Mortality Rate …

SciScore for 10.1101/2020.06.25.20140004: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms

Sentences Resources

The visualizations are created in Python where using the matplotlib , seaborn , plotly libraries and also datetime library for time series data analysis .

Python

suggested: (IPython, SCR_001658)

      <div style="margin-bottom:8px">
        <div><b>matplotlib</b></div>
        <div>suggested: (MatPlotLib, <a href="https://scicrunch.org/resources/Any/search?q=SCR_008624">SCR_008624</a>)</div>
      </div>
    </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">CLUSTER Mortality Rate Recovery Rate 0 High Low 1 Low High 2 Medium Medium We can see countries belonging to cluster 1 are at a comparatively safer zone with low mortality rate and high recovery rate.</td><td style="min-width:100px;border-bottom:1px solid lightgray">
      <div style="margin-bottom:8px">
        <div><b>CLUSTER</b></div>
        <div>suggested: (Cluster, <a href="https://scicrunch.org/resources/Any/search?q=SCR_013505">SCR_013505</a>)</div>
      </div>
    </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">arXiv preprint arXiv:2004.03147 , 2020 10 ) Anuradha Tomar , Neeraj Gupta .</td><td style="min-width:100px;border-bottom:1px solid lightgray">
      <div style="margin-bottom:8px">
        <div><b>arXiv</b></div>
        <div>suggested: (arXiv, <a href="https://scicrunch.org/resources/Any/search?q=SCR_006500">SCR_006500</a>)</div>
      </div>
    </td></tr></table>

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from OddPub: We did not find a statement about open data. We also did not find a statement about open code. Researchers are encouraged to share open data when possible (see Nature blog).

About SciScore

SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.

Read the original source

Version published to 10.1101/2020.06.25.20140004v1 on medRxiv
Jun 26, 2020

Machine Learning and Probabilistic Approaches for Forecasting COVID-19 Transmission and Cases

This article has 7 authors:
1. Md Sakhawat Hossain
2. Ravi Goyal
3. Natasha K Martin
4. Victor DeGruttola
5. Tanvir Ahammed
6. Christopher McMahan
7. Lior Rennert
This article has no evaluationsLatest version Jun 24, 2025
Granular Insights:A Wastewater-Based Machine Learning Approach for Localized COVID-19 Hospitalization Forecasting

This article has 9 authors:
1. Nusrat Tabassum
2. Mohammad Mihrab Chowdhury
3. Christopher S McMahan
4. Stella Self
5. Mirza Isanovic
6. Karlen Correa-Velez
7. Sarah C. Sellers
8. R. Sean Norman
9. Lior Rennert
This article has no evaluationsLatest version Jun 26, 2025
Enhancing Pandemic Prediction: A Deep Learning Approach Using Transformer Neural Networks and Multi-Source Data Fusion for Infectious Disease Forecasting

This article has 5 authors:
1. Jiande Wu
2. Shakhawat Tanim
3. MinJae Woo
4. Tanvir Ahammed
5. Lior Rennert
This article has no evaluationsLatest version Jun 24, 2025

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Machine Learning and Probabilistic Approaches for Forecasting COVID-19 Transmission and Cases

Granular Insights:A Wastewater-Based Machine Learning Approach for Localized COVID-19 Hospitalization Forecasting

Enhancing Pandemic Prediction: A Deep Learning Approach Using Transformer Neural Networks and Multi-Source Data Fusion for Infectious Disease Forecasting