Covid-19 Pandemic Data Analysis and Forecasting using Machine Learning Algorithms

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

India reported its first Covid-19 case on 30th Jan 2020 and the number of cases reported heavily escalated from March, 2020. This research paper analyses COVID -19 data initially at a global level and then drills down to the scenario obtained in India. Data is gathered from multiple data sources-several authentic government websites. The need of the hour is to accurately forecast when the numbers will reach at its peak and then diminish. It will be of huge help to public welfare professionals to plan the preventive measures to be taken keeping the economic balance of the country as well. Variables such as gender, geographical location, age etc. have been represented using Python and Data Visualization techniques. Time Series Forecasting techniques including Machine Learning models like Linear Regression, Support Vector Regression, Polynomial Regression and Deep Learning Forecasting Model like LSTM(Long short-term memory) are deployed to study the probable hike in cases and in the near future. A comparative analysis is also done to understand which model fits the best for our data. Data is considered till 30 th July, 2020. The results show that a statistical model named sigmoid model is outperforming other models. Also the Sigmoid model is giving an estimate of the day on which we can expect the number of active cases to reach its peak and also when the curve will start to flatten. Strength of Sigmoid model lies in providing a count of date that no other model offers and thus it is the best model to predict Covid cases counts –this is unique feature of analysis in this paper. Certain feature engineering techniques have been used to transfer data into logarithmic scale as is affords better comparison removing any data extremities or outliers. Based on the predictions of the short-term interval, our model can be tuned to forecast long time intervals.

Article activity feed

  1. SciScore for 10.1101/2020.06.25.20140004: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.06.25.20140004: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The visualizations are created in Python where using the matplotlib , seaborn , plotly libraries and also datetime library for time series data analysis .
    Python
    suggested: (IPython, SCR_001658)
          <div style="margin-bottom:8px">
            <div><b>matplotlib</b></div>
            <div>suggested: (MatPlotLib, <a href="https://scicrunch.org/resources/Any/search?q=SCR_008624">SCR_008624</a>)</div>
          </div>
        </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">CLUSTER Mortality Rate Recovery Rate 0 High Low 1 Low High 2 Medium Medium We can see countries belonging to cluster 1 are at a comparatively safer zone with low mortality rate and high recovery rate.</td><td style="min-width:100px;border-bottom:1px solid lightgray">
          <div style="margin-bottom:8px">
            <div><b>CLUSTER</b></div>
            <div>suggested: (Cluster, <a href="https://scicrunch.org/resources/Any/search?q=SCR_013505">SCR_013505</a>)</div>
          </div>
        </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">arXiv preprint arXiv:2004.03147 , 2020 10 ) Anuradha Tomar , Neeraj Gupta .</td><td style="min-width:100px;border-bottom:1px solid lightgray">
          <div style="margin-bottom:8px">
            <div><b>arXiv</b></div>
            <div>suggested: (arXiv, <a href="https://scicrunch.org/resources/Any/search?q=SCR_006500">SCR_006500</a>)</div>
          </div>
        </td></tr></table>
    

    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.


    Results from OddPub: We did not find a statement about open data. We also did not find a statement about open code. Researchers are encouraged to share open data when possible (see Nature blog).


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.