Real time scalable data acquisition of COVID-19 in six continents through PySpark - a big data tool

This article has been Reviewed by the following groups

Read the full article

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was declared as a global emergency in January 2020 due to its pandemic outbreak. To examine this Coronavirus disease 2019 (COVID-19) effects various data are being generated through different platforms. This study was focused on the clinical data of COVID-19 which relied on python programming. Here, we proposed a machine learning approach to provide a insights into the COVID-19 information. PySpark is a machine learning approach which also known as Apache spark an accurate tool for the searching of results with minimum time intervals as compare to Hadoop and other tools. World Health Organization (WHO) started gathering corona patients’ data from last week of the February 2020. On March 11, 2020, the WHO declared COVID-19 a global pandemic. The cases became more evident and common after mid-March. This paper used the live owid (our world in data) dataset and will analyse and find out the following details on the live COVID-19 dataset. (1) The daily Corona virus scenario on various continents using PySpark in microseconds of Processor time. (2) After the various antibodies have been implemented, how they impact new cases on a regular basis utilizing various graphs. (3) Tabular representation of COVID-19 new cases in all the continents.

Article activity feed

  1. SciScore for 10.1101/2021.07.04.21259983: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Ethicsnot detected.
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Python’s prominence is increasing, widely in the all areas like Bio informatics and data analysis as well as in programming too (Latif et al. 2020).
    Python’s
    suggested: (PyMVPA, RRID:SCR_006099)
    Various Python big data research libraries, such as pandas, matplotlib, NumPy, and seaborn, are also used in this analysis (Figure-3).
    matplotlib
    suggested: (MatPlotLib, RRID:SCR_008624)
    NumPy
    suggested: (NumPy, RRID:SCR_008633)
    Python libraries are commonly used to visualize data, and visualization tool is becoming simpler with the use of these free libraries.
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The limitation of this study in our analysis we did not consider data from the seventh continent Antarctica as there is not any permeant resident staying although there is at least 36 people confirmed with COVID infection (https://www.barrons.com/; https://www.coolantarctica.com/). Asia: Asia has a total of 23225625.0 COVID-19 positive cases and 22836076.0 recovered cases as of May 15th, 2021. According to this, the mortality rate will be 1.6359 percent until May 15th, 2021. From February 24, 2020 to May 15th, 2021, the daily recover cases and daily confirm cases are depicted in Figure-4 (B) (scatter graph). On September 16, 2020, the highest number of new cases were registered in India, with 97894.0 (new cases) out of 1136613.0 (new tests). The COVID-19 cases peaked in September 2020, according to Figure-5 (B) (histogram), and then started to decline. Total tests performed in Asia were 373625749.0, with 23225625.0 confirmed cases. As a result of this, 6.2163 percent of the cases tested positive. Africa: Africa has total COVID-19 positive cases till May 15th 2021 are 4027046.0 and recover are 3912657.0. On basis of that the death ratio is 2.6701 %. Figure-4 (E) is showing the daily recover cases and positive cases. Figure-5 (E) is showing the positive cases going from 25th February 2020 to May 15th 2021. Although Africa has 17.2% of the world’s population, it only has about 5% of COVID-19 cases diagnosed and 3% of COVID-19-related mortality (Bamgboye et al. 2020). Europe: Eur...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.