Real time scalable data acquisition of COVID-19 in six continents through PySpark - a big data tool
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was declared as a global emergency in January 2020 due to its pandemic outbreak. To examine this Coronavirus disease 2019 (COVID-19) effects various data are being generated through different platforms. This study was focused on the clinical data of COVID-19 which relied on python programming. Here, we proposed a machine learning approach to provide a insights into the COVID-19 information. PySpark is a machine learning approach which also known as Apache spark an accurate tool for the searching of results with minimum time intervals as compare to Hadoop and other tools. World Health Organization (WHO) started gathering corona patients’ data from last week of the February 2020. On March 11, 2020, the WHO declared COVID-19 a global pandemic. The cases became more evident and common after mid-March. This paper used the live owid (our world in data) dataset and will analyse and find out the following details on the live COVID-19 dataset. (1) The daily Corona virus scenario on various continents using PySpark in microseconds of Processor time. (2) After the various antibodies have been implemented, how they impact new cases on a regular basis utilizing various graphs. (3) Tabular representation of COVID-19 new cases in all the continents.
Article activity feed
-
SciScore for 10.1101/2021.07.04.21259983: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources Python’s prominence is increasing, widely in the all areas like Bio informatics and data analysis as well as in programming too (Latif et al. 2020). Python’ssuggested: (PyMVPA, RRID:SCR_006099)Various Python big data research libraries, such as pandas, matplotlib, NumPy, and seaborn, are also used in this analysis (Figure-3). matplotlibsuggested: (MatPlotLib, RRID:SCR_008624)NumPysuggested: (NumPy, RRID:SCR_008633)Python libraries are commonly used to visualize data, and visualization tool is becoming … SciScore for 10.1101/2021.07.04.21259983: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources Python’s prominence is increasing, widely in the all areas like Bio informatics and data analysis as well as in programming too (Latif et al. 2020). Python’ssuggested: (PyMVPA, RRID:SCR_006099)Various Python big data research libraries, such as pandas, matplotlib, NumPy, and seaborn, are also used in this analysis (Figure-3). matplotlibsuggested: (MatPlotLib, RRID:SCR_008624)NumPysuggested: (NumPy, RRID:SCR_008633)Python libraries are commonly used to visualize data, and visualization tool is becoming simpler with the use of these free libraries. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:The limitation of this study in our analysis we did not consider data from the seventh continent Antarctica as there is not any permeant resident staying although there is at least 36 people confirmed with COVID infection (https://www.barrons.com/; https://www.coolantarctica.com/). Asia: Asia has a total of 23225625.0 COVID-19 positive cases and 22836076.0 recovered cases as of May 15th, 2021. According to this, the mortality rate will be 1.6359 percent until May 15th, 2021. From February 24, 2020 to May 15th, 2021, the daily recover cases and daily confirm cases are depicted in Figure-4 (B) (scatter graph). On September 16, 2020, the highest number of new cases were registered in India, with 97894.0 (new cases) out of 1136613.0 (new tests). The COVID-19 cases peaked in September 2020, according to Figure-5 (B) (histogram), and then started to decline. Total tests performed in Asia were 373625749.0, with 23225625.0 confirmed cases. As a result of this, 6.2163 percent of the cases tested positive. Africa: Africa has total COVID-19 positive cases till May 15th 2021 are 4027046.0 and recover are 3912657.0. On basis of that the death ratio is 2.6701 %. Figure-4 (E) is showing the daily recover cases and positive cases. Figure-5 (E) is showing the positive cases going from 25th February 2020 to May 15th 2021. Although Africa has 17.2% of the world’s population, it only has about 5% of COVID-19 cases diagnosed and 3% of COVID-19-related mortality (Bamgboye et al. 2020). Europe: Eur...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-