Identification of spatial variations in COVID-19 epidemiological data using K-Means clustering algorithm: a global perspective

Viswa Chaitanya Chandu

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Background

Discerning spatial variations of COVID-19 through quantitative analysis operating on the geographically designated datasets relating to socio-demographics and epidemiological data facilitate strategy planning in curtailing the transmission of the disease and focus on articulation of necessary interventions in an informed manner.

Methods

K-means clustering was employed on the available country-specific COVID-19 epidemiological data and the influential background characteristics. Country-specific case fatality rates and the average number of people tested positive for COVID-19 per every 10,000 population in each country were derived from the WHO COVID-19 situation report 107, and were used for clustering along with the background characteristics of proportion of country’s population aged >65 years and percentage GDP spent as public health expenditure.

Results

The algorithm grouped the 89 countries into cluster ‘1’ and Cluster ‘2’ of sizes 54 and 35, respectively. It is apparent that Americas, European countries, and Australia formed a major part of cluster ‘2’ with high COVID-19 case fatality rate, higher proportion of country’s population tested COVID-19 positive, higher percentage of GDP spent as public health expenditure, and greater percentage of population being more than 65 years of age.

Conclusion

In spite of the positive correlation between high public health expenditure (%GDP) and COVID-19 incidence, case fatality rate, the immediate task ahead of most of the low and middle income countries is to strengthen their public health systems realizing that the correlation found in this study could be spurious in light of the underreported number of cases and poor death registration.

SciScore for 10.1101/2020.06.03.20121194: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	K-means clustering with K=2 was performed, and it took 4 iterations for the data points to stably cluster into 2 groups with the initial randomly selected centroids moved to the true centroids of the clusters.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
4 After selecting the aforementioned attributes to be used in the partitioning algorithm, K-means clustering analysis was performed using SPSS version 20 software (IBM SPSS statistics for windows version 20, Armonk, NY, USA).	SPSS suggested: (SPSS, RRID:SCR_002865)

Results from OddPub: …

SciScore for 10.1101/2020.06.03.20121194: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	K-means clustering with K=2 was performed, and it took 4 iterations for the data points to stably cluster into 2 groups with the initial randomly selected centroids moved to the true centroids of the clusters.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
4 After selecting the aforementioned attributes to be used in the partitioning algorithm, K-means clustering analysis was performed using SPSS version 20 software (IBM SPSS statistics for windows version 20, Armonk, NY, USA).	SPSS suggested: (SPSS, RRID:SCR_002865)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2020.06.03.20121194 on medRxiv
Jun 5, 2020

Municipality-Level Spatial Clustering and Socio- Environmental Determinants of Tuberculosis in Nepal, 2019- 2024

This article has 3 authors:
1. Nabin Bisht
2. Chiranjivi Adhikari
3. Harikishor Yadav
This article has no evaluationsLatest version Jan 27, 2026
Climate, Spatial Clustering and Hotspots of Non-Communicable Disease Mortality in Sub-Saharan Africa: A Bayesian Spatial Epidemiology Study, 2000–2019

This article has 2 authors:
1. Tsikai Solomon Chinembiri¹
2. Godfrey Pachavo
This article has no evaluationsLatest version Dec 18, 2025
Social Determinants and Outbreak Dynamics of the 2025 Measles Epidemic in Mexico: A Nationwide Analysis of Linked Surveillance Data

This article has 9 authors:
1. Judith Carolina De Arcos-Jiménez
2. Pedro Martínez-Ayala
3. Oscar Francisco Fernández-Diaz
4. Sergio Sánchez-Enríquez
5. Patricia Noemi Vargas-Becerra
6. Ana María López-Yáñez
7. Roberto Miguel Damian-Negrete
8. Sofía Gutierrez-Perez
9. Jaime Briseno-Ramírez
This article has no evaluationsLatest version Jan 14, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Article activity feed

Related articles

Municipality-Level Spatial Clustering and Socio- Environmental Determinants of Tuberculosis in Nepal, 2019- 2024

Climate, Spatial Clustering and Hotspots of Non-Communicable Disease Mortality in Sub-Saharan Africa: A Bayesian Spatial Epidemiology Study, 2000–2019

Social Determinants and Outbreak Dynamics of the 2025 Measles Epidemic in Mexico: A Nationwide Analysis of Linked Surveillance Data