Spatial Prediction of COVID-19 Pandemic Dynamics in the United States
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
The impact of COVID-19 across the United States (US) has been heterogeneous, with rapid spread and greater mortality in some areas compared with others. We used geographically-linked data to test the hypothesis that the risk for COVID-19 was defined by location and sought to define which demographic features were most closely associated with elevated COVID-19 spread and mortality. We leveraged geographically-restricted social, economic, political, and demographic information from US counties to develop a computational framework using structured Gaussian process to predict county-level case and death counts during the pandemic’s initial and nationwide phases. After identifying the most predictive information sources by location, we applied an unsupervised clustering algorithm and topic modeling to identify groups of features most closely associated with COVID-19 spread. Our model successfully predicted COVID-19 case counts of unseen locations after examining case counts and demographic information of neighboring locations, with overall Pearson’s correlation coefficient and the proportion of variance explained as 0.96 and 0.84 during the initial phase and 0.95 and 0.87 during the nationwide phase, respectively. Aside from population metrics, presidential vote margin was the most consistently selected spatial feature in our COVID-19 prediction models. Urbanicity and 2020 presidential vote margins were more predictive than other demographic features. Models trained using death counts showed similar performance metrics. Topic modeling showed that counties with similar socioeconomic and demographic features tended to group together, and some of these feature sets were associated with COVID-19 dynamics. Clustering of counties based on these feature groups found by topic modeling revealed groups of counties that experienced markedly different COVID-19 spread. We conclude that topic modeling can be used to group similar features and identify counties with similar features in epidemiologic research.
Article activity feed
-
-
SciScore for 10.1101/2022.03.27.22271628: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Boundary shapefile of counties downloaded from TIGER/Line database (https://www.census.gov). https://www.census.govsuggested: (U.S. Census Bureau, RRID:SCR_011587)Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:The development and implementation of spatially-informed prediction models suffer from several limitations. Our models did not include mitigation measures or vaccine coverage, due in part to inconsistencies in implementation and data availability. The end date for the …
SciScore for 10.1101/2022.03.27.22271628: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Boundary shapefile of counties downloaded from TIGER/Line database (https://www.census.gov). https://www.census.govsuggested: (U.S. Census Bureau, RRID:SCR_011587)Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:The development and implementation of spatially-informed prediction models suffer from several limitations. Our models did not include mitigation measures or vaccine coverage, due in part to inconsistencies in implementation and data availability. The end date for the nationwide phase analysis, March 31, was before vaccine availability was opened to the general public in most states, but differences in vaccine uptake to that point represent a potential confounder. Early case numbers were heavily influenced by low test availability, leading to significant missing data. However, our analyses found similar features predicted case dynamics throughout the pandemic, suggesting that the effect of this missing data may be minimal. Finally, TMand Louvain clustering generate highly overlapping feature sets that may be specific to the breadth of data included. Thus, while spatial analysis provides a powerful predictive tool, the precise effect of each feature or set of features is likely to be context-specific. In conclusion, we show that spatial features account for the majority of variation in COVID-19 case and death dynamics across the US. Predictive modeling based on combinations of spatial features can identify counties at greatest risk for COVID-19 spread and can be used to direct aggressive mitigation strategies and limited resource pools to these areas. Finally, we show that topic modeling provides a new approach to dimensional reduction in epidemiologic data and may be of value...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-