Regional Characteristics of the Second Wave of SARS-CoV-2 Infections and COVID-19 Deaths in Germany
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
(1) Background: In the absence of individual level information, the aim of this study was to identify the regional key features explaining SARS-CoV-2 infections and COVID-19 deaths during the upswing of the second wave in Germany. (2) Methods: We used COVID-19 diagnoses and deaths from 1 October to 15 December 2020, on the county-level, differentiating five two-week time periods. For each period, we calculated the age-standardized COVID-19 incidence and death rates on the county level. We trained gradient boosting models to predict the incidence and death rates by 155 indicators and identified the top 20 associations using Shap values. (3) Results: Counties with low socioeconomic status (SES) had higher infection and death rates, as had those with high international migration, a high proportion of foreigners, and a large nursing home population. The importance of these characteristics changed over time. During the period of intense exponential increase in infections, the proportion of the population that voted for the Alternative for Germany (AfD) party in the last federal election was among the top characteristics correlated with high incidence and death rates. (4) Machine learning approaches can reveal regional characteristics that are associated with high rates of infection and mortality.
Article activity feed
-
-
-
SciScore for 10.1101/2021.04.14.21255474: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources All analyses were performed using Python 3.8.3. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:STUDY LIMITATIONS: Our study is hampered by a series of limitations. Reliance on the county level introduces the problem of the modifiable areal unit (Kirby et al., 2017). County-level data might be too course, but also too finely …
SciScore for 10.1101/2021.04.14.21255474: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources All analyses were performed using Python 3.8.3. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:STUDY LIMITATIONS: Our study is hampered by a series of limitations. Reliance on the county level introduces the problem of the modifiable areal unit (Kirby et al., 2017). County-level data might be too course, but also too finely graded, to detect important features driving the pandemic. To overcome the limitation that the macro variables are restricted to Germany, we included the age standardized incidence in neighbouring countries for counties with international borders. True infection rates are not known for SARS-CoV-19 due to asymptomatic individuals, regional approval criteria for testing that resulted in different testing rates and differences in reporting by local health departments to the RKI. In addition, these data report the time of diagnosis rather than the time of infection. There was also a strong weekday effect with lower reporting rates on weekends. Our 14-day period averages over these different lags to give an average picture of infections during this period. In addition, our models included information about infections in the previous period. Different machine learning algorithms identify different features and their importance. We obtained similar results regardless of the machine learning algorithm used (Random Forests (results available upon request) versus Cat Boosting algorithms, with the latter better reflecting the data. Nevertheless, it is important to keep in mind that the interpreted Shapley values explain the model and not the data.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-