Social disparities in the first wave of COVID-19 incidence rates in Germany: a county-scale explainable machine learning approach
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Knowledge about the socioeconomic spread of the first wave of COVID-19 infections in Germany is scattered across different studies. We explored whether COVID-19 incidence rates differed between counties according to their socioeconomic characteristics using a wide range of indicators.
Data and method
We used data from the Robert Koch-Institute (RKI) on 204 217 COVID-19 diagnoses in the total German population of 83.1 million, distinguishing five distinct periods between 1 January and 23 July 2020. For each period, we calculated age-standardised incidence rates of COVID-19 diagnoses on the county level and characterised the counties by 166 macro variables. We trained gradient boosting models to predict the age-standardised incidence rates with the macrostructures of the counties and used SHapley Additive exPlanations (SHAP) values to characterise the 20 most prominent features in terms of negative/positive correlations with the outcome variable.
Results
The first COVID-19 wave started as a disease in wealthy rural counties in southern Germany and ventured into poorer urban and agricultural counties during the course of the first wave. High age-standardised incidence in low socioeconomic status (SES) counties became more pronounced from the second lockdown period onwards, when wealthy counties appeared to be better protected. Features related to economic and educational characteristics of the young population in a county played an important role at the beginning of the pandemic up to the second lockdown phase, as did features related to the population living in nursing homes; those related to international migration and a large proportion of foreigners living in a county became important in the postlockdown period.
Conclusion
High mobility of high SES groups may drive the pandemic at the beginning of waves, while mitigation measures and beliefs about the seriousness of the pandemic as well as the compliance with mitigation measures may put lower SES groups at higher risks later on.
Article activity feed
-
-
SciScore for 10.1101/2020.12.22.20248386: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Study Limitations: Our study is hampered by a series of limitations. Resorting to county level data does not only introduce the possibility of an ecological fallacy if results are interpreted on an individual rather than an …
SciScore for 10.1101/2020.12.22.20248386: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Study Limitations: Our study is hampered by a series of limitations. Resorting to county level data does not only introduce the possibility of an ecological fallacy if results are interpreted on an individual rather than an aggregate level, but also the problem of the modifiable areal unit (Kirby et al. 2017). County level data might be too course but also too finely graded to detect important features driving the pandemic. Furthermore, they are limited to Germany and do not reflect if or how infections are acquired locally or internationally, with the exception of the variable “distance to Ischgl”. True infection rates are unknown for COVID-19 because of asymptomatic individuals, regional eligibility criteria for testing leading to different testing rates, as well as differences in reporting of the local “Gesundheitsämter” to the RKI. To further complicate analyses, data from the RKI do not report the time of infection but rather of diagnosis, and by mid-April the date of the start of the illness was only known for 62% of the cases (an der Heiden und Hamouda 2020). Of these 50% were reported to the RKI within seven days, on 21 March it took 6.6 days, on 31 March it was 9.9, and in April it took 7.6 days. However, it has been shown that infected individuals are most contagious two to three days before symptoms start. In addition there was a strong weekday effect with lower numbers reported on weekends. Our 14-day time period averages over these various delays, yielding an ave...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
