Social Media Polls on Twitter and Mastodon: Rapid Data Collection for Public Health
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
After COVID-19 was declared a pandemic by the WHO in March 2020, global responses relied heavily on non-pharmaceutical interventions such as physical distancing and mask mandates. These measures were guided by mathematical models built on empirical data. Although traditional methods such as surveys and observational studies provide high-quality data, they are often slow and resource-intensive. Social media polls (SMPs) offer a faster, more cost-effective alternative. This study evaluates the feasibility of SMPs as a rapid supplementary tool for collecting epidemiological data and compares their representativeness and quality with conventional approaches.
Methods
In this cross-sectional observational study in Germany, we utilized SMPs to collect data on infections and demographic attributes via Twitter and Mastodon. To assess data quality, SMP results were compared with conventional data sources, including the Multilocal and Serial Prevalence Study of Antibodies Against Respiratory Infectious Diseases (MuS-PAD), COVID-19 Snapshot Monitoring (COSMO) survey, official Robert-Koch-Institute reports, and German Federal Statistical Office demographics. The timeframe covered was from 2019 to 2024. Data were analyzed for infection rates, sociodemographic representativeness, and overall data quality, employing descriptive statistics.
Findings
SMPs demonstrated feasibility as a rapid data collection tool. Self-reported infection frequency aligned closely with conventional sources such as MuSPAD, with similar proportions of respondents reporting zero, one, or multiple infections. However, demographic analyses revealed biases: individuals aged 40–59 and those with higher education were over-represented, while one-person households were underrepresented. We used bootstrapping to address these issues, indicating that the effect of sampling bias on overall infection numbers was low. By design, SMPs do not provide detailed demographic data, limiting options for subgroup analyses.
Interpretation
We found SMPs to be a practical and cost-effective method for quickly gathering epidemiological insights. In particular, self-reported infection frequency can aid during a period of high availability of self-testing during epidemics. One can argue that SMPs alone are insufficient for comprehensive public health modeling, as they do not allow real-time monitoring of, e.g., serological indicator-based population-based infection frequency estimates. However, they complement traditional methods by providing near-real-time, cost-effective data to guide interventions, inform policymaking, and refine epidemiological models. Further refinement and integration with established approaches could enhance their utility for public health decision-making.
Funding
This study was conducted within the infoXpand project, which is funded by the German Federal Ministry of Education and Research (Funding Code: 031L0300A, 031L0300C, 031L0300D). This work was additionally supported by The Helmholtz Association, the European Union’s Horizon 2020 research and innovation program (Grant Number 101003480), and intramural funds of the Helmholtz Center for Infection Research.