Disentangling the rhythms of human activity in the built environment for airborne transmission risk: An analysis of large-scale mobility data

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This is a valuable study characterizing seasonal deviations in indoor activity at the county level in the United States with relevance to respiratory disease transmission. Whereas the data are compelling, some of the main claims are only partially supported and need more work. This study and its results are of potential interest to those people constructing more evidence-based infectious disease transmission models.

This article has been Reviewed by the following groups

Read the full article

Abstract

Since the outset of the COVID-19 pandemic, substantial public attention has focused on the role of seasonality in impacting transmission. Misconceptions have relied on seasonal mediation of respiratory diseases driven solely by environmental variables. However, seasonality is expected to be driven by host social behavior, particularly in highly susceptible populations. A key gap in understanding the role of social behavior in respiratory disease seasonality is our incomplete understanding of the seasonality of indoor human activity.

Methods:

We leverage a novel data stream on human mobility to characterize activity in indoor versus outdoor environments in the United States. We use an observational mobile app-based location dataset encompassing over 5 million locations nationally. We classify locations as primarily indoor (e.g. stores, offices) or outdoor (e.g. playgrounds, farmers markets), disentangling location-specific visits into indoor and outdoor, to arrive at a fine-scale measure of indoor to outdoor human activity across time and space.

Results:

We find the proportion of indoor to outdoor activity during a baseline year is seasonal, peaking in winter months. The measure displays a latitudinal gradient with stronger seasonality at northern latitudes and an additional summer peak in southern latitudes. We statistically fit this baseline indoor-outdoor activity measure to inform the incorporation of this complex empirical pattern into infectious disease dynamic models. However, we find that the disruption of the COVID-19 pandemic caused these patterns to shift significantly from baseline and the empirical patterns are necessary to predict spatiotemporal heterogeneity in disease dynamics.

Conclusions:

Our work empirically characterizes, for the first time, the seasonality of human social behavior at a large scale with a high spatiotemporal resolutio and provides a parsimonious parameterization of seasonal behavior that can be included in infectious disease dynamics models. We provide critical evidence and methods necessary to inform the public health of seasonal and pandemic respiratory pathogens and improve our understanding of the relationship between the physical environment and infection risk in the context of global change.

Funding:

Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R01GM123007.

Article activity feed

  1. Author Response

    Reviewer #2 (Public Review):

    Susswein et al. analyze a fine-scale, novel data stream of human mobility, openly available from Safegraph, based on the usage of mobile apps with GPS and sampled from over 45 million smartphone devices. They define a metric $\sigma_{it}$, properly normalized, that quantifies the propensity for visits to indoor locations relative to outdoor locations in a given county $i$ at week $t$. For each pair of counties $i$ and $j$, they compute the Pearson correlation coefficient $\rho_{ij}$ between the corresponding $\sigma$ metrics. This generates a correlation matrix that can be interpreted as the adjacency matrix of a network. They then perform community detection on this network/matrix, effectively clustering together time series that are correlated. This identifies three main clusters of counties, characterized geographically as either in the north of the country, in the south of the country, and possibly in tourism active areas. They then show, via a simple model, how including over-simplified models of seasonality may affect infectious disease models.

    This work is very interesting for the infectious disease modeling community, as it addresses a complex problem introducing a new data stream.

    This work builds on several strengths, among which:

    It is the first analysis of the Safegraph dataset to capture seasonality in indoor behavior.

    It provides a simple metric to quantify indoor activity, that thanks to the dataset can be computed with a high level of spatial detail.

    It aims at characterizing clusters of counties with a similar pattern of indoor activity.

    It aims at quantifying the impact of neglecting finer-scale patterns of seasonality, for example considering seasonality to be homogeneous at the US level.

    We thank the reviewer for the positive review of our work.

    At the same time, it presents several weaknesses that should be addressed to improve the methodology, its results, and the implication:

    There is no quantitative comparison of the newly introduced metric for indoor activity with other proxies of seasonality (e.g. temperature or relative humidity). The (dis)similarity with other proxies may help in assessing the importance of this metric, showing why it can not be exchanged with other data sources (like temperature data) that are widely available and are not affected by sampling issues (more on that later).

    We have now added supplementary figures (Figure S3) to illustrate how indoor activity seasonality compares with temperature and humidity. We have also added text to the Results and the Discussion to discuss this point.

    A major flow of the analysis is to perform community detection on a network defined by the correlation between time series with an algorithm that is based on modularity optimization. As explained in Macmahon et al.[1], all modularity optimization methods rely on null assumptions that in the case of correlation between time series are violated. Therefore, there is a very strong potential bias in their results that is not accounted for. Possible solutions could be to proceed via the methodology presented in [1] or via a different type of algorithm (e.g. Infomap [2]). In both cases, as the network is thresholded (considering only a correlation larger than 0.9), a more quantitative assessment of the impact of the threshold value should be included.

    References

    [1] Mel MacMahon and Diego Garlaschelli Phys. Rev. X 5, 021006 (2015).

    [2] Martin Rosvall and Carl T. Bergstrom PNAS 105, 1118 (2008).

    We thank the reviewer for making this excellent point. We have now added Supplementary Figures S13 and S14. In Figure S13, we demonstrate the robustness of our clustering results with different correlation thresholds. (We have also corrected a typo in our original Methods section which mistakenly stated our correlation threshold as 0.9 rather than the 90th percentile which is what we used.) In Figure S14, we show the clustering results using a different clustering algorithm. In an effort to test a non-network-based clustering approach, we use a hierarchical clustering approach and find a consistent partition of the US to our main results.

    It is not clear what is the added value of the data on indoor activity, as no fitting to real data is performed. Although this may be considered beyond the scope of this paper, I think it would be crucial to quantify how much a data-informed model would better describe real epidemic data (for example in the case of COVID-19). For now, only the impact of neglecting heterogeneity in indoor activity is shown, comparing a model with region-average parameters vs a model with county-level average parameters. Given that the dataset comes with potential bias in sampling (more on this later) it would be good to assess its goodness in predicting real epidemic spread. When showing results from different models, no visible errors are shown on the plot. How have the errors been estimated?

    We appreciate this point by the reviewer, and agree that future work will have to consider how indoor activity seasonality affects our ability to capture observed transmission trends. However, such work would additionally need careful characterization of other seasonal factors hypothesized to drive transmission (including environmental and other behavioral factors), and is beyond the scope of our work. Instead, in Figure 4 we aim to (a) provide the infectious disease modeling community with empirically-inferred parameters for a simple sinusoidal model which is commonly used in infectious disease models to capture transmission seasonality; and (b) demonstrate the implications of ignoring geographic heterogeneity in transmission seasonality in theoretical models of disease dynamics, which are commonly used for scenario analysis and model-based intervention design. As we demonstrate, transmission seasonality described by such sinusoidal models, even when they are empirically characterized as in our case, can lead to meaningfully different epidemic dynamics when transmission seasonality varies from the assumptions.

    Additionally, there is no uncertainty included in Figure 4B because transmission seasonality is either based on empirical data point per time step, or on the fitted sinusoidal model (where the estimated parameters have negligible standard errors).

    The dataset is presented as representative of the US population. However, this has not been assessed over time. As adherence to social distancing is influenced by several socio-economic determinants the lack of representativity in certain strata of the population at a given time may introduce an important bias in the dataset. Although this is an inherent limitation of the dataset, it should be discussed in the paper more thoroughly.

    We agree with the reviewer that this is a limitation. However, we do not have any way of assessing demographic representation in the dataset over time. We have instead included an additional sentence into the Discussion section acknowledging this point.

    In conclusion, I think that the methodology should be revised to account for the fact that the analysis is performed on a correlation matrix. Capturing seasonal patterns of indoor activity can help in tackling the crucial problem of seasonality in human behavior. This could help in identifying effective strategies of disease containment able to curb disease spread at a lower societal cost than fully-fledged lockdowns.

    We thank the reviewer again for their helpful suggestions.

  2. eLife assessment

    This is a valuable study characterizing seasonal deviations in indoor activity at the county level in the United States with relevance to respiratory disease transmission. Whereas the data are compelling, some of the main claims are only partially supported and need more work. This study and its results are of potential interest to those people constructing more evidence-based infectious disease transmission models.

  3. Reviewer #1 (Public Review):

    In this article, Susswein and colleagues use SafeGraph mobile device location data to characterize seasonal trends in indoor activity in the United States at the county level with relevance for respiratory disease transmission. They find substantial variation in indoor activity over the course of the year, ranging from roughly 25% (summer trough) to 200% (winter peak) of the average/baseline indoor activity in each county. Additionally, they identify two main regions with distinct seasonal trends in indoor activity: one in the north, where indoor activity follows a roughly standard sinusoidal trend, and one in the south where indoor activity may feature an additional summer peak. They also identify a third minor region with spring and fall peaks in indoor activity, corresponding to mountainous areas that are hubs for winter tourism. Using a simple mathematical disease transmission model, they demonstrate that using different seasonal forcing terms as inputs can yield substantially different epidemic curves.

    This study's main strength is the volume and resolution of the data. Because of this, the authors are able to provide convincing evidence that seasonal variation in indoor activity exists, that it is substantial, and that it varies geographically across the US. Another important strength is the approach that the authors used to identify regions with similar seasonal trends in indoor activity. By using a network community detection algorithm, they were able to avoid making a priori assumptions about the number, size, and geographic connectivity of the regions, allowing them to make better use of the data itself to inform the delineation between regions.

    Despite the volume of the underlying dataset, it is geographically limited to the United States and only captures the locations of mobile devices for which their users have opted in to sharing location data. This calls into question the generalizability of the findings to other countries and to other populations within the US that may have reduced access to mobile devices or may be less likely to share location data. The assessment of between-county differences in seasonal indoor activity trends and the assessment of the impact of the COVID-19 pandemic on indoor activity could benefit from greater detail, as they currently rest mainly on visual inspection of the trends.

    Overall, the authors have largely achieved their aims of characterizing indoor seasonal activity in the United States at a fine geographic resolution. This work will be immediately useful for the construction of more evidence-based infectious disease transmission models. The authors have made available their estimates of seasonal deviation in indoor activity at the county level, which can be incorporated directly into disease transmission models. Their descriptions are also sufficient for building models that do not incorporate the full county-level detail but nevertheless account for important regional differences in indoor activity across the US.

  4. Reviewer #2 (Public Review):

    Susswein et al. analyze a fine-scale, novel data stream of human mobility, openly available from Safegraph, based on the usage of mobile apps with GPS and sampled from over 45 million smartphone devices. They define a metric $\sigma_{it}$, properly normalized, that quantifies the propensity for visits to indoor locations relative to outdoor locations in a given county $i$ at week $t$. For each pair of counties $i$ and $j$, they compute the Pearson correlation coefficient $\rho_{ij}$ between the corresponding $\sigma$ metrics. This generates a correlation matrix that can be interpreted as the adjacency matrix of a network. They then perform community detection on this network/matrix, effectively clustering together time series that are correlated. This identifies three main clusters of counties, characterized geographically as either in the north of the country, in the south of the country, and possibly in tourism active areas. They then show, via a simple model, how including over-simplified models of seasonality may affect infectious disease models.

    This work is very interesting for the infectious disease modeling community, as it addresses a complex problem introducing a new data stream.

    This work builds on several strengths, among which:
    It is the first analysis of the Safegraph dataset to capture seasonality in indoor behavior.
    It provides a simple metric to quantify indoor activity, that thanks to the dataset can be computed with a high level of spatial detail.
    It aims at characterizing clusters of counties with a similar pattern of indoor activity.
    It aims at quantifying the impact of neglecting finer-scale patterns of seasonality, for example considering seasonality to be homogeneous at the US level.

    At the same time, it presents several weaknesses that should be addressed to improve the methodology, its results, and the implication:
    There is no quantitative comparison of the newly introduced metric for indoor activity with other proxies of seasonality (e.g. temperature or relative humidity). The (dis)similarity with other proxies may help in assessing the importance of this metric, showing why it can not be exchanged with other data sources (like temperature data) that are widely available and are not affected by sampling issues (more on that later).
    A major flow of the analysis is to perform community detection on a network defined by the correlation between time series with an algorithm that is based on modularity optimization. As explained in Macmahon et al.[1], all modularity optimization methods rely on null assumptions that in the case of correlation between time series are violated. Therefore, there is a very strong potential bias in their results that is not accounted for. Possible solutions could be to proceed via the methodology presented in [1] or via a different type of algorithm (e.g. Infomap [2]). In both cases, as the network is thresholded (considering only a correlation larger than 0.9), a more quantitative assessment of the impact of the threshold value should be included.
    It is not clear what is the added value of the data on indoor activity, as no fitting to real data is performed. Although this may be considered beyond the scope of this paper, I think it would be crucial to quantify how much a data-informed model would better describe real epidemic data (for example in the case of COVID-19). For now, only the impact of neglecting heterogeneity in indoor activity is shown, comparing a model with region-average parameters vs a model with county-level average parameters. Given that the dataset comes with potential bias in sampling (more on this later) it would be good to assess its goodness in predicting real epidemic spread.
    When showing results from different models, no visible errors are shown on the plot. How have the errors been estimated?
    The dataset is presented as representative of the US population. However, this has not been assessed over time. As adherence to social distancing is influenced by several socio-economic determinants the lack of representativity in certain strata of the population at a given time may introduce an important bias in the dataset. Although this is an inherent limitation of the dataset, it should be discussed in the paper more thoroughly.

    In conclusion, I think that the methodology should be revised to account for the fact that the analysis is performed on a correlation matrix. Capturing seasonal patterns of indoor activity can help in tackling the crucial problem of seasonality in human behavior. This could help in identifying effective strategies of disease containment able to curb disease spread at a lower societal cost than fully-fledged lockdowns.

    References
    [1] Mel MacMahon and Diego Garlaschelli Phys. Rev. X 5, 021006 (2015).
    [2] Martin Rosvall and Carl T. Bergstrom PNAS 105, 1118 (2008).

  5. Reviewer #3 (Public Review):

    The authors used smartphone-based mobility data to assess indoor and outdoor activities. By doing so, they were able to show seasonality in the ratio between indoor and outdoor activities and to relate it to a certain extent to seasonality in infectious diseases. They were also able to show that data at the county level is necessary to achieve proper assessment of behavior and that the COVID pandemic considerably impacted behavior patterns.

    The major strength of the paper is the simplicity of the concept (proportion of indoor activities compared to outdoor activities), which makes it very straightforward to understand. Another strength is the considerable amount of data (5 million locations) that have been taken into account, and the comparison between the 3 years.

    There is nonetheless a limitation in the interpretation of the results, as the definition of indoor and outdoor is not always easy, and most importantly that home is not part of the considered locations. This is a limitation clearly exposed by the authors and their discussion reflects it.

    Authors have been able to demonstrate how human behavior could influence seasonality, among others factors, and is not strictly related to climate or weather conditions. Moreover, they used the results to show how COVID impacted behavior (whether because of the disease or non-pharmaceutical interventions), and how precise data are necessary to perform appropriate modelling.

    This is an important article, as it shows the potential influence of human behavior on infectious diseases seasonality, but also a very straightforward method that could be reproduced easily.

    Finally, it also confirms the necessity to take into account the seasonality of human behavior in future modelling, in order to provide relevant information to public health deciders.

  6. SciScore for 10.1101/2022.04.07.22273578: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    For community detection, we use the Louvain method [49], a multiscale method in which modularity is first optimized using a greedy local algorithm, on the similarity network with edge weights (i.e. time series correlations) using a igraph implementation in Python [50].
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.