Species clustering, climate effects, and introduced species in 5 million city trees across 63 US cities

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper will be of interest to urban foresters, ecologists, and planners. It provides an urban tree dataset across US cities that can be used to address questions on urban biodiversity and ecosystem services. It contains clear descriptions about the data processing and structures in general, but would need further clarifications about the sample completeness and representativeness of the data.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Sustainable cities depend on urban forests. City trees—pillars of urban forests—improve our health, clean the air, store CO 2 , and cool local temperatures. Comparatively less is known about city tree communities as ecosystems, particularly regarding spatial composition, species diversity, tree health, and the abundance of introduced species. Here, we assembled and standardized a new dataset of N = 5,660,237 trees from 63 of the largest US cities with detailed information on location, health, species, and whether a species is introduced or naturally occurring (i.e., “native”). We further designed new tools to analyze spatial clustering and the abundance of introduced species. We show that trees significantly cluster by species in 98% of cities, potentially increasing pest vulnerability (even in species-diverse cities). Further, introduced species significantly homogenize tree communities across cities, while naturally occurring trees (i.e., “native” trees) comprise 0.51–87.4% (median = 45.6%) of city tree populations. Introduced species are more common in drier cities, and climate also shapes tree species diversity across urban forests. Parks have greater tree species diversity than urban settings. Compared to past work which focused on canopy cover and species richness, we show the importance of analyzing spatial composition and introduced species in urban ecosystems (and we develop new tools and datasets to do so). Future work could analyze city trees alongside sociodemographic variables or bird, insect, and plant diversity (e.g., from citizen-science initiatives). With these tools, we may evaluate existing city trees in new, nuanced ways and design future plantings to maximize resistance to pests and climate change. We depend on city trees.

Article activity feed

  1. Author Response

    Reviewer #2 (Public Review):

    McCoy et al. has developed a new urban tree species database from existing city tree inventories. They designed procedures to collect and clean a large amount of data, i.e., more than five million trees from 63 US cities. They found that urban trees were significantly clustered by species in 93% of cities using the compiled data. They also showed that climate significantly shaped both nativity and tree diversity. Also, they identified the homogenization effect of the non-native species. The interest in patterns of urban biodiversity and its driving mechanism has been rising recently. This paper provides an important data source for addressing research questions on this topic. The finding presented by the authors exemplified its potential. Strengths Compared to the existing urban tree database, such as the one developed by Ossola et al.(Global Ecology and Biogeography 2020), the new database added information on spatial location, nativity statuses, and tree health conditions besides occurrences. The new information expands data usability and saves valuable time for researchers. The authors also make the tools available so others can use them to process their own data sets. Because of the added information, various analyses of the diversity pattern of urban trees and the potential driving mechanism could be conducted. The authors found that individual species nonrandomly clustered urban trees. This finding corroborates the existing knowledge that some common species dominate urban trees. Nevertheless, the authors showed that the dominance was apparent in the spatial dimension. The preliminary finding that the native status of a tree had no apparent impact on tree health is interesting. It can potentially contribute to the debate on native vs. exotic in urban tree species selection, which the author mentioned in the paper.

    Thank you for the feedback!

    Weakness

    While the new database and the analysis based on it has strengths, some aspects of the concepts and data analysis need to be clarified and extended.

    We appreciate these helpful comments and have made many changes in response, detailed below.

    First, the authors need to define several critical concepts used in the paper, including city trees, urban forests, biodiversity, and species diversity. The authors used city trees and urban forests interchangeably throughout the paper. Nevertheless, a widely accepted definition of the urban forest is:"All woody and associated vegetation in and around dense human settlements." Konijnendijk et al. had a good discussion on the terminology used in urban forestry (Urban Forestry & Urban Greening, 2006). Similarly, biodiversity is different from species diversity. Effective species number is a diversity indicator. Therefore, it is challenging to accept conclusions being drawn on biodiversity in urban forests without clear definitions.

    We appreciate these clarifications– we have clarified our terminology throughout and added these important definitions.

    • “...urban forests, which are the woody and associated vegetation in and around dense human settlements (Konijnendijk et al., 2006).”

    • “City tree communities, an essential component of urban forests, provide many services.”

    We replaced the term “biodiversity” throughout the text where really we meant to say “tree species diversity” or just “diversity.”

    Second, the tree inventories varied significantly regarding the number of records (214~720,140). The variation can be due to the actual variation of tree abundance in studied cities or incomplete inventories. Biases can be introduced into the findings when comparing these inventories without adjusting the unequal sample sizes. The authors did not detail how they dealt with this issue when conducting the analysis.

    We redid all of our relevant analyses and applied Chao’s rarefaction and extrapolation techniques throughout the manuscript. The (substantial) changes are fully described above in the “Essential Revisions” section. We also copy them here.

    First, we redid all of our diversity calculations applying Chao’s rarefaction and extrapolation techniques through the R package iNext. Therefore, our summary datasheet now has many new columns to include the following values for each city:

    ○ Effective species number:

    ■ Raw effective species number

    ■ Asymptotic estimate of effective species number with confidence interval

    ■ Estimate of effective species number for a given population size (37,000 trees– the median population size rounded to the nearest 1,000) with confidence interval

    ○ Species richness:

    ■ Raw species richness (number of species)

    ■ Asymptotic estimate of number of species with confidence interval

    ■ Estimate of number of species for a given population size (37,000 trees– the median population size rounded to the nearest 1,000) with confidence interval

    ○ The same for the native-only population of trees in each city (e.g., not just raw number of effective number of native species but also the iNext estimates and confidence intervals)

    ○ Whether or not each of the values above was calculated using extrapolation or interpolation

    ○ Sample coverage estimates

    Second, we re-ran our models testing for significant correlations between species diversity in a city and other factors (including climate), where we used the extrapolated / interpolated effective species numbers from iNext. Specifically, we found the best fit model, which included the following predictors: environmental PCA1, environmental PCA1:environmental PCA2, and whether or not a city was designated as a Tree City USA. Then, we ran this model under six sensitivity conditions, varying the independent variable and/or which cities we included based on completeness of their sample. Climate was still a significant correlate of diversity.

    ○ first, with independent variable = effective species as calculated for a given population of 37,000 trees ("effective species for a standardized population size");

    ○ second, independent variable = the asymptotic estimate of the effective species number for that city as calculated using iNext;

    ○ third, the raw effective species number;

    ○ fourth, excluding cities with fewer than 10,000 trees;

    ○ fifth, excluding cities with <50% spatial coverage;

    ○ sixth, excluding cities with <0.995 sample coverage as calculated by iNext.

    ○ For the fourth, fifth, and sixth models, the independent variable was effective species for a standardized population size of 37,000 trees.

    Third, we redid our comparisons of tree populations in parks versus those in urban areas. Parks were still more diverse than urban areas.

    ○ Specifically, we used iNext to calculate diversity metrics based on the smaller of the two population sizes (park vs urban) to enable fair comparison for each city.

    ○ We reported comparison results for (i) raw effective species number, (ii) asymptotic estimate, and (iii) estimate for a given population.

    ○ In doing so, we eliminated Milwaukee from the comparison (it had only 28 trees recorded as being in an urban setting).

    Fourth, we redid our pairwise comparisons of tree community composition between cities in order to account for different population sizes and sampling efforts. To do so, we randomly subsampled the larger city to make its population equal to the smaller city, calculated comparison metrics, and repeated this process 50 times. We report the average comparison metrics.

    Our new Methods text is copied here for your convenience:

    ○ “Throughout our analyses, it was necessary to control for different sample sizes (and different, but unknown, sampling efforts across cities). To do so, we relied on the rarefaction / extrapolation methods developed by Chao and colleagues (Chao et al., 2015, 2014; Chao & Jost, 2012) and implemented through the R software package iNext (Hsieh et al., 2016). In short, these methods use statistical rarefaction and/or extrapolation to generate comparable estimates of diversity across populations with different sampling efforts or population sizes, alongside confidence intervals for these diversity estimates. iNext performs these tasks for Hill numbers of orders q = 0, 1, and 2. We used two techniques in iNext to allow for comparisons across cities (and between parks and urban areas within cities). First, we generated asymptotic diversity estimates for each; second, we generated diversity estimates for a given standardized population size. For our diversity analyses, the standardized population size we used was 37,000 trees (the rounded median of all cities). For analyses of the diversity of native trees, we used a standardized population size of 10,000 trees. For comparisons of the diversity between park and urban areas in a city, we used the smaller of the two population sizes (park or urban). In all cases we also recorded confidence estimates, and plotted rarefaction/extrapolation curves.

    ○ To control for variation in how uniformly trees were sampled across a city’s geographic range, we developed a procedure to score each city’s spatial coverage (see section Spatial Structure below).

    ○ We identified the best-fitting model, and then repeated our analysis under six sensitivity conditions to control for differences in population size, sampling effort, spatial coverage, and sample coverage. Our sensitivity analyses were as follows: first, with independent variable = effective species as calculated for a given population of 37,000 trees ("effective species for a standardized population size"); second, independent variable = the asymptotic estimate of the effective species number for that city as calculated using iNext; third, the raw effective species number; fourth, excluding cities with fewer than 10,000 trees; fifth, excluding cities with <50% spatial coverage; sixth, excluding cities with <0.995 sample coverage as calculated by iNext. For the fourth, fifth, and sixth models, the independent variable was effective species for a standardized population size of 37,000 trees.”

    Reviewer #3 (Public Review):

    This paper's strength is in the utility of the assembled datasets and some interesting and creative proof of concept analyses. This is an amazing resource for comparative analysis. However the paper felt a little sparse in the conceptual and methodological underpinnings of the questions asked to demonstrate the utility of the analysis. Specifically, I suggest:

    A) More substance in the introduction (currently only two short paragraphs) and a clear statement of research questions.

    We have added text to frame our goals and hypotheses:

    ○ “In particular, we wanted to know whether local climatic conditions are associated with the species diversity of city tree communities, how species diversity was distributed in space within cities, and whether introduced tree species contribute to biotic homogenization among urban ecosystems.”

    B) Add data on the extent to which each dataset represents a complete sample of each city's trees. I know are complete inventories, but some consist of 720 trees and cannot be a complete sample. A column in the meta data indicating effort and if there were any bias in where sampling occurred if the dataset is not complete are needed for others to use this data appropriately. For example, we know tree cover/diversity increases with wealth (which the author rightly cites). Let's say in City X, trees were only inventoried in one wealthy neighborhood. They would not be a representative sample of the city and dataset users need to be aware of this before they draw incorrect conclusions about City X where the sample was biased compared to city Y where the inventory was complete, including a sampling of all affluent and poor areas. This is also needed to support the research questions throughout the paper.

    We completely agree, and have made two major changes in response.

    First, we redid all of our diversity analyses after applying Chao’s rarefaction and extrapolation methods to permit comparison between populations of different sizes and sampling efforts. We added new columns to our datasheet with sample coverage estimates, asymptotic estimates of diversity, and diversity estimates for a given population size.

    Second, we also examined spatial coverage in a city because of the valid concern you raised that trees may only be sampled from particular neighborhoods or areas. In short, we divided each city into grid cells, counted trees per grid cell, and calculated metrics of coverage (adjusted number of trees per grid cell, and proportion grid cells that were empty) and bias (skew, kurtosis of number trees in occupied grid cells). These factors are presented in Spatial_Coverage_Supplement.zip. AS you can see even just from a glance at the spatial coverage plots, some cities are indeed extremely biased! Therefore, we ran a sensitivity analysis where we excluded cities with <50% spatial coverage.

    C) The authors chose to use effective species counts as their alpha diversity metric of choice. They explain why: "effective species counts (a measure that allows comparison between cities of different sizes)" (Ln 109). While effective species number is an excellent metric with much better behavior and attributes in linear modeling, I believe it is still strongly dependent on both city area and the number of individual trees sampled and so the above statement and all of the comparisons that flow out of it in the manuscript are currently unsupported. Just as species richness needs to be rarified or extrapolated to be compared at an equivalent # of individuals or area to be accurate so too does EFN (effective species count). Fortunately there is an R package (iNext) based on Chao's method (citation below) that makes it very easy to create effective species accumulation curves for each city by tree individuals sampled.

    a. Chao, Anne, Nicholas J. Gotelli, T. C. Hsieh, Elizabeth L. Sander, K. H. Ma, Robert K. Colwell, and Aaron M. Ellison. 2014. "Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies." Ecological Monographs 84 (1): 45-67. https://doi.org/https://doi.org/10.1890/13-0133.1.

    b. The standardization (rarefaction/extrapolation) of EFN or richness for # individual trees sampled needs to be made for all analyses that make claims to compare diversity metrics across cities or between groups like urban and park areas (i.e. Fig 2a,b,c; Fig 3b; Fig 5a,b, S1a, S2a, S5, Table S2)

    c. If the authors have an argument for why diversity/area or diversity/sampling effort relationships do not apply for a particular question, then they should make that case instead.

    We very much appreciate this suggestion. Indeed, as described above, we applied Chao’s method to all of our analyses.

    D) The question posed by the Beta diversity analysis is fascinating (i.e. is it non-native species that are driving biotic homogenization across species. However, while frequency (which I assume is relative abundance but maybe it is incidence data- please define) is used to deal with different sample sizes consider whether it makes sense to include incomplete, or very small city datasets in the analysis even with frequency data. For example one city only has ~720 trees listed. If this is an incomplete dataset which seems likely, it will probably be much more differentiated (overlap less) from another city with small numbers simply due to incomplete sampling. Diversity analysis in cities always requires tradeoffs and cannot be identical to methods used in "natural" forested ecosystems, but I encourage the authors to explore this a bit. Perhaps a sensitivity analysis could help where incomplete or small sample sizes are dropped or datasets are resampled via random draw to equalize sizes? The latter would handle incomplete samples but would not deal with bias in which neighborhoods were sampled (see point B above).

    Great suggestion. We redid this analysis using a random drawn approach, as you suggested, to equalize sizes. The new analysis found the same results as our old analysis, with slightly different values. The new method is described here:

    ○ “How similar are species compositions across cities? For N = 1953 city-city comparisons of street tree communities, we could calculate weighted measures of similarity because we had frequency data. We calculated similarity scores for the entire tree population, the naturally-occurring trees only, and the introduced trees only. We used chi-square distance metrics on species frequency data, and we controlled for different population sizes (and potentially, sampling efforts) between cities by sub-sampling the larger city 50 times to match the smaller city’s tree population size and calculating average metrics. In this manner we controlled for differences in sample size.”

    E) Additional context/conceptual underpinning the clustering analysis would be great.

    a. The authors state in Line 390-395:"For city trees, which are often organized along grids or the underlying street layout of a city, this method can more meaningfully cluster trees than merely calculating the meters between trees and identifying nearest neighbors (which may be close as the crow flies but separated from each other by tall buildings)."- I very much agree with this sentiment and it is biologically meaningful for animal and plant dispersal, but as written it is unclear to me how the method described in the text "knows" that a tall building or elevation or some sort of feature exists to separate clusters rather than empty space or a ball field. Please clarify.

    We appreciate these comments, and we have added text and references for the interested reader. Here is the new description in full:

    ○ “We wanted to quantify the degree to which trees were spatially clustered by species within a city (rather than randomly arranged). To do so, we first clustered all trees within each city using hierarchical density based spatial clustering through the hdbscan library in Python (McInnes et al., 2017). HDBSCAN, unlike typical methods such as “k nearest neighbors”, takes into account the underlying spatial structure of the dataset and allows the user to modify parameters in order to find biologically meaningful clusters. For city trees, which are often organized along grids or the underlying street layout of a city, this method can more meaningfully cluster trees than merely calculating the meters between trees and identifying nearest neighbors (which may be close as the crow flies but separated from each other by tall buildings). In particular, using the Manhattan metric rather than Euclidean metrics improves clustering analysis in cities (which tend to be organized along city blocks). For further discussion of why hbdscan is preferable to other clustering metrics, see (Berba, 2020; Leland McInnes et al., 2016; McInnes et al., 2017).”

    b. Would you ever expect composition to be truly random either in a city or a natural forest given environmental conditions etc.? In some sense, the ones closest to random are the most surprising. Can you dive into one to give an example of what is going on in that city?

    c. It seems like there are two metrics here- the size of the cluster and then the observed/expected EFN per cluster. The latter is analyzed in this paper but is there any important information in the former? It seems like an interesting structural measurement of the city and possibly useful in its own right.

    d. Are there any target levels of randomness? Could the authors suggest how this might be determined moving forward with their datasets to illustrate this for foresters?

    Great points. We have given a lot of thought to your comments– these are large and interesting questions!! In the end, I think these questions fall mostly beyond the scope of this study, but we added a substantial amount of text to address your comments:

    ○ “Clustering by species is not necessarily a negative, nor indeed should we necessarily expect trees to be randomly arranged (see suggestions for further research in “Future Analyses” section below). Here, we take a first step toward making spatial clustering a metric of interest in city tree planning.”

    ○ “Researchers could also use this dataset to perform more refined analysis of clustering. For example, what is the biological significance of variation in cluster size (as determined by the hdbscan clustering algorithms)? The size and arrangement of the clusters themselves may be useful metrics. How clustered should we expect trees to be in both wild and urban settings? That is, what our are null expectations? Further, researchers could apply network theory to predict how pest species would proliferate through each of these cities (depending on the spatial arrangement of pest-sensitive trees).”

    F) The statement that this dataset enables "the design of rich heterogenous ecosystems built around urban forests" (Ln 72) seems strange. To my mind this tool will enable a more nuanced evaluation of the urban forests that already exist and suggest ways to target future plantings for increased resilience to climate, pest resistance, biodiversity support etc. I don't understand what ecosystem you would build around and not in the urban forest. If this is what is meant please elaborate. For example, do you mean non-tree installations?

    We agree with you and have changed the text as follows:

    ○ “With these tools, we may evaluate existing city tree communities with more nuance and design future plantings to maximize resistance to pests and climate change. We depend on city trees.”

  2. Evaluation Summary:

    This paper will be of interest to urban foresters, ecologists, and planners. It provides an urban tree dataset across US cities that can be used to address questions on urban biodiversity and ecosystem services. It contains clear descriptions about the data processing and structures in general, but would need further clarifications about the sample completeness and representativeness of the data.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Reviewer #1 (Public Review):

    McCoy et al. provided a dataset of urban trees across large US cities. This new dataset complements urban tree information on their spatial distribution, nativity status and healthy condition, compared to past work focusing on canopy cover and species richness. The authors provided example analyses and relevant tools to explore how this new dataset can provide new insights on spatial composition and nativity status in urban forests. They also suggested potential avenues to combine this new dataset with other data sources (e.g., citizen-scientist data, social and demographic data) to ignite new research and discoveries on urban biodiversity and ecosystem services and social science. The authors describe the data processing and structure clearly and offered good guidance about the usages of this dataset with several example analyses. The analyses were sound in general but would need some further clarifications.

  4. Reviewer #2 (Public Review):

    McCoy et al. has developed a new urban tree species database from existing city tree inventories. They designed procedures to collect and clean a large amount of data, i.e., more than five million trees from 63 US cities. They found that urban trees were significantly clustered by species in 93% of cities using the compiled data. They also showed that climate significantly shaped both nativity and tree diversity. Also, they identified the homogenization effect of the non-native species. The interest in patterns of urban biodiversity and its driving mechanism has been rising recently. This paper provides an important data source for addressing research questions on this topic. The finding presented by the authors exemplified its potential.

    Strengths

    Compared to the existing urban tree database, such as the one developed by Ossola et al.(Global Ecology and Biogeography 2020), the new database added information on spatial location, nativity statuses, and tree health conditions besides occurrences. The new information expands data usability and saves valuable time for researchers. The authors also make the tools available so others can use them to process their own data sets.

    Because of the added information, various analyses of the diversity pattern of urban trees and the potential driving mechanism could be conducted. The authors found that individual species nonrandomly clustered urban trees. This finding corroborates the existing knowledge that some common species dominate urban trees. Nevertheless, the authors showed that the dominance was apparent in the spatial dimension. The preliminary finding that the native status of a tree had no apparent impact on tree health is interesting. It can potentially contribute to the debate on native vs. exotic in urban tree species selection, which the author mentioned in the paper.

    Weakness

    While the new database and the analysis based on it has strengths, some aspects of the concepts and data analysis need to be clarified and extended.

    First, the authors need to define several critical concepts used in the paper, including city trees, urban forests, biodiversity, and species diversity. The authors used city trees and urban forests interchangeably throughout the paper. Nevertheless, a widely accepted definition of the urban forest is:"All woody and associated vegetation in and around dense human settlements." Konijnendijk et al. had a good discussion on the terminology used in urban forestry (Urban Forestry & Urban Greening, 2006). Similarly, biodiversity is different from species diversity. Effective species number is a diversity indicator. Therefore, it is challenging to accept conclusions being drawn on biodiversity in urban forests without clear definitions.

    Second, the tree inventories varied significantly regarding the number of records (214~720,140). The variation can be due to the actual variation of tree abundance in studied cities or incomplete inventories. Biases can be introduced into the findings when comparing these inventories without adjusting the unequal sample sizes. The authors did not detail how they dealt with this issue when conducting the analysis.

  5. Reviewer #3 (Public Review):

    This paper's strength is in the utility of the assembled datasets and some interesting and creative proof of concept analyses. This is an amazing resource for comparative analysis. However the paper felt a little sparse in the conceptual and methodological underpinnings of the questions asked to demonstrate the utility of the analysis.

    Specifically, I suggest:

    A) More substance in the introduction (currently only two short paragraphs) and a clear statement of research questions.

    B) Add data on the extent to which each dataset represents a complete sample of each city's trees. I know are complete inventories, but some consist of 720 trees and cannot be a complete sample. A column in the meta data indicating effort and if there were any bias in where sampling occurred if the dataset is not complete are needed for others to use this data appropriately. For example, we know tree cover/diversity increases with wealth (which the author rightly cites). Let's say in City X, trees were only inventoried in one wealthy neighborhood. They would not be a representative sample of the city and dataset users need to be aware of this before they draw incorrect conclusions about City X where the sample was biased compared to city Y where the inventory was complete, including a sampling of all affluent and poor areas. This is also needed to support the research questions throughout the paper.

    C) The authors chose to use effective species counts as their alpha diversity metric of choice. They explain why: "effective species counts (a measure that allows comparison between cities of different sizes)" (Ln 109). While effective species number is an excellent metric with much better behavior and attributes in linear modeling, I believe it is still strongly dependent on both city area and the number of individual trees sampled and so the above statement and all of the comparisons that flow out of it in the manuscript are currently unsupported. Just as species richness needs to be rarified or extrapolated to be compared at an equivalent # of individuals or area to be accurate so too does EFN (effective species count). Fortunately there is an R package (iNext) based on Chao's method (citation below) that makes it very easy to create effective species accumulation curves for each city by tree individuals sampled.
    a. Chao, Anne, Nicholas J. Gotelli, T. C. Hsieh, Elizabeth L. Sander, K. H. Ma, Robert K. Colwell, and Aaron M. Ellison. 2014. "Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies." Ecological Monographs 84 (1): 45-67. https://doi.org/https://doi.org/10.1890/13-0133.1.
    b. The standardization (rarefaction/extrapolation) of EFN or richness for # individual trees sampled needs to be made for all analyses that make claims to compare diversity metrics across cities or between groups like urban and park areas (i.e. Fig 2a,b,c; Fig 3b; Fig 5a,b, S1a, S2a, S5, Table S2)
    c. If the authors have an argument for why diversity/area or diversity/sampling effort relationships do not apply for a particular question, then they should make that case instead.

    D) The question posed by the Beta diversity analysis is fascinating (i.e. is it non-native species that are driving biotic homogenization across species. However, while frequency (which I assume is relative abundance but maybe it is incidence data- please define) is used to deal with different sample sizes consider whether it makes sense to include incomplete, or very small city datasets in the analysis even with frequency data. For example one city only has ~720 trees listed. If this is an incomplete dataset which seems likely, it will probably be much more differentiated (overlap less) from another city with small numbers simply due to incomplete sampling. Diversity analysis in cities always requires tradeoffs and cannot be identical to methods used in "natural" forested ecosystems, but I encourage the authors to explore this a bit. Perhaps a sensitivity analysis could help where incomplete or small sample sizes are dropped or datasets are resampled via random draw to equalize sizes? The latter would handle incomplete samples but would not deal with bias in which neighborhoods were sampled (see point B above).

    E) Additional context/conceptual underpinning the clustering analysis would be great.
    a. The authors state in Line 390-395:"For city trees, which are often organized along grids or the underlying street layout of a city, this method can more meaningfully cluster trees than merely calculating the meters between trees and identifying nearest neighbors (which may be close as the crow flies but separated from each other by tall buildings)."- I very much agree with this sentiment and it is biologically meaningful for animal and plant dispersal, but as written it is unclear to me how the method described in the text "knows" that a tall building or elevation or some sort of feature exists to separate clusters rather than empty space or a ball field. Please clarify.
    b. Would you ever expect composition to be truly random either in a city or a natural forest given environmental conditions etc.? In some sense, the ones closest to random are the most surprising. Can you dive into one to give an example of what is going on in that city?
    c. It seems like there are two metrics here- the size of the cluster and then the observed/expected EFN per cluster. The latter is analyzed in this paper but is there any important information in the former? It seems like an interesting structural measurement of the city and possibly useful in its own right.
    d. Are there any target levels of randomness? Could the authors suggest how this might be determined moving forward with their datasets to illustrate this for foresters?

    F) The statement that this dataset enables "the design of rich heterogenous ecosystems built around urban forests" (Ln 72) seems strange. To my mind this tool will enable a more nuanced evaluation of the urban forests that already exist and suggest ways to target future plantings for increased resilience to climate, pest resistance, biodiversity support etc. I don't understand what ecosystem you would build around and not in the urban forest. If this is what is meant please elaborate. For example, do you mean non-tree installations?