Automating an insect biodiversity metric using distributed optical sensors: an evaluation across Kansas, USA cropping systems

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    The authors propose a new methodology to survey insects, using new sensors and analytical capabilities that could be valuable for addressing urgent conservation challenges. While the results of the optical sensors appear to be comparable to those obtained with classical survey methodologies, current analyses are considered incomplete.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Global ecosystems and food supply depend on insect biodiversity for key functions such as pollination and decomposition. High-resolution, accurate data on invertebrate populations and communities across scales are critical for informing conservation efforts. However, conventional data collection methodologies for invertebrates are expensive, labor intensive, and require substantial taxonomic expertise, limiting researchers, practitioners, and policymakers. Novel optical techniques show promise for automating such data collection across scales as they operate unsupervised in remote areas. In this work, optical insect sensors were deployed in 20 agricultural fields in Kansas, USA. Measurements were compared to conventional assessments of insect diversity from sweep nets and Malaise traps. Species richness was estimated on optical insect data by applying a clustering algorithm to the optical insect sensor’s signal features of wing-beat frequency and body-to-wing ratio. Species richness correlated more strongly between the optical richness estimate and each of the conventional methods than between the two conventional methods, suggesting sensors can be a reliable indicator of invertebrate richness. Shannon- and Simpson indices were calculated for all three methods but were largely uncorrelated including between conventional methods. Although the technology is relatively new, optical sensors may provide next-generation insight into the spatiotemporal dynamics of invertebrate biodiversity and their conservation.

Article activity feed

  1. eLife Assessment

    The authors propose a new methodology to survey insects, using new sensors and analytical capabilities that could be valuable for addressing urgent conservation challenges. While the results of the optical sensors appear to be comparable to those obtained with classical survey methodologies, current analyses are considered incomplete.

  2. Reviewer #2 (Public review):

    Summary:

    The manuscript proposes a new technology to survey insects. They deployed optical sensors in agricultural landscapes and contrast their results to those in classical malaise and sweep nets survey methodologies. They found the results of optical sensors to be comparable with classical survey methodologies. The authors discuss pros and cons of their near-infrared sensor.

    Strengths:

    Contrasting the results with optical sensors with those in classical malaise and sweep nets was a clever idea.

    Weaknesses:

    The submitted materials on Revision 1 (in particular the response to reviewers) are difficult to follow. I encourage the authors to provide a point-by-point response to the first set of comments, as well as to this second review.

    A new version of the manuscript needs to make sure that variability in the system (different crops) is taken into consideration. Also, stronger analysis including our current understanding of biodiversity metrics (including measures of sample coverage, sample completeness, Hill numbers, among others) will be important to make sure your new methodology is properly capable to be used as a new standard methodology.

    While this new version is stronger and much clearer, I also agree with Reviewer 1 that the usage of terminology is weak. The paper and the new methodology is sound. It is is the application to real ecosystems/questions and datasets that is not properly addressed in the manuscript.

  3. Author response:

    The following is the authors’ response to the original reviews.

    Reviewer 1:

    Authors reject the substance of Reviewer 1’s feedback primarily due to clear lack of understanding of typical parameterization practices used to avoid overfitting. To ensure the Spearman-rank correlation accuracy, 70% of all data was withheld from the optimization process and used solely for testing to yield figure 6. Data was withheld prior to model parameterization and therefore avoids Reviewer 1’s charge of “artificially forcing the correlation”. Authors did appreciate the request for clarification of additional definitions and minor reorganization suggestions. Below we provide specific responses to each numbered point (note: multiple responses are provided for some of the reviewer points).

    Point 1: Clarify Metrics Definition and Evaluation

    Authors clarified the description of biodiversity metrics. The metrics associated with manual methods are detailed in the third paragraph of the Materials and Methods: Data Analysis section, while the sensor-based metric is described in the second paragraph, and summarized in its last sentence.

    Text Additions:

    Authors added clarification to the introduction’s first paragraph defining biodiversity metrics, including species richness.

    Authors added detailed definitions of community metrics and their significance in community ecology in the Materials and Methods section (3rd paragraph of “Data Analysis” section). The discussion was updated to include a reference to community ecology and the benefits of big data, specifically highlighting the potential of autonomous optical sensors in entomology.

    Methods Reorganization

    We have reorganized the Methods section for clarity. Updated section clarifies metrics studied, location, dates, a description and methods around optical sensors, Malaise traps, and sweep netting.

    Text Additions:

    An overview paragraph was added to “Data analysis” (3rd paragraph) detailing key metrics used, specifying metrics such as abundance, richness, Shannon index, and Simpson index.

    Visualization methods for sensor data to deliver analogous metrics of abundance, richness, and diversity indices was added to “Data analysis” section.

    Supplementary Table 1 and the first paragraph of the Materials and Methods section cover location, dates, and other general information.

    Detailed descriptions and methods for optical sensors, Malaise traps, and sweeping are provided.

    Integration of Metrics

    Authors integrated two paragraphs explaining the fundamental differences between conventional methods in the 3rd paragraph of the discussion and the presented method of biodiversity measurement.

    Point 2: Body-to-Wing Ratio Calculation

    The backscattered optical cross-section is now clearly defined as the value measured at the maximum point of the event. Specifically, we have added the word ‘maximum’ to our methods section for clarity.

    Point 3: Ecosystem Services Paragraph

    We have shortened and edited this paragraph for clarity. The revised text is now more straightforward and comprehensible.

    Point 4: Results Section Structure

    We believe restructuring the results section around each metric would result in redundancy. The value of our analysis is in the comparison of different methods; therefore, instead of talking about methods in isolation, we provide an integrated discussion and comparison of all three methods across all metrics. Instead, we have maintained our current structure but ensured that the metrics are consistently described and analyzed.

    Point 5: Abundance Correlation

    We agree that the lack of a correlation between methods for abundance remains an open question. However, we maintain that fitting a linear model would be inappropriate and potentially misleading in the absence of significant correlation. We have clarified this in our manuscript.

    Point 6: Richness and Diversity Evaluations

    The authors disagree with Reviewer 1's feedback, citing a clear misunderstanding of standard parameterization practices used to prevent overfitting. Specifically, authors implemented a 30/70 Training/Testing split. Therefore only 30% of the data was used to fit the model and 70% of the dataset was reserved for testing to ensure the validity and reliability of our clustering results. By validating with a 70% testing dataset, we ensure that the clustering model can accurately group new data points and is robust against overfitting. This process helps verify that the identified clusters are meaningful and consistent across different subsets of the data. Spearman's rho converts the data values into ranks and does not assume a linear relationship between the variables or require the data to follow a normal distribution. Spearman's rank correlation offers robustness against non-linearity and outliers by focusing on ranks. This approach is explained in the 4th paragraph of the “Data Analysis” section.

    Point 7: Clustering Method Credibility

    Authors acknowledge the variability in optical sensor features. However, the Law of Large Numbers supports increased insect measurement accuracy and stability occurs from optical insect sensors due to the increased number of observations made by the optical sensors compared to conventional methods. The manuscript now includes a detailed discussion of these aspects in the 3rd paragraph of discussion, emphasizing the correlation observed despite variability.

    Reviewer 2:

    Authors appreciate Reviewer 2’s feedback especially regarding contextualization. While authors disagree with the need for more specific experimental questions in a methods paper and the suggested need for more complex analysis, we agree with the essence of the review and added additional text regarding potential questions, method applications, and ecosystem processes for contextualization.

    Point 1: Larger Question Framing

    We present this article as a methodological paper rather than asking a specific experimental question. This approach is justified by the generalizable nature of methods papers, akin to those describing ImageJ or mass spectrometers. The method is widely applicable to a range of scientific questions.

    We provided a discussion on how this technology could be applied in community ecology, conservation, and managed ecological systems like agriculture.

    In the Conclusion section we provided elaboration on the potential research questions and applications.

    Point 2: Complex Analyses

    While complex analyses like NMDS are useful for specific questions, this paper aims to establish the method. Once established, this method can be applied to various research questions in future studies. Therefore, as we are not directly asking an experimental question, more complex analysis is unnecessary.

    Point 3: Ecosystem Process (Granivory) Assay

    We have improved the contextualization and explanation of the ecosystem process assay throughout the manuscript, ensuring it is well-integrated and clear to readers.

  4. eLife assessment

    This study presents useful work comparing different techniques for monitoring insect species in agricultural settings, including a brand new one using optical sensors. That said, the data were analysed using an inadequately-described -- or potentially inadequate -- framework, and more careful thought must be given to the interpretation of the results before the new methodology can be used as a starting point for insect studies in agricultural fields and beyond.

  5. Reviewer #1 (Public Review):

    The article offers a comparative study between various methodologies to evaluate the abundance, richness, and diversity of insects from data obtained in a large-scale field experiment. The experiment is impressive in view of the number of locations, its spatial coverage, the number of instruments or methods used, and the data collected appears rich and worthy of multiple publications. The paper focuses on the validation of a novel approach based on optical sensors. These sensors collect the backscattered light from flying insects in their field of view and can retrieve the wingbeat frequency and the body-to-wing backscattering ratios.
    Unfortunately, the paper is poorly written and hard to read, with a lack of clear sections, and an overall confusing structure. The methods, metrics, and data analysis are not properly and thoroughly described, making it sometimes difficult to evaluate the validity of the approach.
    Most importantly, the methodology to retrieve the richness and diversity from optical sensors seems flawed. While the scope and scale of the experiment is valuable, I do not believe that this article supports the authors' claim. The main criticisms are described in more detail below.

    1. The Material and Method section is poorly structured. The article focuses on a series of metrics to evaluate biodiversity from three independent methods: optical sensors, malaise traps, and net sweeping. The authors need to provide a clear and thorough description of what the metrics to be studied are, and how those metrics are evaluated for each method. While it is the main focus of the paper, the term "biodiversity metrics" is never properly defined, it is used in the singular form in both the title and abstract, then in its plural form in the rest of the paper, making the reader further doubt what exactly it means. It is then discussed using the correlation value retrieved when studying richness, so is the biodiversity metric the same as richness? Studying biodiversity remains a complex and sometimes contentious subject and this term, especially when measured by three different methods, is far from obvious. The term "community metrics" is defined as abundance, richness, and diversity; is that the same as biodiversity metrics? In any case, the method section should thoroughly describe how each of those metrics is calculated from the raw data collected by each method. This information is somewhat there, but in a very unorganized way, making it difficult to read. I would recommend organizing this section with multiple and clear sections: 1) describing the metrics that are meant to be studied, 2) the location, dates and time, type of crops, and other general information about the experiment, 3) description and methods around optical sensors, 4) description and methods around malaise traps, 5) description and methods around the sweeping. The last 3 sections should describe how it retrieves the previously defined metrics, potentially using equations.

    2. Regarding the calculation of the body-to-wing ratio, sigma is described as a "signal" line 195, then is described as intensity counts in Figure 2; isn't it really the backscattering optical cross-section? It changes significantly over time during the signal, so how is one value of sigma calculated? Is it the average of the whole insect event? The maximum?

    3. The "ecosystem services" paragraph is really confusing and needs to be rewritten.

    4. Like for the method section, the result section should be structured around the comparison of each metric, abundance, richness, and diversity, or any other properly defined metrics described in the method, so that the result section is consistent with the method section.

    5. The abundance is not correlated; interestingly, malaise traps and sweeping are even less correlated which further supports the claims by the authors that new and improved methods are needed. This part of the results could be further developed. A linear fit could be added to Figure 4.

    6. Richness and diversity are the most problematic. Again, the method is poorly described, with pieces of explanation spread out throughout the paper, but my understanding is the following: the optical sensor retrieves two features from each insect signal, wbf, and BWR. Clustering is made using DBSCAN which has 2 parameters: minimum number of signals, and merge distance. It is important to note that these two parameters will greatly influence the number of clusters found by DBSCAN. The richness obtained by optical sensors is defined as the number of clusters and the diversity is evaluated from it as well. Hence, both diversity and richness are greatly dependent on the chosen parameters. The DBSCAN parameters are chosen by maximizing the Spearman correlation between richness obtained by the optical sensors and richness by the capture methods. I see a major problem here: if you optimize the parameters, that directly impact the retrieved diversity and richness by optical sensors, to have the best correlation with either the richness or diversity of the other methods, you will automatically create a correlation between the richness and diversity retrieved by the optical sensors and alternative methods. The p-value in Figure 6 does not represent the probability of the correlation hypothesis being false anymore, since the whole process is based on artificially forcing the correlation from the start.

    7. In addition, the clustering method provides values higher than 80, which is quite unrealistic with just 2 features, wbf and BWR. It is clear from many studies using optical sensors that the features from optical sensors are subject to variability. Wbf has naturally some variances within the same species, not to mention temperature dependency. Backscattering cross sections will also heavily function on the insect's orientation (facing or sideways) while crossing the cone of light, and, even though it is a ratio, the collection efficiency of the instrument telescope and scattering efficiency of the target will be impacted by the position of the insects within the cone of light, which will also impact the variability on the BWR. While those features can still be used, obtaining 80 clusters from two variables with such statistical fluctuations is simply not credible. Additional features could help, such as the two wavelengths mentioned in the description of the optical sensor but are never mentioned again.

    The conclusion then states that the study serves as the first field validation. I disagree; the abundance doesn't correlate, and the richness and diversity evaluations are flawed. While I do think there is great value in the work done by the authors through this impressive field experiment, and in general in their work toward the development of entomological optical sensors, I believe the data analysis and communication of the results do not support the conclusions drawn.

  6. Reviewer #2 (Public Review):

    Summary:

    The manuscript by Rydhmer et al. proposes a new technology to survey insects. They deployed optical sensors in agricultural landscapes and contrast their results to those in classical malaise and sweep nets survey methodologies. They found the results of optical sensors to be comparable with classical survey methodologies. The authors discuss the pros and cons of their near-infrared sensor.

    Strengths:
    Contrasting the results of optical sensors with those obtained with classical malaise and sweep nets was a clever idea.

    Weaknesses:
    Maybe the first most important shortcoming is the lack of a larger question the new technology can help to answer. If the authors could frame their aims not only as a new tool to sample insects but maybe along the lines of a hypothesis to test in their (agricultural) field of research, this could be a more meaningful article.

    The second more important shortcoming is the lack of more complex analyses. The authors seem to be so fixed on counts of abundance and species that they miss the opportunity to look for more complex patterns in their data. The addition of a simple analysis like an NMDS (to test composition changes) could improve the manuscript significantly.

    The ecosystem process (granivory) assay is currently poorly contextualized and explained across the text; I was surprised to find this part in M&M without previous warning. It seems to me that adding this part could be a nice addition to the manuscript (see my comment above). But this needs to be explained better in all sections of the manuscript.

    As I think that addressing my previous points will reshape the manuscript in important ways, I refrain from giving more specific details at this point. But there are some! Maybe only to mention that Figures 4 and 6 would benefit from individual regressions by crop and Figure 5 from adding results from optical sensors.