Increased signal-to-noise ratios within experimental field trials by regressing spatially distributed soil properties as principal components

Jeffrey C Berry
Mingsheng Qi
Balasaheb V Sonawane
Amy Sheflin
Asaph Cousins
Jessica Prenni
Daniel P Schachtman
Peng Liu
Rebecca S Bart

Curated by eLife

Evaluation Summary:

The manuscript will be of interest to researchers focusing on understanding phenotypes using data collected from field studies. It provides a rigorous strategy for how to appropriately adjust confounding effects and to perform statistical analysis of noisy data from field plots.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

Environmental variability poses a major challenge to any field study. Researchers attempt to mitigate this challenge through replication. Thus, the ability to detect experimental signals is determined by the degree of replication and the amount of environmental variation, noise, within the experimental system. A major source of noise in field studies comes from the natural heterogeneity of soil properties which create microtreatments throughout the field. In addition, the variation within different soil properties is often nonrandomly distributed across a field. We explore this challenge through a sorghum field trial dataset with accompanying plant, microbiome, and soil property data. Diverse sorghum genotypes and two watering regimes were applied in a split-plot design. We describe a process of identifying, estimating, and controlling for the effects of spatially distributed soil properties on plant traits and microbial communities using minimal degrees of freedom. Importantly, this process provides a method with which sources of environmental variation in field data can be identified and adjusted, improving our ability to resolve effects of interest and to quantify subtle phenotypes.

Version published to 10.7554/elife.70056 on eLife
Jul 12, 2022
eLife
Mar 31, 2022

Evaluation Summary:

The manuscript will be of interest to researchers focusing on understanding phenotypes using data collected from field studies. It provides a rigorous strategy for how to appropriately adjust confounding effects and to perform statistical analysis of noisy data from field plots.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

Read the original source
eLife
Mar 31, 2022

Reviewer #1 (Public Review):

In this study, the authors took an experimental, empirical approach to tackle the thorny problem of micro-scale variation in soil properties within and among field plots in confounding statistical analyses. The issue is that in field experiments, small variation in one or more soil property variables can obscure true effects of experimental variables on plant phenotypes. In this case, the authors used Sorghum (many accessions) as the focal plant and a drought vs. well-watered treatment as the main experimental variable. Really, this means genotype x drought treatment was the central question of the study. Many other variables were measured, including the microbiome (the authors cite a preprint where this part of the study is explained in more detail, although unless I am confused, it seems as though there …

Reviewer #1 (Public Review):

In this study, the authors took an experimental, empirical approach to tackle the thorny problem of micro-scale variation in soil properties within and among field plots in confounding statistical analyses. The issue is that in field experiments, small variation in one or more soil property variables can obscure true effects of experimental variables on plant phenotypes. In this case, the authors used Sorghum (many accessions) as the focal plant and a drought vs. well-watered treatment as the main experimental variable. Really, this means genotype x drought treatment was the central question of the study. Many other variables were measured, including the microbiome (the authors cite a preprint where this part of the study is explained in more detail, although unless I am confused, it seems as though there was an experimental treatment of microbes in the biorxiv study--was this also the case in the present study or were the microbes naturally colonizing the rhizosphere?). Overall, the PC-based approach to de-noise these kinds of datasets is sound and provides an important advance in the sense that pulling out subtle phenotypic effects in field trials may now be more straightforward given the results of their study and the tools that they provide. The main result is that without their framework they would not have found the association between water treatment, plant growth and Microvirga bacterial abundance--it would have been lost to the noise inherent in these kind of large-scale experiments with relatively modest degrees of freedom.

Read the original source
eLife
Mar 31, 2022

Reviewer #2 (Public Review):

The authors present a simple statistical workflow to strengthen biological signals of interest by accounting for spatially-structured environmental heterogeneity in field settings. Oftentimes, environmental variation is not neatly partitioned among sections of an experimental plot (e.g., rows or blocks). As a result, statistical models that do not account for the shape of environmental "noise" across the landscape will poorly capture (and poorly control for) its confounding effects.

The presented approach addresses this problem by testing for spatial structure in each of a range of assayed environmental variables, collectively capturing the numerous spatially structured variables into a few principal components, and regressing out their effects on experimental outcomes. This is noteworthy because it focuses …

Reviewer #2 (Public Review):

The authors present a simple statistical workflow to strengthen biological signals of interest by accounting for spatially-structured environmental heterogeneity in field settings. Oftentimes, environmental variation is not neatly partitioned among sections of an experimental plot (e.g., rows or blocks). As a result, statistical models that do not account for the shape of environmental "noise" across the landscape will poorly capture (and poorly control for) its confounding effects.

The presented approach addresses this problem by testing for spatial structure in each of a range of assayed environmental variables, collectively capturing the numerous spatially structured variables into a few principal components, and regressing out their effects on experimental outcomes. This is noteworthy because it focuses on measuring a suite of environmental characteristics and modeling their collective effects on the outcome directly, rather than attempting to account for them indirectly by modeling spatial variation in the outcome itself. The authors justifiably posit that their approach, in comparison to spatially-unaware models, has the potential to (1) boost signals of interest by reducing background noise and (2) reduce false positives that arise when unmodeled environmental variation correlates with the spatial distribution of treatment effects. Their application of this approach to a rich empirical dataset offers an opportunity to explore its utility.

It should be noted that no major component of this approach is new, even in the very specific case of soil elemental composition and plant field trials. For example, Pauli et al. 2018 (G3, doi: 10.1534/g3.117.300479) used very similar methods to measure soil elements, interpolate missing data points, and account for the local soil environmental effects on phenotypes of interest. Additionally, Murren et al. 2020 (Am. J. Bot., doi: 10.1002/ajb2.1420) used principal components regression to reduce the dimensionality of environmental variation (including soil elemental properties) and quantify its effect on plant traits (in this case, fitness). The main advance of the current paper is demonstrating that integrating these approaches is a simple and effective way to address environmental variation that poses a nuisance to the study.

Strengths:

The authors describe a potential way to boost power and reduce false positives -- without the costly (and yet still imperfect!) approach of massively increasing experimental sample sizes. Additionally, the authors demonstrated that environmental states at un-measured locations can be successfully interpolated from spatial relationships among the sampled locations, so their approach can be used even when it's infeasible to measure environmental variables for the majority of samples. Finally, their use of dimensionality reduction for environmental features (in this case, principal components analysis) allows very rich environmental profiles with many variables to be included in an analysis, without greatly increasing the complexity of the statistical models used to test for treatment effects. Overall, these advantages make the approach feasible and broadly applicable across field studies.

Another advantage of this approach, relative to methods that account for spatial variation in outcomes without modeling environmental contributors, is the potential to reveal mechanistic insights. For example, the authors identify specific soil properties (e.g., phosphate levels) that correlate with individual plant or microbial community traits. On one hand, this naturally generates additional hypotheses to test through future experiments. On the other hand, it allows researchers to leverage previously-known mechanistic insights into their experimental systems when choosing which environmental features to measure.

The authors demonstrate how their approach can be applied to a rich empirical dataset. As dependent variables, the dataset includes plant harvest traits, leaf traits, metabolomic profiles, and microbiomes of the plant root, adjacent soil, and root-soil interface. As independent variables, it includes plant genotypes, drought treatments, and their interaction. And as environmental variables, it includes soil composition properties. The authors demonstrate that a subset of the soil composition properties are spatially structured, and that accounting for their effects yields new insight. For example, the authors identify an OTU that is correlated with increased plant height, suggesting potential growth-promoting effects, but only when adjusting both OTU abundance and plant height for soil properties. Such promising results suggest that modest effort to measure and model spatially structured environmental variation can illuminate findings in real experimental settings that would otherwise be obscured, and that integrative approaches to measure a variety of environmental features can add significant value to a study.

Weaknesses:

The paper lacks some components that I would expect to see when presenting an analytical approach and arguing for its effectiveness. Because the study's purpose is framed as providing "a tool with which sources of environmental variation in field data can be identified and removed, improving our ability to resolve effects of interest and to quantify subtle phenotypes", critically and thoroughly evaluating its performance is of utmost importance. Below, I describe a few reasons to be a bit cautious when interpreting the results.

First, the extent to which the approach improves inferences of treatment effects is not comprehensively shown. Rather, aspects of a few of the most promising results from a series of tests are presented. A presentation of the method's performance across the full series of tests conducted (i.e., for each dependent variable considered for a given model formulation) is needed to truly understand how it strengthens the analyses. This would include a comparison of each statistical model before and after adjusting for environmental variability. Metrics such as the increase in variance explained by experimental treatments and the increase in the proportion of treatment effects that are significant, across the full range of tests, would complement the current presentation and better demonstrate the extent to which the approach boosts signals and reduces unexplained noise in field data.

Second, comparisons to other approaches to account for spatial variation in field trials were not conducted, and a strong conceptual discussion of how, when, and why its performance may differ from these other approaches is lacking. Some other methods (e.g., Velazco et al. 2017, Theor. Appl. Genet; doi: 10.1007/s00122-017-2894-4) model how each focal observation differs from expectations based on spatially neighboring observations, but do not measure or directly model the sources of environmental variation (e.g., soil properties). This may have different strengths and weaknesses. For example, if the environmental variables with the largest effects are not measured, they would be missed in the current study's approach but could be indirectly accounted for by other methods. Knowledge about the relative performance of different approaches under realistic scenarios (perhaps both empirical and simulated) would be important to researchers considering use of the presented approach, who must gauge if the additional effort and resources needed to measure environmental parameters are likely to improve statistical inferences beyond other spatially-aware statistical methods that do not require measuring such covariates.

Thus, while promising results in support of this approach are presented, a reliable picture of its overall effectiveness will require further investigation.

Read the original source
eLife
Mar 31, 2022

Reviewer #3 (Public Review):

This work provides a rigorous strategy to reduce the noise in data collected from field studies. Based on a sorghum field trial, the authors proposed a tool to identify and estimate the effects of soil properties with minimal degrees of freedom. Such a procedure can help us better understand the phenotypes of interest using appropriate statistical analyses.

Strengths: Accounting for the confounding effects from all kinds of variables is crucial for the analysis of field data. The tool presented in the manuscript can be used for field studies in general and furthermore, extended to other types of trials with minimal modifications.

Weakness: It will be helpful to include the details of the statistical model in the manuscript.

Read the original source
Version published to 10.1101/2021.04.29.441834 on bioRxiv
Apr 30, 2021

The conditional ecology of pest suppression: A general mechanistic framework for predicting landscape effects on biological control

This article has 2 authors:
1. Andrew Corbett
2. Emily Martin
This article has no evaluationsLatest version Feb 3, 2026
Optimal selection of common bean genotypes under genotype × environment interaction and its environmental drivers

This article has 9 authors:
1. João Amaro Ferreira Vieira Netto
2. Hernandes Peres Panichi
3. João Marcos Amario de Sousa
4. Gabriel Mazetti Blasques
5. Helton Santos Pereira
6. Marcelo Sfeir de Aguiar
7. Leonardo Cunha Melo
8. Kaio Olimpio das Graças Dias
9. Leonardo Lopes Bhering
This article has no evaluationsLatest version Jan 20, 2026
Integrating in-situ data and remote sensing for spatiotemporal assessment of alpine vegetation

This article has 5 authors:
1. Jose Manuel Álvarez-Martínez
2. Clara Espinosa del Alba
3. Corrado Marcenò
4. Gonzalo Hernández Romero
5. Borja Jiménez-Alfaro
This article has no evaluationsLatest version Jan 8, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The conditional ecology of pest suppression: A general mechanistic framework for predicting landscape effects on biological control

Optimal selection of common bean genotypes under genotype × environment interaction and its environmental drivers

Integrating in-situ data and remote sensing for spatiotemporal assessment of alpine vegetation