The Fallacy and Bias of Averages on Vegetation Indices based Plant Phenotyping
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Vegetation indices (VIs) from remote sensing are widely used for non-destructive plant phenotyping, often averaged across plots or image regions to represent each plot. However, according to Jensen’s inequality, which is known as the “fallacy of the average”, it can bias estimates when nonlinear relationships exist between VIs and target traits. To examine this issue, we systematically assessed the severity of this bias and tested a correction method. VI values were simulated using six beta distributions with varying shapes and skewness, and with normalized difference vegetation index (NDVI) images from a paddy rice experiment to evaluate bias under real conditions. Nonlinear link functions (concave, convex, logistic) with different noise levels were applied to model VI–trait relationships.
Result
The results showed that averaging under nonlinear relationships reduced predictive performance, lowering the coefficient of determination (R 2 ) between true and predicted traits by up to 82%. In the rice NDVI simulation, R 2 was reduced by up to 58% around the tillering stage. Our correction method, which predicts traits from VI before averaging, substantially mitigated bias, improving R 2 by up to 0.68 depending on noise level, VI distribution, and link function. To facilitate application, we established an interactive R Shiny website enabling users to quantify potential biases and the efficacy of corrections within this workflow based on their own research conditions
Conclusion
In summary, averaging VIs without accounting for nonlinear relationships can introduce substantial bias and degrade phenotyping accuracy. This bias should be explicitly considered in phenotyping analyses, and correction methods applied when appropriate to improve reliability.