Reducing Bias in Cropland Soil Organic Carbon and Clay Predictions using Sentinel-2 Composites and Data Balancing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate maps of cropland soil organic carbon stocks (SOCS) and clay content are essential for climate-smart agriculture. Soil reflectance composites (SRC), derived from multispectral bare soil observations, offer a scalable approach to high-resolution soil mapping. While studies often focus on maximizing model performance, challenges remain regarding (1) the bias introduced by masking and excluding soil samples during SRC generation and (2) the accurate representation of the full range and distribution of soil properties in the resulting maps. Evaluating different SRC parameters, we found that commonly used indices such as the Normalized Burn Ratio 2 (NBR2) and the Normalized Difference Vegetation Index (NDVI) were significantly correlated with clay content and SOCS, respectively. These dependencies can lead to the systematic exclusion of high SOCS (>80 Mg ha-1) and clay (>30 mass%) samples during SRC generation, introducing bias in the resulting maps. Models trained solely on SRC bands failed to capture the full range of the training data, limiting the applicability of the soil property maps. While the inclusion of additional remote sensing features, such as spectral-temporal metrics and indices, significantly improved the prediction accuracy, the representation of the imbalanced samples remained challenging. We demonstrated that a combined framework of spatial data augmentation and majority undersampling was effective in improving the range and concordance correlation coefficient (CCC) of the predictions (SOCS = 0.82; Clay = 0.9). Our findings emphasize the importance of (1) evaluating excluded samples to identify potential SRC-induced bias, and (2) optimizing model predictions reflecting the observed data range to improve the reliability and usability of the resulting soil maps.

Article activity feed