Spatiotemporal prediction of soil organic carbon density for Europe (2000--2022) in 3D+T based on Landsat-based spectral indices time-series
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The paper describes a comprehensive framework for soil organic carbon density (SOCD) (kg/m3) modeling and mapping, based on spatiotemporal Random Forest (RF) and Quantile Regression Forests (QRF). 22,428 SOCD measurements and a wide range of covariate layers—particularly the 30m Landsat-based spectral indices were used to fit models and produce 30~m SOCD maps for the entire EU at four-year intervals from 2000 to 2022 and for four soil depth intervals (0--20cm, 20--50cm, 50--100cm, and 100--200cm) each accompanied by per-pixel 95% probability prediction intervals (PI, between P0.025 and P0.975). The results of model evaluation indicate consistent accuracy of the predictions: based on both 5--fold spatial cross-validation with model refitting (MAE = 8.64 kg/m3 , MedAE = 4.31 kg/m3 , MAPE = 0.54 kg/m3 and bias = -2.95 kg/m3 ), and on independent testing (MAE = 7.73 kg/m3 , MedAE = 3.54 kg/m3 , MAPE = 0.45 kg/m3 , and bias = -3.04 kg/m3), with both R2 values exceeding 0.7 and concordance correlation coefficients (CCC) greater than 0.8. Validation of PI estimation confirmed that PIs effectively capture uncertainty intervals, although with reduced accuracy for higher SOCD values. Exploratory analysis using Shapley values identified soil depth as the most important feature, with vegetation (Landsat biophysical indices) and long-term bio-climate features as the two main contributing feature groups. Although the uncertainty of the prediction per pixel is significant, further spatial aggregation has been shown to reduce the uncertainty by about 70%. Suggested uses of the data include: (1) time-series / trend analysis to detect potential land degradation hotspots, (2) optimization of sampling designs based on prediction uncertainty, and (3) prediction of future soil carbon potential by extrapolating models under different land use / climate scenarios. The data and code used are publicly available under an open license from https://doi.org/10.5281/zenodo.13754344 and https://github.com/AI4SoilHealth/SoilHealthDataCube/.