Improved Inversion and Digital Mapping of Soil Organic Carbon Content by Combining Crop-Lush Period Vegetation Indices with Ensemble Learning: A Case Study for Liaoning, Northeast China
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Soil organic carbon (SOC) is a crucial indicator of soil quality and carbon cycling. While remote sensing and machine learning enable regional scale SOC prediction, most studies rely on vegetation indices (VIs) derived from bare-soil periods, potentially neglecting vegetation–soil interactions during crop growth. Given the bidirectional relationship between SOC and crop growth, we hypothesized that using crop-lush period VIs (VIs_lush) instead of bare-soil period VIs (VIs_bare) would increase the inversion accuracy. To test this hypothesis, we chose the cropland area in Liaoning Province as the study area and developed three modeling strategies (MS-1: VIs_lush + other features; MS-2: VIs_bare + other features; and MS-3: without VIs) using Landsat 8 imagery, topographic and precipitation data, and ensemble learning models (XGBoost, RF, and AdaBoost), with SHapley Additive exPlanations (SHAP) analysis for variable interpretation. We found that (1) all models achieved their highest performance under MS-1, with XGBoost outperforming the others across all modeling strategies; (2) for XGBoost, MS-1 yielded the highest inversion accuracy (R2 = 0.84, RMSE = 2.22 g·kg−1, RPD = 2.49, and RPIQ = 3.25); compared with MS-2, MS-1 reduced the RMSE by 0.31 g·kg−1, increased R2 from 0.77 to 0.84, and reduced the RPD by 0.31 and the RPIQ by 0.40, and compared with MS-3, MS-1 reduced the RMSE by 0.41 g·kg−1, increased R2 from 0.79 to 0.84, and reduced the RPD by 0.39 and the RPIQ by 0.51; (3) based on the SHAP analysis of the three modeling strategies, it is considered that precipitation, terrain and terrain analysis results are important indicators for SOC content inversion, and it is confirmed that VIs_lush contributed more than VIs_bare, supporting the rationale of using lush-period imagery; and (4) Liaoning Province exhibited distinct SOC spatial patterns (mean: 13.08 g·kg−1), with values ranging from 2.19 g·kg−1 (sandy central–western area) to 33.86 g·kg−1 (eastern mountains/coast). This study demonstrates that integrating growth stage-specific VIs with ensemble learning can significantly enhance regional-scale SOC prediction.