Improved Inversion and Digital Mapping of Soil Organic Carbon Content by Combining Crop-Lush Period Vegetation Indices with Ensemble Learning: A Case Study in Liaoning, Northeast China

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Soil organic carbon (SOC) is a crucial indicator of soil quality and carbon cycling. While remote sensing and machine learning enable regional scale SOC prediction, most studies rely on vegetation indices (VIs) derived from bare-soil periods, potentially neglecting vegetation–soil interactions during crop growth. Given the bidirectional relationship between SOC and crop growth, we hypothesized that using crop-lush period VIs (VIs_lush) instead of bare-soil period VIs (VIs_bare) would increase the inversion accuracy. To test this hypothesis, we chose the cropland area in Liaoning Province as the study area and developed three modelling strategies (MS-1: VIs_lush + other features; MS-2: VIs_bare + other features; MS-3: without VIs) using Landsat 8 imagery, topographic and precipitation data, and ensemble learning models (XGBoost, RF, and AdaBoost), with SHapley Additive exPlanations (SHAP) analysis for variable interpretation. We found that 1) all models achieved their highest performance under MS-1, with XGBoost outperforming the others across all modelling strategies; 2) for XGBoost, MS-1 yielded the highest inversion accuracy (R² = 0.84, RMSE = 2.22 g·kg⁻¹, RPD = 2.49, and RPIQ = 3.25); compared with MS-2, MS-1 reduced the RMSE by 0.31 g·kg⁻¹, increased R² from 0.77 to 0.84, and reduced the RPD by 0.31 and the RPIQ by 0.40, and compared with MS-3, MS-1 reduced the RMSE by 0.41 g·kg⁻¹, increased R² from 0.79 to 0.84, and reduced the RPD by 0.39 and the RPIQ by 0.51; 2) SHAP analysis confirmed that VIs_lush contributed more than VIs_bare, supporting the rationale of using lush-period imagery; and 3) Liaoning Province exhibited distinct SOC spatial patterns (mean: 13.08 g·kg⁻¹), with values ranging from 2.19 g·kg⁻¹ (sandy central–western area) to 33.86 g·kg⁻¹ (eastern mountains/coast). This study demonstrates that integrating growth stage-specific VIs with ensemble learning can significantly enhance regional-scale SOC prediction.

Article activity feed