Predicting AquaCrop-Simulated Durum Wheat Yield with Machine Learning: Algorithm Comparison and Agronomic Signal Convergence in the Capitanata Plain
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Five machine learning algorithms — Linear Regression (LR), Multilayer Perceptron (MLP), Support Vector Machine for regression (SMOreg), RandomTree, and Reduced Error Pruning Tree (REPTree) — were trained and compared for predicting durum wheat (Triticum durum Desf.) grain yield simulated by AquaCrop-GIS across the Capitanata plain (Southern Italy). A dataset of 342 instances was constructed by crossing 25 soil profiles, three sowing dates, and two irrigation regimes over 15 climatic grid cells (2014–2023), validated by stratified 10-fold cross-validation. MLP achieved the highest accuracy (R = 0.983; MAE = 0.059 t ha-1; RMSE = 0.083 t ha-1); the four interpretable models clustered at R = 0.891–0.907 (RMSE = 0.192–0.203 t ha-1). All models converged on consistent agronomic signals: standard sowing (1 November) yielded +0.53 t ha-1 over late sowing (15 November); supplemental irrigation added +0.17 t ha-1; high-silt and clay soils produced superior yields. The SMOreg normalised weight vector identified autumn temperature (Tmin_oct_nov: −0.462; Tmax_oct_nov: −0.405) as the dominant climate predictor, reflecting the AquaCrop phenological mechanism whereby elevated early-season thermal loads curtail tillering. The convergence of directional signals across fundamentally different algorithmic architectures — linear, kernel-based, and tree-based — confirms that ML surrogates can efficiently emulate AquaCrop response surfaces for scenario analysis and decision-support in Mediterranean dryland farming systems.