Wheat Yield Prediction Based on Random Forest Method
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Crop yield prediction is essential for enhancing agricultural productivity, managing risks, ensuring food security, and improving the sustainability of farming systems. This study aimed to evaluate the effectiveness of Random Forest (RF) regression for predicting wheat yield using a time-series dataset comprising climate- and soil-related variables, and to identify the key factors influencing wheat yield. The performance of RF was compared with Multiple Linear Regression (MLR) as a benchmark model. The results showed that RF outperformed MLR in predicting wheat yield. Model performance was evaluated using mean absolute error (MAE) and root mean squared error (RMSE). The RF model achieved an MAE of 135.88 and an RMSE of 163.90, whereas the MLR model produced substantially higher errors, with an MAE of 435.74 and an RMSE of 653.39. Variable importance analysis from the RF model indicated that year and CO₂ emission were the most influential predictors. Nitrogen fertilizer use, wheat cultivated area, and phosphate fertilizer application were also associated with improved wheat yield. The partial dependence plot for year revealed an increasing trend in wheat yield from 2000 to 2010, followed by a yield plateau after 2010. Overall, these findings demonstrate that RF provides superior predictive performance compared with MLR and represents a robust tool for wheat yield forecasting and identifying key drivers of yield variation.