Comparative evaluation of the performance of nine Machine Learning models for predicting corn yield based on uncalibrated empirical data in Cameroon

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Predicting maize yields is a major challenge for food security and the optimization of agricultural systems in Sub-Saharan Africa. While climate and spectral data are often prioritized, the predictive power of socio-economic factors remains less explored. This study aimed to evaluate the performance of diverse Machine Learning (ML) in predicting maize yield using farmer survey data. Methodology : We assessed the performance of seven ML models and two ensemble methods (Stacking, AdaBoost)(multiple linear regression (MLR), K-Nearest Neighbors (KNN), Support Vector Regression (SVR), Random Forest (RF), Artificial Neural Network (ANN), and XGBoost, alongside Stacking and AdaBoost models) using survey data collected from 354 maize farmers in Vina, Cameroon. Variable importance analysis, model training, and robust cross-validation ( k = 5 ) were conducted. Results: Variable importance analysis revealed that profit per hectare was the dominant predictor (score = 0.895; r = 0.85 with yield), followed by fertilizer use and agricultural expenditure. Modeling results demonstrated the superiority of ensemble methods. XGBoost achieved the best performance (R 2  = 0.98; RMSE = 0.128), closely followed by Random Forest and Stacking, while linear models and K-Nearest Neighbors (KNN) exhibited insufficient accuracy. Cross-validation confirmed the robustness and generalization ability of ensemble models, with XGBoost, Random Forest, and Stacking all maintaining high accuracy (R 2  ≈ 0.93–0.94) and low variance. Conclusion: Our findings demonstrate that socio-economic data, when analyzed by robust ensemble algorithms, can provide reliable and accurate maize yield predictions. This approach offers a cost-effective and robust alternative to traditional climate- or spectral-based modeling, presenting new opportunities for agricultural planning and precision agriculture in resource-limited areas of Sub-Saharan Africa.

Article activity feed