Agrofusion Potato Yield Prediction Framework (Af-pypf): District-level Study of Uttar Pradesh, India
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Exact and timely prediction of yield is essential for increasing crop production to support sustainable agricultural practices and improving trade benefits. Potato (Solanum tuberosum L.) is a fundamental source of food in many parts of the country like India; thus, improving its yield is necessary to ensure food security and promote related industries. Potato yield forecasting remains a significant challenge because of the crop’s concurrent sensitivity to agroclimatic variability and plant nutritional dynamics. Previous studies have addressed these two dimensions separately. Potato yield prediction is still limited in terms of the number of varieties and field sample size. The yield estimation performance of models with a single source of estimation information is often inferior to that of estimation models that combine data (agroclimatic indices with foliar nutritional variables) from multiple sources (Haverkort et. al., 1986). Previous studies have shown that combining multiple image feature parameters can improve yield prediction accuracy. This study aimed to provide a hybrid machine learning framework that integrates NASA POWER agroclimatic indices with foliar nutritional variables for accurate potato yield forecasting. It combines multi-source data, feature engineering, model training, and ensemble prediction to support sustainable agriculture and decision making. We used machine learning algorithms, including Elastic Net, Random Forest (RF), Support Vector Regression (SVR), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), and ensemble learning (SVM, XGBoost, and RF) under three feature configurations: climate-only, nutrition-only, and integrated climate and nutrition data (Liu et al., 2020). The integrated dataset outperformed the single-source baseline data across all algorithms. The weighted ensemble achieved a Test R2 of 0.5621 with an RMSE of 3.27 t/ha and identified the most important climate indices for yield prediction, that is, PRECIPSIV (evapotranspiration) and PRECIPSIVcritical (cumulative precipitation) during Stage IV (45–65 days) with 54% importance. Among the nutritional predictors, Ca emerged as the most influential variable, followed by Phosphorus and N.