Explainable Machine Learning for Crop Yield Classification Using Foliar Nutrient Analysis and Management Data in Colombia

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate crop yield prediction is essential for improving agricultural productivity, resource management, and food security, particularly in heterogeneous environments such as Colombia. Recent advances in machine learning have enhanced predictive capabilities; however, most existing approaches rely predominantly on climatic or image-based data, limiting their direct applicability to agronomic decision-making. This study proposes a machine learning framework for crop yield classification based on foliar nutrient analysis, fertilization practices, and geographic variables using open-access agricultural data. The approach formulates yield prediction as a multi-class classification problem, enabling the identification of performance levels that are more interpretable and actionable in practical contexts. Four machine learning models—Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting—were evaluated. The results show that ensemble-based methods outperform alternative approaches, with Random Forest achieving the highest accuracy (96.27%) and macro F1-score (0.9261), followed by Gradient Boosting (95.34%). Feature importance analysis reveals that geographic location, crop type, and foliar nutrients such as sulphur, nitrogen, magnesium, calcium, potassium, and zinc are the most influential predictors. These findings demonstrate that nutrient-based variables provide a direct and meaningful representation of crop performance, offering advantages over models based solely on environmental proxies. By integrating foliar analysis with management practices, the proposed framework enhances interpretability and supports agronomic decision-making, contributing to the advancement of precision agriculture in data-scarce and heterogeneous contexts.

Article activity feed