Rice Yield Prediction using Machine Learning and Remote Sensing Vegetation Indices from Sentinel2, Landsat and MODIS in Mali

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Rice is the second most important cereal crops in Mali, a region marked by climatic variability and agricultural vulnerability. According to the UN FAOSTAT, its production has almost tripled over the past 15 years, while its rice-growing land has more than doubled. Due to ongoing conflict and instability in central part of Mali, accurate rice yield prediction is crucial for sustaining crop production and addressing food security. This study compares four widely used machine learning algorithms—Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), and k-Nearest Neighbors (kNN)—to estimate rice yield using satellite-derived vegetation indices. The analysis integrates NDVI and EVI data from three major remote sensing platforms: Sentinel2, Landsat, and MODIS, in combination with historical rice yield records spanning four growing seasons across two Mali’s major rice-producing regions. Models were evaluated using performance metrics including RMSE, MAE, and coecient of determination (R2). A core objective was to assess how the spatial resolution and vegetation index source influence the accuracy of self-declared rice yield prediction in Sub-Saharan setting. We developed a modeling framework incorporating regional rice yield data, multisource NDVI and EVI time series to estimate end-of-season rice yields. Results indicate that the RF model using Landsat NDVI achieved the best overall performance with an RMSE of 0.55 ton.ha-1 and an R2 of 88% , outperforming all other combinations. Similarly, SVM performed optimally with Sentinel2 data (R2 of 87.3%, RMSE of 0.545 ton.ha-1), while XGB yielded strong results when trained on Landsat data (R2 of 87.4%). The lowest performance was observed with kNN, particularly on MODIS NDVI, indicating limited suitability for this prediction task. ANOVA tests showed no statistically significant difference in predictive accuracy across satellite platforms (p = 0.227), but significant variation across algorithms (p = 2.76 x10-7) and rice-growing regions (p = 0.000224), highlighting the importance of model selection and local agroecological context. While NDVI remains a commonly used vegetation proxy, its limitations, especially spectral saturation and interference from standing water, suggest the need for future exploration of alternative indices or data fusion strategies. These findings contribute to the growing body of knowledge on the application of machine learning and Earth observation in Sub-Saharan agricultural systems and oer practical implications for improving rice yield predictions in Mali.

Article activity feed