Multisource Grapevine Phenology Dataset for Smart Farming and AI Modeling

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial Intelligence and Machine Learning rely on large, high-quality datasets for accurate and robust models, yet data scarcity remains a major challenge, especially in smart farming. Phenology modeling, a key application, studies how plant biological events relate to climate and seasons. Accurate phenology models improve crop quality, support climate adaptation, and guide decisions such as pesticide use and harvesting, enhancing environmental and economic sustainability. However, agricultural data are highly diverse and heterogeneous, complicating model development. This study presents a proposed georeferenced dataset for Machine Learning-based grapevine phenology prediction across 3 Protected Designations of Origin in Aragón, Spain. Developed by a multidisciplinary team, the dataset combines 9 datasets from 8 sources (including meteorological time series, field phenology observations, and Copernicus Sentinel-2 multispectral imagery) covering the period 2016–2022. It supports both physical and Machine Learning-based phenology modeling and facilitates knowledge extraction in agronomy and plant biology. Its relevance lies in its comprehensive scope, the inclusion of 9 phenological stages, and a rigorous methodology ensuring reproducibility. This framework enables the creation of similar datasets for other regions or crops, advancing smart farming through scalable, data-driven solutions. The open publication of the code further supports this objective. We further anticipate its potential contribution to developing foundation models as well as to the creation of new knowledge in biology and agronomy.

Article activity feed