The FIP 1.0 Data Set: Highly resolved annotated image time series of 4,000 wheat plots grown in 6 years

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Background

Understanding genotype–environment interactions of plants is crucial for crop improvement, yet limited by the scarcity of quality phenotyping data. This Data Note presents the Field Phenotyping Platform 1.0 data set, a comprehensive resource for winter wheat research that combines imaging, trait, environmental, and genetic data.

Findings

We provide time-series data for more than 4,000 wheat plots, including aligned high-resolution image sequences totaling more than 153,000 aligned images across 6 years. Measurement data for 8 key wheat traits are included—namely, canopy cover values, plant heights, wheat head counts, senescence ratings, heading date, final plant height, grain yield, and protein content. Genetic marker information and environmental data complement the time series. Data quality is demonstrated through heritability analyses and genomic prediction models, achieving accuracies aligned with previous research.

Conclusions

This extensive data set offers opportunities for advancing crop modeling and phenotyping techniques, enabling researchers to develop novel approaches for understanding genotype–environment interactions, analyzing growth dynamics, and predicting crop performance. By making this resource publicly available, we aim to accelerate research in climate-adaptive agriculture and foster collaboration between plant science and machine learning communities.

Article activity feed

  1. Background Understanding genotype-environment interactions of plants is crucial for crop improvement, yet limited by the scarcity of quality phenotyping data. This data note presents the Field Phenotyping Platform 1.0 data set, a comprehensive resource for winter wheat research that combines imaging, trait, environmental, and genetic data.Findings We provide time series data for more than 4,000 wheat plots, including aligned high-resolution image sequences totaling more than 153,000 aligned images across six years. Measurement data for eight key wheat traits is included, namely canopy cover values, plant heights, wheat head counts, senescence ratings, heading date, final plant height, grain yield, and protein content. Genetic marker information and environmental data complement the time series. Data quality is demonstrated through heritability analyses and genomic prediction models, achieving accuracies aligned with previous research.Conclusions This extensive data set offers opportunities for advancing crop modeling and phenotyping techniques, enabling researchers to develop novel approaches for understanding genotype-environment interactions, analyzing growth dynamics, and predicting crop performance. By making this resource publicly available, we aim to accelerate research in climate-adaptive agriculture and foster collaboration between plant science and machine learning communities.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf051), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer: Wanneng Yang

    The manuscript presents a comprehensive dataset spanning six years, encompassing data from eight key growth stages of wheat, along with corresponding phenotypic data. The construction of such a comprehensive dataset is highly valuable. However, from the perspective of dataset construction itself, quality control and consistency checks require further refinement. Specific issues are as follows:

    1. How is the consistency check of parameters such as canopy cover and plant height at the eight key growth stages ensured? Especially for parameters like phenological stages and senescence assessment, which are determined through visual evaluation and thus susceptible to subjective influences, quality control and consistency check become particularly crucial. It is recommended to supplement relevant content for detailed explanation.

    2. For all images (151,150 out of 158,891 images), the success rate of alignment and within-field detection exceeded 95%. Does this mean that the final RGB sequence image dataset consists of 151,150 images?

    3. Regarding plant height measurement, the text mentions that "TLS (2016, 2017) or UAV (2018 to 2022) was used to measure plant height." Given the potential differences in height measurements obtained from these two methods, how were these differences addressed in the manuscript?

    4. Does this dataset cater to different tasks and include annotated data? If so, it is recommended to specify the concrete annotation methods and data.

    5. If possible, it is recommended to provide a summary table that specifies the different types of data contained in the dataset along with their respective quantities, facilitating readers' comprehensive understanding of the dataset.

    6. What are the potential limitations of this dataset? It is recommended to point them out.

  2. Background Understanding genotype-environment interactions of plants is crucial for crop improvement, yet limited by the scarcity of quality phenotyping data. This data note presents the Field Phenotyping Platform 1.0 data set, a comprehensive resource for winter wheat research that combines imaging, trait, environmental, and genetic data.Findings We provide time series data for more than 4,000 wheat plots, including aligned high-resolution image sequences totaling more than 153,000 aligned images across six years. Measurement data for eight key wheat traits is included, namely canopy cover values, plant heights, wheat head counts, senescence ratings, heading date, final plant height, grain yield, and protein content. Genetic marker information and environmental data complement the time series. Data quality is demonstrated through heritability analyses and genomic prediction models, achieving accuracies aligned with previous research.Conclusions This extensive data set offers opportunities for advancing crop modeling and phenotyping techniques, enabling researchers to develop novel approaches for understanding genotype-environment interactions, analyzing growth dynamics, and predicting crop performance. By making this resource publicly available, we aim to accelerate research in climate-adaptive agriculture and foster collaboration between plant science and machine learning communities.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf051), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer: Abhishek Gogna

    Thank you for the submission. The dataset surely holds value for the plant breeding community but my major concerns are (1) the availability of genetic data, (2) non-conformity to MIAPPE standards (https://www.miappe.org/). These restrict value of the otherwise excellent publication. I would welcome a submission addressing these major points. In addition, I have some minor points for specific sections. Please use the strings in quotation marks ("") to locate the specific sections.

    1. Context* Change of Equipment: Please indicate how the change of equipment from TLS to drone affects data interoperability.* "Figure 2, gray bars": Kindly update Figure 2 to clarify the representation of the gray bars.* "Heads were annotated": Does this mean that not all relevant images were annotated? If so, please modify the title to avoid confusion.
    • Description of FAIR: Please revise this section. Both links listed under "Findable" and "Accessible" are eligible for these tags. Please modify "Interoperability" with reference to the publication listed in the "Re-use Potential."
    1. Reference measurements* "Senescence was": Was this measurement done for all relevant images? Please include this information.* "Adjusted genotype means with year calculation": Please add variance decomposition data for traits.

    Compilation as Data set* "pure GABI-WHEAT set for the extended set": Please revise this sentence for clarity.

    1. Heritabilities of intermediate and target traits* "y of the public marker" - Please revise the sentence for clarity.

    2. Genomic prediction ability of unseen multi-environment trial* Is the CDC data part of the data publication? Please add this information.6. Example 1 to

    6* Please revise all code for consistency and updated results. Also, include the necessary packages required to run the code.7. Availability of Source code and RequirementPlease create connectivity between repositories and add descriptive README files outlining their usage. Additionally, please provide instructions on how individual repositories may be used.I appreciate your attention to these points and believe that addressing them will strengthen your manuscript