INCREASING PHENOMIC PREDICTION EFFICIENCY USING A PRINCIPAL COMPONENT ANALYSIS BASED PRE-PROCESSING OF NEAR INFRARED SPECTRA

Clément Bienvenu
Jean-Michel Roger
Mamadou Séne
Sergio Antonio Castro-Pacheco
Mathilde Singer
Bakolinirina Laurencia Felaniaina
Nancy Terrier
Fabien De Bellis
David Pot
Hugues de Verdal
Vincent Segura

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Phenomic prediction (PP) is a breeding value prediction method using near infrared spectroscopy (NIRS). Spectra pre-processing is a key step in the analysis pipeline of PP and generally involves chemometrics methods. However, there is still little understanding in the genetics community of what pre-processing does and why it increases performances. Consequently, the choice of pre-processing is done either arbitrarily or through a search of the optimal set of methods and associated parameters. In this study, we propose a PCA-based pre-processing method where genetic values of spectra are estimated on a set of principal components instead of individual wavelengths. This way, estimations are based on a few informative and orthogonal features of spectra instead of many correlated, uninformative wavelengths. We tested this new pre-processing method on five data sets representing four plant species (maize, rice, sorghum and grapevine). Results show that it performs as good, or better than the best classical chemometric pre-processing methods in almost all cases. Combining PCA-based and classical chemometric pre-processing methods maximizes predictive ability. Moreover, this pre-processing method opens up possibilities of better understanding and selecting parts of the spectral information that are relevant for the prediction of breeding values. Indeed, components representing together about 1% of spectral variability were found to be responsible for most of PP predictive ability.

Plain language summary

Cultivated plants are the result of a breeding process during which their genetic values are used to select those to breed. Estimation of breeding values requires heavy experimental means and is time consuming. Phenomic prediction is a low cost and high throughput genetic value estimation method that is increasingly being used. It often uses near infrared spectroscopy measurements as predictors of genetic values that are easy to collect and thus routinely used in many species. However, near infrared spectra generally require pre-processing before being used in prediction. Currently used pre-processing methods arise from the chemometrics community, and still deserve a better in-depth appropriation by geneticists. In this study, we propose a new pre-processing approach that performs as good as or better than the best chemometric pre-processing generally used, reduces computation time, and allows for a better understanding of what parts of spectral information are relevant for prediction.

Core Ideas

Working on principal components of spectra instead of wavelengths increases predictive ability of phenomic prediction and performs as good as or better than classical chemometrics pre-processing
Working on principal components of spectra requires less optimization of parameters than chemometrics pre-processing
About 1% of spectral variance is responsible for most of the predictive power of phenomic prediction
Working on principal components of spectra pre-processed with classical chemometrics pre-processing can increase predictive ability even more
PCA-based methods are valuable to optimize predictive ability of phenomic prediction and could be used more widely in the quantitative genetics’ field

Version published to 10.64898/2026.05.10.724118 on bioRxiv
May 13, 2026

Classical and AI-boosted dendrogram-based techniques for the classification of orichalcum ingots XRF spectra

This article has 5 authors:
1. Salvatore Calderaro
2. Francesco Armetta
3. Giosuè Lo Bosco
4. Salvatore Miccichè
5. Maria Luisa Saladino
This article has no evaluationsLatest version Apr 14, 2026
Integrating Envirotyping and Phenomics for AI-Enabled Multi-Environment Genomic Prediction in Crop Breeding

This article has 5 authors:
1. Xiongwei Liang
2. Shaopeng Yu
3. Yongfu Ju
4. Yingning Wang
5. Dawei Yin
This article has no evaluationsLatest version May 21, 2026
Novel linkage disequilibrium-based genotype-by-environmental interaction method for genomic prediction of cotton yield and fibre quality traits

This article has 7 authors:
1. Zitong Li
2. Xuesong Li
3. Shiming Liu
4. Iain Wilson
5. Qian-Hao Zhu
6. Warwick Stiller
7. Warren Conaty
This article has no evaluationsLatest version May 6, 2026

Discuss this preprint

Listed in

Abstract

Plain language summary

Core Ideas

Article activity feed

Related articles

Classical and AI-boosted dendrogram-based techniques for the classification of orichalcum ingots XRF spectra

Integrating Envirotyping and Phenomics for AI-Enabled Multi-Environment Genomic Prediction in Crop Breeding

Novel linkage disequilibrium-based genotype-by-environmental interaction method for genomic prediction of cotton yield and fibre quality traits