CLCNet: a contrastive learning and chromosome-aware network for genomic prediction in plants
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genomic selection (GS), which integrates genomic markers with phenotypic data, has emerged as a powerful breeding strategy for predicting phenotypes and estimating breeding values in candidate populations. The success of GS fundamentally depends on the accuracy of genomic prediction (GP) models. However, conventional GP models often struggle to capture inter-individual variability and are particularly challenged by the curse of dimensionality - where the number of features (e.g. single nucleotide polymorphism, SNP) far exceeds the number of samples-ultimately limiting their predictive performance. To overcome these limitations, we proposed Contrastive Learning and Chromosome-aware Network (CLCNet), a novel deep learning framework that, for the first time, integrates multi-task learning with contrastive learning for GP. CLCNet consists of two key components: (i) a contrastive learning module, designed to enhance the model ability to capture fine-grained genotype-dependent phenotypic differences across individuals; and (ii) a chromosome-aware module, which performs feature selection at both the chromosome level and genome-wide level to retain the most informative SNPs. The performance of CLCNet was evaluated across four major crop species-maize (Zea mays), cotton (Gossypium hirsutum), rapeseed (Brassica napus), and soybean (Glycine max)-across ten distinct traits. This evaluation involved a comprehensive comparison with three classical linear models (rrBLUP, Bayesian Ridge, Bayesian Lasso), two machine learning models (LightGBM, SVR), and two deep learning models (DNNGP, DeepGS). CLCNet consistently outperformed all comparator models, surpassing the average of classical linear models by ~3.84%, the best-performing machine learning model by 4.84%, and leading deep learning models by over 5%. These results highlight CLCNet as a promising and robust tool for accelerating genetic gain through more accurate GS in plant breeding.