RAGP: a retrieval-augmented deep learning model for genomic prediction in crop breeding
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genomic prediction (GP) plays a pivotal role in expediting genetic gains for crop breeding. Recently, deep learning models has attracted growing attention in this field due to its superior performance over traditional models. However, most advanced models rely on simplified numerical representations of genomic variants and treat individuals in isolation, failing to capture the full complexity of genetic variation and the genetic relatedness among individuals. In response, we propose RAGP, a deep learning model driven by a retrieval-augmented mechanism composed of two synergistic components: 1) an embedding-based retrieval module that identifies a group of nearest neighbors highly relevant to the target sample as references, employs a gene-specific discriminator to refine the representation space and capture informative genetic structures; 2) an augmentation module that integrates the retrieved references to enhance the target representation. The enriched representation is then propagated through a regression network for final prediction. This retrieval-guided strategy enables more informed and fine-grained utilization of genomic data and strengthens the model’s capacity to capture individual-level genetic relationships. Extensive experiments reveal that RAGP achieves consistently superior predictive performance compared to both conventional methods and state-of-the-art deep learning models. By incorporating retrieval-augmented mechanisms, RAGP achieves substantial improvements in predictive accuracy and offers a promising direction for advancing genomic selection in crop breeding. Our code is available on https://github.com/l00907l/RAGP.