RAGP: a retrieval-augmented deep learning model for genomic prediction in crop breeding

Lingling Zuo
Xinger Li
Xiaoli Wang
Rui Man
Yang Yang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genomic prediction (GP) plays a pivotal role in expediting genetic gains for crop breeding. Recently, deep learning models has attracted growing attention in this field due to its superior performance over traditional models. However, most advanced models rely on simplified numerical representations of genomic variants and treat individuals in isolation, failing to capture the full complexity of genetic variation and the genetic relatedness among individuals. In response, we propose RAGP, a deep learning model driven by a retrieval-augmented mechanism composed of two synergistic components: 1) an embedding-based retrieval module that identifies a group of nearest neighbors highly relevant to the target sample as references, employs a gene-specific discriminator to refine the representation space and capture informative genetic structures; 2) an augmentation module that integrates the retrieved references to enhance the target representation. The enriched representation is then propagated through a regression network for final prediction. This retrieval-guided strategy enables more informed and fine-grained utilization of genomic data and strengthens the model’s capacity to capture individual-level genetic relationships. Extensive experiments reveal that RAGP achieves consistently superior predictive performance compared to both conventional methods and state-of-the-art deep learning models. By incorporating retrieval-augmented mechanisms, RAGP achieves substantial improvements in predictive accuracy and offers a promising direction for advancing genomic selection in crop breeding. Our code is available on https://github.com/l00907l/RAGP.

Version published to 10.21203/rs.3.rs-7229420/v1 on Research Square
Nov 11, 2025

GENERator: A Long-Context Generative Genomic Foundation Model

This article has 18 authors:
1. Qiuyi Li
2. Wei Wu
3. Yuanyuan Zhang
4. Zhihao Zhan
5. Ruipu Chen
6. Mingyang Li
7. Kun Fu
8. Junyan Qi
9. Yongzhou Bao
10. Chao Wang
11. Yiheng Zhu
12. Zhiyun Zhang
13. Jian Tang
14. Fuli Feng
15. Jieping Ye
16. Liu Yuwen
17. Hui Xiong
18. Zheng Wang
This article has no evaluationsLatest version Feb 4, 2026
BPformer: An Interpretable Deep Learning Framework for Livestock Breed Proportion Analysis

This article has 9 authors:
1. Jinpeng Wang
2. Shuo Sun
3. Yaran Zhang
4. Zhihua Ju
5. Qiang Jiang
6. Xiuge Wang
7. Yao Xiao
8. Lingxi Chen
9. Jinming Huang
This article has no evaluationsLatest version Dec 15, 2025
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

GENERator: A Long-Context Generative Genomic Foundation Model

BPformer: An Interpretable Deep Learning Framework for Livestock Breed Proportion Analysis

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model