End-to-end genomic prediction: direct prediction of images and text from genome-wide molecular markers

Mark T. Watson
Mitchell Feldmann
Haipeng Yu
Hao Cheng

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Methods to predict the heritable component of phenotypes from genetic markers, collectively known as genomic prediction, have been widely applied in plant and animal breeding and in human genetics. Currently, genomic prediction is limited to numeric phenotypes. In some cases, though, plant and animal phenotypes are better understood through images and text rather than numbers. The current best practice for incorporating images and text in genomic prediction is to first extract scalar numeric phenotypes from images and text, and then to perform genomic prediction on the numeric phenotypes. While this approach is effective for some traits, it involves discarding most of the information in the image or text, including potentially useful information. Additionally, numeric phenotypes derived from images and text may not be as interpretable as the images and text themselves.

Approach

We present a novel approach for predicting images and text from SNP markers, which we refer to as end-to-end genomic prediction, and validate this approach using genotypes, text, and image phenotypes derived from a strawberry ( Fragaria × ananassa ) diversity panel. Our approach combines nonlinear latent space encoding with linear genomic prediction to generate accurate breeding values for a high-dimensional phenotype, for example, images or text.

Results

For both genome-to-image and genome-to-text prediction, we found that predicting images and text and then extracting numeric traits from them was in some cases as accurate as directly predicting extracted numeric phenotypes, demonstrating for the first time that genome-to-image prediction accuracy can be comparable to conventional genomic prediction accuracy.

Conclusions

Based on our proof-of-concept using the same core end-to-end method in both images and text, we believe end-to-end genomic prediction could be of use in a wide range of visual and multidimensional phenotypes in plants and animals, although further work to improve the accuracy of the embedding and genomic prediction steps is needed.

Version published to 10.1101/2025.11.03.686395 on bioRxiv
Nov 5, 2025

Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

This article has 6 authors:
1. Jędrzej Kubica
2. Hetvi Jethwani
3. Krzysztof H. Banecki
4. Mauricio Moldes
5. Dariusz Plewczynski
6. Ben Busby
This article has no evaluationsLatest version Dec 17, 2025
Derivation of prediction error variance for non-genotyped individuals in genomic selection

This article has 3 authors:
1. Vinícius Junqueira
2. Marcos Jun-Iti Yokoo
3. Fernando Flores
This article has no evaluationsLatest version Dec 17, 2025
Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Background

Approach

Results

Conclusions

Article activity feed

Related articles

Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

Derivation of prediction error variance for non-genotyped individuals in genomic selection

Understanding Pathways in Bioinformatics, Genomics, and Health Applications