Learning genetic values of individuals with incomplete pedigree, genomic and phenotypic data

Daniel Gianola
Ignacio Aguilar
Olga Ravagnolo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Prediction of outcomes is important in personalized medicine, animal and plant breeding. Typical inputs for building prediction models in agriculture include genealogies, molecular markers and phenotypes. Information is seldom complete, i.e., there may be individuals that lack at least one of such inputs. For instance, all individuals may possess pedigree data but only a fraction is genotyped for molecular markers. A solution for such situation is known as “single-step best linear unbiased prediction” (SS-BLUP). A more general scenario is one where, in addition to the setting of SS-BLUP, there are subjects with genomic data but lacking genealogy, with or without phenotypes. Our study presents a novel “single-step” prediction method that accommodates a wider degree of incompleteness than SS-BLUP. It does not employ imputation or approximations and is based on basic Bayesian principles of combining distinct prior opinions. The proposed method, Hy-BLUP (“Hy” for “hybrid”) uses a prior that combines knowledge from the population about variation derived from pedigree and from markers, as if these two sources of information were independent. Such assumption may over-state prior precision, but Bayesian theory dictates that it should be over-ridden as information from data accrues.The Bayesian logic defines the weights assigned to the sources implictly. However, additional weights ( w _A and w _G for pedigree and genomic information, respectively) may be introduced as tuning parameters. From an inferential perspective, the weights and the variance components are not jointly identified in the likelihood function. However, given the variance components, some Bayesian learning about the weights can be obtained. The prior induces a precision matrix (inverse of the covariance matrix) automatically, without use of cumbersome matrix algebra arguments or approximations. The prior is combined with the data and, given the weights (if any) and variance parameters, the estimating “mixed model” equations can be built and computed directly. The method was evaluated using a publicly available data set consisting of 599 inbred lines of wheat genotyped for binary markers and with full pedigree information; the target trait was grain yield. The evaluation used several experiments that simulated various patterns of incompleteness and a training-testing layout supplemented by bootstrapping or random reconstruction of sets. There were minor differences between SS-BLUP and Hy-BLUP in predictive ability.The discussion includes a multiple-trait generalization of Hy-BLUP that may be useful in situations where some individuals are not phenotyped for some trait (e.g., animal carcass weight in a fully-pedigreed breeding nucleus) while others (not pedigreed) are genotyped, scored and destroyed for commercial or laboratory purposes. The study provides a proof-of-concept of the potential usefulness of Hy-BLUP for routine genome-enabled prediction in individuals with irregular patterns of information.

Version published to 10.1101/2025.09.11.675626 on bioRxiv
Sep 16, 2025

Derivation of prediction error variance for non-genotyped individuals in genomic selection

This article has 3 authors:
1. Vinícius Junqueira
2. Marcos Jun-Iti Yokoo
3. Fernando Flores
This article has no evaluationsLatest version Dec 17, 2025
Impact of scale parameter for marker variance prior in some Bayesian whole-genome regression methods

This article has 2 authors:
1. Özge KOZAKLI
2. Ayhan CEYHAN
This article has no evaluationsLatest version Jan 20, 2026
Comparison of BLUPF90IOD3 and MiXBLUP implementations of the single-step model applied to the Polish national dairy cattle evaluation

This article has 4 authors:
1. Dawid Słomian
2. Michalina Jakimowicz
3. Tomasz Suchocki
4. Joanna Szyda
This article has no evaluationsLatest version Dec 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Derivation of prediction error variance for non-genotyped individuals in genomic selection

Impact of scale parameter for marker variance prior in some Bayesian whole-genome regression methods

Comparison of BLUPF90IOD3 and MiXBLUP implementations of the single-step model applied to the Polish national dairy cattle evaluation