Learning genetic values of individuals with incomplete pedigree, genomic and phenotypic data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Prediction of outcomes is important in personalized medicine, animal and plant breeding. Typical inputs for building prediction models in agriculture include genealogies, molecular markers and phenotypes. Information is seldom complete, i.e., there may be individuals that lack at least one of such inputs. For instance, all individuals may possess pedigree data but only a fraction is genotyped for molecular markers. A solution for such situation is known as “single-step best linear unbiased prediction” (SS-BLUP). A more general scenario is one where, in addition to the setting of SS-BLUP, there are subjects with genomic data but lacking genealogy, with or without phenotypes. Our study presents a novel “single-step” prediction method that accommodates a wider degree of incompleteness than SS-BLUP. It does not employ imputation or approximations and is based on basic Bayesian principles of combining distinct prior opinions. The proposed method, Hy-BLUP (“Hy” for “hybrid”) uses a prior that combines knowledge from the population about variation derived from pedigree and from markers, as if these two sources of information were independent. Such assumption may over-state prior precision, but Bayesian theory dictates that it should be over-ridden as information from data accrues.The Bayesian logic defines the weights assigned to the sources implictly. However, additional weights ( w A and w G for pedigree and genomic information, respectively) may be introduced as tuning parameters. From an inferential perspective, the weights and the variance components are not jointly identified in the likelihood function. However, given the variance components, some Bayesian learning about the weights can be obtained. The prior induces a precision matrix (inverse of the covariance matrix) automatically, without use of cumbersome matrix algebra arguments or approximations. The prior is combined with the data and, given the weights (if any) and variance parameters, the estimating “mixed model” equations can be built and computed directly. The method was evaluated using a publicly available data set consisting of 599 inbred lines of wheat genotyped for binary markers and with full pedigree information; the target trait was grain yield. The evaluation used several experiments that simulated various patterns of incompleteness and a training-testing layout supplemented by bootstrapping or random reconstruction of sets. There were minor differences between SS-BLUP and Hy-BLUP in predictive ability.The discussion includes a multiple-trait generalization of Hy-BLUP that may be useful in situations where some individuals are not phenotyped for some trait (e.g., animal carcass weight in a fully-pedigreed breeding nucleus) while others (not pedigreed) are genotyped, scored and destroyed for commercial or laboratory purposes. The study provides a proof-of-concept of the potential usefulness of Hy-BLUP for routine genome-enabled prediction in individuals with irregular patterns of information.

Article activity feed