Relating multivariate shapes to genescapes using phenotype-biological process associations for craniofacial shape

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper offers a new take on multivariate genotype-phenotype mapping that identifies the joint phenotypic effect of genes involved in known biological processes that impact craniofacial variation. More specifically, the work expands on the traditional idea of candidate gene investigations into candidate biological process investigations, grouping multiple genes into a single analysis. In doing so, the authors show the joint effects of three strong candidate processes, chondrocyte differentiation, determination of left/right symmetry, and palate development on multidimensional craniofacial shape in the heterogenous Diversity Outbred mouse population.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Realistic mappings of genes to morphology are inherently multivariate on both sides of the equation. The importance of coordinated gene effects on morphological phenotypes is clear from the intertwining of gene actions in signaling pathways, gene regulatory networks, and developmental processes underlying the development of shape and size. Yet, current approaches tend to focus on identifying and localizing the effects of individual genes and rarely leverage the information content of high-dimensional phenotypes. Here, we explicitly model the joint effects of biologically coherent collections of genes on a multivariate trait – craniofacial shape – in a sample of n = 1145 mice from the Diversity Outbred (DO) experimental line. We use biological process Gene Ontology (GO) annotations to select skeletal and facial development gene sets and solve for the axis of shape variation that maximally covaries with gene set marker variation. We use our process-centered, multivariate genotype-phenotype (process MGP) approach to determine the overall contributions to craniofacial variation of genes involved in relevant processes and how variation in different processes corresponds to multivariate axes of shape variation. Further, we compare the directions of effect in phenotype space of mutations to the primary axis of shape variation associated with broader pathways within which they are thought to function. Finally, we leverage the relationship between mutational and pathway-level effects to predict phenotypic effects beyond craniofacial shape in specific mutants. We also introduce an online application that provides users the means to customize their own process-centered craniofacial shape analyses in the DO. The process-centered approach is generally applicable to any continuously varying phenotype and thus has wide-reaching implications for complex trait genetics.

Article activity feed

  1. Evaluation Summary:

    This paper offers a new take on multivariate genotype-phenotype mapping that identifies the joint phenotypic effect of genes involved in known biological processes that impact craniofacial variation. More specifically, the work expands on the traditional idea of candidate gene investigations into candidate biological process investigations, grouping multiple genes into a single analysis. In doing so, the authors show the joint effects of three strong candidate processes, chondrocyte differentiation, determination of left/right symmetry, and palate development on multidimensional craniofacial shape in the heterogenous Diversity Outbred mouse population.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

  2. Reviewer #1 (Public Review):

    In this paper, the authors use a multivariate genotype-phenotype method to assess the broader association of a group of related genes to set of multivariate complex phenotypes. In particular, they investigate the genetic association of genes related to a specific gene ontology (GO) term with a multivariate representation of craniofacial shape. With this type of analysis, they demonstrated that different 'processes', e.g., different GO terms, influence different aspects of craniofacial shape. Using regularized partial least squares, the authors quantitate the proportion of variation in craniofacial shape that can be attributed to genetic differences in a particular process. The association between the process and aspects of craniofacial shape are further explored by examining the changes in those same aspects of craniofacial shape in mice that have been genetic manipulated. A web app is available to use the data and methods described in this paper to identify associations between a MPG genetic axis derived for a particular process, the aspects of craniofacial shape associated with that genetic axis, and the changes in those aspects of craniofacial shape induced by the genetic manipulation of a single gene.

    Strengths

    The authors have an extensive data set from Diversity Outbred mice on craniofacial shape and genetic variation. With over a thousand mice, they have ample power for these types of analyses.

    Much of traditional complex trait genetic analyses are focused on breaking complex trait down into quantitative components that can be measured precisely and examine one genetic marker at a time. However, this traditional approach is counter-intuitive to what we know about complex traits. With this method, the analytical and objective decision about how to capture the genetic influences on multiple correlated and highly interdependent quantitative measures of a biological phenomenon is driven by the data rather than by the researcher. This method also allows the user to break away from the mentality of one gene to one trait and acknowledges that disruption of any number of genes can often produce a similar phenotypic outcome and the disruption of the process is more relevant to the outcome than the disruption of any single gene.

    Weakness

    One of the challenges with multivariate analyses of this type is how to measure success of the model. In this case, the authors compared their genotype-phenotype results to phenotype results from genetically manipulated mice. While this methods is recognized to have advantages, there are disadvantages to this approach that there not fully addressed.

    Within the manuscript, there is an emphasis on the concordant direction of association between the process MGP axis and the axis of shape variation of a relevant mutant phenotype. The reviewers had concerns about the assumptions made and the implications of those assumptions for the interpretations of the results.

    Overall, the discussion sections is overly strongly worded.

  3. Reviewer #2 (Public Review):

    Despite the strong premise, the implementation of the multivariate genotype-phenotype (MGP) approach from biological processes also presents a few shortcomings. First, as properly introduced by the authors, candidate versus genome-wide marker set investigations are two distinct approaches, each with their respective advantages and disadvantages. The proposed methodology is based on candidate selections of processes and therefore a group of genes in support of hypothesis-driven research. In contrast to hypothesis-free investigations (e.g., genome-wide association scans, GWAS) such an approach does not allow to "discover" new associations outside the known genome annotations today, and therefore help solving the mystery of the non-coding (non-gene) parts of DNA or to discover new gene-pathways and interactions. However, combining a multitude of markers across multiple genes in an unsupervised and genome-wide manner as input to a multivariate genotype to multivariate phenotype investigation remains problematic. These issues are well discussed and acknowledged by the authors.

    The deployed MGP methodology, based on partial least squares (PLS), was presented in 2016 (1), following the citation of the authors, (in fact something similar was presented before that in 2012 (2)), but an actual genome-wide use of the technique has not been witnessed yet, to the best of my knowledge. The main reason in my opinion, is that this PLS technique is indeed prone to overfitting, as stated by the authors and the work of 2012, and further that statistical testing is obtained under permutation/randomization or cross validation. These are computationally intractable at the level of millions of SNPs to investigate in GWAS today. Alternatively, is the use of canonical correlation analysis (CCA), which resonates PLS very closely with the distinction of optimizing correlation instead of covariance in search of connecting latent dimensions between two multivariate variables. I.e. both methods are very much related (3) and both are prone to overfitting. However, CCA does have to advantage to report parametric-based p-values that are computationally tractable, which has been used in a recent GWAS on multivariate facial shape (4). The main difference with the current work is that (4) and its predecessor (5) performed a more simple SNP variant by SNP variant investigation only, to avoid overfitting, while still modeling multivariate facial shape. However, the literature on gene-based and/or haplotype- based GWAS instead of SNP-by-SNP based GWAS also lists CCA among others as a common tool to use, and it is of interest to relate the work presented here methodologically to what is done in such multivariate genotype to multivariate phenotype GWAS. It is observed that multiple SNPs within a single gene or haplotype do require extensive pruning before inputted to MV association techniques. Of great distinction and worth emphasizing, is that these remain limited at the level of a single gene at the most, and that the presented work, for the first-time associates across multiple genes (of note, all genes are represented by only an average of two genetic markers within the gene, so that a single gene is certainly not oversampled in comparison to the other genes in the group).

    On the matter of overfitting, the authors deploy a regularization and restrict themselves to the first PLS component as an outcome of the association. Although necessary from an overfitting perspective, at the same time it reduces my enthusiasm in the results presented. First, any kind of regularization is typically user-defined and tuned, making it hard to judge how robust and how well the results generalize. Unfortunately, despite the interesting overlap with mutant phenotypes, the work does not present an independent replication of the associations found, and this in a separate dataset. Second, in the case of CCA, and most likely by relationship in PLS as well, it is not always the case that the first latent dimension is the meaningful one. Therefore, the question becomes, what is missed by not including additional components, or at least testing how many components seem relevant. Third, as a by-product of the regularization, alongside the focus on a single latent component, the results as presented go from a group of genes, to a focus on one or a few of the genes only. In other words, the question now is, to what extent is the group analysis more powerful than a gene-by-gene based analysis, since regularization especially forces a sparse loading on multiple input features (in this case genes).

    While the three example processes are interesting and easy to understand or follow in terms of, how this is of interest concern remains about the interpretation of the follow-up analyses.

    It is worth noting that it is generally very hard to visualize high-dimensional data and the authors did a great job, but it is somewhat disappointing starting off introducing a complex multidimensional problem followed by a potential solution in terms of methodology (PLS) and then in contradiction working with limited dimensions throughout. Towards the future, with increasing datasets and therefore reduced danger of overfitting, it will be of great interest to expand the dimensionalities explored.

    1. Mitteroecker P, Cheverud JM, Pavlicev M. Multivariate Analysis of Genotype-Phenotype Association. Genetics. 2016 Apr;202(4):1345-63.

    2. Le Floch E, Guillemot V, Frouin V, Pinel P, Lalanne C, Trinchera L, et al. Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares. NeuroImage. 2012 Oct 15;63(1):11-24.

    3. Sun L, Ji S, Yu S, Ye J. On the equivalence between canonical correlation analysis and orthonormalized partial least squares. In: Proceedings of the 21st international jont conference on Artifical intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 2009. p. 1230-5. (IJCAI'09). \

    4. White JD, Indencleef K, Naqvi S, Eller RJ, Hoskens H, Roosenboom J, et al. Insights into the genetic architecture of the human face. Nat Genet. 2021 Jan;53(1):45-53.

    5. Claes P, Roosenboom J, White JD, Swigut T, Sero D, Li J, et al. Genome-wide mapping of global-to-local genetic effects on human facial shape. Nat Genet. 2018 Mar;50(3):414-23.

  4. Reviewer #3 (Public Review):

    This paper aims to generate biologically and developmentally meaningful genotype-phenotype maps of craniofacial shape variation in mice. The authors acknowledge that genotype-phenotype maps are multivariate in nature (many loci have joint effects on complex phenotypes) and therefore look for associations between multiple loci and multivariate measures of craniofacial shape. And, to gain developmentally relevant information, they constrain the analysis to genetic variation that is found in known biological processes/pathways. To find genotype-phenotype associations they use regularized partial least squares that estimates the vectors of phenotypic and genetic variation (in the genes that correspond to the biological process of interest) that have maximum correlation - as a result the overall morphological effect of the pathway is identified, as well as the relative importance of each of the genes for such phenotypic variation.

    This approach sheds new light on how natural (found in outbred mice) genetic variation in well-understood biological processes affects adult craniofacial shape, and allows the comparison between phenotypic effects of different pathways. The authors also developed a web interface that will allow anyone to explore the phenotypic effects of their biological process of interest, not restricted to the ones explored in the manuscript.

    The study offers a very useful new perspective on how genetic variation translates into phenotypic variation in a multivariate context, and it should be relevant not only for shape phenotypes but for any other complex multivariate phenotype like gene expression or behavioral measurements. However, there are two points that should be taken into consideration when assessing the novelty and predictive power of the approach:

    The novelty of this method is very overstated throughout the paper. The authors state to be using a method previously published by Mitteroecker et al 2016 with the twist of restricting the analysis to known biological processes. It is not clear in the manuscript how much of their approach is actually new and how much is Mitteroecker's applied to a subset of markers.

    The approach provides the phenotypic effect of genetic variation in already known pathways but it does not result in new genotype-phenotype associations; this is acknowledged in the text. However, the manuscript suggest that the results generate testable hypothesis which this reviewer found to be over reaching based on the data present.