Complementary evolution of coding and noncoding sequence underlies mammalian hairlessness

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    Several mammal species, including dolphins, have evolved to be relatively "hairless". Kowalczyk and colleagues scan the genomes of multiple species to identify genomic regions that appear to have evolved at a faster or slower evolutionary rate along hairless lineages. They identify a number of protein-coding genes as well as noncoding regions that might explain how hairlessness evolved in mammals. This study is of interest to those investigating the development of the skin and its appendages as well as evolutionary biologists, especially those investigating instances of convergent evolution and those developing phylogenomic methods for genome comparisons.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Body hair is a defining mammalian characteristic, but several mammals, such as whales, naked mole-rats, and humans, have notably less hair. To find the genetic basis of reduced hair quantity, we used our evolutionary-rates-based method, RERconverge, to identify coding and noncoding sequences that evolve at significantly different rates in so-called hairless mammals compared to hairy mammals. Using RERconverge, we performed a genome-wide scan over 62 mammal species using 19,149 genes and 343,598 conserved noncoding regions. In addition to detecting known and potential novel hair-related genes, we also discovered hundreds of putative hair-related regulatory elements. Computational investigation revealed that genes and their associated noncoding regions show different evolutionary patterns and influence different aspects of hair growth and development. Many genes under accelerated evolution are associated with the structure of the hair shaft itself, while evolutionary rate shifts in noncoding regions also included the dermal papilla and matrix regions of the hair follicle that contribute to hair growth and cycling. Genes that were top ranked for coding sequence acceleration included known hair and skin genes KRT2 , KRT35 , PKP1 , and PTPRM that surprisingly showed no signals of evolutionary rate shifts in nearby noncoding regions. Conversely, accelerated noncoding regions are most strongly enriched near regulatory hair-related genes and microRNAs, such as mir205 , ELF3 , and FOXC1 , that themselves do not show rate shifts in their protein-coding sequences. Such dichotomy highlights the interplay between the evolution of protein sequence and regulatory sequence to contribute to the emergence of a convergent phenotype.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    In this manuscript, Kowalczyk and colleagues report on identifying coding and non-coding genetic determinants of hairlessness in mammals using an approach they developed called RER-converge. The approach has previously been employed to examine several different traits in previous publications from this group. The authors determine that hairlessness is associated with relaxed evolutionary constraint at genetic loci and identify both coding genes and non-coding sequencing associated with this phenotype. Several known-hair-associated and novel genes and microRNAs are observed.

    This is a strong manuscript with interesting results. It is remarkable how robust this method is. There are a few places where I was not fully convinced of the choice to highlight a gene as "significant" however.

    In Figure 4 and the associated text and figure legend the claim is made that non-coding regions exhibit accelerated evolution of matrix and dermal papilla elements. However, the enrichment, even prior to multiple testing correction is not significant. Should this be reported on?

    We agree that some of the results that we displayed had borderline significance and we have clarified this in the text so the reader is aware. Our rationale for highlighting tissue annotations from borderline-significant enrichment results from noncoding analyses (matrix p=0.078 adj.p=0.18, dermal sheath p=0.059 adj.p=0.16, dermal papilla p=0.049 adj.p=0.16) is because we believe that these are an honest depiction of the trends we see in this scan (with alpha=0.2 for adjusted p-values), particularly when supported by effect sizes as reported through AUC. We prefer to set a generous threshold to avoid missing any meaningful results rather than setting a more stringent threshold. A more generous alpha is also more forgiving of the noise related to identifying noncoding regions and assigning them to genes.

    Related to the above, Table 1 includes just one 'significant gene,' with the remainder of the genes highlighted because they have a Bayes Factor ratio >5. Should a gene with a BF HvM be highlighted as a gene "whose evolutionary rates are significantly associated with the hairless phenotype?" Perhaps I am incorrect, but the hypothesis that is being tested by this approach seems distinct from "is the gene associated with hair loss."

    Similar to pathway enrichment analyses, we also used generous significance thresholds for gene-specific results to show our top, most significant results from protein-coding analyses. Significance of noncoding enrichment was not a criterion for inclusion/exclusion of genes in Table 1. Generally, some genes with significant convergent evolutionary rate shifts in protein coding sequence also have significant enrichment of convergent rate shifts in nearby noncoding regions (like PTPRM in Table 1), but many do not, which is also shown in Figure 6A. We have clarified column titles in Table 1 by adding (Gene) or (Noncoding) to indicate which sequences the values refer to.

    Bayes factors (BF) are a complementary Bayesian approach to analyze statistical associations that we use here to supplement information we get from our more traditional Kruskal-Wallis test. BF are easy to interpret because they directly describe the amount of support for our alternative hypothesis rather than indirectly describing support as p-values do. For example, in Table 1, the hypothesis that the evolution of FGF11 is associated with the evolution of mammalian hairlessness has 6,354.7 times more support than the null hypothesis that phenotype and gene evolution are not related. These large values are interpreted as supporting the alternative, which is equivalent to what we want to be able to interpret from p-values (i.e. low p-values allow us to reject the null and implicitly support the alternative).

    BF values in Table 1 are calculated using evolutionary rates in protein coding sequence and so are not expected to match values in the “Noncoding” columns. “BF Hairless” is directly related to the “Statistic” and “p-adj” columns, which is why the “BF Hairless” values are all quite large, indicating a large amount of support for an association between gene and phenotype evolution.

    The hypotheses that are tested with the “Statistic” and “p-adj” columns and the “BF HvM” column are colloquially the same: they both test to determine if the evolutionary rate of the gene is different in hairy mammals compared to hairless mammals. Only the details are different. The traditional statistics test for an association without accounting for marine mammals as a potential confounder. The BF tests check for a significant association that is driven more strongly by hairlessness than by marine habitat.

    Slightly more description of the Bayes factor calculation would be beneficial to the supplement. e.g. is the R package BayesFactor package being used here... or something else?

    We agree that a clearer description of Bayes factors is appropriate and have modified the methods description as follows:

    “In addition to calculating element-specific association statistics, Bayes factors were calculated for each gene using the marine and hairless phenotypes using the BayesFactor R package (Morey & Rouder, 2021). These values were calculated to disentangle the two phenotypes, which are heavily confounded since nearly all marine mammals in the genome alignment used for this work are hairless. Briefly, Bayes factors are a Bayesian approach complementary to more standard statistical tests. Instead of returning statistics and p-values, Bayes factors directly quantify the amount of support for an alternative hypothesis. For example, a Bayes factor value of 5 for a particular statistical test would indicate 5 times more support for the alternative hypothesis than the null hypothesis. Bayes factors can also be used to compare different alternative hypotheses by calculating the ratio of two Bayes factors. When considering the hairless phenotype, we use Bayes factors to quantify the support for a linear model predicting phenotype using evolutionary rate information from each gene, with a higher Bayes factor indicating greater support. We perform this calculation for two alternative hypotheses: 1) a gene shows different evolutionary rates in hairless versus hairy species, and 2) a gene shows different evolutionary rates in marine species versus non-marine species. The ratio of Bayes factors between the hairless and marine phenotypes quantifies the level of support of one phenotype over the other and thus can be used to tease apart intricacies of the two heavily-confounded phenotype. When the Bayes factor for the hairless phenotype is much larger than the Bayes factor for the marine phenotype, that indicates stronger support for signal driven by hairlessness.”

    Why are the qq-plot distributions of non-coding elements so distinct compared to coding? Some comment on this would be appreciated in the main text, even if briefly.

    We have added the following text as a tentative speculation about why noncoding elements seem to show more signal than coding signal:

    “Interestingly, noncoding regions appeared to show even stronger deviation from uniformity than coding regions, perhaps because regulatory changes more strongly underlie the convergent evolution of hairlessness.”

    Reviewer #3 (Public Review):

    The authors present a phylogenetic analysis of evolutionary rates as they correlate with independently derived "hairlessness" across mammals. This is a very good paper, well written and very carefully analyzed. This paper makes a number of interesting biological insights, including the identification of protein coding as well as noncoding regions that appear to evolve in correlated fashion with hairlessness.

    I have several recommendations:

    1. The main assumption behind this experiment is that species "use" the same genes to accomplish hairlessness. Only then would one predict correlated rate shifts along hairless lineages. If, on the other hand, each hairless species used a unique gene to accomplish hairlessness, then one might only see a rate shift on that species' lineage. Therefore, a complementary approach might be to i) define all genes with known involvement in hair morphology (i.e., genes in the categories listed in Fig. 1C). ii) test how many of those genes show a significant rate shift in at least one hairless lineage. iii) test whether hair genes are more likely to show at least one rate shift compared to genomic background. This complementary analysis would relax the assumption that all hairless species show similar rate shifts compared to haired species.

    Our analyses detect convergently evolving genomic elements associated with hairlessness for two reasons. First, species-specific analyses may detect genomic changes associated with any unique phenotypes in a particular species and it is difficult to distinguish which of those genomic changes are associated with hairlessness. Second, we are seeking genomic elements associated with hair growth in all mammals and species-specific adaptations will not be shared across all mammals.

    Nevertheless, we conducted a complementary analysis to test for rate shifts specific to each hairless species compared to all of the non-hairless species. We then tested for enrichment of hair follicle genes among genes with significant rate shifts in different numbers of hairless species. For example, among all genes with significant rate shifts in at least one hairless species, is there an enrichment of hair follicle genes? Then, among all genes with significant rate shifts in at least two hairless species, is there an enrichment of hair follicle genes? Et cetera until we test for enrichment only in genes with rates shifts in all ten hairless species. As expected, the signal of enrichment gets stronger as more species share the rate shift (the “convergent signal”). This happens because the genes with shared rate shifts are more hair-specific than the genes with unshared rate shifts.

    We also performed another analysis to test for enrichment of hair follicle genes among genes with significant rate shifts per hairless species. For example, in orca, are the genes with significant rate shifts enriched for hair follicle genes? To complement this analysis, we also repeated the procedure for non-hairless species for comparison. Only two of the ten hairless species show species-specific hair follicle enrichments, which indicates that most of the hairless species alone are insufficient to detect hair signal at all. Even among the two species with significant enrichment, there are thousands of total genes identified, many of which are likely related to other unique characteristics of those species other than hairlessness, and it is impossible to distinguish the hair-related genes from the other genes without additional information.

    All of these results are reported in the manuscript in the text and figures shown below:

    Species-Specific Analyses

    In addition to conducting convergent evolution analyses to identify genetic elements evolving at different rates across all hairless species, we also conducted complementary analyses to detect elements evolving at different rates in individual hairless species to demonstrate the importance of convergent evolution in our analyses. Indeed, the strength of enrichment for hair follicle-related genes among top hits steadily increases as more hairless species share rate shifts in those genes, an indicator of the power of the convergent signal (Figure 2). Further, analyses on single species alone only show enrichment for hair follicle-related genes among top hits in two hairless species out of ten – armadillo and pig (Figure 2 Supplement 1). Together, these results demonstrate the importance of testing for convergent evolutionary rate shifts across all hairless mammals to best detect hair-related elements.

    Also of important note is that every individual hairless species has thousands of genes with significant rate shifts in that species (Supp. File 10). It is impossible to tell which of those rate shifts is associated with hairlessness specifically because the species have many unique phenotypes other than hairlessness that could be responsible for rate shifts in their respective genes. Convergent analyses allow for more concrete identification of hair-related elements by weeding out rate shifts that are not shared across species with the convergent hairless phenotype.

    1. It would be interesting to break up noncoding into additional strata. For example, one might predict that rate shifts in predicted transcription factor binding sites would have a larger functional impact than rate shifts in noncoding regions with no function. Or... that rate shifts in highly conserved noncoding regions vs. less conserved noncoding regions.

    We have performed extensive analyses to investigate the roles of TFBSs in the convergent evolution of hairlessness and found little enrichment of specific TFBS in our top noncoding regions from RERconverge. Perhaps because the noncoding regions are highly conserved, they contain many potential locations for TF binding and so it may be more reasonable to consider their full stretch of sequence as functional than it would be if they were less conserved.

    We have calculated conservation scores for noncoding regions and found no global association between RERconverge results and sequence conservation score.

    1. Why is aardvark considered a haired species? Aardvarks have as much (or as little) hair as pigs.

    Body hair is a difficult phenotype to categorize in mammals because all mammals do have hair. In order to create a binary distinction between hairy and hairless mammals, we needed to make a choice about where to draw that line. We were particularly concerned about the impact of assigning some of the hairier mammals, like pig, armadillo, and human, as hairless, so we performed the drop-out tests shown in Figure 4 to demonstrate that removing individual hairless species from our analyses does not change the overall signal. Indeed, removing pig impacts detection of genes in the two hair-related pathways shown less than removing clearly hairless species like killer whale or dolphin. We believe that these results are sufficient to demonstrate that subtle differences in phenotyping decisions will not substantially change the findings stated in our manuscript.

    1. The primary goal of the paper is to identify coding/noncoding regions that show shifts in evolutionary that are correlated on hairless vs. haired lineages. I was left wondering... when these correlations are found, how often is it due to the same mutations hitting the regions vs. mutations randomly hitting the same regions. If the former, this would suggest some limited way that species can achieve "hairlessness".

    In general, we do not expect amino acid convergence (for genes) or nucleotide convergence (for noncoding regions) to drive much of the signal we detect using RERconverge. For species separated by millions of years of evolutionary time, it is highly unlikely that a change in a single amino acid (or nucleotide) would drive exactly the same phenotypic change for a highly complex phenotype like hairlessness. However, we argue that there do appear to be some limited ways that species become hairless, albeit at the scale of evolutionary rates across a length of sequence rather than individual bases.

    Related to this point is the distinction between positively selected regions compared to regions under reduced constraint, which we would expect to accumulate mutations randomly.

    For genes, we believe that accelerated evolution of specific genomic regions in hairless species is caused by an accumulation of random mutations, not positive selection or specific targeted mutations. As stated in the manuscript, we performed branch-site tests for positive selection on our top genes, all KRTs, and all KRTAPs, and we found little indication that quickly evolving genes are undergoing positive selection specific to hairless species. This conclusion is also consistent with the hypothesis that genes under relaxation of evolutionary constraint will have rate shifts that are easier to detect over long periods of evolutionary time compared to genes under more subtle and short-lived periods of positive selection in association with the establishment of a new phenotype.

    For noncoding regions, it is much more difficult to distinguish positive selection from relaxation of evolutionary constraint because it is difficult to establish an estimate of neutral evolution for those sequences. Models of positive selection in regulatory sequence is a current area of emerging research in the field and are not yet reliable enough to make the distinction between positive selection and accumulation of random mutations.

  2. eLife assessment

    Several mammal species, including dolphins, have evolved to be relatively "hairless". Kowalczyk and colleagues scan the genomes of multiple species to identify genomic regions that appear to have evolved at a faster or slower evolutionary rate along hairless lineages. They identify a number of protein-coding genes as well as noncoding regions that might explain how hairlessness evolved in mammals. This study is of interest to those investigating the development of the skin and its appendages as well as evolutionary biologists, especially those investigating instances of convergent evolution and those developing phylogenomic methods for genome comparisons.

  3. Reviewer #1 (Public Review):

    In this manuscript, Kowalczyk and colleagues report on identifying coding and non-coding genetic determinants of hairlessness in mammals using an approach they developed called RER-converge. The approach has previously been employed to examine several different traits in previous publications from this group. The authors determine that hairlessness is associated with relaxed evolutionary constraint at genetic loci and identify both coding genes and non-coding sequencing associated with this phenotype. Several known-hair-associated and novel genes and microRNAs are observed.

    This is a strong manuscript with interesting results. It is remarkable how robust this method is. There are a few places where I was not fully convinced of the choice to highlight a gene as "significant" however.

    In Figure 4 and the associated text and figure legend the claim is made that non-coding regions exhibit accelerated evolution of matrix and dermal papilla elements. However, the enrichment, even prior to multiple testing correction is not significant. Should this be reported on?

    Related to the above, Table 1 includes just one 'significant gene,' with the remainder of the genes highlighted because they have a Bayes Factor ratio >5. Should a gene with a BF HvM be highlighted as a gene "whose evolutionary rates are significantly associated with the hairless phenotype?" Perhaps I am incorrect, but the hypothesis that is being tested by this approach seems distinct from "is the gene associated with hair loss."

    Slightly more description of the Bayes factor calculation would be beneficial to the supplement. e.g. is the R package BayesFactor package being used here... or something else?

    Why are the qq-plot distributions of non-coding elements so distinct compared to coding? Some comment on this would be appreciated in the main text, even if briefly.

  4. Reviewer #2 (Public Review):

    'Hairlessness' has convergently evolved numerous times in mammals. In this paper the authors look for patterns in the rate of DNA sequence evolution across the mammalian phylogeny to identify regions of the genome that are independently evolving at similar rates in hairless mammals. The authors find that signatures of convergent accelerated sequence evolution in hairless mammals is biased towards coding and gene regulatory regions known to be involved in hair biology, likely reflecting genetic drift following hair reduction. This bias toward hair-relevant genomic regions also highlights the utility of this approach to identify new candidate regions of the genome that haven't previously been implicated in hair biology and the authors describe several intriguing coding and non-coding candidates. Authors further find that genes and putative gene-regulatory regions have non-random patterns of drift, with mutations in coding regions biased toward proteins that compose physical aspects of the hair sheath.

    The analysis in this paper is centered on the RERconverge tool. Importantly, the authors have taken numerous steps to address potential issues with such an approach. One issue with RERconverge is the need to include/exclude ancestral branches as having a trait, which introduces assumptions about ancestral states. The authors controlled for this by running multiple variations of RERconverge with and without ancestral states as being 'hairless' with no major impact on results. The authors also controlled for whether certain lineages are driving the correlation signal, and found that removal of any given lineage does not impact skin or hair follicle enrichments. Finally, the authors have adequately distinguished whether other common phenotypes in hairless mammals (e.g. marine lifestyle or body size) drive the convergent signals in the dataset and found the reported genetic signatures are best explained by hair loss compared to these other traits.

    The paper should be of interest to a broad selection of biologists interested in evolution, development and phylogenomic methods. The candidate genes identified in this paper provide a compelling launching point for future experimental studies into the genetic basis of hair.

  5. Reviewer #3 (Public Review):

    The authors present a phylogenetic analysis of evolutionary rates as they correlate with independently derived "hairlessness" across mammals. This is a very good paper, well written and very carefully analyzed. This paper makes a number of interesting biological insights, including the identification of protein coding as well as noncoding regions that appear to evolve in correlated fashion with hairlessness.

    I have several recommendations:

    1. The main assumption behind this experiment is that species "use" the same genes to accomplish hairlessness. Only then would one predict correlated rate shifts along hairless lineages. If, on the other hand, each hairless species used a unique gene to accomplish hairlessness, then one might only see a rate shift on that species' lineage. Therefore, a complementary approach might be to i) define all genes with known involvement in hair morphology (i.e., genes in the categories listed in Fig. 1C). ii) test how many of those genes show a significant rate shift in **at least one hairless lineage**. iii) test whether hair genes are more likely to show at least one rate shift compared to genomic background. This complementary analysis would relax the assumption that all hairless species show similar rate shifts compared to haired species.

    2. It would be interesting to break up noncoding into additional strata. For example, one might predict that rate shifts in predicted transcription factor binding sites would have a larger functional impact than rate shifts in noncoding regions with no function. Or... that rate shifts in highly conserved noncoding regions vs. less conserved noncoding regions.

    3. Why is aardvark considered a haired species? Aardvarks have as much (or as little) hair as pigs.

    4. The primary goal of the paper is to identify coding/noncoding regions that show shifts in evolutionary that are correlated on hairless vs. haired lineages. I was left wondering... when these correlations are found, how often is it due to the same mutations hitting the regions vs. mutations randomly hitting the same regions. If the former, this would suggest some limited way that species can achieve "hairlessness".