Evolution of novel mimicry polymorphisms through Haldane’s sieve and rare recombination

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This important study provides new and nuanced insights into the evolution of morphs in a textbook example of Batesian mimicry. The evidence supporting the claims about the origin and dominance relationships among morphs is convincing, but the interpretation of signals needs improvement with complementary analysis and some nuanced interpretation. Pending a revision, this work will be of interest to a broad range of evolutionary biologists.

This article has been Reviewed by the following groups

Read the full article

Abstract

Origins of phenotypic novelty represent a paradox. Maintenance of distinct, canalized morphs usually requires a complex array of polymorphisms, whose co-retention requires a genetic architecture resistant to recombination, involving inversions and master regulators. Here, we reveal how such a constraining architecture can still accommodate novel morphs in evolving polymorphisms using the classic polymorphic Batesian mimicry in Papilio polytes , whose supergene-like genetic architecture is maintained in a large inversion. We show that rapidly evolving alleles of the conserved gene, doublesex , within this inversion underlie the genetic basis of this polymorphism. Using precisely dated phylogeny and breeding experiments, we show that novel adaptive mimetic morphs and underlying alleles evolved in a sequentially dominant manner, undergoing selective sweeps in the mimetic species as predicted under Haldane’s sieve. Furthermore, we discovered that mimetic forms share precise inversion breakpoints, allowing rare exon swaps between the universally dominant and a recessive allele to produce a novel, persistent intermediate phenotype, ultimately facilitating the acquisition of phenotypic novelty. Thus, genetic dominance, selective sweeps, rapid molecular divergence, and rare recombination promote novel forms in this iconic evolving polymorphism, resolving the paradox of phenotypic novelty arising even in highly constrained genetic architectures.

Article activity feed

  1. Author Response:

    Reviewer #1 (Public review):

    In this study, Deshmukh et al. provide an elegant illustration of Haldane's sieve, the population genetics concept stating that novel advantageous alleles are more likely to fix if dominant because dominant alleles are more readily exposed to selection. To achieve this, the authors rely on a uniquely suited study system, the female-polymorphic butterfly Papilio polytes.

    Deshmukh et al. first reconstruct the chronology of allele evolution in the P. polytes species group, clearly establishing the non-mimetic cyrus allele as ancestral, followed by the origin of the mimetic allele polytes/theseus, via a previously characterized inversion of the dsx locus, and most recently, the origin of the romulus allele in the P. polytes lineage, after its split from P. javanus. The authors then examine the two crucial predictions of Haldane's sieve, using the three alleles of P. polytes (cyrus, polytes, and romulus). First, they report with compelling evidence that these alleles are sequentially dominant, or put in other words, novel adaptive alleles either are or quickly become dominant upon their origin. Second, the authors find a robust signature of positive selection at the dsx locus, across all five species that share the polytes allele.

    In addition to exquisitely exemplifying Haldane's sieve, this study characterizes the genetic differences (or lack thereof) between mimetic alleles at the dsx locus. Remarkably, the polytes and romulus alleles are profoundly differentiated, despite their short divergence time (< 0.5 my), whereas the polytes and theseus alleles are indistinguishable across both coding and intronic sequences of dsx. Finally, the study reports incidental evidence of exon swaps between the polytes and romulus alleles. These exon swaps caused intermediate colour patterns and suggest that (rare) recombination might be a mechanism by which novel morphs evolve.

    This study advances our understanding of the evolution of the mimicry polymorphism in Papilio butterflies. This is an important contribution to a system already at the forefront of research on the genetic and developmental basis of sex-specific phenotypic morphs, which are common in insects. More generally, the findings of this study have important implications for how we think about the molecular dynamics of adaptation. In particular, I found that finding extensive genetic divergence between the polytes and romulus alleles is striking, and it challenges the way I used to think about the evolution of this and other otherwise conserved developmental genes. I think that this study is also a great resource for teaching evolution. By linking classic population genetic theory to modern genomic methods, while using visually appealing traits (colour patterns), this study provides a simple yet compelling example to bring to a classroom.

    In general, I think that the conclusions of the study, in terms of the evolutionary history of the locus, the dominance relationships between P. polytes alleles, and the inference of a selective sweep in spite of contemporary balancing selection, are strongly supported; the data set is impressive and the analyses are all rigorous. I nonetheless think that there are a few ways in which the current presentation of these data could lead to confusion, and should be clarified and potentially also expanded.

    We thank the reviewer for the kind and encouraging assessment of our work.

    (1) The study is presented as addressing a paradox related to the evolution of phenotypic novelty in "highly constrained genetic architectures". If I understand correctly, these constraints are assumed to arise because the dsx inversion acts as a barrier to recombination. I agree that recombination in the mimicry locus is reduced and that recombination can be a source of phenotypic novelty. However, I'm not convinced that the presence of a structural variant necessarily constrains the potential evolution of novel discrete phenotypes. Instead, I'm having a hard time coming up with examples of discrete phenotypic polymorphisms that do not involve structural variants. If there is a paradox here, I think it should be more clearly justified, including an explanation of what a constrained genetic architecture means. I also think that the Discussion would be the place to return to this supposed paradox, and tell us exactly how the observations of exon swaps and the genetic characterization of the different mimicry alleles help resolve it.

    The paradox that we refer to here is essentially the contrast of evolving new adaptive traits which are genetically regulated, while maintaining the existing adaptive trait(s) at its fitness peak. While one of the mechanisms to achieve this could be differential structural rearrangement at the chromosomal level, it could arise due to alternative alleles or splice variants of a key gene (caste determination in Cardiocondyla ants), and differential regulation of expression (the spatial regulation of melanization in Nymphalid butterflies by ivory lncRNA). In each of these cases, a new mutation would have to give rise to a new phenotype without diluting the existing adaptive traits when it arises. We focused on structural variants, because that was the case in our study system, however, the point we were making referred to evolution of novel traits in general. We will add a section in the revised discussion to address this.

    (2) While Haldane's sieve is clearly demonstrated in the P. polytes lineage (with cyrus, polytes, and romulus alleles), there is another allele trio (cyrus, polytes, and theseus) for which Haldane's sieve could also be expected. However, the chronological order in which polytes and theseus evolved remains unresolved, precluding a similar investigation of sequential dominance. Likewise, the locus that differentiates polytes from theseus is unknown, so it's not currently feasible to identify a signature of positive selection shared by P. javanus and P. alphenor at this locus. I, therefore, think that it is premature to conclude that the evolution of these mimicry polymorphisms generally follows Haldane's sieve; of two allele trios, only one currently shows the expected pattern.

    We agree with the reviewer that the genetic basis of f. theseus requires further investigation. f. theseus occupies the same level on the dominance hierarchy of dsx alleles as f. polytes (Clarke and Sheppard, 1972) and the allelic variant of dsx present in both these female forms is identical, so there exists just one trio of alleles of dsx. Based on this evidence, we cannot comment on the origin of forms theseus and polytes. They could have arisen at the same time or sequentially. Since our paper is largely focused on the sequential evolution of dsx alleles through Haldane’s sieve, we have included f. theseus in our conclusions. We think that it fits into the framework of Haldane’s sieve due to its genetic dominance over the non-mimetic female form. However, this aspect needs to be explored further in a more specific study focusing on the characterization, origin, and developmental genetics of f. theseus in the future.

    Reviewer #2 (Public review):

    Summary:

    Deshmukh and colleagues studied the evolution of mimetic morphs in the Papilio polytes species group. They investigate the timing of origin of haplotypes associated with different morphs, their dominance relationships, associations with different isoform expressions, and evidence for selection and recombination in the sequence data. P. polytes is a textbook example of a Batesian mimic, and this study provides important nuanced insights into its evolution, and will therefore be relevant to many evolutionary biologists. I find the results regarding dominance and the sequence of events generally convincing, but I have some concerns about the motivation and interpretation of some other analyses, particularly the tests for selection.

    We thank the reviewer for these insightful remarks.

    Strengths:

    This study uses widespread sampling, large sample sizes from crossing experiments, and a wide range of data sources.

    We appreciate this point. This strength has indeed helped us illuminate the evolutionary dynamics of this classic example of balanced polymorphism.

    Weaknesses:

    (1) Purpose and premise of selective sweep analysis

    A major narrative of the paper is that new mimetic alleles have arisen and spread to high frequency, and their dominance over the pre-existing alleles is consistent with Haldane's sieve. It would therefore make sense to test for selective sweep signatures within each morph (and its corresponding dsx haplotype), rather than at the species level. This would allow a test of the prediction that those morphs that arose most recently would have the strongest sweep signatures.

    Sweep signatures erode over time - see Figure 2 of Moest et al. 2020 (https://doi.org/10.1371/journal.pbio.3000597), and it is unclear whether we expect the signatures of the original sweeps of these haplotypes to still be detectable at all. Moest et al show that sweep signatures are completely eroded by 1N generations after the event, and probably not detectable much sooner than that, so assuming effective population sizes of these species of a few million, at what time scale can we expect to detect sweeps? If these putative sweeps are in fact more recent than the origin of the different morphs, perhaps they would more likely be associated with the refinement of mimicry, but not necessarily providing evidence for or against a Haldane's sieve process in the origin of the morphs.

    Our original plan was to perform signatures of sweeps on individual morphs, but we have very small sample sizes for individual morphs in some species, which made it difficult to perform the analysis. We agree that signatures of selective sweeps cannot give us an estimate of possible timescales of the sweep. They simply indicate that there may have been a sweep in a certain genomic region. Therefore, with just the data from selective sweeps, we cannot determine whether these occurred with refining of mimicry or the mimetic phenotype itself. We have thus made no interpretations regarding time scales or causal events of the sweep. Additionally, we discuss the results we obtained for individual alleles represent what could have occurred at the point of origin of mimetic resemblance or in the course of perfecting the resemblance, although we cannot differentiate between the two at this point (lines 320 to 333).

    (2) Selective sweep methods

    A tool called RAiSD was used to detect signatures of selective sweeps, but this manuscript does not describe what signatures this tool considers (reduced diversity, skewed frequency spectrum, increased LD, all of the above?). Given the comment above, would this tool be sensitive to incomplete sweeps that affect only one morph in a species-level dataset? It is also not clear how RAiSD could identify signatures of selective sweeps at individual SNPs (line 206). Sweeps occur over tracts of the genome and it is often difficult to associate a sweep with a single gene.

    RAiSD (https://www.nature.com/articles/s42003-018-0085-8) detects selective sweeps using the μ statistic, which is a combined score of SFS, LD, and genetic diversity along a chromosome. The tool is quite sensitive and is able to detect soft sweeps. RAiSD can use a VCF variant file comprising of SNP data as input and uses an SNP-driven sliding window approach to scan the genome for signatures of sweep. Using an SNP file instead of runs of sequences prevents repeated calculations in regions that are sparse in variants, thereby optimizing execution time. Due to the nature of the input we used, the μ statistic was also calculated per site. We then tried to annotate the SNPs based on which genes they occur in and found that all species showing mimicry had atleast one site that showed a signature of sweep contained within the dsx locus.

    (3) Episodic diversification

    Very little information is provided about the Branch-site Unrestricted Statistical Test for Episodic Diversification (BUSTED) and Mixed Effects Model of Evolution (MEME), and what hypothesis the authors were testing by applying these methods. Although it is not mentioned in the manuscript, a quick search reveals that these are methods to study codon evolution along branches of a phylogeny. Without this information, it is difficult to understand the motivation for this analysis.

    We thank you for bringing this to our notice, we will add a few lines in the Methods about the hypothesis we were testing and the motivation behind this analysis. We will additionally cite a previous study from our group which used these and other methods to study the molecular evolution of dsx across insect lineages.

    (4) GWAS for form romulus

    The authors argue that the lack of SNP associations within dsx for form romulus is caused by poor read mapping in the inverted region itself (line 125). If this is true, we would expect strong association in the regions immediately outside the inversion. From Figure S3, there are four discrete peaks of association, and the location of dsx and the inversion are not indicated, so it is difficult to understand the authors' interpretation in light of this figure.

    We indeed observe the regions flanking dsx showing the highest association in our GWAS. This is a bit tricky to demonstrate in the figure as the genome is not assembled at the chromosome level. However, the association peaks occur on scf 908437033 at positions 2192979, 1181012 and 1352228 (Fig. S3c, Table S3) while dsx is located between 1938098 and 2045969. We will add the position of dsx in the figure legend of the revised manuscript.

    (5) Form theseus

    Since there appears to be only one sequence available for form theseus (actually it is said to be "P. javanus f. polytes/theseus"), is it reasonable to conclude that "the dsx coding sequence of f. theseus was identical to that of f. polytes in both P. javanus and P. alphenor" (Line 151)? Looking at the Clarke and Sheppard (1972) paper cited in the statement that "f. polytes and f. theseus show equal dominance" (line 153), it seems to me that their definition of theseus is quite different from that here. Without addressing this discrepancy, the results are difficult to interpret.

    Among P. javanus individuals sampled by us, we obtained just one individual with f. theseus and the H P allele, however, in the data we added from a previously published study (Zhang et. al. 2017), we were able to add nine more individuals of this form (Fig. S4b and S7), while we did not show these individuals in Fig 3 (which was based on PCR amplification and sequencing of individual exons od dsx), all the analysis with sequence data was performed on 10 theseus individuals in total. In Zhang et. al. the authors observed what we now know are species specific differences when comparing theseus and polytes dsx alleles and not allele-specific differences. Our observations were consistent with these findings.

  2. eLife Assessment

    This important study provides new and nuanced insights into the evolution of morphs in a textbook example of Batesian mimicry. The evidence supporting the claims about the origin and dominance relationships among morphs is convincing, but the interpretation of signals needs improvement with complementary analysis and some nuanced interpretation. Pending a revision, this work will be of interest to a broad range of evolutionary biologists.

  3. Reviewer #1 (Public review):

    In this study, Deshmukh et al. provide an elegant illustration of Haldane's sieve, the population genetics concept stating that novel advantageous alleles are more likely to fix if dominant because dominant alleles are more readily exposed to selection. To achieve this, the authors rely on a uniquely suited study system, the female-polymorphic butterfly Papilio polytes.

    Deshmukh et al. first reconstruct the chronology of allele evolution in the P. polytes species group, clearly establishing the non-mimetic cyrus allele as ancestral, followed by the origin of the mimetic allele polytes/theseus, via a previously characterized inversion of the dsx locus, and most recently, the origin of the romulus allele in the P. polytes lineage, after its split from P. javanus. The authors then examine the two crucial predictions of Haldane's sieve, using the three alleles of P. polytes (cyrus, polytes, and romulus). First, they report with compelling evidence that these alleles are sequentially dominant, or put in other words, novel adaptive alleles either are or quickly become dominant upon their origin. Second, the authors find a robust signature of positive selection at the dsx locus, across all five species that share the polytes allele.

    In addition to exquisitely exemplifying Haldane's sieve, this study characterizes the genetic differences (or lack thereof) between mimetic alleles at the dsx locus. Remarkably, the polytes and romulus alleles are profoundly differentiated, despite their short divergence time (< 0.5 my), whereas the polytes and theseus alleles are indistinguishable across both coding and intronic sequences of dsx. Finally, the study reports incidental evidence of exon swaps between the polytes and romulus alleles. These exon swaps caused intermediate colour patterns and suggest that (rare) recombination might be a mechanism by which novel morphs evolve.

    This study advances our understanding of the evolution of the mimicry polymorphism in Papilio butterflies. This is an important contribution to a system already at the forefront of research on the genetic and developmental basis of sex-specific phenotypic morphs, which are common in insects. More generally, the findings of this study have important implications for how we think about the molecular dynamics of adaptation. In particular, I found that finding extensive genetic divergence between the polytes and romulus alleles is striking, and it challenges the way I used to think about the evolution of this and other otherwise conserved developmental genes. I think that this study is also a great resource for teaching evolution. By linking classic population genetic theory to modern genomic methods, while using visually appealing traits (colour patterns), this study provides a simple yet compelling example to bring to a classroom.

    In general, I think that the conclusions of the study, in terms of the evolutionary history of the locus, the dominance relationships between P. polytes alleles, and the inference of a selective sweep in spite of contemporary balancing selection, are strongly supported; the data set is impressive and the analyses are all rigorous. I nonetheless think that there are a few ways in which the current presentation of these data could lead to confusion, and should be clarified and potentially also expanded.

    (1) The study is presented as addressing a paradox related to the evolution of phenotypic novelty in "highly constrained genetic architectures". If I understand correctly, these constraints are assumed to arise because the dsx inversion acts as a barrier to recombination. I agree that recombination in the mimicry locus is reduced and that recombination can be a source of phenotypic novelty. However, I'm not convinced that the presence of a structural variant necessarily constrains the potential evolution of novel discrete phenotypes. Instead, I'm having a hard time coming up with examples of discrete phenotypic polymorphisms that do not involve structural variants. If there is a paradox here, I think it should be more clearly justified, including an explanation of what a constrained genetic architecture means. I also think that the Discussion would be the place to return to this supposed paradox, and tell us exactly how the observations of exon swaps and the genetic characterization of the different mimicry alleles help resolve it.

    (2) While Haldane's sieve is clearly demonstrated in the P. polytes lineage (with cyrus, polytes, and romulus alleles), there is another allele trio (cyrus, polytes, and theseus) for which Haldane's sieve could also be expected. However, the chronological order in which polytes and theseus evolved remains unresolved, precluding a similar investigation of sequential dominance. Likewise, the locus that differentiates polytes from theseus is unknown, so it's not currently feasible to identify a signature of positive selection shared by P. javanus and P. alphenor at this locus. I, therefore, think that it is premature to conclude that the evolution of these mimicry polymorphisms generally follows Haldane's sieve; of two allele trios, only one currently shows the expected pattern.

  4. Reviewer #2 (Public review):

    Summary:

    Deshmukh and colleagues studied the evolution of mimetic morphs in the Papilio polytes species group. They investigate the timing of origin of haplotypes associated with different morphs, their dominance relationships, associations with different isoform expressions, and evidence for selection and recombination in the sequence data. P. polytes is a textbook example of a Batesian mimic, and this study provides important nuanced insights into its evolution, and will therefore be relevant to many evolutionary biologists. I find the results regarding dominance and the sequence of events generally convincing, but I have some concerns about the motivation and interpretation of some other analyses, particularly the tests for selection.

    Strengths:

    This study uses widespread sampling, large sample sizes from crossing experiments, and a wide range of data sources.

    Weaknesses:

    (1) Purpose and premise of selective sweep analysis

    A major narrative of the paper is that new mimetic alleles have arisen and spread to high frequency, and their dominance over the pre-existing alleles is consistent with Haldane's sieve. It would therefore make sense to test for selective sweep signatures within each morph (and its corresponding dsx haplotype), rather than at the species level. This would allow a test of the prediction that those morphs that arose most recently would have the strongest sweep signatures.

    Sweep signatures erode over time - see Figure 2 of Moest et al. 2020 (https://doi.org/10.1371/journal.pbio.3000597), and it is unclear whether we expect the signatures of the original sweeps of these haplotypes to still be detectable at all. Moest et al show that sweep signatures are completely eroded by 1N generations after the event, and probably not detectable much sooner than that, so assuming effective population sizes of these species of a few million, at what time scale can we expect to detect sweeps? If these putative sweeps are in fact more recent than the origin of the different morphs, perhaps they would more likely be associated with the refinement of mimicry, but not necessarily providing evidence for or against a Haldane's sieve process in the origin of the morphs.

    (2) Selective sweep methods

    A tool called RAiSD was used to detect signatures of selective sweeps, but this manuscript does not describe what signatures this tool considers (reduced diversity, skewed frequency spectrum, increased LD, all of the above?). Given the comment above, would this tool be sensitive to incomplete sweeps that affect only one morph in a species-level dataset? It is also not clear how RAiSD could identify signatures of selective sweeps at individual SNPs (line 206). Sweeps occur over tracts of the genome and it is often difficult to associate a sweep with a single gene.

    (3) Episodic diversification

    Very little information is provided about the Branch-site Unrestricted Statistical Test for Episodic Diversification (BUSTED) and Mixed Effects Model of Evolution (MEME), and what hypothesis the authors were testing by applying these methods. Although it is not mentioned in the manuscript, a quick search reveals that these are methods to study codon evolution along branches of a phylogeny. Without this information, it is difficult to understand the motivation for this analysis.

    (4) GWAS for form romulus

    The authors argue that the lack of SNP associations within dsx for form romulus is caused by poor read mapping in the inverted region itself (line 125). If this is true, we would expect strong association in the regions immediately outside the inversion. From Figure S3, there are four discrete peaks of association, and the location of dsx and the inversion are not indicated, so it is difficult to understand the authors' interpretation in light of this figure.

    (5) Form theseus

    Since there appears to be only one sequence available for form theseus (actually it is said to be "P. javanus f. polytes/theseus"), is it reasonable to conclude that "the dsx coding sequence of f. theseus was identical to that of f. polytes in both P. javanus and P. alphenor" (Line 151)? Looking at the Clarke and Sheppard (1972) paper cited in the statement that "f. polytes and f. theseus show equal dominance" (line 153), it seems to me that their definition of theseus is quite different from that here. Without addressing this discrepancy, the results are difficult to interpret.