Selection on many loci drove the origin and spread of a key innovation

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

Key innovations are fundamental to biological diversification, but their genetic architecture is poorly understood. A recent transition from egg-laying to live-bearing in Littorina snails provides the opportunity to study the architecture of an innovation that has evolved repeatedly in animals. Samples do not cluster by reproductive mode in a genome-wide phylogeny, but local genealogical analysis revealed numerous genomic regions where all live-bearers carry the same core haplotype. Associated regions show evidence for live-bearer-specific positive selection, and are enriched for genes that are differentially expressed between egg-laying and live-bearing reproductive systems. Ages of selective sweeps suggest live-bearing alleles accumulated gradually, involving selection at different times in the past. Our results suggest that innovation can have a polygenic basis, and that novel functions can evolve gradually, rather than in a single step.

Article activity feed

  1. Instead, the bulk of the distribution fell close to the center of the triangle, revealing extensive ILS due to rapid diversification relative to the effective population size (10, 11). Thus, although well-supported statistically, the genome-wide tree is a very poor predictor of evolutionary relationships at any given genomic region.

    This is not terribly surprising to me, honestly, particularly given that it's known a priori that the two lineages are closely related and that hybridization is not uncommon. It may also be worth mentioning/emphasizing here that the phylogeny was inferred using concatenated SNPs. Whether using full sequences or SNP datasets, concatenation can often lead to inflated topological support.

    Have you considered using an approach like SVDquartets to infer a complementary phylogeny to the one inferred with RAxML? The approach is consistent with the multi-species coalescent, and treats sites as independent, inferring quartets at each position before amalgamating them into a full tree. Individuals could be pooled by sampling locality/species, enabling you to still infer a population-level tree, doing so under an independent phylogenetic model.

  2. Simulated distributions of weights. A greater opportunity for lineage sorting (i - iii) biases the distribution toward the topology that matches the demographic history. Incomplete lineage sorting yields genealogies that are a better fit to one of the discordant trees, but the distribution is always symmetrical between the left and right half triangles. Additional factors, including gene flow, create a bias toward one of the discordant genealogies (panels iv - vi).

    I'm super intrigued by the use/utility of simulation here. I know that these simulations were conducted using msprime, but I can't help but wonder about the inclusion of varying intensities of positive selection on loci in these simulations. This of course would increase the complexity of parameter space significantly with regards to potential simulations, but seems explicitly tied to the hypotheses being tested here.

    More generally, it seems to me that this framework could lend itself nicely to the application of machine or deep learning approaches for assignment of genomic windows to alternative evolutionary histories (e.g. migration, selection), as well as potentially even parameter inference, such as through the use of a CNN. Obviously this is outside of the scope of the current paper, but I'm just curious as to whether this is a thought/space you all have explored?

  3. Topology weighting reveals genomic regions associated with reproductive mode.

    First off, I can't not complement the "Twisst & Tern" pun... Excellent stuff.

    But more specifically, I just want to say I think the use of the ternary plot in combination with both empirical and simulated data to explore and quantify genealogical asymmetries is so clever. I doubt I'll be alone in saying that I think this type of approach has immense potential, not only for hypothesis testing, but even for parameter testing.