Dating the origin and spread of specialization on human hosts in Aedes aegypti mosquitoes

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This fundamental study by Rose and colleagues addresses key challenges in demographic inference in non-model systems with an innovative approach to model parameter calibration based on known historical events. Using this approach, they convincingly show that human specialization in Ae. aegypti mosquitoes likely evolved due to a past climate event around 5,000 years ago, and that recent rapid urbanization has continued to fuel its spread in West Africa in the past 20-40 years. This work will be of broad interest to population geneticists working on demographic inference, and to mosquito biologists working on the monitoring and control of this important vector species.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The globally invasive mosquito subspecies Aedes aegypti aegypti is an effective vector of human arboviruses, in part because it specializes in biting humans and breeding in human habitats. Recent work suggests that specialization first arose as an adaptation to long, hot dry seasons in the West African Sahel, where Ae. aegypti relies on human-stored water for breeding. Here, we use whole-genome cross-coalescent analysis to date the emergence of human-specialist populationsand thus further probe the climate hypothesis. Importantly, we take advantage of the known migration of specialists out of Africa during the Atlantic Slave Trade to calibrate the coalescent clock and thus obtain a more precise estimate of the older evolutionary event than would otherwise be possible. We find that human-specialist mosquitoes diverged rapidly from ecological generalists approximately 5000 years ago, at the end of the African Humid Period—a time when the Sahara dried and water stored by humans became a uniquely stable, aquatic niche in the Sahel. We also use population genomic analyses to date a previously observed influx of human-specialist alleles into major West African cities. The characteristic length of tracts of human-specialist ancestry present on a generalist genetic background in Kumasi and Ouagadougou suggests the change in behavior occurred during rapid urbanization over the last 20–40 years. Taken together, we show that the timing and ecological context of two previously observed shifts towards human biting in Ae. aegypti differ; climate was likely the original driver, but urbanization has become increasingly important in recent decades.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    Demographic inference is a notoriously difficult problem in population genetics, especially for non-model systems in which key population genetic parameters are often unknown and where the reality is always a lot more complex than the model. In this study, Rose et al. provided an elegant solution to these challenges in their analysis of the evolutionary history of human specialization in Ae. aegypti mosquitoes. They first applied state-of-the-art statistical phasing methods to obtain haplotype information in previously published mosquito sequences. Using this phased data, they conducted cross-coalescent and isolation-with-migration analyses, and they innovatively took advantage of a known historical event, i.e., the spread of Ae. aegypti to South America, to infer the key model parameters of generation time and mutation rate. With these parameters, they were able to confirm a previous hypothesis, which suggests that human specialists evolved at the end of the African Humid Period around 5,000 years ago when Ae. aegypti mosquitoes in the Sahel region had to adapt to human-derived water storage as their breeding sites during intense dry seasons. The authors further carried out an ancestry tract length analysis, showing that human specialists have recently introgressed into Ae. aegypti population in West African cities in the past 20-40 years, likely driven by rapid urbanization in these cities.

    Given all the complexities and uncertainties in the system, the authors have done outstanding jobs coming up with well-informed research questions and hypotheses, carrying out analyses that are most appropriate to their questions, and presenting their findings in a clear and compelling fashion. Their results reveal the deep connections between mosquito evolution and past climate change as well as human history and demonstrate that future mosquito control strategies should take these important interactions into account, especially in the face of ongoing climate change and urbanization. Methodologically, the analytical approach presented in this paper will be of broad interest to population geneticists working on demographic inference in a diversity of non-model organisms.

    In my opinion, the only major aspect that this paper can still benefit from is more explicit and in-depth communication and discussion about the assumptions made in the analyses and the uncertainties of the results. There is currently one short paragraph on this in the discussion section, but I think several other assumptions and sources of uncertainties could be included, and a few of them may benefit from some quantitative sensitivity analyses. To be clear, I don't think that most of these will have a huge impact on the main results, but some explicit clarification from the authors would be useful.

    Below are some examples:

    Thank you very much for your kind words and your feedback! We have expanded our discussion of assumptions and uncertainties – we have responded to each point below:

    1. Phasing accuracy: statistical phasing is a relatively new tool for non-model species, and it is unclear from the manuscript how accurate it is given the sample size, sequencing depth, population structure, genetic diversity, and levels of linkage disequilibrium in the study system. If authors would like to inspire broader adoption of this workflow, it would be very helpful if they could also briefly discuss the key characteristics of a study system that could make phasing successful/difficult, and how sensitive cross-coalescent analyses are to phasing accuracy.

    We agree that this is an important topic to expand on. We have clarified as follows:

    Results, Page 4, last paragraph: “Over 95% of prephase calls had maximal HAPCUT2 phred-scaled quality scores of 100 and prephase blocks (i.e. local haplotypes) were 728bp long on average (interquartile range 199-1009bp). We then used SHAPEIT4.2 to assemble the prephase blocks into chromosome-level haplotypes, using statistical linkage patterns present across our panel of 389 individuals (25).”

    Discussion, Page 8, last paragraph: “Overall linkage disequilibrium is relatively low in Ae. aegypti, dropping off quickly over a few kilobases and reaching half its maximum value within about 50kb (37); this is likely sufficient for assembling shorter, high-confidence prephase blocks into longer haplotypes in many cases. However, phase-switch errors may be common across longer distances – potentially affecting inferences in the most recent time windows. Nevertheless, the similar results we obtain using different proxy populations (and thus different input haplotype structures) for human-specialist and generalist lineages (see Figure S1) suggest that our results are robust to potential mistakes in long-range haplotype phasing.”

    Discussion, Page 9, paragraph 2: “Here, we take advantage of a continent-wide set of genomes, combined with read-based prephasing and population-wide statistical phasing to develop a phasing panel that should enable future studies in Ae. aegypti with a lower barrier to entry. The same approach may work for other study organisms with similar population genomic properties; high levels of diversity are helpful for prephasing and at least moderate levels of linkage disequilibrium are important for the assembly of prephase blocks.”

    1. Estimation of mutation rate and generation time: the estimation of these importantparameters is made based on the assumption that they should maximize the overlap between the distribution of estimated migration rate and the number of enslaved people crossing the Atlantic, but how reasonable is this assumption, and how much would the violation of this assumption affect the main result? Particularly, in the MSMC-IM paper (Wang et al. 2020, Fig 2A), even with a simulated clean split scenario, the estimated migration rate would have a wide distribution with a lot of uncertainty on both sides, so I believe that the exact meaning and limitations of such estimated migration rate over time should be clarified. This discussion would also be very helpful to readers who are thinking about using similar methods in their studies. Furthermore, the authors have taken 15 generations per year as their chosen generation time and based their mutation rate estimates on this assumption, but how much will the violation of this assumption affect the result?

    This is a great point. We have expanded our discussion of how this assumption affects our conclusions (see Discussion page 9, first paragraph): “Furthermore, we chose a scaling factor that maximized overlap between the peak of estimated Ae. aegypti migration and the peak of the Atlantic Slave Trade (Fig. 2B). If we instead consider alternative scenarios where peak migration occurred at the very beginning of the slave trade era, around 1500, then our inferred mutation rate would be lower (about 2.4e-9, assuming 15 generations per year), pushing back the split of human-specialist lineages to about 10,000 years before present. This scenario seems less plausible, in part because our isolation-with-migration analyses suggest a gradual onset of migration between continents rather than a single, early-pulse model. It would also make it harder to explain the timing of the bottleneck we see in invasive populations; the first signs of this bottleneck occur at the beginning of the slave trade (~500 years ago) with our current calibration (Fig. S1A), but would be pushed to a pre-trade date in this alternative scenario. We can also consider a scenario in which peak Ae. aegypti migration occurred more recently, perhaps around 1850, corresponding to increased global shipping traffic outside the slave trade alone. In this case, our inferred mutation rate would be higher (or generation time lower), and the split of human-specialist lineages would be placed at about 3,000 years ago. Overall, the best match between the existing literature and our data corresponds to our main estimates, but alternative scenarios could gain support if future research finds evidence for a different time course of invasion than is suggested by the epidemiological literature.”

    We have slightly expanded our description of calibration in Results, page 5, last paragraph: “The fact that we see good overlap between the two distributions (yellow–white color) across a wide range of reasonable mutation rates and generation times for Ae. aegypti is consistent with our understanding of the species’ recent history and supports our approach. For example, if we take the common literature value of 15 generations per year (0.067 years per generation) (17, 20), the de novo mutation rate that maximizes correspondence between the two datasets is 4.85x10-9 (black dot in Figure 2A, used in Figure 2B), which is on the order of values documented in other insects. We chose to carry forward this calibrated scaling factor (corresponding to any combination of mutation rate and generation time found along the line in Figure 2A) into subsequent analyses.”

    We have also expanded on the uncertainty of our analyses (see Discussion page 8, last paragraph): “First, the temporal resolution of our inferences is relatively low, and both previously published simulations (39) and our own bootstrap replicates (Figure 2B–D, grey lines) suggest relatively wide bounds for the precise timing of events.”

    1. The effect of selection: all analyses in this paper assume that no selection is at play,and the authors have excluded loci previously found to be under selection from these analyses, but how effective is this? In the ancestry tract length analysis, in particular, the authors have found that the human-specialist ancestry tends to concentrate in key genomic regions and suggested that selection could explain this, but doesn't this mean that excluding known loci under selection was insufficient? If the selection has indeed played an important role at a genome-wide level, how would it affect the main results (qualitatively)?

    We have clarified that we excluded those loci from our timing estimates for both MSMC and ancestry tract analyses, but then re-ran the ancestry tract analysis with all regions included to visualize and assess how tracts were distributed along chromosomes. See Methods, page 12, paragraph 2: “Since selection associated with adaptation to urban habitats could shape lengths of admixture tracts, we masked regions previously identified as under selection between human-specialists and generalists when estimating admixture timing—namely, the outlier regions in (2). However, we used an unmasked analysis to determine and visualize the genome-wide distribution of ancestries (Fig. 3).”

    We have also added additional discussion of the expected effects of selection on our analyses (see Discussion, page 9, last paragraph): “Positive selection during adaptive introgression can increase tract lengths and make admixture appear to be more recent than it actually is. For this reason, we masked regions of the genome thought to underlie adaptation to human habitats before running our analysis. Nevertheless, if selection has acted outside these regions, admixture may be somewhat older than we estimate.”

  2. eLife assessment

    This fundamental study by Rose and colleagues addresses key challenges in demographic inference in non-model systems with an innovative approach to model parameter calibration based on known historical events. Using this approach, they convincingly show that human specialization in Ae. aegypti mosquitoes likely evolved due to a past climate event around 5,000 years ago, and that recent rapid urbanization has continued to fuel its spread in West Africa in the past 20-40 years. This work will be of broad interest to population geneticists working on demographic inference, and to mosquito biologists working on the monitoring and control of this important vector species.

  3. Reviewer #1 (Public Review):

    Demographic inference is a notoriously difficult problem in population genetics, especially for non-model systems in which key population genetic parameters are often unknown and where the reality is always a lot more complex than the model. In this study, Rose et al. provided an elegant solution to these challenges in their analysis of the evolutionary history of human specialization in Ae. aegypti mosquitoes. They first applied state-of-the-art statistical phasing methods to obtain haplotype information in previously published mosquito sequences. Using this phased data, they conducted cross-coalescent and isolation-with-migration analyses, and they innovatively took advantage of a known historical event, i.e., the spread of Ae. aegypti to South America, to infer the key model parameters of generation time and mutation rate. With these parameters, they were able to confirm a previous hypothesis, which suggests that human specialists evolved at the end of the African Humid Period around 5,000 years ago when Ae. aegypti mosquitoes in the Sahel region had to adapt to human-derived water storage as their breeding sites during intense dry seasons. The authors further carried out an ancestry tract length analysis, showing that human specialists have recently introgressed into Ae. aegypti population in West African cities in the past 20-40 years, likely driven by rapid urbanization in these cities.

    Given all the complexities and uncertainties in the system, the authors have done outstanding jobs coming up with well-informed research questions and hypotheses, carrying out analyses that are most appropriate to their questions, and presenting their findings in a clear and compelling fashion. Their results reveal the deep connections between mosquito evolution and past climate change as well as human history and demonstrate that future mosquito control strategies should take these important interactions into account, especially in the face of ongoing climate change and urbanization. Methodologically, the analytical approach presented in this paper will be of broad interest to population geneticists working on demographic inference in a diversity of non-model organisms.

    In my opinion, the only major aspect that this paper can still benefit from is more explicit and in-depth communication and discussion about the assumptions made in the analyses and the uncertainties of the results. There is currently one short paragraph on this in the discussion section, but I think several other assumptions and sources of uncertainties could be included, and a few of them may benefit from some quantitative sensitivity analyses. To be clear, I don't think that most of these will have a huge impact on the main results, but some explicit clarification from the authors would be useful. Below are some examples:

    1. Phasing accuracy: statistical phasing is a relatively new tool for non-model species, and it is unclear from the manuscript how accurate it is given the sample size, sequencing depth, population structure, genetic diversity, and levels of linkage disequilibrium in the study system. If authors would like to inspire broader adoption of this workflow, it would be very helpful if they could also briefly discuss the key characteristics of a study system that could make phasing successful/difficult, and how sensitive cross-coalescent analyses are to phasing accuracy.

    2. Estimation of mutation rate and generation time: the estimation of these important parameters is made based on the assumption that they should maximize the overlap between the distribution of estimated migration rate and the number of enslaved people crossing the Atlantic, but how reasonable is this assumption, and how much would the violation of this assumption affect the main result? Particularly, in the MSMC-IM paper (Wang et al. 2020, Fig 2A), even with a simulated clean split scenario, the estimated migration rate would have a wide distribution with a lot of uncertainty on both sides, so I believe that the exact meaning and limitations of such estimated migration rate over time should be clarified. This discussion would also be very helpful to readers who are thinking about using similar methods in their studies. Furthermore, the authors have taken 15 generations per year as their chosen generation time and based their mutation rate estimates on this assumption, but how much will the violation of this assumption affect the result?

    3. The effect of selection: all analyses in this paper assume that no selection is at play, and the authors have excluded loci previously found to be under selection from these analyses, but how effective is this? In the ancestry tract length analysis, in particular, the authors have found that the human-specialist ancestry tends to concentrate in key genomic regions and suggested that selection could explain this, but doesn't this mean that excluding known loci under selection was insufficient? If the selection has indeed played an important role at a genome-wide level, how would it affect the main results (qualitatively)?