GADMA2: more efficient and flexible demographic inference from genetic data

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Inference of complex demographic histories is a source of information about events that happened in the past of studied populations. Existing methods for demographic inference typically require input from the researcher in the form of a parameterized model. With an increased variety of methods and tools, each with its own interface, the model specification becomes tedious and error-prone. Moreover, optimization algorithms used to find model parameters sometimes turn out to be inefficient, for instance, by being not properly tuned or highly dependent on a user-provided initialization. The open-source software GADMA addresses these problems, providing automatic demographic inference. It proposes a common interface for several likelihood engines and provides global parameters optimization based on a genetic algorithm.

Results

Here, we introduce the new GADMA2 software and provide a detailed description of the added and expanded features. It has a renovated core code base, new likelihood engines, an updated optimization algorithm and a flexible setup for automatic model construction. We provide a full overview of GADMA2 enhancements, compare the performance of supported likelihood engines on simulated data and demonstrate an example of GADMA2 usage on two empirical datasets.

Conclusions

We demonstrate the better performance of a genetic algorithm in GADMA2 by comparing it to the initial version and other existing optimization approaches. Our experiments on simulated data indicate that GADMA2’s likelihood engines are able to provide accurate estimations of demographic parameters even for misspecified models. We improve model parameters for two empirical datasets of inbred species.

Article activity feed

  1. Background

    Ilan Gronau: This manuscript describes updates made to GADMA, which was published two years ago. GADMA uses likelihood-based demography inference methods as likelihood-computation engines, and replaces their generic optimization technique with a more sophisticated technique based on a genetic algorithm. The version of GADMA described in this manuscript has several important added features. It supports two additional inference engines, more flexible models, additional input and output formats, and it provides better values for the hyper-parameters used by the genetic algorithm. This is indeed a substantial improvement over the original version of GADMA. The manuscript clearly describes the different added features to GADMA, and then demonstrates them with a series of analyses. These analyses establish three main things: (1) they show that the new hyper-parameters improve performance; (2) they show how GADMA can be used to compare performance of different approaches to calculate data likelihood for demography inference; (3) showcase new features of GADMA (supporting model structure and inbreeding inference). Overall, the presentation is very clear and the results are interesting and compelling. Thus, despite being a publication about a method update, it shows substantial improvement, provides interesting new insights, and will likely lead to expansion of the user base for GADMA.The only major comment I have is about the part of the study that optimizes the hyperparameters. The hyper-parameter optimization is a very important improvement in GADMA2. The setup for this analysis is very good, with three inference engines, four data sets used for training and six diverse data sets used for testing. However, because of complications with SMAC for discrete hyperparameters, the analysis ends up considering six separate attempts. The comparison between the hyper-parameters produced by these six attempts is mostly done manually across data sets and inference engines. This somewhat beats the purpose of the well-designed set up. Eventually, it is very difficult for the reader to asses the expected improvement of the final suggested values of hyperparameters (attempt 2) to the default ones. I have two comments/suggestions about this part.First, I'm wondering if there is a formal way to compare the eventual parameters of the six attempts across the four training sets. I can see why you would need to run SMAC six separate times to deal with the discrete parameters. However, why do you not use the SMAC score to compare the final settings produced by these six runs?Second, as a reader, I would like to see a single table/figure summarizing the improvement you get using whatever hyper-parameters you end up suggesting in the end compared to the default setting used in GADMA1. This should cover all the inference engines and all the data sets somehow in one coherent table/figure. Using such a table/figure, you could report improvement statistics, such as the average increase in log-likelihood, or average decrease in convergence times. These important results get lost in the many improved figures and tables.These are my main suggestions for revisions of the current version. I also have some more minor comments that the authors may wish to consider in their revised version, which I list below.Introduction:===========para 2: the survey of demography inference methods focuses on likelihood-based methods, but there is a substantial family of Bayesian inference methods, such as MPP, Ima, and G-PhoCS. Bayesian methods solve the parameter estimation problem by Bayesian sampling. I admit that this is somewhat tangential to what GAMDA is doing, but this distinction between likelihood-based methods and Bayesian methods probably deserves a brief mention in the introduction.para 2,3: you mention a result from the original GADMA paper showing that GADMA improves on the optimization methods implemented by current demography inference methods. Readers of this paper might benefit of a brief summary of the improvement you were able to achieve using the original version of GADMA. Can you add 2-3 sentences providing the highlights of the improvement you were able to show in the first paper?para 3: The statement "GADMA separates two regular components" is not very clear. Can you rephrase to clarify?Materials and methods

    • Hyper-parameter optimization:==============================================I didn't fully understand what you use for the cost function in SMAC here. Seems to me like there are two criteria: accuracy and speed. You wish the final model to be as accurate as possible (high log likelihood), but you want to obtain this result with few optimization iterations. Can you briefly describe how these two objectives are addressed in your use of SMAC? It's also not completely clear how results from different engines and different data sets are incorporated into the SMAC cost. Can you provide more details about this in the supplement?para 2: "That eliminate three combinations" should be "This eliminates three combinations".para 3: "Each attempt is running" should be "Each attempt ran"para 3: "We take 200×number of parameters as the stop criteria". Can you clarify? Does this mean that you set the number of GADMA iterations to 200 times the number of demographic model parameters? Why should it be a linear function of the number of parameters? The following text explains the justification, butTable 1: I would merge Table S2 with this one (by adding the ranges of all hyper-parametres as a first column). It's important to see the ranges when examining the different selections.Materials and methods - Performance test of GADMA2 engines:=====================================================para 2: "ROS-STRUCT-NOMIG" should be "DROS-STRUCT-NOMIG" Also, "This notation could be read" - maybe replace by "This notation means" to signal that you're explaining the structure notation.Para 4 (describing comparisons for momi on Orangutan data): "ORAN-NOMIG model is compared with three …". You also consider ORAN-STRUCTNOMIG in the momi analysis, right?Results - Performance test of GADMA2 engines:========================================Inference for the Drosophila data set under model with migration: you mention that the models with migration obtain lower likelihoods than the models without migration. You cannot directly compare likelihoods in these two models, since the likelihood surface is not identical. So, I'm not sure that the fact that you get higher likelihoods in the models without migration is a clear enough indication for model fit. The fact that the inferred migration rates are low is a good indication for that. It also seems like despite converging to models with very low migration rates, the other parameters are inferred with higher noise. For example, the size of the European bottleneck is significantly increased in these inferences compared to that of the NOMIG. So, potentially the problem here is that more time is required for these complex models to converge.Inference for the Drosophila data set under structured model (2,1): the values inferred by moments and momentsLD appear to neatly fit the true values. However, it is not straightforward to compare an exponential increase in population size to an instantaneous increase. Maybe this can be done by some time-averaged population size, or the average time until coalescence in the two models? This will allow you to quantify how good the two exponential models fit the true model with instantaneous increase.Inference for the Orangutan data set under structured model (2,1) without migration: you argue that a constant population size is inferred for Bor by moments and momi because of the restriction on population sizes after the split. You base this claim on a comparison between the log-likelihoods obtained in this model (STRUCT-NOMIG) and the standard model (NOMIG) in which you add this restriction. I didn't fully understand how you can conclude from this comparison that the constant size inferred for Bor is due to the restriction on the initial population size after the split. I think what you need to do to establish this is run the STRUCT model without this restriction and see that you get exponential decrease. Can you elaborate more on your rationale? A detailed explanation should appear in the supplement and a brief summary in the main text.Inference for the Orangutan data set with models with pulse migration: This is a nice result showing that the more pulses you include, the better the estimates become. However, your main example in the main text uses the inferred migration rates. This is a poor example, because migration rates in a pulse model cannot be compared to rates in a continuous model. If migration is spread along a longer time range, then you expect the rates to decrease. So, there is no expectation of getting the same rates. You do expect, however, to get other parameters reasonably accurate. It seems like this is done with 7 pulses, but not so much with one pulse. This should be the main the focus of the discussion of these results.Results - inference of inbreeding coefficients:======================================When you describe the results you obtained for the cabbage data set, you say "the population size for the most recent epoch in our results is underestimated (6 vs 592 individuals) for model 1 without inbreeding and overestimated (174,960,000 vs. 215,000 individuals) for model 2 with inbreeding". The usage of under/overestimated is not ideal here, because it would imply that the original dadi estimates are more correct. You should probably simply say that they are lower/higher than estimates originally obtained by dadi. Or maybe even suggest that the original estimates were over/underestimated?Supplementary materials:=====================Page 4, para2: "Figure ??" should be "Figure S1"Page 4, para 4: Can you clarify what you mean by "unsupervised demographic history with structure (2, 1)"?Page 22, para 2: "Compared to dadi and moments engines momentsLD provide slightly worse approximations for migration rates". I don't really see this in Supplementary Table S16. Estimates seem to be very similar in all methods. Am I missing anything? You make the same statement again in the STRUCT-MIG model (page 23).Page 22, para 4: "The best history for the ORAN-NOMIG model with restriction on population sizes is -175,106 compared to 174,309 obtained for the ORAN-STRUCT-NOMIG mod". There is a missing minus sign before the second log likelihood. You should also specify that this refers to the moments engine. Also see comment above about this result.
  2. Abstract

    Ryan Gutenkunst: In this paper, the authors present GADMA 2, an update of their population genomic inference software GADMA. The author's software serves as a driver for other population genomics software, enabling a consistent user interface and a different parameter optimization approach. GADMA 2 extends GADMA by adding two new inference engines: momi2 and momentsLD, hyperparameter optimization for the genetic algorithm, demes visualization, selection, dominance, and inbreeding modeling, and a new method for specifying model structures. In this paper, the authors show that their optimized genetic algorithm is somewhat more effective than the original hyperparameter settings. They also compare among inference engines, finding some differences in behavior. Lastly they compare with dadi itself in two models with inbreeding, finding better likelihood parameter sets than those previously published.GADMA has already found some use in the population genomics community, and GADMA 2 is a substantial update. The manuscript describes the updates in good detail and demonstrates the effectiveness of GADMA 2 on two real-world data sets. Overall, this is a strong contribution, and we have few major concerns.Major Technical Concerns:1) The authors claim to now support inference of selection and dominance. But what they support is very limited and not very biological. In particular, they currently support inferences which assume a single selection and dominance coefficient for the entire data set (as in Williamson et al. (2005) PNAS). In reality, any AFS will include sites with a variety of selection coefficients, usually summarized by a distribution of fitness effects. Since Keightley and EyreWalker (2007) Genetics, this has been the standard for inferring selection from the AFS. The authors should be clear about the limitations of what they have implemented.2) Figure 4 shows that optimization runs using GADMA 2 tend to find better likelihoods than bare dadi optimization runs. But the advice for using dadi or moments is to run multiple optimizations and take the best likelihood found, with some heuristic for assessing convergence. So most users would not (or at least should not) stop with the result of a single dadi optimization run. It does seem that GADMA 2 reduces the complexity of assessing convergence between multiple dadi optimization runs. But another important consideration is computational cost. (At an extreme, if each dadi run was 100 times faster than a single GADMA 2 run, then the correct comparison would be between the best of 100 dadi runs and a single GADMA 2 run.) It is not clear from the paper how the 100 GADMA 2 runs compare to the 100 dadi runs in terms of computational cost. It would be very helpful to have a table or some text describing the average computational cost (in CPU hours) of those runs.Major Writing / Presentation Concerns:1) Bottom of page 5: The authors are sharing the results of their hyperparameter optimizations from their own server, with uncertain lifetime. These results should be moved to an archival service such as Dryad.Minor Technical Concerns:1) The authors note that the DROS-MIG models had worse likelihoods than the DROS-NOMIG models. Since these are nested models, the DROS-MIG model must mathematically have a better global optimum likelihood. It would be worth pointing out that the likelihoods they found indicate a failure of the optimization algorithms. The authors should also present the DROS-MIG model results in a supplementary table.2) The Godambe parameter uncertainties in Tables S20 and S21 are pretty extreme, sometimes 10^-13 to 10^12. This may be due to instability of the Godambe approximation versus step size. In Blischak et al. (2020) Mol Biol Evol, the authors tried several step sizes and sought consistent results between them (Tables S1-S4). We suggest the authors take that approach here.Minor Writing / Presentation Concerns:1) The author claims that "GADMA does not require model specification". However, it seems that GADMA "structure model" rather describes a different and perhaps broader way to specify demographic models rather than completely eliminates model specification.2) The authors use the term "inference engine" for the four tools GADMA 2 builds upon. But to us, the act of inference includes parameter optimization. In this case, these tools are not being used for the inference itself, but rather to calculate the (composite) likelihood of the data. Perhaps a better term would be "likelihood calculator"?3) The authors suggest engine-specific hyperparameter optimization as a future goal. But the optimal hyperparameters are also likely to be model specific. (For example, 2- versus 4-population models might benefit from different optimization regimes.) Can the authors comment on this?Writing Nitpicks1) Abstract: "optimization algorithms used to find model parameters sometimes turn out to be inefficient" → vague: more details on why/how they are inefficient would be helpful2) Introduction: "Inference of complex demographic histories… in the population's past." needs citation.3) Page 2: "parameter to infer, for example, all migration" is a comma splice and should be split into two sentences.4) Supplement page 4: Figure ?? reference is broken.