The genomic footprint of social stratification in admixing American populations

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This valuable study combines population genetic theory and deep learning approaches to estimate the extent of assortative mating and sex bias in modern admixed populations in the Americas. The new approach provides solid evidence for their main conclusions that socially constructed hierarchies have influenced mating behaviors, though certain results would benefit from further consideration. This paper would be of interest to human population geneticists and social scientists, particularly those studying demographic processes.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Cultural and socioeconomic differences stratify human societies and shape their genetic structure beyond the sole effect of geography. Despite mating being limited by sociocultural stratification, most demographic models in population genetics often assume random mating. Taking advantage of the correlation between sociocultural stratification and the proportion of genetic ancestry in admixed populations, we sought to infer the former process in the Americas. To this aim, we define a mating model where the individual proportions of the genome inherited from Native American, European, and sub-Saharan African ancestral populations constrain the mating probabilities through ancestry-related assortative mating and sex bias parameters. We simulate a wide range of admixture scenarios under this model. Then, we train a deep neural network and retrieve good performance in predicting mating parameters from genomic data. Our results show how population stratification, shaped by socially constructed racial and gender hierarchies, has constrained the admixture processes in the Americas since the European colonization and the subsequent Atlantic slave trade.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    In their study Mas Sandoval and colleagues estimate, from human genomic data, two important parameters that measure how intermarriages have been affected by social stratification in the Americas: sex-biased admixture (SB), which refers to sex differences in the chances to intermarry with another ethnic group, and ancestry-based assortative mating (AM), which refers to the higher probability of partners to intermarry when they carry similar genetic ancestries. To do so, the authors train a deep neural network (DNN) with simulations of admixture with non-random mating and use ancestry tract length distributions to infer the two parameters. They show that their approach estimates SB and AM parameters with a relatively good accuracy in a number of scenarios. When applying the DNN to empirical data, they find solid evidence that social stratification has constrained the admixture processes in the Americas for the last centuries.

    In contrast with the vast majority of population genetic studies, which assume random mating, this study assesses if mating has been random or not in American populations. Furthermore, the study is very valuable because it leverages, for the first time, a deep learning approach and local ancestry inference to co-estimate the extent of SB and AM from genomic data.

    One limitation of the study, however, is that it assumes that (i) the admixture date in the simulations is known and equals 19 generations and (ii) admixture started at the same time in all admixed American populations. The authors also implicitly assume that the variance of the difference between male and female ancestry proportions only depends on AM, and not admixture timing. This may be problematic, as it has been shown that linkage disequilibrium between local ancestry tracts depends both on AM and admixture timing (Zaitlen et al., Genetics 2017).

    To clarify the assumption of fixed admixture date, we have added the following sentence in the results section (line 170) where the model is firstly described: “In both models we assume a continuous admixture process that starts 19 generations ago, knowing that the populations analysed trace the first contact of Native American and European populations in the first half of 16th century and assuming a generation time of 26 .9 years (Wang et al., 2023). In contrast with the approaches that aim to find an admixture date assuming random mating, we assume that the admixture process starts with the contact, and it is continuous and modulated with the mating parameters.”

    We thank the reviewer for such an important reference we had not included in our manuscript, whose findings support the basis of our approach. It is now included on line 70 to justify the analysis of the length of the ancestry tracts: Herein, we argue that the tract length information can measure the non-randomness of mating associated with genetic ancestry and, therefore, it can also monitor the permeability of socioeconomic and cultural barriers between subpopulations with different genetic ancestries (Zaitlen et al., 2017)

    This is also suggested by the authors' results, showing that AM estimates are much lower in admixed Americans under the two-pulse model, relative to the one-pulse model, i.e., when admixture extends over time. Estimates of AM in admixed Americans may thus be biased, if admixture actually started less (or more) than 19 generations ago.

    We evaluated the resemblance of the footprints left by either assortative mating or gene flow, by testing how a neural network trained on models with gene flow due to a second migration pulse predicts migration size on data generated by models without a second migration pulse but assortative mating only . We then tested how neural networks trained on models with assortative mating detect assortative mating from data with no assortative mating but only migration. Results are summarised in Figures 4 – supplement 1 – supplement 2 and show a strong correlation of the predicted size of the second migration pulse and the simulated level of assortative mating. Parallelly, there is also a strong correlation between the predicted assortative mating level and the size of the second migration pulse. Below, we respond to the reviewers in more detail regarding this question.

    Another potential limitation concerns local ancestry inference. The authors assume that RFMix makes no errors when inferring ancestry tracts. This can be a concern, as recent studies have shown that RFMix has reduced accuracy compared to other methods (Hilmarsson et al., bioRxiv 2022).

    In response to this comment, we performed a local ancestry analysis with Gnomix and generated the tract length profile according to the results obtained. One possible issue shared by Gnomix and RFMix is that they may infer a higher fraction of short tracts (at the expense of breaking longer ones). This issue was reported by Gravel et al. (2012). In this study, authors decided to filter out the short tracts because these tracts showed a high rate of false positives and false negatives. Therefore, we conducted an experiment to test if filtering out the shortest tract length window (i) improve the accuracy of the predictions of the simulated test data through the Mean Squared Error (MSE), ii) decrease the uncertainty of the estimations, and (iii) increase the correlation between Gnomix and RFMix-based estimates through the generalised variance.

    We also tested a modification of the tract lengths profile by dividing (or not) the tract lengths profile by the total amount of tracts in either the Autosomes or the X chromosome. Our goal was to force the neural network to focus on the profile shape rather than on the absolute value of tracts at each window to mitigate the possible bias in the tract length profile. Our experimental set-up consisted of three combinations of modifications of the tract length profile, in addition to the non-modified one.

    In Figures 4 supplement 3 – supplement 7, we show the predicted mating parameters using the modifications of the tract length profile outcoming from the local ancestry inference. Each point represents a prediction using RFMix and Gnomix tract length profiles (x and y axis, respectively) as input for each of the 1000 trained neural networks with the same architecture. We evaluated the uncertainty of the estimations for both Gnomix and RFMix and the correlation between them through the Generalised Variance. The Generalised Variance is the determinant of the covariance matrix, which increases with low values of covariance of the bivariate distribution and high values of the respective variances.The estimations of the parameters based on the tract length profile normalised by dividing by the total amount of fragments in Autosomes or X chromosome had both low values of Generalised Variances in the Gnomix-RFmix bivariate distribution of predicted parameters and low values of MSE in the prediction of simulated test data. These results indicate that by normalising the tract length profile by the total amount of fragments, the distribution is still informative and less sensitive to possible biases introduced by errors in the local ancestry analysis .

    Therefore, we present the results obtained from this RFMix profile in the main figures and tables, while showing the other predictions in the supplementary figures.

    In addition, the authors do not report a measure of uncertainty for the estimation of SB and AM, which is another important weakness. Interpretation of parameter estimates is limited if no measures of uncertainty are provided.

    We now provide the 95% CI for each parameter obtained from the distribution of predicted parameters from the 1000 trained neural networks, for both RFMix and Gnomix for the tract length profile.

    Finally, the authors compare the likelihood of two competing models, assuming a single or two admixture pulses, but do not determine the accuracy of their model choice procedure.

    We now include the confidence intervals of the composite likelihood by replicating the test for each of the 1000 bootstrapped tract length profiles for each population. None of the 95% confidence intervals includes both negative and positive results and all of them support either the one pulse or the two pulses model, except for the sub-Saharan ancestry in the Columbian (CLM) population.

    Overall, besides these methodological limitations, I expect that the study by Mas Sandoval and colleagues could be of great and broad interest for the scientific community studying population genetics, anthropology, sociology and history.

    Reviewer #2 (Public Review):

    This paper introduces a method to quantify how genetic ancestry drives non-random mating in admixed populations. Admixed American populations are structured by racial, gender, and class hierarchies. This has the potential to cause both ancestry-related assortative mating, in which the ancestry of mates tends to be correlated, and ancestry-related sex bias, in which individuals have a preference for mates with a particular ancestry composition. By applying their method to several African American and Latin American populations, Sandoval et al. further our understanding of ancestry-based population structure in this region more broadly.

    Strengths

    As many others have recently done, Sandoval et al. leverage the ability of a neural network to predict demographic parameters from high-dimensional population genomic data. Sandoval et al. first develop a clever probabilistic model of mating by defining the probability of a male and female mating as a function of the difference in ancestry between the individuals. They use this model to simulate population genomic data under various demographic scenarios, and then train a neural network on these simulated data. Finally, they apply the neural network to empirical data and learn the parameters of the underlying probability distribution, which can be related back to assortative mating and sex bias.

    One clear strength of this paper is their ability to jointly assess assortative mating and sex bias, as well as their ability to apply their model to multiple contemporary admixed populations.

    Importantly, the authors couch their results in an intersectional understanding of populations and consistently refer to research from historians and other social scientists throughout their paper, which reflects a very thoughtful awareness of the interdisciplinary nature of this research.

    Weaknesses

    The definition of assortative mating is conceptually confusing - in the text, assortative mating is introduced as genetic similarity between mates, i.e. positive assortative mating. However, based on the definition of assortative mating in their model, a population can have high assortative mating for a particular ancestry component even when there is non-zero sex bias for that component (e.g. males with low Native American ancestry are more likely to mate with females with high Native American ancestry). Fundamentally, this scenario cannot reflect positive assortative mating; rather, it reflects negative assortative mating (i.e. there is structured genetic dissimilarity between mates). However, the authors do not discuss the fact that the interpretation of the assortative mating parameter changes with the value of the sex bias parameter.

    We acknowledge that our definition of assortative mating requires more clarity. We now define it on line 155 as: The AM parameter measures the non-randomness of mating associated to a genetic ancestry. This includes both positive assortative mating -genetic similarity between mates- (when SB is zero) and negative assortative mating -genetic dissimilarity between mates- (when SB is not zero). This approach allows accounting for the male-female way of negative assortative mating through SB parameter.

    In addition, the results of the inference in ASW are difficult to interpret. They find that males of high African ancestry are more likely to mate with females of low African ancestry. This result seems counterintuitive given the body of literature that suggests sex-biased admixture in African Americans has greater male European and female African contributions. The authors do not suggest potential explanations for this observation.

    We agree that results regarding the ASW population can be confusing. Our hypothesis to explain such results is that the sex bias parameter captures both sex-biased migrations and sex-biased admixture. Therefore, it is difficult to accommodate the complex genetic history of ASW. We have extended the discussion on this aspect as follows on line 380:

    In addition, African American populations might have a complex genetic history involving on one hand male-biased sub-Saharan migration and on the other hand an admixture femalebiased in the sub-Saharan ancestry. However, our current model can only accommodate this demographic scenario with a single sex-bias parameter, and the results regarding this population should be interpreted with caution.

    Lastly, the authors have not done any simulations to assess how accurate parameter estimates are if the demographic model is misspecified, which weakens the interpretability of the results.

    We have performed a new analysis where we vary AM to generate tract length profiles to predict GFR, and viceversa. The results of this analysis are shown in the new figure 4Supplement 1. Results show how the footprint in the genome of the admixing populations of assortative mating and multiple pulse migration is similar. In the discussion we argue that both One Pulse and Two pulse models must be considered because they are supported by results obtained using X chromosome and Autosomes, respectively. We discuss how accounting for migration reduces AM values and how the resulting admixture dynamics resemble in both cases.

  2. eLife assessment

    This valuable study combines population genetic theory and deep learning approaches to estimate the extent of assortative mating and sex bias in modern admixed populations in the Americas. The new approach provides solid evidence for their main conclusions that socially constructed hierarchies have influenced mating behaviors, though certain results would benefit from further consideration. This paper would be of interest to human population geneticists and social scientists, particularly those studying demographic processes.

  3. Reviewer #1 (Public Review):

    In their study Mas Sandoval and colleagues estimate, from human genomic data, two important parameters that measure how intermarriages have been affected by social stratification in the Americas: sex-biased admixture (SB), which refers to sex differences in the chances to intermarry with another ethnic group, and ancestry-based assortative mating (AM), which refers to the higher probability of partners to intermarry when they carry similar genetic ancestries. To do so, the authors train a deep neural network (DNN) with simulations of admixture with non-random mating and use ancestry tract length distributions to infer the two parameters. They show that their approach estimates SB and AM parameters with a relatively good accuracy in a number of scenarios. When applying the DNN to empirical data, they find solid evidence that social stratification has constrained the admixture processes in the Americas for the last centuries.

    In contrast with the vast majority of population genetic studies, which assume random mating, this study assesses if mating has been random or not in American populations. Furthermore, the study is very valuable because it leverages, for the first time, a deep learning approach and local ancestry inference to co-estimate the extent of SB and AM from genomic data. One limitation of the study, however, is that it assumes that (i) the admixture date in the simulations is known and equals 19 generations and (ii) admixture started at the same time in all admixed American populations. The authors also implicitly assume that the variance of the difference between male and female ancestry proportions only depends on AM, and not admixture timing. This may be problematic, as it has been shown that linkage disequilibrium between local ancestry tracts depends both on AM and admixture timing (Zaitlen et al., Genetics 2017). This is also suggested by the authors' results, showing that AM estimates are much lower in admixed Americans under the two-pulse model, relative to the one-pulse model, i.e., when admixture extends over time. Estimates of AM in admixed Americans may thus be biased, if admixture actually started less (or more) than 19 generations ago. Another potential limitation concerns local ancestry inference. The authors assume that RFMix makes no errors when inferring ancestry tracts. This can be a concern, as recent studies have shown that RFMix has reduced accuracy compared to other methods (Hilmarsson et al., bioRxiv 2022). In addition, the authors do not report a measure of uncertainty for the estimation of SB and AM, which is another important weakness. Interpretation of parameter estimates is limited if no measures of uncertainty are provided. Finally, the authors compare the likelihood of two competing models, assuming a single or two admixture pulses, but do not determine the accuracy of their model choice procedure. Overall, besides these methodological limitations, I expect that the study by Mas Sandoval and colleagues could be of great and broad interest for the scientific community studying population genetics, anthropology, sociology and history.

  4. Reviewer #2 (Public Review):

    This paper introduces a method to quantify how genetic ancestry drives non-random mating in admixed populations. Admixed American populations are structured by racial, gender, and class hierarchies. This has the potential to cause both ancestry-related assortative mating, in which the ancestry of mates tends to be correlated, and ancestry-related sex bias, in which individuals have a preference for mates with a particular ancestry composition. By applying their method to several African American and Latin American populations, Sandoval et al. further our understanding of ancestry-based population structure in this region more broadly.

    Strengths
    As many others have recently done, Sandoval et al. leverage the ability of a neural network to predict demographic parameters from high-dimensional population genomic data. Sandoval et al. first develop a clever probabilistic model of mating by defining the probability of a male and female mating as a function of the difference in ancestry between the individuals. They use this model to simulate population genomic data under various demographic scenarios, and then train a neural network on these simulated data. Finally, they apply the neural network to empirical data and learn the parameters of the underlying probability distribution, which can be related back to assortative mating and sex bias.

    One clear strength of this paper is their ability to jointly assess assortative mating and sex bias, as well as their ability to apply their model to multiple contemporary admixed populations.

    Importantly, the authors couch their results in an intersectional understanding of populations and consistently refer to research from historians and other social scientists throughout their paper, which reflects a very thoughtful awareness of the interdisciplinary nature of this research.

    Weaknesses
    The definition of assortative mating is conceptually confusing - in the text, assortative mating is introduced as genetic similarity between mates, i.e. positive assortative mating. However, based on the definition of assortative mating in their model, a population can have high assortative mating for a particular ancestry component even when there is non-zero sex bias for that component (e.g. males with low Native American ancestry are more likely to mate with females with high Native American ancestry). Fundamentally, this scenario cannot reflect positive assortative mating; rather, it reflects negative assortative mating (i.e. there is structured genetic dissimilarity between mates). However, the authors do not discuss the fact that the interpretation of the assortative mating parameter changes with the value of the sex bias parameter.

    In addition, the results of the inference in ASW are difficult to interpret. They find that males of high African ancestry are more likely to mate with females of low African ancestry. This result seems counterintuitive given the body of literature that suggests sex-biased admixture in African Americans has greater male European and female African contributions. The authors do not suggest potential explanations for this observation.

    Lastly, the authors have not done any simulations to assess how accurate parameter estimates are if the demographic model is misspecified, which weakens the interpretability of the results.