Parental haplotypes reconstruction in up to 440,209 individuals reveals recent assortative mating dynamics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Assortative mating (AM), the tendency for individuals to choose partners with similar traits, plays an important role in social stratification and has a wide-spread impact on the genetic architecture of complex human traits. A genetic footprint of this behaviour, genetic assortative mating (GAM), has been documented for many traits. However, most existing approaches to estimate GAM either rely on genotyped couples – rare in large biobanks – or on estimates based on gametic phase disequilibrium (GPD), which reflect cumulative effects over multiple generations and cannot detect short-term events.
We introduce a novel, scalable approach that infers genetic data for the parental generation to boost statistical power and improve interpretability when estimating GAM in biobank cohorts. Our method reconstructs the two parental haplotypes of biobank individuals by leveraging state-of-the-art inter-chromosomal phasing based on close relative information. By correlating polygenic scores computed separately on maternally and paternally inherited haplotypes, we infer the extent of GAM.
Applied to 245,884 individuals from the UK Biobank and 194,325 individuals from the Estonian Biobank, our haplotype-based estimates showed strong concordance with GAM estimates from genotyped mate pairs across 69 traits and improved performance compared to existing GPD-based methods. We replicated previously identified patterns of GAM for many traits (educational attainment, height, BMI, and alcohol consumption), and revealed new ones (overall health and sedentary lifestyle). Temporal and geographic stratification revealed accelerated assortment in recent generations — particularly for education and height — and modest differences between urban and rural contexts.
Our method enables scalable, interpretable, and generation-specific estimation of GAM in large biobank cohorts, providing new insights into the dynamic nature of mate choice and its impact on the genetic makeup of populations.