Genetic ancestry and population structure in the All of Us Research Program cohort
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The NIH All of Us Research Program ( All of Us ) aims to build one of the world’s most diverse population biomedical datasets in support of equitable precision medicine. For this study, we analyzed participant genomic variant data to assess the extent of population structure and to characterize patterns of genetic ancestry for the All of Us cohort (n=297,549). Unsupervised clustering of genomic principal component analysis (PCA) data revealed a non-uniform distribution of genetic diversity and substantial population structure in the All of Us cohort, with dense clusters of closely related participants interspersed among less dense regions of genomic PC space. Supervised genetic ancestry inference was performed using genetic similarity between All of Us participants and global reference population samples. Participants show diverse genetic ancestry, with major contributions from European (66.4%), African (19.5%), Asian (7.6%), and American (6.3%) continental ancestry components. Participant genetic similarity clusters show group-specific genetic ancestry patterns, with distinct patterns of continental and subcontinental ancestry among groups. We also explored how genetic ancestry changes over space and time in the United States (US). African and American ancestry are enriched in the southeast and southwest regions of the country, respectively, whereas European ancestry is more evenly distributed across the US. The diversity of All of Us participants’ genetic ancestry is negatively correlated with age; younger participants show higher levels of genetic admixture compared to older participants. Our results underscore the ancestral genetic diversity of the All of Us cohort, a crucial prerequisite for genomic health equity.