Representation learning-based genome-wide association mapping discovers genes underlying complex traits.
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies (GWAS) have provided key insights into the genetic architecture of complex traits. However, traditional approaches fall short in accounting for polygenicity, epistatic interactions, and linkage disequilibrium. We present Representation Learning-Based Association Mapping (RBAM), a framework that leverages variational autoencoders to learn latent genotype embedding for improved association mapping and phenotype prediction. We apply RBAM to 17 complex traits, including brain disorders, immunological conditions, cancers, and cardiometabolic phenotypes, using genotypes from the UK Biobank, dbGaP, and WTCCC, totalling 136,458 samples. RBAM enhanced gene discovery and identified DisGeNET-validated gene-disease associations, outperforming REGENIE and SKAT. Simulation studies confirm that RBAM maintains a controlled Type I error rate. The latent embedding as input to machine learning classifiers outperforms PRS estimates in complex diseases. Functional annotations show sensible biological enrichments and shared pleiotropic genes across distinct complex diseases. The RBAM framework bridges the gap between unsupervised representation learning and association mapping