Mechanistic genotype-phenotype translation using hierarchical transformers
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies have linked millions of genetic variants to biomedical phenotypes, but their utility has been limited by a lack of mechanistic understanding and widespread epistatic interactions. Recently, Transformer models have emerged as a powerful general-purpose architecture in machine learning, with potential to address these and other challenges. Accordingly, here we introduce the Genotype-to-Phenotype Transformer (G2PT), a framework for modeling hierarchical information flow among variants, genes, multigenic functions, and phenotypes. As proof-of-concept, we use G2PT to model the genetics of TG/HDL (triglycerides to high-density lipoprotein cholesterol), an indicator of metabolic health. G2PT learns to predict this trait via high attention to genetic variants underlying 24 functions, including immune response and cholesterol transport, with accuracy exceeding state-of-the-art. It implicates unexpected epistatic interactions, including those among APOC1 and CETP. This work positions Hierarchical Transformers as a general approach to functionally interpret polygenic risk. The source code is available at https://github.com/idekerlab/G2PT .