Leveraging genomic large language models to enhance causal genotype-brain-clinical pathways in Alzheimer’s disease
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies (GWAS) have identified numerous Alzheimer’s disease (AD)- associated variants. However, how these variants contribute to the etiology of AD remains largely elusive. Recent advances in genomic large language models (LLMs) offer new opportunities to interpret the genetic variation observed in personal genome. In this study, we propose epiBrainLLM, a novel computational framework that leverages genomic LLM to enhance our understanding of the causal pathways from genotypes to brain measures to AD-related clinical phenotypes. epiBrainLLM will first convert the personal DNA sequence into a diverse set of genomic and epigenomic features using a pretrained genomic LLM and then use these features to further predict phenotypes. Across various experimental settings, epiBrainLLM significantly improves causal analysis compared to traditional genotype association approach. We conclude that epiBrainLLM provides a novel perspective for understanding the regulatory mechanisms underlying the AD disease etiology, potentially offering insights into complex disease mechanisms beyond AD.