Vision Transformer Autoencoders for Unsupervised Representation Learning: Capturing Local and Non-Local Features in Brain Imaging to Reveal Genetic Associations
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The discovery of genetic loci associated with brain architecture can provide deeper insights into neuroscience and improved personalized medicine outcomes. Previously, we designed the Unsupervised Deep learning-derived Imaging Phenotypes (UDIPs) approach to extract endophenotypes from brain imaging using a convolutional (CNN) autoencoder, and conducted brain imaging GWAS on UK Biobank (UKBB). In this work, we leverage a vision transformer (ViT) model due to a different inductive bias and its ability to potentially capture unique patterns through its pairwise attention mechanism. Our approach based on 128 endophenotypes derived from average pooling discovered 10 loci previously unreported by CNN-based UDIP model, 3 of which were not found in the GWAS catalog to have had any associations with brain structure. Our interpretation results demonstrate the ViT’s capability in capturing non-local patterns such as left-right hemisphere symmetry within brain MRI data, by leveraging its attention mechanism and positional embeddings. Our results highlight the advantages of transformer-based architectures in feature extraction and representation for genetic discovery.