Parameter Efficient Fine-tuning of Transformer-based Masked Autoencoder Enhances Resource Constrained Neuroimage Analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent innovations in artificial intelligence (AI) have increasingly focused on large-scale foundational models that are more general purpose in contrast to conventional models trained to perform specialized tasks. Transformer-based architectures have become the standard backbone in foundation models across data modalities (image, text, audio, video). There has been a keen interest in applying parameter-efficient fine-tuning (PEFT) methods to adapt these models to specialized downstream tasks in language and vision. These methods are particularly essential for medical image analysis where the limited availability of training data could lead to overfitting. In this work, we evaluated different types of PEFT methods on pre-trained vision transformers relative to typical training approaches, such as full fine-tuning and training from scratch. We used a transformer-based masked autoencoder (MAE) framework, to pretrain a vision encoder on T1- weighted (T1-w) brain MRIs. The pretrained vision transformers were then fine-tuned using different PEFT methods that reduced the trainable model parameters to as few as 0.04% of the original model size. Our study shows that: 1. PEFT methods were competitive with or outperformed the reference full fine-tuning approach and outperformed training from scratch, with only a fraction of the trainable parameters; 2. PEFT methods with a 32% reduction in model size boosted Alzheimer’s disease (AD) classification by 3% relative to full fine-tuning and 11% relative to a 3D CNN, with only 258 training scans; and 3. PEFT methods performed well on diverse neuroimaging tasks including AD and Parkinson’s disease (PD) classification, and “brain-age” prediction based on T1-w MRI datasets - a standard benchmark for deep learning models in neuroimaging; 4. smaller model sizes were competitive with larger models in test performance. Our results show the value of adapting foundation models to neuroimaging tasks efficiently and effectively in contrast to training stand- alone special purpose models.

Article activity feed