An Improved Pipeline for Constructing UK Biobank Brain Imaging Confounds
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
UK Biobank (UKB) brain imaging data is a one-of-a-kind resource for studying the links between the brain and demographic-, lifestyle- and genetic data. When establishing such links, it is crucial to account for confounding effects caused by the acquisition of fMRI images, as well as demographic confounding factors. UKB brain imaging confounds are constructed through variable selection by the proportion of variance explained in the Imaging Derived Phenotypes (IDPs), from tens of thousands of possible confounds. The current implementation of this pipeline is very computationally intensive and has a large memory footprint, largely due to the varying patterns of missing data in IDPs. This makes it impractical for many users of UK Biobank brain imaging data. We propose a fast and memory efficient multivariate pipeline for constructing imaging confounds using mean imputation combined with a bias-corrected estimator of R 2 , the proportion of confound variance explained in an IDP. Building on this, we also improve the pipeline in order to better select confounds that explain unique variance in IDPs, and non-imaging variables of interest, so called nIDPs. The new implementation leads to a more compact set of confounds that explains roughly the same amount of variance, and runs in around 1 hour on a single CPU.