Accounting for Structured Missingness in Canonical Correlation Analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A particularly challenging form of missing data is structured missingness, where sets of subjects and variables consistently have missing data. For tabular data from sub-studies or modalities, structured missingness can come from non-participation in followup studies, which creates large blocks of missing data. Canonical Correlation Analysis (CCA) is a multivariate modelling tool commonly used to link two different set of variables, and in neuroimaging has typically been used to find associations between imaging and non-imaging variables. Motivated by CCA, we propose a new method for covariance estimation from incomplete data that handles data with a mix of structured and unstructured missingness, assuming Missing at Random (MAR). Our proposed method is compared to existing methodology by way of evaluation on simulated data and on real data from subjects in the UK Biobank brain imaging cohort.

Article activity feed