MIND: Multimodal Integration with Neighbourhood-aware Distributions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multi-omics profiling has become a powerful tool for biomedical applications such as cancer patient stratification and clustering. However, the characterisation and integration of multi-omics data remain challenging because of missingness and inherent heterogeneity. Methods such as imputation and sample exclusion often rely on strong assumptions that could potentially lead to information loss or distortion. To address these limitations, we propose a multi-omics integration framework that learns patient-specific embeddings from incomplete multiomics data based on a multimodal Variational Autoencoder with a data-driven prior. Specifically, we inject neighbourhood structure of the observed dataset encoded as affinity matrices into the prior of embeddings through exponential tilting, and use this prior to penalise the configuration of the latent embeddings based on the discrepancy between the neighbourhood structures in the data spaces and in the latent space. Our proposed method handles high missing rate and unbalanced missingness pattern well, and is robust in the presence of data with a low signal-to-noise ratio. Compared with existing data integration methods, the proposed method achieves better performance on a range of supervised and unsupervised downstreaming tasks on both synthetic and real data.

Article activity feed