Evaluating Multiomics Integration Architectures for Training With Structured Missingness

Simon Fisher
Jacob Bradley
George Lansdown
Owen Anderson
Russell Hung
James Lesh
Murray Cutforth
Ian Poole
Jeremy P. Voisey

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multimodal bioinformatics datasets are increasingly common in biomedical research, for tasks such as cancer subtyping and outcome prediction. It is feasible that data from a given patient, or even all data from a given institution, does not have coverage across all modalities; the available data is contingent on both the assay choice at the institution alongside technical aspects and associated drop-out. Consequently, algorithms for machine learning models must be tolerant to structured data missingness (or occurrence) for entire modalities when training. In this paper, we compare general strategies for training multimodal models in the context of structured modality missingness, employing suitable strategies for the stage of modality integration: early by concatenation of features, intermediate by max pooling of latent features, and late by aggregating model predictions probabilistically. We evaluate our strategies on a real-world bioinformatics dataset for the task of breast cancer subtyping, constructing a range of structured missingness scenarios. We highlight that, despite their inability to learn cross-modality interactions, late integration models outperform against early and intermediate integration strategies across a range of scenarios according to the level and nature of missingness. Logistic regression models, although simple, also outperform neural networks within the same settings. Fundamentally, we show that understanding the structure of missingness within a dataset is necessary when selecting a method of integration, and that simple models and approaches should not be dismissed when working with structured missingness.

Version published to 10.1101/2025.09.10.672554 on bioRxiv
Sep 12, 2025

BKDRP: A Biological Knowledge-Driven Approach for Drug Response Prediction Using Multi-Omics Data in Cancer Cell Lines

This article has 2 authors:
1. Koyel Mandal
2. Sanghamitra Bandyopadhyay
This article has no evaluationsLatest version Oct 1, 2025
OmiXAI: An Ensemble XAI Pipeline for Interpretable Deep Learning in Omics Data

This article has 7 authors:
1. Ameliia Alaeva
2. Natalya Mikhaylovskaya
3. Anna Lapteva
4. Vladislav Malkov
5. Alan Herbert
6. Andrey Borevskiy
7. Maria Poptsova
This article has no evaluationsLatest version Aug 27, 2025
Unsupervised learning of multi-omics data enables disease risk prediction in the UK Biobank

This article has 7 authors:
1. Chiara Rohrer
2. Justus F. Gräf
3. Marc Pielies Avelli
4. Ricardo Hernandez Medina
5. Henry Webel
6. Kirstine Ravn
7. Simon Rasmussen
This article has no evaluationsLatest version Oct 3, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

BKDRP: A Biological Knowledge-Driven Approach for Drug Response Prediction Using Multi-Omics Data in Cancer Cell Lines

OmiXAI: An Ensemble XAI Pipeline for Interpretable Deep Learning in Omics Data

Unsupervised learning of multi-omics data enables disease risk prediction in the UK Biobank