Unsupervised learning of multi-omics data enables disease risk prediction in the UK Biobank

Chiara Rohrer
Justus F. Gräf
Marc Pielies Avelli
Ricardo Hernandez Medina
Henry Webel
Kirstine Ravn
Simon Rasmussen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The size and complexity of biomedical datasets continue to grow, driving the development of methods that reduce dimensionality while preserving biological signals. Yet, when deep learning is applied to such data, the impact of preprocessing choices and dataset properties on model behavior is often overlooked. Here, we applied our framework Multi-Omics Variational autoEncoder (MOVE) to multiomics data from 452,026 UK Biobank participants, aiming to both evaluate the power of the learned representations for disease risk prediction and critically analyze how non-biological factors, like dataset properties and preprocessing decisions, can shape and influence the results. We show that reducing the dimensionality of the data by a factor of 80 still yields comparable prediction performance across 15 different diseases. We further demonstrate how dataset properties and preprocessing choices impact the model performance, latent representation and downstream results, and our findings strongly underline the need for thorough analysis and understanding of a model’s behavior before drawing conclusions from its results.

Version published to 10.1101/2025.10.02.679853 on bioRxiv
Oct 3, 2025

Deep Learning Architectures for Multi-Omics Data Integration: Bridging Biomarker Discovery and Clinical Translation

This article has 2 authors:
1. Akshay Krishnan Pushparaj
2. Malarmathi Muthukumar
This article has no evaluationsLatest version Jan 26, 2026
A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets

This article has 6 authors:
1. Yalu Wen
2. QINGYU MENG
3. Xiaoyan Sun
4. Ning Li
5. Long Liu
6. Deqiang Zheng
This article has no evaluationsLatest version Jan 21, 2026
An enhanced explainable thyroid disease diagnosis by leveraging cluster-smote and machine learning models

This article has 4 authors:
1. Usman Suleh
2. Badamasi Alhaji Ahmed
3. Farouk Lawan Gambo
4. Fatima Umar Zambuk
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deep Learning Architectures for Multi-Omics Data Integration: Bridging Biomarker Discovery and Clinical Translation

A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets

An enhanced explainable thyroid disease diagnosis by leveraging cluster-smote and machine learning models