Multi-omics integration and batch correction using a modality-agnostic deep learning framework

Jose Ignacio Alvira Larizgoitia
Gabriele Partel
Lorenzo Venturelli
Wanqiu Zhang
Xander Spotbeen
Sebastiaan Vanuytven
Sam Kint
Katy Vandereyken
David Wouters
Anis Ismail
Regis Scarceriaux
Jakub Idkowiak
Tassiani Sarretto
Shane R. Ellis
Massimo Loda
Fabio Socciarelli
Thomas Gevaert
Steven Joniau
Marc Claesen
Nico Verbeeck
Thierry Voet
Johannes V. Swinnen
Jelle Jacobs
Alejandro Sifrim

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

State-of-the-art biotechnologies allow the detection of different molecular species on the same biological sample, generating complex highly-dimensional multi-modal datasets. Gaining a holistic understanding of biological phenomena, such as oncogenesis or aging, requires integrating these diverse modalities into low-dimensional data representations while correcting for technical artifacts. Here we present MIMA, a modular, unsupervised AI framework for multi-omics data integration and batch correction. Applied to complex spatial and single-cell datasets, MIMA effectively removes batch effects, while preserving biologically relevant information, and learns representations predictive of expert pathologist annotations. Additionally, it enables cross-modal translation, uncovers molecular patterns not captured by manual annotations, and despite being modality-agnostic performs on par with specialized state-of-the-art tools. MIMA’s flexibility and scalability make it a powerful tool for multimodal data analysis. MIMA provides a foundation for AI-based, multi-omics augmented digital pathology frameworks, offering new opportunities for improved patient stratification and precision medicine through the comprehensive integration of high-dimensional molecular data and histopathological imaging.

Version published to 10.1101/2025.10.21.683449 on bioRxiv
Oct 22, 2025

OmicsFUSION: A Pretrained Hyena-Based Framework for Encoding, Reconstruction, and Representation of the Omics and DNA Data

This article has 5 authors:
1. Nazar Beknazarov
2. Nikita Pavlichenko
3. Artem Bashkatov
4. Maria Poptsova
5. Alan Herbert
This article has no evaluationsLatest version Sep 8, 2025
Unlocking biological insight from single-cell data with an interpretable dual-stream foundation model

This article has 8 authors:
1. Honglie Guo
2. Qinghang Cui
3. Xiang Zhang
4. Chaowei Chen
5. Weihua Zheng
6. Changfeng Cai
7. Xinyi Wang
8. Shunfang Wang
This article has no evaluationsLatest version Sep 11, 2025
MIND: Multimodal Integration with Neighbourhood-aware Distributions

This article has 2 authors:
1. Hanwen Xing
2. Christopher Yau
This article has no evaluationsLatest version Sep 18, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

OmicsFUSION: A Pretrained Hyena-Based Framework for Encoding, Reconstruction, and Representation of the Omics and DNA Data

Unlocking biological insight from single-cell data with an interpretable dual-stream foundation model

MIND: Multimodal Integration with Neighbourhood-aware Distributions