Coherent Cross-modal Generation of Synthetic Biomedical Data to Advance Multimodal Precision Medicine

Raffaele Marchesi
Nicolò Lazzaro
Walter Endrizzi
Gianluca Leonardi
Matteo Pozzi
Flavio Ragni
Stefano Bovo
Monica Moroni
Venet Osmani
Giuseppe Jurman

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Integration of multimodal, multi-omics data is critical for advancing precision medicine, yet its application is frequently limited by incomplete datasets where one or more modalities are missing. To address this challenge, we developed a generative framework capable of synthesizing any missing modality from an arbitrary subset of available modalities. We introduce Coherent Denoising, a novel ensemble-based generative diffusion method that aggregates predictions from multiple specialized, single-condition models and enforces consensus during the sampling process. We compare this approach against a multicondition, generative model that uses a flexible masking strategy to handle arbitrary subsets of inputs. The results show that our architectures successfully generate high-fidelity data that preserve the complex biological signals required for downstream tasks. We demonstrate that the generated synthetic data can be used to maintain the performance of predictive models on incomplete patient profiles and can leverage counterfactual analysis to guide the prioritization of diagnostic tests. We validated the framework’s efficacy on a large-scale multimodal, multi-omics cohort from The Cancer Genome Atlas (TCGA) of over 10,000 samples spanning across 20 tumor types, using data modalities such as copy-number alterations (CNA), transcriptomics (RNA-Seq), proteomics (RPPA), and histopathology (WSI). This work establishes a robust and flexible generative framework to address sparsity in multimodal datasets, providing a key step toward improving precision oncology.

Version published to 10.1101/2025.08.22.671728 on bioRxiv
Aug 27, 2025

Multimodal Machine Learning in Healthcare: A Tutorial and Review

This article has 4 authors:
1. Muntaqim Ahmed Raju
2. Priyanka Siddappa
3. Md Shifat Haider Al Amin
4. Ruizhe Ma
This article has no evaluationsLatest version Dec 16, 2025
A Survey of Contrastive Learning in Medical AI: Foundations, Biomedical Modalities, and Future Directions

This article has 6 authors:
1. George Obaido
2. Ibomoiye Domor Mienye
3. Kehinde Aruleba
4. Chidozie Williams Chukwu
5. Ebenezer Esenogho
6. Cameron Modisane
This article has no evaluationsLatest version Dec 26, 2025
A novel pipeline for realistic synthetic longitudinal EHR data generation

This article has 3 authors:
1. Gabrielle Josling
2. Ibrahima Diouf
3. Sankalp Khanna
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multimodal Machine Learning in Healthcare: A Tutorial and Review

A Survey of Contrastive Learning in Medical AI: Foundations, Biomedical Modalities, and Future Directions

A novel pipeline for realistic synthetic longitudinal EHR data generation