Combining Real and Synthetic Data to Overcome Limited Training Datasets in Multimodal Learning

Niccolo Marini
Zhaohui Liang
Sivaramakrishnan Rajaraman
Zhiyun Xue
Sameer Antani

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Biomedical data are inherently multimodal, capturing complementary aspects of a patient condition. Deep learning (DL) algorithms that integrate multiple biomedical modalities can significantly improve clinical decisionmaking, especially in domains where collecting data is not simple and data are highly heterogeneous. However, developing effective and reliable multimodal DL methods remains challenging, requiring large training datasets with paired samples from modalities of interest. An increasing number of de-identifed biomedical datasets are publicly accessible, though they still tend to be unimodal. For example, several publicly available skin lesion datasets aid automated dermatology clinical decision-making. Still, they lack annotated reports paired with the images, thereby limiting the advance and use of multimodal DL algorithms. This work presents a strategy exploiting real and synthesized data in a multimodal architecture that encodes finegrained text representations within image embeddings to create a robust representation of skin lesion data. Large language models (LLMs) are used to synthesize textual descriptions from image metadata that are subsequently paired with the original skin lesion images and used for model development. The architecture is evaluated on the classification of skin lesion images, considering nine internal and external data sources. The proposed multimodal representation outperforms the unimodal one on the classification of skin lesion images, achieving superior performance in every tested dataset.

Version published to 10.1101/2025.07.16.25331662 on medRxiv
Jul 17, 2025

Coherent Cross-modal Generation of Synthetic Biomedical Data to Advance Multimodal Precision Medicine

This article has 10 authors:
1. Raffaele Marchesi
2. Nicolò Lazzaro
3. Walter Endrizzi
4. Gianluca Leonardi
5. Matteo Pozzi
6. Flavio Ragni
7. Stefano Bovo
8. Monica Moroni
9. Venet Osmani
10. Giuseppe Jurman
This article has no evaluationsLatest version Aug 27, 2025
Benchmark Evaluation of Multi-Modal Large Language Models for Ophthalmic Diagnosis

This article has 10 authors:
1. Weihua Yang
2. Shoujun Huang
3. Junhong Chen
4. Jiaoman Wang
5. Ping Zhang
6. Wending Du
7. Yuan Hong
8. Dexing Kong
9. Wei Lou
10. Wei Chi
This article has no evaluationsLatest version Jul 23, 2025
Integrating Multimodal Data with Large Foundation Models in Healthcare

This article has 4 authors:
1. Hyunwoo Choi
2. Eunji Kang
3. Daehyun Song
4. Gyeong Jung
This article has no evaluationsLatest version Aug 7, 2025

Listed in

Abstract

Article activity feed

Related articles

Coherent Cross-modal Generation of Synthetic Biomedical Data to Advance Multimodal Precision Medicine

Benchmark Evaluation of Multi-Modal Large Language Models for Ophthalmic Diagnosis

Integrating Multimodal Data with Large Foundation Models in Healthcare