MVCBench: A Multimodal Benchmark for Drug-induced Virtual Cell Phenotypes

Bo Li
Qing Wang
Shihang Wang
Bob Zhang
Yuzhong Peng
Pinxian Zeng
Chengliang Liu
Mengran Li
Ziyang Tang
Xiaojun Yao
Chuxia Deng
Qianqian Song

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Drugs induce coordinated phenotypic changes across multiple modalities, including transcriptional reprogramming and cellular morphological remodeling. Predicting these drug-induced modality changes is central to drug discovery, mechanism-of-action studies and precision therapeutics, however, prediction performance depends critically on how both drug compounds and cellular states are represented. Despite rapid advances in drug molecular and gene representation methods, a systematic evaluation of these methods remains lacking. Herein, we introduce MVCBench, a comprehensive benchmarking framework for evaluating drug molecular and gene representation methods in predicting drug-induced multimodal virtual cell (MVC) phenotypes. MVCBench leverages large-scale transcriptomic and high-content imaging data and systematically evaluates 24 representation methods (12 drug molecular and 12 gene representation methods) across nearly 1.1 million drug-induced profiles, under both in-distribution and out-of-distribution settings spanning unseen compounds, cell lines, assay plates and datasets. Our benchmarking reveals a pronounced modality-dependent asymmetry: advanced drug molecular representations substantially improve the prediction of drug-induced morphological phenotypes but provide only limited gains for gene expression prediction relative to classical fingerprints, whereas task-specific gene representations outperform general-purpose foundation models in predicting drug-induced transcriptomic responses. Predictive performance also deteriorates sharply under distribution shift, highlighting persistent challenges in cross-dataset and cross-platform generalization. We further show that integrating transcriptomic and morphological modalities consistently improves prediction accuracy, and derive practical design principles for MVC architectures, including modality-aware loss calibration and fusion strategies. Together, MVCBench provides a systematic foundation for evaluating representation methods and offers guidance for developing robust MVC models of drug-induced cellular responses.

Version published to 10.64898/2026.04.22.720110 on bioRxiv
Apr 24, 2026

Condition-matched in silico prediction of drug transcriptional responses enables mechanism-guided screening and combination discovery

This article has 5 authors:
1. Meisheng Xiao
2. Yiping He
3. Jianhua Hu
4. Fei Zou
5. Baiming Zou
This article has no evaluationsLatest version Mar 31, 2026
DrugPTM-Bench: A Large-Scale Dataset for Predictive Modeling of Drug-Induced Cell Type-Specific Protein Post-Translational Modifications

This article has 4 authors:
1. Amitesh Badkul
2. Mohammadsadeq Mottaqi
3. Li Xie
4. Lei Xie
This article has no evaluationsLatest version Apr 30, 2026
Cell-Level Virtual Screening

This article has 5 authors:
1. Caleb N. Ellington
2. Sohan Addagudi
3. Jiaqi Wang
4. Benjamin Lengerich
5. Eric P. Xing
This article has no evaluationsLatest version May 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Condition-matched in silico prediction of drug transcriptional responses enables mechanism-guided screening and combination discovery

DrugPTM-Bench: A Large-Scale Dataset for Predictive Modeling of Drug-Induced Cell Type-Specific Protein Post-Translational Modifications

Cell-Level Virtual Screening