Genetic-to-Chemical Perturbation Transfer Learning Through Unified Multimodal Molecular Representations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial Intelligence virtual cell (AIVC) holds transformative potential for biomedical research. Central to this vision is the systematic modeling of genetic and chemical perturbation phenotypes to accurately predict cellular dynamic states from diverse interventions. However, disparities in screening agents, library scales, experimental technologies, and data production efficiency hinder the integration, modeling, and analysis of the cross-data. Here we present UniPert-G2CP , a two-phase deep learning approach comprising i) UniPert, a multimodal molecular representation model that bridges genetic and chemical domains, and ii) G2CP ( Genetic-to-Chemical Perturbation transfer learning), which systematically transforms CRISPR screen-based genetic insights into chemical perturbation modeling for cost-effective in silico drug screening. UniPert not only encodes multimodal perturbagens into a unified functionally interpretable sematic embedding space, but also improves phenotypic effect prediction for previously unseen gene perturbations and drug treatments. Building upon UniPert, G2CP successfully modeled large-scale cellular post-perturbation states spanning 4,994 gene and 7,821 compound perturbagens, while reducing modeling data costs by over 60%. We demonstrate that UniPert-G2CP enables efficient, generalizable simulations of multicellular, multi-domain perturbation cause-effect spaces, revealing differential cellular biological causality and informing mechanism-driven therapy. UniPert-G2CP opens new avenues for biological causal foundation model building, AIVC creation, and AI-powered precision medicine.

Article activity feed