Function-Driven Molecular Design Enabled by Instruction-Tuned Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Translating high-level functional design intent into concrete molecular structures remains a fundamental challenge in generative molecular discovery, particularly for biomolecular targets governed by non-pocket-like recognition. Here, we introduce SemantiChem, an instruction-tuned generative framework for function-driven molecular design that maps functional objectives expressed in natural language directly to chemically meaningful molecular structures, without relying on predefined geometric constraints, molecular scaffolds, or pocket-centric assumptions. We apply this framework to G-quadruplexes (G4), a representative system characterized by diffuse and topology-driven molecular recognition, and experimentally validate model-generated candidates through assays of G4 stabilization, polymerase stalling, and cellular response. The same design pipeline is further evaluated on a structurally distinct RNA target and, for contrast, on a pocket-dominated protease target. Together, these results establish a function-level molecular design strategy with regime-dependent applicability, highlighting a complementary path for molecular discovery in biomolecular systems where conventional structure-centric paradigms are insufficient.