Function-Driven Molecular Design Enabled by Instruction-Tuned Large Language Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Translating high-level functional design intent into concrete molecular structures remains a fundamental challenge in generative molecular discovery, particularly for biomolecular targets governed by non-pocket-like recognition. Here, we introduce SemantiChem, an instruction-tuned generative framework for function-driven molecular design that maps functional objectives expressed in natural language directly to chemically meaningful molecular structures, without relying on predefined geometric constraints, molecular scaffolds, or pocket-centric assumptions. We apply this framework to G-quadruplexes (G4), a representative system characterized by diffuse and topology-driven molecular recognition, and experimentally validate model-generated candidates through assays of G4 stabilization, polymerase stalling, and cellular response. The same design pipeline is further evaluated on a structurally distinct RNA target and, for contrast, on a pocket-dominated protease target. Together, these results establish a function-level molecular design strategy with regime-dependent applicability, highlighting a complementary path for molecular discovery in biomolecular systems where conventional structure-centric paradigms are insufficient.

Article activity feed