Efficient Search of Ultra-Large Synthesis On-Demand Libraries with Chemical Language Models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ultra-large building block catalogs provide inexpensive access to billions of synthesis- on-demand molecules, but the combinatorial scale renders conventional virtual screening impractical. We present Vector Virtual Screen (VVS), a score-function-agnostic machine learning framework for efficient navigation of combinatorial libraries and rapid identifi- cation of promising molecules for experimental validation. VVS comprises four key innovations: (i) the Embedding Decomposer, which factors molecules into building blocks in latent space; (ii) ChemRank, a correlation-based loss that improves retrieval precision; (iii) BBKNN, an algorithm for nearest-neighbor search directly in building block space; and (iv) a multi-scale hill-climbing algorithm for gradient-based navi- gation of molecular embedding vector databases. Across diverse scoring functions, VVS consistently outperforms existing methods in retrieving high-scoring molecules while evaluating only a fraction of the library, achieving orders-of-magnitude run- time improvements. By turning ultra-large libraries into tractable search spaces, VVS enables virtual screening to keep pace with the rapid expansion of chemical space and adapt seamlessly to future advances in scoring functions.

Article activity feed