Learning a CoNCISE language for small-molecule binding

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Rapid advances in deep learning have improved in silico methods for drug-target interaction (DTI) prediction. However, current methods do not scale to the massive catalogs that list millions or billions of commercially-available small molecules. Here, we introduce CoNCISE, a method that accelerates drug-target interaction (DTI) prediction by 2-3 orders of magnitude while maintaining high accuracy. CoNCISE uses a novel vector-quantized codebook approach and a residual-learning based training of hierarchical codes. Strikingly, we find that much of binding-specificity information in the small molecule space can be compressed into just 15 bits of information per compound, characterizing all small molecules into 32,768 hierarchically-organized binding categories. Our DTI architecture, which combines these compact ligand representations with fixed-length protein embeddings in a cross-attention framework, achieves state-of-the-art prediction accuracy at unprecedented speed. We demonstrate CoNCISE’s practical utility by indexing 6.4 billion ligands in the Enamine dataset, enabling researchers to query vast chemical libraries against a protein target in seconds. A “CoNCISE + docking” pipeline screened Enamine to propose strong binders (predicted K D ≈ 10-20 µ M) of three difficult-to-drug targets, each within two hours. CoNCISE’s advance could democratize access to largescale computational drug discovery, potentially enabling rapid identification of promising molecules for therapeutic targets and cellular perturbations.

Article activity feed