CCGM: a Compound Coarse Grain Model representation for enhanced chemotype exploration, annotation and screening
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Structurally similar compounds often exhibit similar bioactivity, making similarity estimation an essential step in many cheminformatics workflows. Traditionally, compound similarity has been evaluated using diverse molecular representations, such as molecular fingerprints, compound 3D structural features, and physicochemical properties. These methods have proven effective, particularly during the early stages of drug discovery, where the primary goal is to identify initial hits from large compound libraries. However, these representation and methods often fall short during the hit-to-lead development phase, where modifications to the core scaffold or chemotype are performed and evaluated. To address this limitation, we developed the Compound-Coarse-Grain-Model (CCGM), a framework that represents structural features of a compound as nodes and edges within a simplified graph. This approach augments the pharmacophore and chemotype features of the compound within the graph, enabling the identification of compounds with similar chemotype and pharmacophore features more effectively than conventional methods. CCGM is particularly useful for when screening large libraries to identify compounds with similar chemotypes and for filtering generative designs to retain designs with similar pharmacophore features.