Systematic Generation of Drug-Like Molecules via Biologically Safe Fragment-Based Rules Reveals Chemical Space Saturation Using RDKit and PubChem

Yathu Krishna Y K

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Finding new compounds that resemble drugs is still a major problem in cheminformatics and pharmaceutical development. Conventional de novo molecular design frequently makes use of intricate generative models, however it is still unclear if straightforward, rule-based techniques can produce chemically new structures. Here, we demonstrate the methodical creation of structurally varied, drug-like compounds through the use of fragment-based, biologically safe criteria that are encoded in Python and verified by RDKit. We created a stochastic SMILES generator that mimics key aspects of drug-likeness while avoiding toxic or unstable chemotypes by combining a small number of atoms (C, N, and O), basic aliphatic and aromatic ring systems, and frequently occurring functional groups like amides, esters, and alkyl chains. After being generated, molecular structures were canonicalized to remove redundant information, filtered for chemical validity, and then evaluated for novelty using the PubChemPy interface against the PubChem database. A very high degree of overlap between randomly built drug-like molecules and the existing chemical space was revealed by the fact that, despite the combinatorial diversity of created structures, the vast majority matched known chemicals in PubChem. This finding implies that a large portion of what is currently known may be replicated using even very basic generating principles; this is known as chemical space saturation. Our results offer a solid foundation for assessing the actual uniqueness of AI-based molecular generators and highlight the significance of comparing such systems to existing chemical repositories in addition to structural validity and drug-likeness. This work also emphasizes the necessity of more sophisticated rules or higher-order logic in order to get around the restrictions of the available public datasets and investigate truly unique areas of chemical space. To encourage openness and reproducibility, all code and datasets are made publicly available.

Version published to 10.21203/rs.3.rs-7278857/v1 on Research Square
Aug 5, 2025

Systematic molecular glue drug discovery with a high-throughput effector protein remodeling platform

This article has 26 authors:
1. Jia Lu
2. Kate Stuart
3. Rebecca Teague
4. Chloe Tarry
5. Robert Yan
6. Tabitha Morgan
7. Elizabeth Nock
8. Darcie S Mulhearn
9. Matt Jones
10. Michael H Knaggs
11. Ruben Alvarez Fernandez
12. Willie Yen
13. Aris Aristodemou
14. Eleanor Thompson
15. Penny Hayward
16. Juliane F Ripka
17. Bethany C Atkinson
18. Abigail Dear
19. Aruba Farooq
20. Mat Calder
21. Miguel B Coelho
22. Laura R Butler
23. Alberto Moreno
24. Christian Dillon
25. Richard J Boyce
26. Benedict CS Cross
This article has no evaluationsLatest version Jul 18, 2025
MGMG: Cell Morphology-Guided Molecule Generation for Drug Discovery

This article has 10 authors:
1. Qiaosi Tang
2. Daoyun Ding
3. Xiaoyong Yuan
4. Gustavo Seabra
5. Peter A Ramdhan
6. Chi-Yuan Liu
7. My T. Thai
8. Chenglong Li
9. Hendrik Luesch
10. Yanjun Li
This article has no evaluationsLatest version Jul 17, 2025
A universal model for drug-receptor interactions

This article has 12 authors:
1. Filipe Menezes
2. Adam Wahida
3. Tony Fröhlich
4. Phillip Grass
5. Jan Zaucha
6. Katarzyna Pustelny
7. Agata Barzowska-Gogola
8. Anna Czarna
9. Andreas Hochhaus
10. Johannes Nissen-Meyer
11. Marcus Conrad
12. Grzegorz M. Popowicz
This article has no evaluationsLatest version Aug 2, 2025

Listed in

Abstract

Article activity feed

Related articles

Systematic molecular glue drug discovery with a high-throughput effector protein remodeling platform

MGMG: Cell Morphology-Guided Molecule Generation for Drug Discovery

A universal model for drug-receptor interactions