Artificial Intelligence and Machine Learning for De Novo Cancer Drug Discovery: A Systematic Review of Generative Design and Validation Gaps
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Generative artificial intelligence (AI) and machine learning (ML) are emerging as powerful tools for de novo drug discovery. Oncology, which faces arduous and lengthy development timelines, could gain considerably from these approaches. Previous reviews have generally described generative models, but none have provided a systematic and quantitative synthesis of their application to cancer drug discovery. Methods A PRISMA-guided systematic review of PubMed was carried from January 2015 to June 20, 2025. Eligible studies applied generative AI or ML architectures to design new molecules with cancer relevance. Extracted data included study targets, model families, docking scores, binding free energies, in vitro potency (IC₅₀/EC₅₀), in vivo validation, ADME(T) assessments, code availability, and comparator performance. Analyses were descriptive and aimed at mapping the coverage and distribution of reported outcomes. Results From 1,130 records screened, 57 studies met eligibility. Kinases were the most frequent targets (49%), followed by enzymes, GPCRs, and immune proteins. Publications rose sharply after 2021. Under half of the studies reported docking scores or in vitro potency values, and 14% described in vivo testing. Binding free energy values appeared in 26% of papers, and ADME(T) assessments in 37%. Code availability was inconsistent, with public release in 54% of papers, highlighting reproducibility gaps. Conclusion Generative AI demonstrates potential to design biologically active anticancer compounds. However, evidence is predominantly comprised of computational results with limited experimental validation. Future work should give priority to consistent reporting, benchmarking frameworks, open code and data, and prospective in vitro and in vivo testing.