Pharmacogenomics-Driven Multimodal Data Integration Improves Predictions of Adverse Drug Reactions in Cancer Patients using Machine Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurately predicting adverse drug reactions (ADRs) in cancer remains challenging. We applied a pharmacogenomics-driven machine learning framework that integrates genomic, environmental, and comorbidity data to enhance ADR prediction. Using UK Biobank, we analysed 26,235 antineoplastic-treated patients, identifying ADRs via ICD-10 codes. Features included GWAS-derived SNPs from 169 pharmacogenes, curated PharmGKB variants, demographics, lifestyle, laboratory biomarkers, and comorbidities. Five supervised models were trained; subgroup analyses assessed drug-specific and ADR-specific cohort performance. Logistic regression and multilayer perceptron models performed best. In drug-specific cohort, genetic data alone achieved AUC-ROC 0.82 (LR) and 0.80 (MLP), improving to 0.85 and 0.86 when all features were included. For secondary thrombocytopenia, LR and MLP achieved AUC-ROC 0.94 using genetic data only and 0.97 with all features. SHAP and univariate analyses highlighted female gender, elevated cystatin C, and alkaline phosphatase (all p < 0.001); haematologic and digestive cancers showed higher risk compared to other cancer types. This integrative approach supports data-driven clinical decision-making to reduce ADRs.