Discovery of Novel Natural Product-Derived EGFR Inhibitors Using Multiple Linear Regression, Stacked Ensemble Regression, and Fingerprinting Approaches
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study developed and validated Quantitative Structure-Activity Relationship (QSAR) models to predict the inhibitory activity (pIC\textsubscript{50}) of 225 EGFR inhibitors. A genetic algorithm selected eight molecular descriptors, which were used to construct two models: a multiple linear regression (MLR) and a stacked ensemble regression (SER). The Stacked Ensemble Regression (SER) model showed only marginally higher accuracy (\((\Delta r^{2} = + 0.022)\)) but exhibited greater instability (\((\Delta r_{m(test)}^{2})\)= 0.0802 vs. MLR's 0.0184) and reduced interpretability. Thus, MLR was retained as the primary model due to its OECD-compliant mechanistic transparency and superior generalizability. Rigorous applicability domain analysis confirmed the MLR model's reliability. Notably, molecular docking (PDB ID: 8A27) identified a top-ranked inhibitor (Compound 121) with high binding affinity (-12.023 kcal/mol), forming critical hydrogen bonds and hydrophobic interactions with EGFR's active site. Virtual screening of 32 structural analogs of Compound 121 revealed additional promising candidates. This work provides a robust framework for EGFR inhibitor discovery, combining computational modeling with structural insights.