Predictive Bioactivity Modeling and Structural Binding Analysis for the Identification of Potential SMYD3 Modulators
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
SMYD3 is a lysine methyltransferase involved in epigenetic regulation and oncogenic transcription, making it an attractive yet challenging therapeutic target. This study presents an integrated computational workflow combining machine learning based quantitative structure-activity relationship (QSAR) modelling, external bioactivity prediction, molecular docking, molecular dynamics (MD) simulations, and network analysis to prioritize potential SMYD3 inhibitors. ML-QSAR models were constructed using multiple molecular descriptor representations and regression algorithms. A MACCS fingerprint-based Random Forest model showed the most reliable external predictivity, supported by cross-validation, applicability domain assessment, and Y-randomization analysis. Feature interpretability using SHAP highlighted a small set of chemically meaningful structural patterns that consistently influenced activity prediction. The validated model was then applied to an external compound library, and bioactivity was predicted only for compounds lying within the defined applicability domain. This screening enabled the prioritization of in-domain candidates with moderate predicted potency and acceptable structural coverage relative to the training space. Structure-based evaluation using the crystallographic SMYD3 structure demonstrated that selected compounds bind within the experimentally validated active site and engage key residues observed in the co-crystal complex. Extended 250 ns MD simulations indicated that CHEMBL4472528 maintained stable binding, persistent polar and hydrophobic interactions, and favorable binding free energies compared with both the co-crystal ligand and other screened candidates. Network and pathway analysis further placed SMYD3 within a focused chromatin-associated and transcriptional regulatory context, supporting the biological relevance of the target. This work provides a reproducible computational framework for SMYD3 inhibitor prioritization and highlights CHEMBL4472528 as a promising scaffold for further investigation.