Interpretable machine learning-driven QSAR modeling for coagulation factor X inhibitors: from molecular descriptors to predictive potency
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Inhibition of Coagulation Factor X (FXa) is a clinically validated therapeutic strategy; however, developing safer and more selective inhibitors remains a major challenge. In this study, we developed an interpretable machine learning–based QSAR framework to predict both the inhibitory potency and activity class of small molecules targeting FXa. A structurally curated dataset of 6400 compounds was retrieved from ChEMBL, standardized, and encoded using 391 non-redundant Mordred descriptors following systematic filtering. Benchmarking of 42 regression and 42 classification algorithms identified ExtraTreesRegressor and XGBoostClassifier as the most robust models. The regression model achieved an R 2 of 0.760 and an RMSE of 0.831 on the independent test set, while the classification model reached an accuracy of 0.91 with balanced precision, recall, and an ROC-AUC of 0.962. SHAP (SHapley Additive exPlanations) analysis further enhanced interpretability by revealing that electrostatic, topological, and polar surface descriptors were the dominant contributors to FXa inhibitory potency. Applicability domain assessment using Williams plots confirmed that most compounds in both the training and test sets lay within the model’s reliable prediction space. Overall, the proposed QSAR pipeline integrates strong predictive performance with valuable mechanistic interpretability and rigorous validation, offering a practical computational tool for the virtual screening and rational design of novel FXa inhibitors.