Interpretable Machine Learning-Driven QSAR Modeling for Coagulation Factor X Inhibitors: From Molecular Descriptors to Predictive Potency
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The inhibition of Coagulation Factor X (FXa) is a clinically validated strategy in anticoagulant therapy; however, the development of safer and more selective inhibitors remains a critical challenge. In this study, we present a machine learning–enhanced quantitative structure–activity relationship (QSAR) modeling framework to predict the inhibitory potency (pKi) of small molecules targeting FXa. Bioactivity data were curated from the ChEMBL database and standardized, resulting in a filtered dataset of 6400 structurally validated compounds. The molecular descriptors were calculated using the Mordred platform and filtered for statistical robustness. Two predictive approaches were employed: regression using the ExtraTrees Regressor and binary classification using the XGBoost Classifier. The regression model achieved an R² of 0.760 and an RMSE of 0.831 on the test set. The classification model demonstrated strong performance across all key metrics, achieving an accuracy of 0.91, precision of 0.92 (class 0) and 0.89 (class 1), recall of 0.89 (class 0) and 0.92 (class 1 ) , and an F1-score of 0.91 for both classes. These results indicate a balanced and robust predictive capability across active and inactive compounds. SHAP (SHapley Additive exPlanations) analysis enabled the interpretation of key structural features driving activity, revealing that electrostatic and topological descriptors were the most dominant. The applicability domain analysis was conducted using the leverage approach, and the Williams plots indicated that all compounds in both the training and test sets fell within the reliable prediction space of the regression model. We are confident that the models developed in this study provide not only strong predictive performance but also interpretable insights and can be effectively used to guide the rational design and screening of novel FXa inhibitors in anticoagulant drug discovery.