Systematic Evaluation of Molecular Descriptors for Machine Learning–Based IC₅₀ Prediction

I. Shokair
F. J. Dominguez-Gutierrez
A. Krzyczmonik
E. Gopi
A. Aligayev
M. Pruszynski

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate prediction of molecular bioactivity is a critical challenge in early-stage drug discovery, as it enables efficient prioritization of compounds within vast chemical space. Among bioactivity measures, the half-maximal inhibitory concentration (IC₅₀) is widely used to quantify compound potency against specific targets. Machine learning (ML) methods provide powerful tools for modeling IC₅₀ values, but their performance depends strongly on the choice of molecular descriptors. In this study, we systematically compare four descriptor classes-physicochemical properties, MACCS structural keys, Morgan circular fingerprints, and Mordred-generated descriptors for their ability to predict IC₅₀ values against the SSTR2 receptor. Curated and preprocessed datasets were used to train ML models, including ensemble stacking frameworks, to assess descriptor complementarity and robustness. Our results show that MACCS keys consistently outperform other descriptors, achieving R² values close to 0.9, reflecting their ability to capture pharmacophore-relevant structural motifs through predefined SMARTS patterns. To complement predictive benchmarking, SHAP (SHapley Additive exPlanations) analysis was applied to quantify feature contributions, linking statistical importance to chemically interpretable patterns. These results demonstrate the practical utility of substructure-focused fingerprints in ML-driven IC₅₀ prediction and provide guidance for descriptor selection strategies that enhance accuracy, interpretability, and generalizability in computational drug discovery. Scientific Contribution : This study presents a systematic evaluation of four molecular descriptor classes, physicochemical properties, MACCS structural keys, Morgan circular fingerprints, and Mordred descriptors, for their ability to predict IC₅₀ values against the SSTR2 receptor using machine learning models and ensemble frameworks. The results demonstrate that MACCS keys consistently outperform more complex descriptor families, achieving R² values close to 0.9, owing to their SMARTS-based encoding of pharmacophore-relevant structural features. Beyond predictive benchmarking, we employed SHAP (SHapley Additive exPlanations) analysis to link statistical feature importance with chemically interpretable patterns, thereby validating model robustness and providing mechanistic insights into descriptor performance. Collectively, these contributions highlight the practical utility of substructure-focused fingerprints in cheminformatics workflows and provide guidance for selecting interpretable, high-performing descriptors to enhance accuracy, generalizability, and interpretability in computational drug discovery.

Version published to 10.21203/rs.3.rs-7775414/v1 on Research Square
Nov 2, 2025

Framework for evaluating explainable AI in antimicrobial drug discovery

This article has 3 authors:
1. Abdulmujeeb T. Onawole
2. Mark A. T. Blaskovich
3. Johannes Zuegg
This article has no evaluationsLatest version Jan 29, 2026
Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

This article has 3 authors:
1. Brandon Yee
2. Maximilian Rutkowski
3. Wilson Collins
This article has no evaluationsLatest version Jan 28, 2026
Drug discovery guided by maximum drug likeness

This article has 3 authors:
1. Hao-Yu Zhu
2. Lu Xu
3. Wei Shi
This article has no evaluationsLatest version Dec 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Framework for evaluating explainable AI in antimicrobial drug discovery

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

Drug discovery guided by maximum drug likeness