Duality in Simplicity and Accuracy in QSPR: A Machine Learning Framework for Predicting the Solubility of Diverse Pharmaceutical Acids in Deep Eutectic Solvents
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The study presents a systematic machine-learning study of the solubility of diverse pharmaceutical acids in deep eutectic solvents (DESs). Using an automated Du-al-Objective Optimization with Iterative feature pruning (DOO-IT) framework, we analyze a solubility dataset compiled from the literature for eight pharmaceutically important carboxylic acids and augmented with new measurements for mefenamic and niflumic acids in choline chloride– and menthol–based DESs, yielding N = 1,020 data points. Analysis with the corrected Akaike Information Criterion (AICc) reveals two distinct basins of high performance: an ultra-parsimonious 6‑descriptor model and a high-accuracy 16‑descriptor model, exposing a previously unrecognized duality in optimal model complexity. The 6‑descriptor model offers excellent predictive power suitable for rapid virtual screening, while the 16‑descriptor model—featuring a COS-MO‑RS–derived solubility descriptor—delivers the best absolute accuracy for applica-tions requiring maximum quantitative fidelity. These complementary models enable a practical two-tier screening strategy. The dual-solution landscape clarifies the trade-off between complexity and cost in QSPR for DES systems and shows that phys-ically meaningful energetic descriptors can replace or enhance explicit COSMO‑RS predictions depending on the application.