Duality in Simplicity and Accuracy in QSPR: A Machine Learning Framework for Predicting the Solubility of Diverse Pharmaceutical Acids in Deep Eutectic Solvents

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The study presents a systematic machine-learning study of the solubility of diverse pharmaceutical acids in deep eutectic solvents (DESs). Using an automated Du-al-Objective Optimization with Iterative feature pruning (DOO-IT) framework, we analyze a solubility dataset compiled from the literature for eight pharmaceutically important carboxylic acids and augmented with new measurements for mefenamic and niflumic acids in choline chloride– and menthol–based DESs, yielding N = 1,020 data points. Analysis with the corrected Akaike Information Criterion (AICc) reveals two distinct basins of high performance: an ultra-parsimonious 6‑descriptor model and a high-accuracy 16‑descriptor model, exposing a previously unrecognized duality in optimal model complexity. The 6‑descriptor model offers excellent predictive power suitable for rapid virtual screening, while the 16‑descriptor model—featuring a COS-MO‑RS–derived solubility descriptor—delivers the best absolute accuracy for applica-tions requiring maximum quantitative fidelity. These complementary models enable a practical two-tier screening strategy. The dual-solution landscape clarifies the trade-off between complexity and cost in QSPR for DES systems and shows that phys-ically meaningful energetic descriptors can replace or enhance explicit COSMO‑RS predictions depending on the application.

Article activity feed