From QbD to Explainable AI: Interpretable Random Forest Surrogates for Design Space Understanding of Voriconazole–β-Cyclodextrin Inclusion Complexes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Voriconazole formulation development is often constrained by limited aqueous solubility and variable dissolution behavior. β-Cyclodextrin (β-CD) inclusion complexation prepared by solvent-free co-grinding is a practical solubility-enhancement strategy. A recent Quality-by-Design (QbD) study optimised this system using a central composite design (CCD); however, polynomial response surfaces can be difficult to interpret locally across the design space. Objective To perform an explainable artificial intelligence (XAI) secondary reanalysis of a published QbD CCD dataset for voriconazole–β-CD inclusion complexes, generating interpretable Random Forest (RF) surrogates for design-space understanding and comparing model behavior against published QbD checkpoints. Methods Factor–response data (13 CCD runs) were extracted exactly as reported for β-CD amount (A, mg) and grinding time (B, min) with responses solubility (Y1, mg/mL) and cumulative drug release (Y2, %CDR). Two RF regression surrogates (RF–Y1 and RF–Y2) were trained and evaluated by leave-one-out cross-validation (LOOCV). Published checkpoints were used for benchmarking against QbD predictions. Explainability was implemented using TreeSHAP, permutation feature importance (PFI), partial dependence/ICE plots, and LIME. RF-based response surfaces and a multi-response desirability map were generated to identify high-performance regions. Results LOOCV indicated modest predictive performance (Y1: R²=0.1629, MAE = 11.2217, RMSE = 14.4720; Y2: R²=0.2208, MAE = 12.3883, RMSE = 15.5143). RF design-space mapping indicated increasing Y1 and Y2 with higher A and B, with a broad high-response region. The RF desirability optimum occurred at A = 544.99 mg and B = 26.84 min with predicted Y1 = 66.09 mg/mL, Y2 = 89.08%, and desirability = 0.887. At the published high-performance checkpoint (A = 600 mg, B = 30 min), RF predictions closely matched the experimental results (Y1 ≈ 66.09 vs 65.86 mg/mL; Y2 ≈ 89.08 vs 85.93%), whereas the QbD polynomial overpredicted, especially for Y2. SHAP global importance suggested A dominated Y1 (mean |SHAP|: A = 7.37; B = 3.78), while Y2 depended on both factors (A = 7.48; B = 7.82); PFI supported strong influence of A (ΔMAE: Y1 A ≈ 11.03, B ≈ 5.42; Y2 A ≈ 12.31, B ≈ 5.97). Conclusion Explainable ML did not replace QbD; it augmented a published QbD dataset with transparent, multi-view interpretability and an alternative design-space depiction. RF + XAI triangulated factor priority (carrier-driven solubility; joint carrier–process control of release), highlighted plateau-like high-performance regions, and provided calibration-friendly predictions at the optimised condition. This workflow offers a practical template for integrating explainable AI into formulation-oriented QbD analyses.

Article activity feed