Benchmarking model-based design of experiment approaches with a pharmaceutical crystallisation emulator
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study presents a benchmarking framework for model-based design of experiments (MB-DoE) in pharmaceutical crystallisation using an in-silico emulator. The emulator simulates a generic cooling-antisolvent crystallisation underpinned by a mechanistic population balance model (PBM) calibrated with experimental data from the self-driving Scale-Up CMC DataFactory. The PBM generated over 20,000 synthetic experiments to train a random forest emulator with an error < 1% which served as the benchmark function for evaluating optimisation strategies. Twelve initial design strategies were assessed, combining Latin hypercube sampling (LHS), Sobol sequences, and random sampling with varying sample sizes. These were followed by Bayesian optimisation using a Gaussian process (GP) and six acquisition functions: Mean, Random Sampling, Space Filling, Expected Hypervolume Improvement (EHVI), Noisy EHVI (NEHVI), and Pareto efficient global optimisation (NParEGO) leading to 360 emulated experimental campaigns. Performance was assessed using the hypervolume metric. Sobol outperformed other methods at low sample counts, while LHS showed consistent improvement with larger sample sizes. EHVI consistently delivered the highest hypervolume values, indicating effective identification and convergence towards optimal crystallisation conditions. The emulator was also demonstrated on a non-standard optimisation method using gradient boosting. However, the GP with expected improvement outperformed gradient boosting by between 6 and 12% across differing initial conditions. These results highlight the emulator’s utility in revealing performance differences between optimisation approaches. The benchmark function demonstrated in this work represents a novel, adaptable and scalable framework for systematically evaluating new and existing MB-DoE strategies under realistic noisy pharmaceutical conditions.