Machine learning driven acceleration of biopharmaceutical formulation development using Excipient Prediction Software (ExPreSo)
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Formulation development of protein biopharmaceuticals has become increasingly challenging due to new modalities and higher target drug substance concentrations. The limited amount of drug substance available during development, coupled with extensive analytical requirements, restrict the number of excipients that can be empirically screened. There is a strong need for in silico tools to optimize excipient pre-selection before wet lab experiments. Here, we introduce Excipient Prediction Software (ExPreSo), a supervised machine learning algorithm that suggests excipients based on the properties of the protein drug substance and target product profile. ExPreSo was trained on a dataset comprising 335 regulatory-approved peptide and protein drug products. Predictive features included protein structural properties, protein language model embeddings, and drug product characteristics. ExPreSo showed good performance for the nine most prevalent excipients in biopharmaceutical formulations and minimal overfitting. A fast variant of ExPreSo using only sequence-based input features showed similar prediction power to slower models that relied on molecular modeling. Notably, an ExPreSo variant using only protein-based input features also showed good performance, indicating resilience to the influence of platform formulations. To our knowledge, this is the first machine learning algorithm to suggest biopharmaceutical excipients based on the dataset of regulatory-approved drug products. Overall, ExPreSo shows great potential to reduce the time, costs, and risks associated with excipient screening during formulation development.