Hybrid modeling framework for bioprocesses with minimal prior knowledge and limited data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
A hybrid model combines a mechanistic model, described by ordinary differential equations, with a data-driven component, such as neural networks. This strategy leverages both the physical knowledge of the system and the predictive power of machine learning methods, and it has been applied to a variety of bioprocesses. Nevertheless, it faces important challenges: (I) the often limited availability of experimental data and (II) the incorporation of uncertain mechanistic knowledge. To address these limitations, we propose a framework that generates mathematical models that enforce only minimal yet essential qualitative properties such as: non-negativity of state variables, biogenesis, and the requirement that biomass mediates both substrate uptake and product formation. This avoids reliance on kinetic expressions while ensuring that only certain knowledge is introduced, thus addressing (I). Because batch cultures are the standard experimental setup in early-stage bioprocess research and are routinely repeated under different conditions, we focus on this type of experiment to address (II). We propose a minibatch optimization scheme based on the ADAM optimizer, where each batch experiment naturally corresponds to a minibatch in the algorithm. Using both synthetic and experimental datasets, we show that only a few batch experiments are sufficient to train accurate and generalizable hybrid models, and that incorporating stochastic regularization provides better interpretability to the model. Overall, this work provides a structured and biologically grounded alternative to existing hybrid modeling approaches, improving how machine-learning tools can be integrated into bioprocess modeling under limited data and limited prior knowledge.