Trajectory-level dynamic validation of a phase-structured grey-box model for consolidated bioprocessing using literature-derived ethanol trajectories
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Consolidated bioprocessing (CBP) is a promising route for lignocellulosic ethanol production because enzyme generation, biomass hydrolysis, and fermentation can be integrated within a single process. However, the strong nonlinear coupling and limited observability of these stages make CBP difficult to model dynamically. This study presents a time-series-informed grey-box framework for dynamic validation of CBP using secondary, literature-derived ethanol trajectories. A curated benchmark dataset was assembled from heterogeneous published CBP studies, harmonized at the trajectory level, and partitioned by series; after curation and trimming, the dataset retained 23 trajectories and 211 observations. A data-driven timepoint model was first developed as a benchmark for ethanol prediction across diverse operating conditions. The same dataset was then used to calibrate a phase-structured grey-box model representing overlapping enzyme-production, hydrolysis, and fermentation stages through direct fitting to observed ethanol time series. Model performance was assessed using trajectory-level RMSE, MAE, final-point error, coefficient of determination, and residual diagnostics, together with analyses of fitted parameter distributions, phase activation behavior, and reconstructed trajectories. The grey-box model reproduced the dominant temporal patterns of literature-derived CBP ethanol trajectories while preserving mechanistic interpretability. Fitted parameters indicated that many trajectories could be captured through moderate adjustments in phase-specific kinetic capacity and phase timing rather than through major distortion of the underlying process structure. In direct comparison, the data-driven benchmark provided higher predictive accuracy, whereas the grey-box framework offered a more informative representation of CBP progression by linking ethanol accumulation to coordinated upstream and downstream process stages. Overall, the results show that secondary time-series data can extend CBP modeling beyond endpoint analysis toward dynamic validation, trajectory reconstruction, and soft-sensor-oriented process interpretation for lignocellulosic ethanol systems.