Data-driven identification of biomedical systems using multi-scale analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Biomedical systems inherently exhibit multi-scale dynamics, making accurate system identification particularly challenging due to the complexity of capturing a wide time scale spectrum. Traditional methods capable of addressing this issue rely on explicit equations, limiting their applicability in cases where only observational data are available. To overcome this limitation, we propose a data-driven framework that integrates the Sparse Identification of Nonlinear Dynamics (SINDy) method, the multi scale analysis algorithm Computational Singular Perturbation (CSP) and neural networks (NNs). This framework allows the partition of the available dataset in subsets characterized by similar dynamics, so that system identification can proceed within these subsets without facing a wide time scale spectrum. Accordingly, when the full dataset does not allow SINDy to identify the proper model, CSP is employed for the generation of subsets of similar dynamics, which are then fed into SINDy. CSP requires the availability of the gradient of the vector field, which is estimated by the NNs. The framework is tested on the Michaelis-Menten model, for which various reduced models in analytic form exist at different parts of the phase space. It is demonstrated that the CSP-based data subsets allow SINDy to identify the proper reduced model in cases where the full dataset does not. In addition, it is demonstrated that the framework succeeds even in the cases where the available data set originates from stochastic versions of the Michaelis-Menten model. This framework is algorithmic, so system identification is not hindered by the dimensions of the dataset.
Author summary
Biomedical systems often evolve across multiple time scales, posing major challenges for constructing accurate models directly from data. Traditional model reduction techniques require explicit equations and thus cannot be applied when only observational data are available. To address this, we developed a data-driven framework that combines Sparse Identification of Nonlinear Dynamics (SINDy), Computational Singular Perturbation (CSP) and neural networks (NNs). Our approach automatically partitions a dataset into subsets characterized by similar dynamics, allowing valid reduced models to be identified in each region. When SINDy fails to recover a global model from the full dataset, CSP -leveraging Jacobian estimates from NNs-successfully isolates dynamical regimes where SINDy can be applied locally. We validated this framework using the Michaelis-Menten biochemical model, which is known to admit multiple reduced models in different regions of the phase space. Our method consistently identified the appropriate reduced dynamics, even when the data originated from stochastic simulations. Because our approach is algorithmic and equation-free, it is scalable to high-dimensional systems and robust to noise, offering a promising solution for data-driven model discovery in complex biomedical systems.