Multiscale Probabilistic Modeling: A Bayesian Approach to Augment Mechanistic Models of Cell Signaling with Machine-Learning Predictions of Binding Affinity

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Computational models in systems biology are often underdetermined—that is, there is little data relative to the complexity and size of the model. The lack of data is primarily due to limits in our ability to observe specific biological systems and restricts the utility of computational models. However, there are a growing number of experimental databases in biology. While these databases provide more observations, they often do not have observations that match the system of interest exactly. For example, database measurements might be collected at different experimental conditions or on a different scale compared to the system of interest. Here, we investigate what information can be gleaned from generalizing databases across these differences in the context of modeling a specific system – cell signaling. Ultimately, our goal is to better determine models of specific systems, thereby increasing their utility. To do this, we propose a novel, multiscale, probabilistic framework. We use this framework to integrate measurements of protein structure from the Protein Data Bank and measurements of amino acid sequence from the Universal Protein Resource into the parameter inference of cell signaling models. Then, we quantify exactly what information is gained from these measurements when modeling cell signaling. We choose to investigate the utility of these databases in the context of dynamic cell signaling models because experimental measurements of the variables of interest, protein dynamics, are still quite limited. We find that we can successfully integrate measurements from these databases to significantly improve parameter estimation of signaling models. The impact of sequence and structure measurements on model predictions depends on the sensitivity of the prediction to perturbations in the parameter values. Overall, this study demonstrates that measurements of protein structure and amino acid sequence can be leveraged to better inform parameters in models of cell signaling.

Author Summary

Computational models of cell signaling have provided mechanistic insights into complex biological systems, including in physiological and disease settings. Accurate and predictive modeling critically depends on the precise estimation of model parameters, which is often hindered by the limited availability of experimental data. In this study, we present a novel multiscale probabilistic inference framework that broadens the scope of data types that can be leveraged for parameter estimation for models of cell signaling. The framework integrates a machine learning pipeline with a generalizable parameter inference approach, enabling the use of experimental data across scales. Specifically, we demonstrate that incorporating protein amino acid sequence and 3D structural data enhances parameter estimation compared to traditional measurements such as protein concentrations over time. Improving parameter estimation increases the robustness and applicability of cell signaling models. Ultimately, our framework facilitates use of a broader range of data and supports the development of predictive computational models that increase our understanding of cell signaling.

Article activity feed