Bayesian inference of kinetic schemes for ion channels by Kalman filtering

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Summary: The manuscript is well written and overall clear, and the mathematical treatment is a rigorous tour-de-force. However, the reviewers raised a number of points that need further clarification, better discussion or amendment. These concerns are likely to be addressable largely by changes to the main text and software documentation along with some additional analyses. The study is very nice and ambitious, but clarity is a bit impaired by dealing with perhaps too many issues. The state inference and the bayesian model selection are very important but completely different issues. The authors should consider whether they may be better treated separately, or for a more specialized audience.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Inferring adequate kinetic schemes for ion channel gating from ensemble currents is a daunting task due to limited information in the data. We address this problem by using a parallelized Bayesian filter to specify hidden Markov models for current and fluorescence data. We demonstrate the flexibility of this algorithm by including different noise distributions. Our generalized Kalman filter outperforms both a classical Kalman filter and a rate equation approach when applied to patch-clamp data exhibiting realistic open-channel noise. The derived generalization also enables inclusion of orthogonal fluorescence data, making unidentifiable parameters identifiable and increasing the accuracy of the parameter estimates by an order of magnitude. By using Bayesian highest credibility volumes, we found that our approach, in contrast to the rate equation approach, yields a realistic uncertainty quantification. Furthermore, the Bayesian filter delivers negligibly biased estimates for a wider range of data quality. For some data sets, it identifies more parameters than the rate equation approach. These results also demonstrate the power of assessing the validity of algorithms by Bayesian credibility volumes in general. Finally, we show that our Bayesian filter is more robust against errors induced by either analog filtering before analog-to-digital conversion or by limited time resolution of fluorescence data than a rate equation approach.

Article activity feed

  1. Reviewer #3:

    The work by Münch et al addresses an important problem of modeling data that originates from multiple channels (100s-1000s) by establishing a Bayesian inference-based framework to extend an existing Kalman filter-based method. They convincingly demonstrate that their approach is much more accurate at quantifying channels using previous, and is impressively able to combine multiple experimental modalities. Most importantly, as a Bayesian method, this approach allows the incorporation of prior information such as the diffusion limit or previous experiments, and also allows one to perform model selection to select the best kinetic model of the data (although this aspect is less developed). In particular, the Bayesian approach of this work is an important advance in the field.

    1. The manuscript needs line editing and proofreading (e.g., on line 494, "Roa" should be "Rao"; missing an equals sign in equation 13). Additionally, in many paragraphs, several of the sentences are tangential and distract from communicating the message of the paper (e.g., line 55). Removing them will help to streamline the text, which is quite long.

    2. Even more emphasis on the approximation of n(t) as being distributed according to a multivariate normal, and thus being continuous, should be placed in the main text. To my understanding, this limits the applicability of the method to data with > ~100s of channels; although the point is not investigated that I could find. In Fig. 3, it seems the method is only benchmarked to a lower limit of ~500 channels. Although an investigation of performance below that point would be interesting, it is only necessary to discuss the approximate lower bound cutoff.

    3. The methods section should include information concerning the parameter initialization choices, HMC parameters (e.g. number of steps) and any burn-in period used in the analyses used in Figs. 3-6

    4. In the section on priors, the entire part concerning the use of a beta distribution should be removed or replaced, because it is a probabilistic misrepresentation of the actual prior information that the authors claim to have in the manuscript text. The max-entropy prior derived for the situation described in the text (i.e., an unknown magnitude where you don't know any moments but do have upper and lower bounds; the latter could be from the length from the experiment) is actually P(x) = (ln(x_{max}) - ln(x_{min}))^{-1} * x^{-1}. I'm happy to discuss more with the authors.

    5. Achieving the ability to rigorously perform model selection is a very impressive aspect of this work and a large contribution to the field. However, the manuscript offers too many solutions to performing that model selection itself along with a long discussion of the field (for instance, line 376-395 could be completely cut). Since probabilistic model selection is an entire area of study by itself, the authors do not need to present underdeveloped investigations of each of them in a paper on modeling channel data (e.g., of course WAIC out performs AIC. Why not cover BIC and WBIC?). The authors should pick one, and maybe write a second paper on the others instead of presenting non-rigorous comparisons (e.g., one kinetic scheme and set of parameters). As a side note, it is strange that the authors did not consider obtaining evidences or Bayes factors to directly perform Bayesian model selection - for instance, they could have used thermodynamic integration since they used MC to obtain posteriors anyway (c.f., Computing Bayes Factors Using Thermodynamic Integration by Lartillot and Philippe, Systematic Biology, 2006, 55(2), 195-207. DOI: 10.1080/10635150500433722)

  2. Reviewer #2:

    Extracting ion channel kinetic models from experimental data is an important and perennial problem. Much work has been done over the years by different groups, with theoretical frameworks and computational algorithms developed for specific combinations of data and experimental paradigms, from single channels to real-time approaches in live neurons. At one extreme of the data spectrum, single channel currents are traditionally analyzed by maximum likelihood fitting of dwell time probability distributions; at the other extreme, macroscopic currents are typically analyzed by fitting the average current and other extracted features, such as activation curves. Robust analysis packages exist (e.g., HJCFIT, QuB), and they have been put to good use in the literature.

    Münch et al focus here on several areas that need improvement: dealing with macroscopic recordings containing relatively low numbers of channels (i.e., hundreds to tens of thousands), combining multiple types of data (e.g., electrical and optical signals), incorporating prior information, and selecting models. The main idea is to approach the data with a predictor-corrector type of algorithm, implemented via a Kalman filter that approximates the discrete-state process (a meta-Markov model of the ensemble of active channels in the preparation) with a continuous-state process that can be handled efficiently within a Bayesian estimation framework, which is also used for parameter estimation and model selection.

    With this approach, one doesn't fit the macroscopic current against a predicted deterministic curve, but rather infers - point by point - the ensemble state trajectory given the data and a set of parameters, themselves treated as random variables. This approach, which originated in the signal processing literature as the Forward-Backward procedure (and the related Baum-Welch algorithm), has been applied since the early 90s to single channel recordings (e.g., Chung et al, 1990), and later has been extended to macroscopic data, in a breakthrough study by Moffatt (2007). In this respect, the study by Münch et al is not necessarily a conceptual leap forward. However, their work strengthens the existing mathematical formalism of state inference for macroscopic ion channel data, and embeds it very nicely in a rigorous Bayesian estimation framework.

    The main results are very convincing: basically, model parameters can be estimated with greater precision - as much as an order of magnitude better - relative to the traditional approach where the macroscopic data are treated as noisy but deterministic (but see my comments below). Estimate uncertainty can be further improved by incorporating prior information on parameters (e.g., diffusion limits), and by including other types of data, such as fluorescence. The manuscript is well written and overall clear, and the mathematical treatment is a rigorous tour-de-force.

    There are several issues that should be addressed by the authors, as listed below.

    1. I think packaging this study as a single manuscript for a broad-audience is not optimal. First, the subject is very technical and complex, and the target audience is probably small. Second, the study is very nice and ambitious, but I think clarity is a bit impaired by dealing with perhaps too many issues. The state inference and the bayesian model selection are very important but completely different issues that may be better treated separately, perhaps for a more specialized readership where they can be developed in more detail. Tutorial-style computational examples must be provided, along with well commented/documented code. The interested readers should be able to implement the method described here in their own code/program.

    2. The authors should clearly discuss the types of data and experimental paradigms that can be optimally handled by this approach, and they must explain when and where it fails or cannot be applied, or becomes inefficient in comparison with other methods. One must be aware that ion channel data are very often subject to noise and artifacts that alter the structure of microscopic fluctuations. Thus, I would guess that the state inference algorithm would work optimally with low noise, stable, patch-clamp recordings (and matching fluorescence recordings) in heterologous expression systems (e.g., HEK293 cells), where the currents are relatively small, and only the channel of interest is expressed (macropatches?). I imagine it would not be effective with large currents that are recorded with low gain, are subject to finite series resistance, limited rise time, restricted bandwidth, colored noise, contaminated by other currents that are (partially) eliminated with the P/n protocol with the side effect of altering the noise structure, power line 50/60 Hz noise, baseline fluctuations, etc. This basically excludes some types of experimental data and experimental paradigms, such as recordings from neurons in brain slices or in vivo, oocytes, etc. Of course, artifacts can affect all estimation algorithms, but approaches based on fitting the predicted average current have the obvious benefit of averaging out some of these artifacts.

    The discussion in the manuscript is insufficient in this regard and must be expanded. Furthermore, I would like to see the method tested under non-ideal but commonly occurring conditions, such as limited bandwidth and in the presence of contaminating noise. For example, compare estimates obtained without filtering with estimates obtained with 2, 3 times over-filtering, with and without large measurement noise added (whole cell recordings with low-gain feedback resistors and series resistance compensation are quite noisy), with and without 50/60 Hz interference. How does the algorithm deal with limited bandwidth that distorts the noise spectrum? How are the estimated parameters affected? The reader will have to get a sense of how sensitive this method is to artifacts.

    1. A better comparison with alternative parameter estimation approaches is necessary. First of all, explain more clearly what is different from the predictor-corrector formalism originally proposed by Moffatt (2007). The manuscript mentions that it expands on that, but exactly how? If it is only an incremental improvement, a more specialized audience is more appropriate.

    Second, the method proposed by Celentano and Hawkes, 2004, is not a predictor-corrector type but it utilizes the full covariance matrix between data values at different time points. It seems to me that the covariance matrix approach uses all the information contained in the macroscopic data and should be on par with the state inference approach. However, this method is only briefly mentioned here and then it's quickly dismissed as "impractical". I am not at all convinced that it's impractical. We all agree that it's a slower computation than, say, fitting exponentials, but so is the Kalman filter. Where do we draw the line of impracticability? Computational speed should be balanced with computational simplicity, estimation accuracy, and parameter and model identifiability. Moreover, that method was published in 2004, and the computational costs reported there should be projected to present day computational power. I am not saying that the authors should code the C&H procedure and run it here, but should at least give it credit and discuss its potential against the KF method.

    The only comparison provided in the manuscript is with the "rate equation" approach, by which the authors understand the family of methods that fit the data against a predicted average trajectory. In principle, this comparison is sufficient, but there are some issues with the way it's done.

    Table 3 compares different features of their state inference algorithm and the "rate equation fitting", referencing Milescu et al, 2005. However, there seems to be a misunderstanding: the algorithm presented in that paper does in fact predict and use not only the average but also - optionally - the variance of the current, as contributed by stochastic state fluctuations and measurement noise. These quantities are predicted at any point in time as a function of the initial state, which is calculated from the experimental conditions. In contrast, the KF calculates the average and variance at one point in time as a projection of the average and variance at the previous point. However, both methods (can) compare the data value against a predicted probability distribution. The Kalman filter can produce more precise estimates but presumably with the cost of more complex and slower computation, and increased sensitivity to data artifacts.

    Fig. 3 is very informative in this sense, showing that estimates obtained with the state inference (KF) algorithm are about 10 times more precise that those obtained with the "rate equation" approach. However, for this test, the "rate equation" method was allowed to use only the average, not the variance.

    Considering this, the comparison made in Fig 3 should be redone against a "rate equation" method that utilizes not only the expected average but also the expected variance to fit the data, as in Milescu et al, 2005. Calculating this variance is trivial and the authors should be able to implement it easily (and I'll be happy to provide feedback). The comparison should include calculation times, as well as convergence.

    1. As shown in Milescu et al, 2005, fitting macroscopic currents is asymptotically unbiased. In other words, the estimates are accurate, unless the number of channels is small (tens or hundreds), in which case the multinomial distribution is not very well approximated by a Gaussian. What about the predictor-corrector method? How accurate are the estimates, particularly at low channel counts (10 or 100)? Since the Kalman filter also uses a Gaussian to approximate the multinomial distribution of state fluctuations, I would also expect asymptotic accuracy. Parameter accuracy should be tested, not just precision.

    2. The manuscript nicely points out that a "rate equation" approach would need 10 times more channels (N) to attain the same parameter precision as with the Kalman filter, when the number of channels is in the approximate range of 10^2 ... 10^4. With larger N, the two methods become comparable in this respect.

    This is very important, because it means that estimate precision increases with N, regardless of the method, which also means that one should try to optimize the experimental approach to maximize the number of channels in the preparation. However, I would like to point out that one could simply repeat the recording protocol 10 times (in the same cell or across cells) to accumulate 10 times more channels, and then use a "rate equation" algorithm to obtain estimates that are just as good. Presumably, the "rate equation" calculation is significantly faster than the Kalman filter (particularly when one fits "features", such as activation curves), and repeating a recording may only add seconds or minutes of experiment time, compared to a comprehensive data analysis that likely involves hours and perhaps days. Although obvious, this point can be easily missed by the casual reader and so it would be useful to be mentioned in the manuscript.

    1. Another misunderstanding is that a current normalization is mandatory with "rate equation" algorithms. This is really not the case, as shown in Milescu et al, 2005, where it is demonstrated clearly that one can explicitly use channel count and unitary current to predict the observed macroscopic data. Consequently, these quantities can also be estimated, but state variance must be included in the calculation. Without variance, one can only estimate the product i x N, where i is unitary current and N is channel count. This should be clarified in the manuscript: any method that uses variance can be used to estimate i and N, not just the Kalman filter. In fact, the non-stationary noise analysis does exactly that: a model-blind estimation of N and i from non-equilibrium data. Also, one should be realistic here: in some circumstances it is far more efficient to fit data "features", such as the activation curve, in which case the current needs to be normalized.

    2. I think it's great that the authors develop a rigorous Bayesian formalism here, but I think it would be a good idea to explain - even briefly - how to implement a (presumably simpler) maximum likelihood version that uses the Kalman filter. This should satisfy those readers who are less interested in the Bayesian approach, and will also be suitable for situations when no prior information is available.

    3. The Bayesian formalism is not the only way of incorporating prior knowledge into an estimation algorithm. In fact, it seems to me that the reader would have more practical and pressing problems than guessing what the parameter prior distribution should be, whether uniform or Gaussian or other. More likely one would want to enforce a certain KD, microscopic (i)reversibility, an (in)equality relationship between parameters, a minimum or maximum rate constant value, or complex model properties and behaviors, such as maximum Popen or half-activation voltage. A comprehensive framework for handling these situations via parameter constraints (linear or non-linear) and cost function penalty has been recently published (Salari et al/Navarro et al, 2018). Obviously, the Bayesian approach has merit, but the authors should discuss how it can better handle the types of practical problems presented in those papers, if it is to be considered an advance in the field, or at least a usable alternative.

    4. Discuss the practical aspects of optimization. For example, how is convergence established? How many iterations does it take to reach convergence? How long does it take to run? How does it scale with the data length, channel count, and model state count? How long does it take to optimize a large model (e.g., 10 or 20 states)? Provide some comparison with the "rate equation method".

    5. Here and there, the manuscript somehow gives the impression that existing algorithms that extract kinetic parameters by fitting the average macroscopic current ("fitting rate equations") are less "correct", or ignorant of the true mathematical description of the data. This is not the case. Published algorithms that I know of clearly state what data they apply to, what their limitations are, and what approximations were made, and thus they are correct within that defined context and are meant to be more effective than alternatives. Some quick editing throughout the manuscript should eliminate this impression.

    6. The manuscript refers to the method where the data are fitted against a predicted current as "rate equations". I don't actually understand what that means. The rate equation is something intrinsic to the model, not a feature of any algorithm. An alternative terminology must be found. Perhaps different algorithms could be classified based on what statistical properties are used and how. E.g., average (+variance) predicted from the starting probabilities (Milescu et al, 2005), full covariance (Celentano and Hawkes, 2004), point-by-point predictor-corrector (Moffatt, 2007).

  3. Reviewer #1:

    The authors develop a Bayesian approach to modeling macroscopic signals arising from ensembles of individual units described by a Markov process, such as a collection of ion channels. Their approach utilizes a Kalman filter to account for temporal correlations in the bulk signal. For simulated data from a simple ion channel model where ligand binding drives pore opening, they show that their approach enhances parameter identifiability over an existing approach based on fitting average current responses. Furthermore, the approach can include simultaneous measurement of multiple signals (e.g. current and fluorescence) which further increases parameter identifiability. They also show how appropriate choice of priors can help model and parameter identification.

    The application of Bayesian approaches to kinetic modeling has recently become popular in the ion channel community. The need for approaches that inform on parameter distributions and their identifiability, as well as allow model selection, is unquestioned. Also, it is ideal to use as much information in the experimental data as possible, including temporal correlations. As such, the authors’ addition is a valuable contribution.

    Comments:

    I note that my comments are restricted largely to the results rather than the mathematical derivation of the author's approach.

    1. I understand that this is somewhat secondary to the paper's intellectual contribution. However, one thing that would be enormously useful is accompanying software usable by others. The supplied code is not well commented, and it is unclear whether it is applicable beyond the specific models examined in the paper. It was supplied as .txt files, but looks like C code. I did not spend the time to get it working, so an accompanying GitHub page or some such with detailed instructions for how to apply this approach for one's own model of interest would make this contribution infinitely better. Even better if there was a GUI, although easily adaptable code is of primary importance.

    2. What are the temporal resolutions of the current and fluorescence simulations shown in Fig 1? I assume that they are the same. However, most current recordings are much higher temporal resolution than fluorescence recordings. If you were to reduce the sample rate of the binding fluorescence relative to current simulations to something experimentally reasonable, how would the resulting time averaging of the binding signal impact its enhancement of parameter identifiability?

    3. For comparison, it would also be nice to see how addition of the binding signal in the data helps the RE approach. i.e. Is addition of the binding signal more important than choice of RE vs KF, or is optimization method still an important factor in terms of correctly identifying the model's rate constants or in selecting the true model?

    4. Fig 7: For PC data, why is RE model BC appear to be better than KF model BC if the KF model does a better job at estimating the parameters and setting non true rates to zero? Doesn't this suggest that RE with cross validation is better than the proposed KF approach? In terms of parameter estimates (i.e. as shown in Fig. 3), how does RE + BC stack up?

  4. Summary: The manuscript is well written and overall clear, and the mathematical treatment is a rigorous tour-de-force. However, the reviewers raised a number of points that need further clarification, better discussion or amendment. These concerns are likely to be addressable largely by changes to the main text and software documentation along with some additional analyses. The study is very nice and ambitious, but clarity is a bit impaired by dealing with perhaps too many issues. The state inference and the bayesian model selection are very important but completely different issues. The authors should consider whether they may be better treated separately, or for a more specialized audience.