Information, certainty, and learning

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This important paper shows that the acquisition and expression of Pavlovian conditioned responding are lawfully related to temporal characteristics of an animal's conditioning experience. It showcases a rigorous experimental design, several different approaches to data analysis, careful consideration of prior literature, and a thorough introduction. The evidence supporting the conclusions is strong and convincing. The paper will have a general appeal to those interested in the behavioral and neural analysis of Pavlovian conditioning.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

More than four decades ago, Gibbon and Balsam (1981) showed that the acquisition of Pavlovian conditioning in pigeons is directly related to the informativeness of the conditioning stimulus (CS) about the unconditioned stimulus (US), where informativeness is defined as the ratio of the US-US interval ( C ) to the CS-US interval ( T ). However, the evidence for this relationship in other species has been equivocal. Here, we describe an experiment that measured the acquisition of appetitive Pavlovian conditioning in 14 groups of rats trained with different C / T ratios (ranging from 1.5 to 300) to establish how learning is related to informativeness. We show that the number of trials required for rats to start responding to the CS is determined by the C / T ratio and, remarkably, the specific scalar relationship between the rate of learning and informativeness aligns very closely to that previously obtained with pigeons. We also found that the response rate after extended conditioning is strongly related to T , with the terminal CS response rate being a scalar function of the CS reinforcement rate (1/ T ). Moreover, this same scalar relationship extended to the rats’ response rates during the (never-reinforced) inter-trial interval, which was directly proportional to the contextual rate of reinforcement (1/ C ). The findings establish that animals encode rates of reinforcement, and that conditioning is directly related to how much information the CS provides about the US. The consistency of the data across species, captured by a simple regression function, suggests a universal model of conditioning.

Article activity feed

  1. eLife Assessment

    This important paper shows that the acquisition and expression of Pavlovian conditioned responding are lawfully related to temporal characteristics of an animal's conditioning experience. It showcases a rigorous experimental design, several different approaches to data analysis, careful consideration of prior literature, and a thorough introduction. The evidence supporting the conclusions is strong and convincing. The paper will have a general appeal to those interested in the behavioral and neural analysis of Pavlovian conditioning.

  2. Joint Public Review:

    The subject area will have general appeal to those interested in the study of Pavlovian conditioning. The paper is important, showcasing a rigorous experimental design, several different approaches to data analysis, careful consideration of prior literature, and a thorough introduction. The results indicate that the rate of Pavlovian learning is determined by the ratio of reward rate during cue to the overall reward rate, and that the asymptotic response rate is determined by the reward rate during cue. These findings provide context to many conflicting recent results on this topic and are supported by strong/convincing evidence.

    It is additionally claimed that the parameter that governs the acquisition and asymptote of responding in rats is exactly the same as that which governs the acquisition and asymptote of responding in the Gibbon and Balsam (1981) study that used pigeons as experimental subjects; and that the rates of responding during the inter-trial interval and the cue are proportional to the corresponding reward rates with the same proportionality constant. In both of these respects, there are several points that stand in need of clarification - at present, the strength of the evidence in support of these claims is solid. More generally, there are some points that could clarify aspects of rate estimation theory and, thereby, increase the rating of the paper from important to fundamental. These points range from analytical to conceptual and are presented below.

    ANALYTICAL

    (1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981; Figure 1) and the rats in this study (Figure 3). The evidence for this claim, as presented here, is not as strong as it could be. This is because the measure used for identifying trials to criterion in Figure 1 appears to differ from any of the criteria used in Figure 3, and the exact measure used for identifying trials to criterion influences the interpretation of Figure 3***. To make the claim that the quantitative relationship is one and the same in the Gibbon-Balsam and present datasets, one would need to use the same measure of learning on both datasets and show that the resultant plots are statistically indistinguishable, rather than simply plotting the dots from both data sets and spotlighting their visual similarity. In terms of their visual characteristics, it is worth noting that the plots are in log-log axis and, as such, slight visual changes can mean a big difference in actual numbers. For instance, between Figure 3B and 3C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5 in the real numbers. Thus, in order to support the strong claim that the quantitative relationships obtained in the Gibbon-Balsam and present datasets are identical, a more rigorous approach is needed for the comparisons.

    ***The measure of acquisition in Figure 3A is based on a previously established metric, whereas the measure in Figure 3B employs the relatively novel nDKL measure that is argued to be a better and theoretically based metric. Surprisingly, when r and r2 values are converted to the same metric across analyses, it appears that this new metric (Figure 3B) does well but not as well as the approach in Figure 3A. This raises questions about why a theoretically derived measure might not be performing as well on this analysis, and whether the more effective measure is either more reliable or tapping into some aspect of the processes that underlie acquisition that is not accounted for by the nDKL metric.

    (2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, it is not clear why the data used to test the ITI proportionality came from the last 5 conditioning sessions. What were the decision criteria used to decide on averaging the final 5 sessions as terminal responses for the analyses in Figure 5? Was this based on consistency with previous work, or based on the greatest number of sessions where stable data for all animals could be extracted?

    If the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Figure 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ from pre and post-cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to the cue reward rate instead of the cue reward rate plus the contextual reward rate?

    (3) There is a disconnect between the gradual nature of learning shown in Figures 7 and 8 and the information-theoretic model proposed by the authors. To the extent that we understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to the rate of rewards, why is it changing as animals go from 10% to 90% of peak response? The manuscript would be greatly strengthened if these results were explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, this should be explicitly stated in the manuscript.

    (4) Page 27, Procedure, final sentence: The magazine responding during the ITI is defined as the 20 s period immediately before CS onset. The range of ITI values (Table 1) always starts as low as 15 s in all 14 groups. Even in the case of an ITI on a trial that was exactly 20 s, this would also mean that the start of this period overlaps with the termination of the CS from the previous trial and delivery (and presumably consumption) of a pellet. It should be indicated whether the definition of the ITI period was modified on trials where the preceding ITI was < 20 s, and if any other criteria were used to define the ITI. Were the rats exposed to the reinforcers/pellets in their home cage prior to acquisition?

    (5) For all the analyses, the exact models that were fit and the software used should be provided. For example, it is not necessarily clear to the reader (particularly in the absence of degrees of freedom) that the model discussed in Figure 3 fits on the individual subject data points or the group medians. Similarly, in Figure 6 there is no indication of whether a single regression model was fit to all the plotted data or whether tests of different slopes for each of the conditions were compared. With regards to the statistics in Figure 6, depending on how this was run, it is also a potential problem that the analyses do not correct for the potentially highly correlated multiple measurements from the same subjects, i.e. each rat provides 4 data points which are very unlikely to be independent observations.

    CONCEPTUAL

    (1) We take the point that where traditional theories (e.g., Rescorla-Wagner) and rate estimation theory (RET) both explain some phenomenon, the explanation in terms of RET may be preferred as it will be grounded in aspects of an animal's experience rather than a hypothetical construct. However, like traditional theories, RET does not explain a range of phenomena - notably, those that require some sort of expectancy/representation as part of their explanation. This being said, traditional theories have been incorporated within models that have the representational power to explain a broader array of phenomena, which makes me wonder: Can rate estimation be incorporated in models that have representational power; and, if so, what might this look like? Alternatively, do the authors intend to claim that expectancy and/or representation - which follow from probabilistic theories in the RW mould - are unnecessary for explanations of animal behaviour?***

    ***If the authors choose to reply to these points, they should consider taking advantage of an "Ideas and Speculation" subsection within the Discussion that is supported by eLife [ https://elifesciences.org/inside-elife/e3e52a93/elife-latest-including-ideas-and-speculation-in-elife-papers ].

    (2) The discussion of Rescorla's (1967) and Kamin's (1968) findings needs some elaboration. These findings are already taken to mean that the target CS in each design is not informative about the occurrence of the US - hence, learning about this CS fails. In the case of blocking, we also know that changes in the rate of reinforcement across the shift from stage 1 to stage 2 of the protocol can produce unblocking. Perhaps more interesting from a rate estimation perspective, unblocking can also be achieved in a protocol that maintains the rate of reinforcement while varying the sensory properties of the US (Wagner). How does rate estimation theory account for these findings and/or the demonstrations of trans-reinforcer blocking (Pearce-Ganesan)? Are there other ways that the rate estimation account can be distinguished from traditional explanations of blocking and contingency effects? If so, these would be worth citing in the discussion. More generally, if one is going to highlight seminal findings (such as those by Rescorla and Kamin) that can be explained by rate estimation, it would be appropriate to acknowledge findings that challenge the theory - even if only to note that the theory, in its present form, is not all-encompassing. For example, it appears to me that the theory should not predict one-trial overshadowing or the overtraining reversal effect - both of which are amenable to discussion in terms of rates. I assume that the signature characteristics of latent inhibition and extinction would also pose a challenge to rate estimation theory, just as they pose a challenge to Rescorla-Wagner and other probability-based theories. Is this correct?

  3. Author response:

    ANALYTICAL

    (1) Figure 3 shows that the relationship between learning rate and informativeness for our rats was very similar to that shown with pigeons by Gibbon and Balsam (1981). We used multiple criteria to establish the number of trials to learn in our data, with the goal of demonstrating that the correspondence between the data sets was robust. To establish that they are effectively the same does require using an equivalent decision criterion for our data as was used for Gibbon and Balsam’s data. However, the criterion they used—at least one peck at the response key on at least 3 out of 4 consecutive trials—cannot be sensibly applied to our magazine entry data because rats make magazine entries during the inter-trial interval (whereas pigeons do not peck at the response key in the inter-trial interval). Therefore, evidence for conditioning in our paradigm must involve comparison between the response rate during CS and the baseline response rate. There are two ways one could adapt the Gibbon and Balsam criterion to our data. One way is to use a non-parametric signed rank test for evidence that the CS response rate exceeds the pre-CS response rate, and adopting a statistical criterion equivalent to Gibbon and Balsam’s 3-out-of-4 consecutive trials (p<.3125). The second method estimates the nDkl for the criterion used by Gibbon and Balsam. This could be done by assuming there are no responses in the inter-trial interval and a response probability of at least 0.75 during the CS (their criterion). This would correspond to an nDkl of 2.2 (odds ratio 27:1). The obtained nDkl could then be applied to our data to identify when the distribution of CS response rates has diverged by an equivalent amount from the distribution of pre-CS response rates.

    (2) A single regression line, as shown in Figure 6, is the simplest possible model of the relationship between response rate and reinforcement rate and it explains approximately 80% of the variance in response rate. Fixing the log-log slope at 1 yields the maximally simple model. (This regression is done in the logarithmic domain to satisfy the homoscedasticity assumption.) When transformed into the linear domain, this model assumes a truly scalar relation (linear, intercept at the origin) and assumes the same scale factor and the same scalar variability in response rates for both sets of data (ITI and CS). Our plot supports such a model. Its simplicity is its own motivation (Occam’s razor).

    If regression lines are fitted to the CS and ITI data separately, there is a small increase in explained variance (R2 = 0.82). We leave it to further research to determine whether such a complex model, with 4 parameters, is required. However, we do not think the present data warrant comparing the simplest possible model, with one parameter, to any more complex model for the following reasons:

    · When a brain—or any other machine—maps an observed (input) rate to a rate it produces (output rate), there is always an implicit scalar. In the special case where the produced rate equals the observed rate, the implicit scalar has value 1. Thus, there cannot be a simpler model than the one we propose, which is, in and of itself, interesting.

    · The present case is an intuitively accessible example of why the MDL (Minimum Description Length) approach to model complexity (Barron, Rissanen, & Yu, 1998; Grünwald, Myung, & Pitt, 2005; Rissanen, 1999) can yield a very different conclusion from the conclusion reached using the Bayesian Information Criterion (BIC) approach. The MDL approach measures the complexity of a model when given N data specified with precision of B bits per datum by computing (or approximating) the sum of the maximum-likelihoods of the model’s fits to all possible sets of N data with B precision per datum. The greater the sum over the maximum likelihoods, the more complex the model, that is, the greater its measured wiggle room, it’s capacity to fit data. Recall that von Neuman remarked to Fermi that with 4 parameters he could fit an elephant. His deeper point was that multi-parameter models bring neither insight nor predictive power; they explain only post-hoc, after one has adjusted their parameters in the light of the data. For realistic data sets like ours, the sums of maximum likelihoods are finite but astronomical. However, just as the Sterling approximation allows one to work with astronomical factorials, it has proved possible to develop readily computable approximations to these sums, which can be used to take model complexity into account when comparing models. Proponents of the MDL approach point out that the BIC is inadequate because models with the same number of parameters can have very different amounts of wiggle room. A standard illustration of this point is the contrast between logarithmic model and power-function model. Log regressions must be concave; whereas power function regressions can be concave, linear, or convex—yet they have the same number of parameters (one or two, depending on whether one counts the scale parameter that is always implicit). The MDL approach captures this difference in complexity because it measures wiggle room; the BIC approach does not, because it only counts parameters.

    · In the present case, one is comparing a model with no pivot and no vertical displacement at the boundary between the black dots and the red dots (the 1-parameter unilinear model) to a bilinear model that allows both a change in slope and a vertical displacement for both lines. The 4-parameter model is superior if we use the BIC to take model complexity into account. However, 4-parameter has ludicrously more wiggle room. It will provide excellent fits—high maximum likelihood—to data sets in which the red points have slope > 1, slope 0, or slope < 0 and in which it is also true that the intercept for the red points lies well below or well above the black points (non-overlap in the marginal distribution of the red and black data). The 1-parameter model, on the other hand, will provide terrible fits to all such data (very low maximum likelihoods). Thus, we believe the BIC does not properly capture the immense actual difference in the complexity between the 1-parameter model (unilinear with slope 1) to the 4-parameter model (bilinear with neither the slope nor the intercept fixed in the linear domain).

    · In any event, because the pivot (change in slope between black and red data sets), if any, is small and likewise for the displacement (vertical change), it suffices for now to know that the variance captured by the 1-parameter model is only marginally improved by adding three more parameters. Researchers using the properly corrected measured rate of head poking to measure the rate of reinforcement a subject expects can therefore assume that they have an approximately scalar measure of the subject’s expectation. Given our data, they won’t be far wrong even near the extremes of the values commonly used for rates of reinforcement. That is a major advance in current thinking, with strong implications for formal models of associative learning. It implies that the performance function that maps from the neurobiological realization of the subject’s expectation is not an unknown function. On the contrary, it’s the simplest possible function, the scalar function. That is a powerful constraint on brain-behavior linkage hypotheses, such as the many hypothesized relations between mesolimbic dopamine activity and the expectation that drives responding in Pavlovian conditioning (Berridge, 2012; Jeong et al., 2022; Y. Niv, Daw, Joel, & Dayan, 2007; Y. Niv & Schoenbaum, 2008).

    The data in Figure 6 are taken from the last 5 sessions of training. The exact number of sessions was somewhat arbitrary but was chosen to meet two goals: (1) to capture asymptotic responding, which is why we restricted this to the end of the training, and (2) to obtain a sufficiently large sample of data to estimate reliably each rat’s response rate. We have checked what the data look like using the last 10 sessions, and can confirm it makes very little difference to the results.
    Finally, as noted by the reviews, the relationship between the contextual rate of reinforcement and ITI responding should also be evident if we had measured context responding prior to introducing the CS. However, there was no period in our experiment when rats were given unsignalled reinforcement (such as is done during “magazine training” in some experiments). Therefore, we could not measure responding based on contextual conditioning prior to the introduction of the CS. This is a question for future experiments that use an extended period of magazine training or “poor positive” protocols in which there are reinforcements during the ITIs as well as during the CSs. The learning rate equation has been shown to predict reinforcements to acquisition in the poor-positive case (Balsam, Fairhurst, & Gallistel, 2006).

    (3) One of us (CRG) has earlier suggested that responding appears abruptly when the accumulated evidence that the CS reinforcement rate is greater than the contextual rate exceeds a decision threshold (C.R. Gallistel, Balsam, & Fairhurst, 2004). The new more extensive data require a more nuanced view. Evidence about the manner in which responding changes over the course of training is to some extent dependent on the analytic method used to track those changes. We presented two different approaches. The approach shown in Figures 7 and 8, extending on that developed by Harris (2022), assumes a monotonic increase in response rate and uses the slope of the cumulative response rate to identify when responding exceeds particular milestones (percentiles of the asymptotic response rate). This analysis suggests a steady rise in responding over trials. Within our theoretical model, this might reflect an increase in the animal’s certainty about the CS reinforcement rate with accumulated evidence from each trial. While this method should be able to distinguish between a gradual change and a single abrupt change in responding (Harris, 2022) it may not distinguish between a gradual change and multiple step-like changes in responding and cannot account for decreases in response rate.
    The other analytic method we used relies on the information theoretic measure of divergence, the nDkl (Gallistel & Latham, 2023), to identify each point of change (up or down) in the response record. With that method, we discern three trends. First, the onset tends to be abrupt in that the initial step up is often large (an increase in response rate by 50% or more of the difference between its initial value and its terminal value is common and there are instances where the initial step is to the terminal rate or higher). Second, there is marked within-subject variability in the response rate, characterised by large steps up and down in the parsed response rates following the initial step up, but this variability tends to decrease with further training (there tend to be fewer and smaller steps in both the ITI response rates and the CS response rate as training progresses). Third, the overall trend, seen most clearly when one averages across subjects within groups is to a moderately higher rate of responding later in training than after the initial rise. We think that the first tendency reflects an underlying decision process whose latency is controlled by diminishing uncertainty about the two reinforcement rates and hence about their ratio. We think that decreasing uncertainty about the true values of the estimated rates of reinforcement is also likely to be an important part of the explanation for the second tendency (decreasing within-subject variation in response rates). It is less clear whether diminishing uncertainty can explain the trend toward a somewhat greater difference in the two response rates as conditioning progresses. It is perhaps worth noting that the distribution of the estimates of the informativeness ratio is likely to be heavy tailed and have peculiar properties (as witness, for example, the distribution of the ratio of two gamma distributions with arbitrary shape and scale parameters) but we are unable at this time to propound an explanation of the third trend.

    (4) There is an error in the description provided in the text. The pre-CS period used to measure the ITI responding was 10 s rather than 20 s. There was always at least a 5-s gap between the end of the previous trial and the start of the pre-CS period.

    (5) Details about model fitting will be added in a revision. The question about fitting a single model or multiple models to the data in Figure 6 is addressed in response 2 above. In Figure 6, each rat provides 2 behavioural data points (ITI response rate and CS response rate) and 2 values for reinforcement rate (1/C and 1/T). There is a weak but significant correlation between the ITI and CS response rates (r = 0.28, p < 0.01; log transformed to correct for heteroscedasticity). By design, there is no correlation between the log reinforcement rates (r = 0.06, p = .404).

    CONCEPTUAL

    (1) It is important for the field to realize that the RW model cannot be used to explain the results of Rescorla’s (Rescorla, 1966; Rescorla, 1968, 1969) contingency-not-pairing experiments, despite what was claimed by Rescorla and Wagner (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972) and has subsequently been claimed in many modelling papers and in most textbooks and reviews (Dayan & Niv, 2008; Y. Niv & Montague, 2008). Rescorla programmed reinforcements with a Poisson process. The defining property of a Poisson process is its flat hazard function; the reinforcements were equally likely at every moment in time when the process was running. This makes it impossible to say when non-reinforcements occurred and, a fortiori, to count them. The non-reinforcements are causal events in RW algorithm and subsequent versions of it. Their effects on associative strength are essential to the explanations proffered by these models. Non-reinforcements—failures to occur, updates when reinforcement is set to 0, hence also the lambda parameter—can have causal efficacy only when the successes may be predicted to occur at specified times (during “trials”). When reinforcements are programmed by a Poisson process, there are no such times. Attempts to apply the RW formula to reinforcement learning soon foundered on this problem (Gibbon, 1981; Gibbon, Berryman, & Thompson, 1974; Hallam, Grahame, & Miller, 1992; L.J. Hammond, 1980; L. J. Hammond & Paynter, 1983; Scott & Platt, 1985). The enduring popularity of the delta-rule updating equation in reinforcement learning depends on “big-concept” papers that don’t fit models to real data and discretize time into states while claiming to be real-time models (Y. Niv, 2009; Y. Niv, Daw, & Dayan, 2005).

    The information-theoretic approach to associative learning, which sometimes historically travels as RET (rate estimation theory), is unabashedly and inescapably representational. It assumes a temporal map and arithmetic machinery capable in principle of implementing any implementable computation. In short, it assumes a Turing-complete brain. It assumes that whatever the material basis of memory may be, it must make sense to ask of it how many bits can be stored in a given volume of material. This question is seldom posed in associative models of learning, nor by neurobiologists committed to the hypothesis that the Hebbian synapse is the material basis of memory. Many—including the new Nobelist, Geoffrey Hinton— would agree that the question makes no sense. When you assume that brains learn by rewiring themselves rather than by acquiring and storing information, it makes no sense.

    When a subject learns a rate of reinforcement, it bases its behavior on that expectation, and it alters its behavior when that expectation is disappointed. Subjects also learn probabilities when they are defined. They base some aspects of their behavior on those expectations, making computationally sophisticated use of their representation of the uncertainties (Balci, Freestone, & Gallistel, 2009; Chan & Harris, 2019; J. A. Harris, 2019; J.A. Harris & Andrew, 2017; J. A. Harris & Bouton, 2020; J. A. Harris, Kwok, & Gottlieb, 2019; Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012; Mallea, Schulhof, Gallistel, & Balsam, 2024 in press).

    (2) Rate estimation theory is oblivious to the temporal order in which experience with different predictors occurs. The matrix computation finds the additive solution, if it exists, to the data so far observed, on the assumption that predicted rates have remained the same. This is the stationarity assumption, which is implicit in a rate computation and was made explicit in the formulation of RET (C.R. Gallistel, 1990). When the additive solution does not exist, the RET algorithm treats the compound of two predictors as a third predictor, and computes the additive solution to the 3-predictor problem. Because it is oblivious to the order in which the data have been acquired, it predicts one-trial overshadowing and retroactive blocking and unblocking (C.R. Gallistel, 1990 pp 439 & 452-455).

    The RET algorithm is but one component of the information-theoretic model of associative learning (aka, TATAL, The Analytic Theory of Associative Learning Wilkes & Gallistel, 2016)). It solves the assignment-of-credit problem, not the change-detection problem. Because rates of reinforcement do sometimes change, the stationarity assumption, which is essential to the RET algorithm, must be tested when each new reinforcement occurs and when the interval since the last reinforcement has become longer than would be expected or the number of reinforcements has become significantly fewer than would be expected given the current estimate of the probability of reinforcement (C. R. Gallistel, Krishan, Liu, Miller, & Latham, 2014). In the information-theoretic approach to associative learning, detecting non-stationarity is done by an information-theoretic change-detecting algorithm. The algorithm correctly predicts that omitted reinforcements to extinction will be a constant (C.R. Gallistel, 2024 under review; Gibbon, Farrell, Locurto, Duncan, & Terrace, 1980). To put the prediction another way, unreinforced trials to extinction will increase in proportional to the trials/reinforcement during training (C.R. Gallistel, 2012; Wilkes & Gallistel, 2016). In other words, it predicts the best and most systematic data on the partial reinforcement extinction effect (PREE) known to us. The profound challenge to neo-Hullian delta-rule updating models that is posed by the PREE has been recognized for the better part of a century. To the best of our knowledge, no other formalized model of associative learning has overcome this challenge (Dayan & Niv, 2008; Mellgren, 2012). Explaining extinction algorithmically is straightforward when one adopts an information-theoretic perspective, because computing reinforcement-by-reinforcement the Kullback-Leibler divergence in a sequence of earlier rate (or probability!) estimates from the most recent estimate and multiplying the vector of divergences by the vector of effective sample sizes (C. R. Gallistel & Latham, 2022) detects and localized changes in rates and probabilities of reinforcement (C.R. Gallistel, 2024 under review). The computation presupposes the existence of a temporal map, a time-stamped record of past events. This supposition is strongly resisted by neuroscience-oriented reinforcement-learning modelers, who try to substitute the assumption of decaying eligibility traces.

    The very interesting Pearce-Ganesan findings (Ganesan & Pearce, 1988) are not predicted by RET, but nor do they run counter its predictions. RET has nothing to say about how subjects categorize appetitive reinforcements; nor, at this time, does the information-theoretic approach to an understanding of associative have anything to say about that.

    The same is not true for the Betts, Brandon & Wagner results (Betts, Brandon, & Wagner, 1996). They pretrained a blocking cue that predicted a painful paraorbital shock to one eye of a rabbit. This cue elicited an anticipatory blink in the threatened eye. It also potentiated the startle reflex made to a loud noise in one ear. A new cue that was then introduced, which always occurred in compound with the pretrained blocking cue. In one group, the painful shock continued to be delivered to the same eye as before; in another group, it was delivered to the skin around the other eye. In the group that continued to receive the shock to the same eye, the old cue effectively blocked conditioning of the new cue for both the eyeblink and the potentiated startle response. However, in the group for which the location of the shock changed to the other eye, the old cue did not block conditioning of the eyeblink response to the new cue but did block conditioning of the startle response to the new cue. The information-theoretic analysis of associative learning focusses on the encoding of measurable predictive temporal relationships, rather than on general and, to our mind, vague notions like CS processing and US processing. A painful shock elicits fear in a rabbit no matter where on the body surface it is experienced, because fear is a reaction to a very broad category of dangers, and fear potentiates the startle reflex regardless of the threat that causes fear. Once that prediction of such a threat is encoded; redundant cues will not be encoded that same way because the RET algorithm blocks the encoding of redundant predictions. A painful shock near an eye elicits a blink of the threatened eye as well as the fear that potentiates the startle. An appropriate encoding for the eye blink must specify the location of the threat. RET will attribute prediction of the threat to the new eye to the new cue—and not to the old cue, the pretrained blocker— while continuing to attribute to the old cue the prediction of a fear-causing threat, because the change in location does not alter that prediction. Therefore, the new cue will be encoded as predicting the new location of the threat to the eye, but not as predicting the large category non-specific threats that elicit fear and the potentiation of the startle, because that prediction remains valid. Changing that prediction would violate the stationarity assumption; predictive relations do not change unless the data imply that they must have changed. Unless we have made a slip in our logic, this would seem to explain Betts et al’s (1996) results. It does so with no free parameters, unlike AESOP, which has a notoriously large number of free parameters.

    Balci, F., Freestone, D., & Gallistel, C. R. (2009). Risk assessment in man and mouse. Proceedings of the National Academy of Science U S A, 106(7), 2459-2463. doi:10.1073/pnas.0812709106

    Balsam, P. D., Fairhurst, S., & Gallistel, C. R. (2006). Pavlovian contingencies and temporal information. Journal of Experimental Psychology: Animal Behavior Processes, 32, 284-294.

    Barron, A., Rissanen, J., & Yu, B. (1998). The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory, 44(6), 2743-2760.

    Berridge, K. C. (2012). From prediction error to incentive salience: Mesolimbic computation of reward motivation. European Journal of Neuroscience.

    Betts, S. L., Brandon, S. E., & Wagner, A. R. (1996). Dissociation of the blocking of conditioned eyeblink and conditioned fear following a shift in US locus. Animal Learning and Behavior, 24(4), 459-470.

    Chan, C. K. J., & Harris, J. A. (2019). The partial reinforcement extinction effect: The proportion of trials reinforced during conditioning predicts the number of trials to extinction. Journal of Experimental Psychology: Animal Learning and Cognition, 45(1). doi:http://dx.doi.org/10.1037/xan0000190

    Dayan, P., & Niv, Y. (2008). Reinforcement learning: The good, the bad and the ugly. Current Opinion in Neurobiology, 18(2), 185-196.

    Gallistel, C. R. (1990). The organization of learning. Cambridge, MA: Bradford Books/MIT Press.

    Gallistel, C. R. (2012). Extinction from a rationalist perspective. Behav Processes, 90, 66-88. doi:10.1016/j.beproc.2012.02.008

    Gallistel, C. R. (2024 under review). Reconceptualized associative learning. Perspectives on Behavioral Science (Special Issue for SQAB 2024).

    Gallistel, C. R., Balsam, P. D., & Fairhurst, S. (2004). The learning curve: Implications of a quantitative analysis. Proceedings of the National Academy of Sciences, 101(36), 13124-13131.

    Gallistel, C. R., Krishan, M., Liu, Y., Miller, R. R., & Latham, P. E. (2014). The perception of probability. Psychological Review, 121, 96-123. doi:10.1037/a0035232

    Gallistel, C. R., & Latham, P. E. (2022). Bringing Bayes and Shannon to the Study of Behavioral and Neurobiological Timing. Timing & Time Perception. timing & TIME Perception, 1-61. doi:10.1163/22134468-bja10069

    Ganesan, R., & Pearce, J. M. (1988). Effect of changing the unconditioned stimulus on appetitive blocking. Journal of Experimental Psychology: Animal Behavior Processes, 14, 280-291.

    Gibbon, J. (1981). The contingency problem in autoshaping. In C. M. Locurto, H. S. Terrace, & J. Gibbon (Eds.), Autoshaping and conditioning theory (pp. 285-308). New York: Academic.

    Gibbon, J., & Balsam, P. (1981). Spreading association in time. In C. M. Locurto, H. S. Terrace, & J. Gibbon (Eds.), Autoshaping and conditioning theory (pp. 219-253). New York: Academic Press.

    Gibbon, J., Berryman, R., & Thompson, R. L. (1974). Contingency spaces and measures in classical and instrumental conditioning. Journal of the Experimental Analysis of Behavior, 21(3), 585-605. doi: 10.1901/jeab.1974.21-585

    Gibbon, J., Farrell, L., Locurto, C. M., Duncan, H. J., & Terrace, H. S. (1980). Partial reinforcement in autoshaping with pigeons. Animal Learning and Behavior, 8, 45–59. doi:doi.org/10.3758/BF03209729

    Grünwald, P. D., Myung, I. J., & Pitt, M. A. (2005). Advances in minimum description length: theory and applications. Cambridge, MA: MIT Press.

    Hallam, S. C., Grahame, N. J., & Miller, R. R. (1992). Exploring the edges of Pavlovian contingency space: An assessment of contignency theory and its various metrics. Learning and Motivation, 23, 225-249.

    Hammond, L. J. (1980). The effect of contingency upon the appetitive conditioning of free operant behavior. Journal of the Experimental Analysis of Behavior, 34, 297-304. doi:10.1901/jeab.1980.34-297

    Hammond, L. J., & Paynter, W. E. (1983). Probabilistic contingency theories of animal conditioning: A critical analysis. Learning and Motivation, 14, 527-550. doi:10.1016/0023-9690(83)90031-0

    Harris, J. A. (2019). The importance of trials. Journal of Experimental Psychology: Animal Learning and Cognition, 45(4).

    Harris, J. A. (2022). The learning curve, revisited. Journal of Experimental Psychology: Animal Learning and Cognition, 48, 265-280.

    Harris, J. A., & Andrew, B. J. (2017). Time, Trials and Extinction. Journal of Experimental Psychology: Animal Learning and Cognition, 43(1), 15-29.

    Harris, J. A., & Bouton, M. E. (2020). Pavlovian conditioning under partial reinforcement: The effects of non-reinforced trials versus cumulative CS duration. The Journal of Experimental Psychology: Animal Learning & Cognition, 46, 256-272.

    Harris, J. A., Kwok, D. W. S., & Gottlieb, D. A. (2019). The partial reinforcement extinction effect depends on learning about nonreinforced trials rather than reinforcement rate. Journal of Experimental Psychology: Animal Behavior Learning and Cognition, 45(4). doi:10.1037/xan0000220

    Jeong, H., Taylor, A., Floeder, J. R., Lohmann, M., Mihalas, S., Wu, B., . . . Namboodiri, V. M. K. (2022). Mesolimbic dopamine release conveys causal associations. Science. doi:10.1126/science.abq6740

    Kheifets, A., Freestone, D., & Gallistel, C. R. (2017). Theoretical Implications of Quantitative Properties of Interval Timing and Probability Estimation in Mouse and Rat. Journal of the Experimental Analysis of Behavior, 108(1), 39-72. doi:doi.org/10.1002/jeab.261

    Kheifets, A., & Gallistel, C. R. (2012). Mice take calculated risks. Proceedings of the National Academy of Science, 109, 8776-8779. doi:doi.org/10.1073/pnas.1205131109

    Mallea, J., Schulhof, A., Gallistel, C. R., & Balsam, P. D. (2024 in press). Both probability and rate of reinforcement can affect the acquisition and maintenance of conditioned responses. Journal of Experimental Psychology: Animal Learning and Cognition.

    Mellgren, R. (2012). Partial reinforcement extinction effect. In N. M. Seel (Ed.), Encyclopedia of the Sciences of Learning. Boston, MA: Springer.

    Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53, 139-154.

    Niv, Y., Daw, N. D., & Dayan, P. (2005). How fast to work: response vigor, motivation and tonic dopamine. In Y. Weiss, B. Schölkopf, & J. R. Platt (Eds.), NIPS 18 (pp. 1019–1026). Cambridge, MA: MIT Press.

    Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology, 191(3), 507-520.

    Niv, Y., & Montague, P. R. (2008). Theoretical and empirical studies of learning. In (., eds), pp. , Academic Press. In P. W. e. a. Glimcher (Ed.), Neuroeconomics: Decision-Making and the Brain (pp. 329–349). New York: Academic Press.

    Niv, Y., & Schoenbaum, G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences, 12(7), 265-272. doi:10.1016/j.tics.2008.03.006

    Rescorla, R. A. (1966). Predictability and the number of pairings in Pavlovian fear conditioning. Psychonomic Science, 4, 383-384.

    Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66(1), 1-5. doi:10.1037/h0025984

    Rescorla, R. A. (1969). Conditioned inhibition of fear resulting from negative CS-US contingencies. Journal of Comparative and Physiological Psychology, 67, 504-509.

    Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II (pp. 64-99). New York: Appleton-Century-Crofts.

    Rissanen, J. (1999). Hypothesis selection and testing by the MDL principle. The Computer Journal, 42, 260–269. doi:10.1093/comjnl/42.4.260

    Scott, G. K., & Platt, J. R. (1985). Model of response-reinforcement contingency. Journal of Experimental Psychology: Animal Behavior Processes, 11(2), 152-171.

    Wagner, A. R., & Rescorla, R. A. (1972). Inhibition in Pavlovian conditioning: Appllication of a theory. In R. A. Boakes & S. Halliday (Eds.), Inhibition and learning. New York: Academic.

    Wilkes, J. T., & Gallistel, C. R. (2016). Information Theory, Memory, Prediction, and Timing in Associative Learning (original long version).