19
Interval Timing as an Emergent Learning Property Valentin Dragoi Massachusetts Institute of Technology J. E. R. Staddon, Richard G. Palmer, and Catalin V. Buhusi Duke University Interval timing in operant conditioning is the learned covariation of a temporal dependent measure such as wait time with a temporal independent variable such as fixed-interval duration. The dominant theories of interval timing all incorporate an explicit internal clock, or “pacemaker,” despite its lack of indepen- dent evidence. The authors propose an alternative, pacemaker-free view that demonstrates that temporal discrimination can be explained by using only 2 assumptions: (a) variation and selection of responses through competition between reinforced behavior and all other, elicited, behaviors and (b) modulation of the strength of response competition by the memory for recent reinforcement. The model departs radically from existing timing models: It shows that temporal learning can emerge from a simple dynamic process that lacks a periodic time reference such as a pacemaker. When an activity occurs is as important as what the activity is, yet timing is an ability that is still imperfectly understood. Most studied is circadian timing, the synchronization of activity with the 24-hr diurnal cycle, which is shown by almost every organism. Less understood, and apparently dependent on quite different neurophysiological mechanisms, is interval timing, the learned locating of behavior with respect to a time marker. Mammals, birds, and fish will learn to restrict an operant response for food reinforcement to times when the response has been effective in the past. Here are two examples. On fixed-interval (FI) reinforcement schedules, the operant response is rewarded only after a fixed time since the preceding reward (the time marker). Subjects learn to restrict operant responding to the latter part of the interval. Ani- mals will also learn to space successive responses in time if such spacing is a condition of reinforcement (Ferster & Skinner, 1957— spaced responding; here the response is also the time marker), and in this case the average interresponse time (IRT) is proportional to the required minimum. In these examples, animals learn not to respond during the early part of the to-be-timed interval, and this wait time is usually proportional to the interval (this is termed proportional timing). A second characteristic is Weber’s law (which is also true of most sensory dimensions)—namely, that in the steady state, over a considerable range of times, the variability in temporal measures such as wait time or IRT is proportional to the mean; that is, the coefficient of variation (COV; the standard deviation divided by the mean) is approximately constant. In order to replicate the response-wait pattern observed during temporal learning, an adequate model must be able to generate the appropriate periodic response pattern, for example, a fixed inter- response interval (spaced responding) or a fixed wait time (FI responding). The simplest way to do this is to assume an explicit internal clock (or pacemaker) with fixed period that simply counts time (e.g., Church, Meck, & Gibbon, 1994; Gibbon, 1977, 1991) or switches between different behavioral states in a certain order (e.g., Killeen, 1991; Killeen & Fetterman, 1988). Other ways to generate oscillatory response patterns are to assume the existence of explicit internal oscillators with built-in temporal properties (Church & Broadbent, 1990), or explicit variables tuned to partic- ular (fixed) time intervals (e.g., Grossberg & Schmajuk, 1989; Machado, 1997b). However, none of these models explains how these dedicated temporal units (internal clocks, pacemakers, or oscillators) emerge during learning to exhibit the temporal prop- erties they have been assigned. Are the temporal regularities in behavior a consequence of explicit internal representations of time, or do they stem from processes that have no explicit temporal representation? This question has rarely been asked by temporal- learning researchers. Despite the dominance of theories that hy- pothesize a timing mechanism separate from other learning pro- cesses (e.g., Church et al., 1994; Gibbon, 1977, 1991; Killeen, 1991; Killeen & Fetterman, 1988), it seemed to us important on grounds of parsimony and generality to explore the possibility that some aspects of timing, at least, might be emergent properties of nontemporal learning processes. Previous work by Staddon and Higa (1999) and Staddon (2001) has suggested that certain static and dynamic properties of interval timing can be deduced from the properties of event memory. However, these models lack the ability to identify the relevant time marker; that is, they do not attempt to solve the assignment-of- credit problem that must be solved by any learning process (cf. Staddon & Zhang, 1991). We show here that timing with respect Valentin Dragoi, Department of Brain and Cognitive Sciences, Massa- chusetts Institute of Technology; J. E. R. Staddon and Catalin V. Buhusi, Department of Psychological and Brain Sciences, Duke University; Richard G. Palmer, Department of Physics, Duke University. We thank Armando Machado for insightful comments on the manu- script. We acknowledge support by McDonnell-Pew Foundation and Merck, Inc., fellowships to Valentin Dragoi and by National Institute of Mental Health Grants to J. E. R. Staddon. Correspondence concerning this article should be addressed to Valentin Dragoi, Department of Brain and Cognitive Sciences, Massachusetts In- stitute of Technology, 45 Carleton Street, E25-235, Cambridge, Massachu- setts 02139. E-mail: [email protected] Psychological Review Copyright 2003 by the American Psychological Association, Inc. 2003, Vol. 110, No. 1, 126 –144 0033-295X/03/$12.00 DOI: 10.1037/0033-295X.110.1.126 126

Interval timing as an emergent learning property

Embed Size (px)

Citation preview

Page 1: Interval timing as an emergent learning property

Interval Timing as an Emergent Learning Property

Valentin DragoiMassachusetts Institute of Technology

J. E. R. Staddon, Richard G. Palmer,and Catalin V. Buhusi

Duke University

Interval timing in operant conditioning is the learned covariation of a temporal dependent measure suchas wait time with a temporal independent variable such as fixed-interval duration. The dominant theoriesof interval timing all incorporate an explicit internal clock, or “pacemaker,” despite its lack of indepen-dent evidence. The authors propose an alternative, pacemaker-free view that demonstrates that temporaldiscrimination can be explained by using only 2 assumptions: (a) variation and selection of responsesthrough competition between reinforced behavior and all other, elicited, behaviors and (b) modulation ofthe strength of response competition by the memory for recent reinforcement. The model departsradically from existing timing models: It shows that temporal learning can emerge from a simple dynamicprocess that lacks a periodic time reference such as a pacemaker.

When an activity occurs is as important as what the activity is,yet timing is an ability that is still imperfectly understood. Moststudied is circadian timing, the synchronization of activity with the24-hr diurnal cycle, which is shown by almost every organism.Less understood, and apparently dependent on quite differentneurophysiological mechanisms, is interval timing, the learnedlocating of behavior with respect to a time marker. Mammals,birds, and fish will learn to restrict an operant response for foodreinforcement to times when the response has been effective in thepast. Here are two examples. On fixed-interval (FI) reinforcementschedules, the operant response is rewarded only after a fixed timesince the preceding reward (the time marker). Subjects learn torestrict operant responding to the latter part of the interval. Ani-mals will also learn to space successive responses in time if suchspacing is a condition of reinforcement (Ferster & Skinner, 1957—spaced responding; here the response is also the time marker), andin this case the average interresponse time (IRT) is proportional tothe required minimum.

In these examples, animals learn not to respond during the earlypart of the to-be-timed interval, and this wait time is usuallyproportional to the interval (this is termed proportional timing). Asecond characteristic is Weber’s law (which is also true of mostsensory dimensions)—namely, that in the steady state, over aconsiderable range of times, the variability in temporal measures

such as wait time or IRT is proportional to the mean; that is, thecoefficient of variation (COV; the standard deviation divided bythe mean) is approximately constant.

In order to replicate the response-wait pattern observed duringtemporal learning, an adequate model must be able to generate theappropriate periodic response pattern, for example, a fixed inter-response interval (spaced responding) or a fixed wait time (FIresponding). The simplest way to do this is to assume an explicitinternal clock (or pacemaker) with fixed period that simply countstime (e.g., Church, Meck, & Gibbon, 1994; Gibbon, 1977, 1991)or switches between different behavioral states in a certain order(e.g., Killeen, 1991; Killeen & Fetterman, 1988). Other ways togenerate oscillatory response patterns are to assume the existenceof explicit internal oscillators with built-in temporal properties(Church & Broadbent, 1990), or explicit variables tuned to partic-ular (fixed) time intervals (e.g., Grossberg & Schmajuk, 1989;Machado, 1997b). However, none of these models explains howthese dedicated temporal units (internal clocks, pacemakers, oroscillators) emerge during learning to exhibit the temporal prop-erties they have been assigned. Are the temporal regularities inbehavior a consequence of explicit internal representations of time,or do they stem from processes that have no explicit temporalrepresentation? This question has rarely been asked by temporal-learning researchers. Despite the dominance of theories that hy-pothesize a timing mechanism separate from other learning pro-cesses (e.g., Church et al., 1994; Gibbon, 1977, 1991; Killeen,1991; Killeen & Fetterman, 1988), it seemed to us important ongrounds of parsimony and generality to explore the possibility thatsome aspects of timing, at least, might be emergent properties ofnontemporal learning processes.

Previous work by Staddon and Higa (1999) and Staddon (2001)has suggested that certain static and dynamic properties of intervaltiming can be deduced from the properties of event memory.However, these models lack the ability to identify the relevant timemarker; that is, they do not attempt to solve the assignment-of-credit problem that must be solved by any learning process (cf.Staddon & Zhang, 1991). We show here that timing with respect

Valentin Dragoi, Department of Brain and Cognitive Sciences, Massa-chusetts Institute of Technology; J. E. R. Staddon and Catalin V. Buhusi,Department of Psychological and Brain Sciences, Duke University; RichardG. Palmer, Department of Physics, Duke University.

We thank Armando Machado for insightful comments on the manu-script. We acknowledge support by McDonnell-Pew Foundation andMerck, Inc., fellowships to Valentin Dragoi and by National Institute ofMental Health Grants to J. E. R. Staddon.

Correspondence concerning this article should be addressed to ValentinDragoi, Department of Brain and Cognitive Sciences, Massachusetts In-stitute of Technology, 45 Carleton Street, E25-235, Cambridge, Massachu-setts 02139. E-mail: [email protected]

Psychological Review Copyright 2003 by the American Psychological Association, Inc.2003, Vol. 110, No. 1, 126–144 0033-295X/03/$12.00 DOI: 10.1037/0033-295X.110.1.126

126

Page 2: Interval timing as an emergent learning property

to the appropriate time marker on simple reinforcement schedulessuch as FI and spaced responding (also called differential rein-forcement of low rate; DRL) can in fact be deduced from elemen-tary learning assumptions about arousal and response competition.

Internal Clocks (Pacemakers)

The two most influential theories of interval timing are scalarexpectancy theory (SET; Church et al., 1994; Gibbon, 1977; Treis-man, 1963) and the behavioral theory of timing (BET; Killeen,1991; Killeen & Fetterman, 1988). SET assumes an internal pace-maker whose pulses are accumulated until the occurrence of animportant event, such as reinforcement. The number of accumu-lated pulses is then stored in a reference (or long-term) memory.The model generates responses by comparing the value stored inreference memory with the current accumulator total: When theratio falls below a threshold, responding begins. Organisms ontiming schedules are presumed to learn which values of the inter-nal scale are associated with reward and to respond predominantlyat those times (proportional timing).

BET (Killeen, 1991; Killeen & Fetterman, 1988), anotherpacemaker-based theory, assumes a “clock” consisting of a fixedsequence of states with the transition from one state to the nextdriven by a Poisson pacemaker (for a deterministic version of BETthat adds an associative feature, see also Higa & Staddon, 1997;Machado, 1997a). Each state is associated with different classes ofbehavior, and, the theory claims, these behaviors serve as discrim-inative stimuli that set the occasion for appropriate operant re-sponses (although there is not a 1:1 correspondence between astate and a class of behavior). The added assumption that pace-maker rate varies directly with reinforcement rate allows the modelto handle some experimental results not covered by SET, althoughit has failed some specific tests (see Staddon & Higa, 1999, for areview).

The pacemaker idea is familiar and intuitively appealing. How-ever, it fits behavioral data only at some cost in theoretical sim-plicity and has weak physiological support. A pacemaker–accumulator system implies greater relative timing accuracy atlonger time intervals (Gibbon, 1977; Staddon & Higa, 1999). Ifthere is no error in the accumulator memory for pulses, or if theerror is independent of accumulator value, and if there is pulse-by-pulse variability in the pacemaker rate, then relative error(COV) must be less at longer time intervals: The standard devia-tion of the temporal estimation should increase as the square rootof time, not linearly. But in fact the relative error is roughlyconstant: The standard deviation is proportional to the mean, not tothe square root of the mean (Weber’s law). The linear pulse-accumulator assumption is also problematic, because it implies abiological process that can increase without limit.

These limitations are well-known, and for this reason the pro-ponents of SET have suggested solutions such as a Poisson pace-maker that has a different mean rate across trials, or a randomvariable that multiplies the number of pulses in the accumulatorbefore the storage in long-term memory (scalar variability; Gibbon& Church, 1992). Although these solutions are successful in thesense that they correct these specific errors, they seem to have beenfavored over other possibilities, such as abandoning the pacemakeridea entirely, largely on the basis of curve fits to selected data—and the intuitive plausibility of the pacemaker clock. Other criteria

for the evaluation of theoretical constructs, such as generality,biological plausibility, and parsimony, have played little role.

Theoretical criticism of the pacemaker hypothesis is sometimesanswered by pointing to physiological and pharmacological datathat appear to support it. However, the physiological basis for aninterval-timing pacemaker is equivocal. Although “clock” neu-rons, also known as central pattern generators in motor systems,have been identified in many brain areas (e.g., the suprachiasmaticnuclei [SCN] of the hypothalamus or in the motor cortex), theyseem to be involved mainly in the circadian patterning of activity(Turek, 1994) and endocrine output (Welsh, Logothetis, Meister,& Reppert, 1995), in locomotion (Cahill, Hurd, & Batchelor, 1998;Pearson & Fourtner, 1975), or in heart-rate regulation (Mirmiran etal., 1992).

Most studies of the neuronal mechanisms of interval timinghave used neuropharmacological and lesion techniques to identifythe specific brain areas involved in temporal discrimination. Thesestudies have led researchers to conclude that both the cerebellumand basal ganglia are involved. For instance, research by severalgroups (e.g., Church, 1984; Meck, 1996) has shown that it ispossible to disrupt the animals’ sensitivity to time by affectingdopamine regulation in the basal ganglia (Meck, 1996; Spetch &Treit, 1984). Cerebellar lesions also impair interval timing(Breukelaar & Dalrymple-Alford, 1999; Perrett, Ruiz, & Mauk,1993). The problems are as follows. First, it is by no means clearthat the effects of these interventions are either exclusively or evenpredominantly on timing processes (cf. Cevik, 2000; Odum, Liev-ing, & Schaal, 2002). Second, even if an effect on timing can bedemonstrated, that effect by itself is not sufficient to show thattiming is caused by a pacemaker-driven clock.

There are also technical problems with the neuropharmacolog-ical approach—for example, that it neglects possible effects ofchanging the balance between different neurotransmitter systems.Injections or oral ingestion of specific pharmacological agents mayhave diffuse effects that propagate throughout the brain, modifyingthe global balance between different neurotransmitter systems,with unpredictable effects in many brain areas. However, the mainproblem is that although these studies show that temporal discrim-ination in pigeons, rats, or rabbits can be affected in a variety ofways by lesion and/or pharmacological techniques, they do notdemonstrate the existence of a centralized internal clock for inter-val timing—because they do not really attempt to do so. In moststudies researchers have simply assumed the existence of a pace-maker clock and then used this framework to interpret any timingdisruptions—as clock speeding up or slowing down or partialclock reset (Meck, 1996; Spetch & Treit, 1984; Welzl, Berz, &Battig, 1991) or as errors in long-term memory (Maricq, Roberts,& Church, 1981; Meck, 1996; Meck, Church, Wenk, & Olton,1987; Santi, Weise, & Kuiper, 1995; Shurtleff, Raslear, Genovese,& Simmons, 1992), for example. Hence, the pacemaker–clockassumption is not in any sense tested, much less proved, by thesestudies. Consequently they do not rule out the possibility thatinterval timing is carried out by something other than a pacemaker-driven clock.

Resetting Mechanism

SET and BET are not concerned with how the organism iden-tifies the relevant time marker: the assignment-of-credit problem.

127EMERGENT TIMING

Page 3: Interval timing as an emergent learning property

This problem is related to causal attribution: Which stimulus orresponse constitutes the cause, or predictor, of reward or otherimportant event? Both these lacks are shared by a recent compet-itor for SET and BET, the multiple-time-scale (MTS) theory(Staddon & Higa, 1999). MTS theory dispenses with the pace-maker, but, like SET and BET, is silent on the assignment-of-creditprocess. All three theories retain the assumption that time discrim-ination is driven by comparison between “working” and “refer-ence” temporal memories.

The major timing theories are also for the most part static, ratherthan dynamic. The model discussed by Machado (1997a) and Higaand Staddon (1997), a dynamic version of BET, is a partialexception, as is a recent dynamic neural-network model for clas-sical conditioning timing (Buhusi & Schmajuk, 1999) and a dy-namic version of the MTS model (Staddon, Chelaru, & Higa,2002). However, none of these theories deal with spaced-responding schedules or with many transitions from one procedureto another. The Machado article discusses the credit-assignmentproblem but does not attempt to solve it. The Buhusi and Schmajukapproach is potentially more comprehensive (although the articledeals only with classical conditioning) but also much more com-plex. Like many neural-net models, the model has numerous freeparameters (18) and on the order of 100 state variables. Neither theMachado nor the Buhusi and Schmajuk model provides a simpledynamic and pacemaker-free alternative to SET and BET.

The Timing Theory

We argue in this article that the major facts of interval timingcan be explained without reference to an internal clock, time scale,or explicit comparison process, in a way that incidentally providesa solution to the resetting problem. We demonstrate temporaldiscrimination in a model that has no pacemaker or fixed internalscale for time and has no comparator beyond the familiar winnertake all (WTA) response-competition rule. The model relies on justtwo assumptions: (a) variation and selection of responses throughcompetition between response classes and (b) competition modu-lation by the overall arousal level linked to reinforcement rate. Thefirst assumption is widely accepted. The second assumption, thatthe strength of behavioral competition (which determines the fre-quency with which the system switches between responses) iscontrolled by the overall reinforcement rate in the training context,resembles the BET idea that clock rate depends on reinforcementrate (Killeen, 1991) and has been applied in a different context byGibbon (1995) to explain the dynamics of time matching. Bothassumptions are elements in a model for nontemporal aspects ofoperant conditioning that we have already explored extensively(Dragoi, 1997; Dragoi & Staddon, 1999).

In this article, we implement these two principles, competitionand arousal, as a set of interrelated differential equations. We showfirst how a two-unit competitive network can behave as an oscil-lator with arbitrary period. We then show how the period can bemodulated by overall arousal level. Finally, we show that if arousallevel is suitably linked to reinforcement rate, the system will showtemporal discrimination on standard reinforcement schedules.

There is abundant empirical evidence that three classes of be-havior typically occur during different parts of the to-be-timedinterval on timing tasks (e.g., Staddon, 1977; Staddon & Simmel-hag, 1971; see also Dragoi, 1997; Dragoi & Staddon, 1999; Killeen

& Fetterman, 1988): elicited responses (behaviors that typicallyoccur immediately after reinforcement, on FI schedules, or beforethe last response preceding reinforcement, on spaced-respondingschedules), interim responses (behaviors that occur in the middleof the interreinforcement interval), and the terminal response (thereinforced response, which normally occurs in the final segment ofthe interreinforcement interval). For simplicity, we group togetherelicited and interim responses and label them other (O) responses(or behaviors). Terminal responses (e.g., key pecks by pigeons,lever presses by rats) are treated separately and are labeled Rresponses. These two response classes are the minimum needed fortemporal discrimination.

We implement behavioral competition between O and R re-sponses as a simple two-unit network (see Figure 1A). Associatedwith each unit are two variables, O(t) and AO(t), for the O unit, andR(t) and AR(t), for the R unit. O(t) and R(t) are output variables thatdenote the strengths of the O and R responses at time t. AO(t) andAR(t) are unit activations that are linear functions of O(t � 1) andR(t � 1). All unit variables are thresholded between 0 and 1.

Formally, the two-unit network is described by the followingequations:

AO�t� � wORR�t � 1� (1)

and

AR�t� � wROO�t � 1�, (2)

where wOR and wRO are the connection strengths between behav-iors O and R.

The response that is generated at time t follows a WTA nonlin-ear rule, that is,

if AO � AR then O � 1, elsedO

dt� ��O;

(3)

if AR � AO then R � 1, elsedR

dt� ��R,

where 0 � � � 1 is the fixed decay rate.Thus, the response trace is either set to 1 for the winner unit or

decays at a fixed rate for the loser unit. Notice that this rule meansthat the system retains no memory of how long (for how manysuccessive time units) a winner has been active: Information aboutwinner dwell time is entirely carried by the recencies—traces R(t)and O(t)—of the losers. The winner is represented as a binaryoutput unit, which is 1 if R � O and 0 otherwise.

The dynamics of the two-unit network can be understood for-mally in the following way. Equations 1 and 2 define a switchingline in a unit square: On one side response R dominates; on theother, response O dominates. Because R(t) and O(t) are con-strained to be between 0 and 1, the switching space is just the unitsquare (see Figure 1B), and the switching line has the equation

R � OwRO

wOR, (4)

which is a line through the origin. Figure 1B shows an example ofa typical trajectory, which is associated with an output of the formO3R: 000100010001 and so on.

128 DRAGOI, STADDON, PALMER, AND BUHUSI

Page 4: Interval timing as an emergent learning property

We can now trace the evolution of the two-unit system withrespect to the switching line. For instance, suppose that wOR � 0.2,wRO � 0.15, and � � 0.1. If we assume that we start the systemwith R � 0 and O � 1, then the sequence of O and R values is

indicated in Figure 1C. In the first iteration AR � AO, and thereforeO declines according to the differential equation dO/dt � ��O,which in discrete form is equivalent to O(t � 1) � O(t) � ��O(t)(integration time step is 1). Thus, O(t � 1) becomes (1 � �)O(t) � 0.9 and R(t) � 1. In the next iteration, with the sameconnection weight configuration, AO � AR, and therefore O � 1and R decays to (1 � �) R � 0.9. In the next iteration, AO � AR

is still true, and R declines further to (1 � �)2 R � 0.81 and O � 1again. In this way, R declines until (1 � �)N � switching value,which is (wRO)/(wOR) (see Equation 4). The general condition forswitching is therefore

N � Pln�wRO /wOR�

ln�1 � ��P , (5)

where is the integer part.Figure 2A illustrates the time evolution of AO and AR following

AR � 0.14 and AO � 0.2 (Figure 1C). We label the winner asoutput in Figure 1A. Figure 2, B and C, shows that, according tothe WTA response rule, the winner is 1 when R � O and 0 whenO � R.

Thus, Equation 5 and Figures 1–2 show that a two-unit systemthat relies on response competition (WTA rule) is able to generatesequences of the form ONR (N � 3 in the case discussed above),which is sufficient for it to act as a timer. However, because modelparameters are fixed, N is fixed, and therefore the model behaveslike a simple oscillator (see Figure 2C and the Output column inFigure 1B) that responds whenever R � O (the strength of terminalbehaviors is greater than that of “other” behaviors) and waitswhenever O � R (the strength of “other” behaviors is greater thanthat of terminal behaviors).

Adaptive Timing

Behavioral timing is adaptive, however. The terminal responsetypically shifts with training on any periodic schedule so that itoccurs with highest probability in the vicinity of reinforcement.There are two ways to make the period of the simple oscillator(i.e., N � 1) sensitive to temporal regularities in the environment:Either adjust the connection strengths between the two responseunits, O and R (i.e., the two w variables), or adjust the factor �,which governs the decay rate for both response units.

Our timing model is the first in which resetting by the timemarker is an automatic consequence of the timing process, ratherthan an ad hoc add-on. We now show in two steps how this works.First, we convert the simple oscillator in Figure 1A to an adaptivetimer. And second, we show that this timer resets on reinforcementas an emergent property.

The simple oscillator can be converted into an adaptive timer byallowing short-term memory parameter � to vary between 0 and 1(cf. Equation 3). Larger � means faster forgetting, and, because ofthe nonlinear WTA response rule, faster forgetting means a higherfrequency of transitions between R and O states, which means ahigher switching rate between the two responses. Also (cf. Equa-tion 5), larger � (values close to 1) imply a larger ln(1 � �)(which is an increasing function for � between 0 and 1), whichdetermines a smaller value for N (equal to the period of theadaptive timer minus 1). In short, � determines the period of theoscillator, the number of O responses that intervene betweensuccessive R responses.

Figure 1. Two-unit network. O and R are responses (“other” and terminalresponses, respectively). A: AO(t) and AR(t) are unit activations that are linearfunctions of O(t � 1) and R(t � 1), where t � time. wOR and wRO are theconnection strengths between behaviors O and R. Output is 1 if R � O and 0otherwise. WTA � winner take all. B: Typical model behavior when R � 0 andO � 1. C: Dynamic trajectory in the O–R plane for the situation described in B.

129EMERGENT TIMING

Page 5: Interval timing as an emergent learning property

How much should the decay rate, �, increase or decrease, andwhen? Because temporal discrimination is under the control ofreinforcement, it is reasonable to assume that some form of mem-ory for reinforcement should control the strength of behavioralcompetition. Thus, our second principle is that the overall arousallevel, integrated over the recent past (short-term memory forreinforcement), controls the switching rate of the two types ofresponses (R and O) by modulating the response trace decay rates(i.e., by modulating �).

Figure 3 illustrates that the terminal response (R) competes withthe “other” behaviors (O) in a WTA fashion, and then, dependingon the response pattern that is generated (the output), the obtainedreinforcements are stored at the appropriate rate in a short-termmemory. Next, the response decay rate is controlled by the short-

term memory for reinforcement (STMRF), which modulates thestrength of interresponse competition.

The short-term memory reinforcement trace is described by thefollowing equation:

dSTMRF

dt� �1RF � �2STMRF, (6)

where �1 and �2 control the increase and decay rate of reinforce-ment short-term memory; RF is the reinforcement signal (0 or 1).Equation 6 ensures that STMRF is directly related to reinforcementintensity and duration, and, on average, it is proportional to rein-forcement frequency. (In the simulations, we assume that eachreinforcement has a fixed intensity and duration.)

The control of response decay rate (�) by the reinforcementshort-term memory is given by

d�

dt� ��STMRF � ��, (7)

where � is a fixed parameter that controls the rate at which eachreinforcement changes �. Equation 7 ensures that at steady state �stabilizes to a value proportional to the short-term memory forreinforcement. When STMRF � 0, � decays to zero with � rateconstant. We now rewrite the response decay term from Equation 3by introducing a proportionality factor between � and the responsevariable, that is,

dR

dt� ���R

and

Figure 2. Model dynamics. O and R are responses (“other” and terminalresponses, respectively). A: Dynamics of response activations (AO and AR).B: Response dynamics. C: The output temporal pattern is equivalent to thesequence OOOROOOROOOR . . . .

Figure 3. Model schematic representation. Terminal (R) and “other” (O)responses compete in a winner-take-all (WTA) fashion; depending on thepattern of responding (the output), the reinforcement is obtained at a certainrate and then stored in a short-term memory (STM); the arousal level(reinforcement STM) controls the strength of behavioral competition bymodulating the decay rates. AO and AR are response activations; WOR andWRO are the connection strengths between behaviors O and R.

130 DRAGOI, STADDON, PALMER, AND BUHUSI

Page 6: Interval timing as an emergent learning property

dO

dt� ���O , (8)

where � is a proportionality factor (see also Equation 3).Note that response competition and modulation of the strength

of competition by recent memory for reinforcement combine tomake an effective resetting mechanism that suppresses respondingimmediately after reinforcement. Suppose that a response (R)occurs and produces a reinforcement. If R � 1, then by Equation 1,AO increases to become greater than AR. According to the WTAresponse rule, it follows then that O � 1, and thus the response isreset. The second principle of our model (modulation of compe-tition strength by the short-term memory for reinforcement) pre-vents the occurrence of subsequent R responses that can resumeresponding before a certain waiting time has elapsed (except forresponses due to the noise—see the next section). Indeed, Equa-tion 5 ensures that O responses will be emitted continuously for aninterval correlated with the short-term memory for reinforcement(which directly controls the response decay rate �).

Response Variation

In this section we investigate the stability of the model-schedulesystem over the course of prolonged training. Because experimen-tal evidence (e.g., Staddon, 1965) suggests that response rates arerelatively high during early training, we chose a large initial valuefor � (� � 1). This value ensures frequent transitions betweenresponse classes O and R (cf. Equation 3). Because the initial value

for the reinforcement short-term memory is 0, Equation 7 ensuresthat � decays until it reaches asymptote. However, it follows fromEquation 5 that each small decrease of � would cause a smallincrease of N (equivalent to the IRT). However, if IRTs increase,then the reinforcement frequency would decrease, causing thedecrease of the STMRF (Equation 6). It then follows (Equation 7)that � would further decrease, closing a positive feedback loop thatincreases IRTs, thus decreasing reinforcement probability. Oneway to prevent this situation is by increasing response variabil-ity—for instance, by using independent noise coefficients, �O(t)and �R(t), added to the two response activations, AO and AR

(Equations 1 and 2):

AO�t� � wORR�t � 1� ��O�t� (9)

and

AR�t� � wROO�t � 1� ��R�t�, (10)

where �O(t) and �R(t) are drawn independently at each t from anormal distribution between 0 and 1 and � is noise amplitude (seeFigure 4A).

Why do the additive noise coefficients have a stabilizing effect?According to Equations 9 and 10, increasing response variabilityyields a higher probability that a response will occur due to noisealone, thus forcing the IRTs to decrease and counteract the con-sequences of a decrease in �.

Figure 4, B–D, presents simulation results from a temporallearning paradigm in which the reinforcement is given if the IRTs

Figure 4. Response variation has a stabilizing effect. A: Response variation is represented by the additive noiseparameter �. B: Decay rate dynamics in two situations: � � 0.05 (simple line) and � � 0 (thick line). C:Variations in the short-term memory for reinforcement (STMRF) as a function of noise amplitude. D: Interre-sponse time (IRT) variations as a function of noise amplitude. O and R are responses (“other” and terminalresponses, respectively); AO and AR are response activations; wOR and wRO are the connection strengths betweenbehaviors O and R; �O and �R are independent noise coefficients. WTA � winner take all.

131EMERGENT TIMING

Page 7: Interval timing as an emergent learning property

are longer than or equal to 10 s (DRL 10—the properties of thistype of responding are examined in more detail in the SpacedResponding section). The variability in the response pattern wascontrolled by parameter �. We show the time evolution of keymodel variables, that is, the response decay rate (�), the short-termmemory for reinforcement, and the IRT, choosing Time 0 as thetime when � has reached the asymptote in the � � 0.05 condition.It is clear from Figure 4, B–D, that despite the fluctuations inSTMRF and IRTs induced by response variability, the responsedecay rate stabilizes at a close-to-optimal value (IRTs vary around10 s). However, if � � 0, all the model variables decay to zero,which is equivalent to an exponential increase in IRTs until STMRF

stabilizes at 0.What is the relationship between IRT and the amplitude of the

response variation parameter (��1)? Figure 5A shows that, for

DRL 10, when the noise increases, the mean IRT decreases, whichcauses a decrease in reinforcement frequency. For low levels ofnoise (see Figure 5A, inset) the response frequency decreases untilreinforcement is suppressed. How does additive noise in the re-sponse rule (Equations 9–10) affect the correlation between themean IRT and the reinforcement schedule requirement (DRL N)?Although at large DRL values the model generates IRTs that, onaverage, are below the schedule requirement, Figure 5B showsthat, for � � 0.05, there is clearly a positive correlation betweenN and mean IRT. This feature of the model is quite surprising: Weshow here that adding independent noise coefficients to the re-sponse learning rule increases response variability, which then actsas a feedback mechanism to stabilize the response rate, thusarguing for a stabilizing role of behavioral variability (cf.Machado, 1997a).

Figure 5. The relationship between response variation and mean interresponse time (IRT). A: Mean IRT as afunction of noise amplitude. IRT values are calculated for a differential reinforcement of low rate (DRL) 10-ssituation. Inset: IRT increases exponentially with decreases in the noise level. B: At high DRL values the modelgenerates IRTs that are below the schedule requirement. All simulations are generated with additive noiseparameter � � 0.05.

132 DRAGOI, STADDON, PALMER, AND BUHUSI

Page 8: Interval timing as an emergent learning property

We next use our adaptive model to address several well-knowntiming phenomena: spaced responding (DRL), interval timing (FI),rapid temporal control, and timing two intervals (mixed FI-FI). Wethen describe how temporal regulation develops in real time bycharacterizing the dynamics of acquisition and extinction on bothDRL and FI schedules.

Model Implementation

The output of all the units described in this article (Equations1–10) varies between 0 and 1. The computer simulations useddiscrete time steps set equal to 1 s of real time. The reinforcementinput is of fixed magnitude, set equal to 1 throughout. We used thefollowing parameter values for all simulations: initial values—� � 1 and STMRF � 0 (it might be argued that because animals aretypically exposed to continuous reinforcement before the experi-mental training begins, the initial value of STMRF should be higherthan 0, but model performance is in fact insensitive to initial valuesof STMRF); connection strengths wOR � 0.2, wRO � 0.15 (cf.Equations 1 and 2); this asymmetry ensures a bias toward Oresponses. This assumption is justified by the fact that the reper-toire of interim, or O, responses is richer than that of terminal, orR, responses, and therefore the probability of O responses shouldbe higher, �1 � 0.3, �2 � 0.1, � � 0.0002, � � 0.097. Parametervalues were chosen to ensure that steady state performance isreached after 400–500 reinforcements in both DRL and FI sched-ule training (the parameter that controls the pace of learning is �,the rate constant of � increase and decay). To generate accuratequantitative predictions, we set the response variability parameter(�) to 0.05 for DRL schedules and to 0.03 for FI schedules (themodel can generate qualitatively similar predictions for a relativelylarge range of � values—but see Figure 5A).

Spaced Responding

DRL schedules (spaced responding) selectively reinforce IRTslonger than a specified value. During the initial phase of DRLtraining, animals tend to respond at a high rate. After stableperformance is reached, the rate of responding on DRL schedulesis directly related to the maximum reinforcement rate (1/minimumreinforced IRT) specified by the schedule and the IRT distributionpeaks at or just before the minimum required IRT (e.g., Kelleher,Fry, & Cook, 1959; Staddon, 1965; Wilson & Keller, 1953). TheDRL procedure generates a peaked distribution of IRTs over alimited range of DRL values (�30 s for pigeons with food rein-forcement and the peck response), but there is often a peak at theshortest recorded IRT interval due to “bursts” of very short IRTs.Spaced responding has not been addressed by current intervaltiming models.

Figure 6 shows the emergence of temporal discrimination dur-ing exposure to a schedule that reinforced responses if they fol-lowed either a response or a reinforcement by 10 s or more (DRL10). The figure illustrates typical model behavior, both early andlate in training. We began the simulations with a high decay rate,� � 1, that is, very brief memory for previous responses, so thatthe switching rate of the two types of responses, O and R, is veryhigh. We began with high � for consistency with experimentalevidence (e.g., Staddon, 1965) that response rates are high duringinitial training—see Figure 6A. After the first reinforcement the

decay rate of the two response units, �, starts to decrease (seeEquation 7) and diminishes the response switching rate. Thestrength of behavioral competition increases, because decreasing �means that the trace of the winner response decays more slowly, sothat it inhibits the competing response for longer. Figure 6A alsoshows that as training progresses, terminal responses become morewidely spaced, and this leads to an increased frequency of rein-forcement (see Figure 6C) because spaced responding is actually acondition of reinforcement. If training continues further, and rein-forcements continue to accumulate in short-term memory, theresponse trace decay parameter, �, stabilizes to a value (see Figure6E) that ensures relatively constant response rates (in our simula-tions the steady state for a DRL 10 s is reached after approximately500 reinforcements; see also Figures 6B and D, which illustrate thepattern of spaced responses and reinforcements after 15,500 s oftraining).

Figure 6F shows that the average IRT, measured as reinforce-ments accumulate, stabilizes to a value close to 10, which is theDRL schedule requirement. Consistent with Sidman (1956) andStaddon (1965), we also obtain very short IRTs (bursts); thefrequency of these short IRTs, much more frequent during earlytraining, decreases with continued exposure to the timing situation(Staddon, 1965). These shorter IRTs are the main source of vari-ability in the average IRT (see Figure 6F).

Figure 7 shows IRT distributions for DRL 10, 20, 30, 40, 50,and 60 (see Figure 7A) and the average IRT value for the sameDRL conditions, plotted against the DRL value and compared withexperimental data (see Figure 7B). In Figure 7A, the IRTs areplotted in bins that are 20% of the prevailing DRL value (cf.Staddon, 1965). The points on each graph represent the mean IRTsafter stabilization under each condition. Only IRTs in cells 6through 10 were reinforced. The IRT distribution is a measure oftemporal discrimination. When responding is random with respectto time, the distribution function is exponential; any peak of thedistribution function indicates some form of temporal discrimina-tion. According to experimental data (Staddon, 1965; Wilson &Keller, 1953), Figure 7A indicates good temporal discriminationfor DRL values of 20 s or less and poorer temporal discriminationdeveloped under DRL values of 30 s or more. As the DRL valuesincrease, there is also the tendency of the IRT distributions to shifttoward a point before the minimum reinforced interresponse in-terval (Staddon, 1965).

The average IRT values after the steady state is reached indifferent DRL schedules are shown in Figure 7B. For DRL valuesless than 30 s, the simulation shows functions close to the match-ing line, similar to pigeon experimental data (Staddon, 1965, e.g.,Pigeon 422). For DRL values of 30 s or more, the average IRT isincreasingly below the DRL requirement, in partial agreement withdata. For both rats and pigeons, there is an upper limit on thesustainable DRL value, but the limit is longer for rats than forpigeons and longer for pigeons pressing a treadle than pecking akey (Richelle & Lejeune, 1980). Moreover, beyond the limit,behavior changes drastically: Response rate increases, and thepeaked IRT distribution shifts toward the random, exponentialform (cf. Staddon, 1965, Figures 1 and 4). In contrast to this drasticchange, the model shows a gradual increase in the disparity be-tween average IRT and DRL value as the DRL value increases.Thus, the model implies impaired temporal discrimination forDRL schedules longer than 20 s, for two reasons. First, there are

133EMERGENT TIMING

Page 9: Interval timing as an emergent learning property

limitations imposed by the short-term memory for reinforcement,which has small values for really long interreinforcement intervals(associated with long IRTs). Second, response variability is alsoresponsible for the failure of both the model and organism to tracklong DRL schedules. In these schedules, because the reinforce-ment frequency is low, the value of the model’s key variablesdecreases relative to the noise coefficients, such that many re-sponses will occur because of the noise itself. Therefore, thetracking of the DRL requirement will gradually deteriorate (cf.Figure 5B).

The model makes several other predictions about discriminationperformance on DRL schedules. When the reinforcement require-ments are changed such that DRL T1 (T1 is the temporal require-ment of the reinforcement schedule) shifts to DRL T2, the appro-priate change also occurs in the IRT distribution, which is able totrack the new reinforcement conditions (Staddon, 1965). Figure7A also shows that at asymptote responding occurs most fre-

quently close to the lower bound of possible IRTs, thus decreasingthe range of reinforced intervals when the DRL value is increased(cf. Malott & Cumming, 1964). This behavior reflects the Weber-law property of timing. The model also correctly predicts that,after sufficient training, IRTs increase during extinction. The rea-son is that if RF � 0, short-term memory for reinforcement decaysto 0 (cf. Equation 6), and thus the response-trace decay rate, �,gradually declines (cf. Equation 7), so increasing the IRT.

Interval Schedules

The prototype of temporal control is steady state performanceon FI schedules of reinforcement. On FI, reinforcement is givenfor the first response at least N s after the previous reinforcer (FIN). After sufficient FI training, animals develop a stable responsepattern (FI scallop) in which, after a waiting time proportional tothe interval length, responding increases as the time for reinforce-

Figure 6. Illustration of model dynamics during exposure to a differential reinforcement of low rate 10-sschedule (early and late training). A: Response sequence during early training (high response rates). B: Responsesequence during late training (spaced responding). C: Reinforcement sequence during early training (lowreinforcement rates). D: Reinforcement sequence during late training (high reinforcement rates). E: The responsetrace decay rate (�) stabilizes after about 500 reinforcements to a value that ensures relatively fixed responserates. F: Average interresponse time measured over trials.

134 DRAGOI, STADDON, PALMER, AND BUHUSI

Page 10: Interval timing as an emergent learning property

ment delivery approaches (Ferster & Skinner, 1957). In the fol-lowing sections we present simulations of the model’s exposure todifferent FI schedules and evaluate the timing performance bycumulating responses generated during specific segments of theinterreinforcement interval. Each curve that we show representsresponse rates averaged across all interreinforcement intervals(trials) at a given stage of training for a given FI. This averagingis necessary because the slow dynamics of alpha (Equation 7)allows a maximum of two responses to be generated during eachinterreinforcement interval. However, responses are not emitted atfixed times. Indeed, the response variation parameter, �, ensurestrial-by-trial variations of the time when responses are generated,with a peak in the response time distribution toward the end of thetrial. In the next sections we show the model predictions byaveraging responses across trials, and then we show how the modelgenerates temporal response patterns in individual trials by assum-ing a population of systems like the one in Figure 3 operating inparallel.

Emergence of timing behavior. Characterizing the transientphases of learning is one of the most important objectives of anadaptive model. Consequently, we have examined our model’sbehavior during early exposure to an FI schedule. Figure 8 showsthe comparison of representative data and model response profilesacross a 120-s FI schedule during different learning phases: thesecond, fourth, and sixth sessions (data—Figure 8A), and the first,fourth, and sixth sessions (model—Figure 8B). Each model ses-sion consists of 60 reinforced trials (interreinforcement intervals);cumulative responses are calculated by recording the time of eachresponse within a trial (120 s) and then cumulating responses forall trials in the session. Figure 8B shows that as trainingprogresses, the waiting time increases until it reaches about halfthe length of the interreinforcement interval. Because the simula-tions start with a high response decay rate, � � 1, in the firstsession the response frequency is maximal and therefore the cu-mulative histogram shows a linear profile, in agreement withFerster and Skinner’s (1957) data. As learning progresses, � de-creases to diminish the response frequency, and this conditionensures that the first response within a trial is generated at least Ts since the onset of that trial (in Session 6, T is almost 1/2 thelength of the trial). The pattern of responding generated by themodel, that is, a linear cumulative response histogram, and then, inSessions 4 and 6, a transition to a well-marked scallop (see Figure8A), is representative of the transition from continuous reinforce-ment to an FI schedule. This scenario agrees with that reported byFerster and Skinner.

Assignment of credit. We next explain how the model solvesthe assignment-of-credit problem in temporal learning by showinghow time begins to control behavior in FI schedules. Figure 9shows how training ensures that the postreinforcement responsebegins at increasingly later times (wait time increases). Supposethat we start the simulations with a high decay rate parameter thatis associated with high response rates (� � 1; cf. Equation 3) orshort wait times (e.g., 2–3 s). As training progresses, the short-termmemory for reinforcement increases in strength (Equation 6) andthus � begins to decay (Equation 7 and Figure 9A). However, asEquation 5 shows, N, which in the context of FI schedules repre-sents the average response time since the last reinforcement, in-creases with decaying �, and thus the wait time gradually increases(see Figure 9B) to generate the response pattern shown in Figure8B. Therefore, the operant R response controlled by the reinforce-ment through the decay rate parameter � begins to show evidenceof temporal discrimination.

Extinction. Figure 10 shows simulation results for extinctionafter exposure to an FI 30 schedule, in which the average IRT (theinverse of response rate) gradually increases during extinction(which occurs at Time 0 in Figure 10). We explain this result(Clark, 1964; Ferster & Skinner, 1957) as follows: When thereinforcement is suppressed (RF � 0 in Equation 6), the short-termmemory for reinforcement starts to decay, discharging the re-sponse decay rate, �, to zero. When � decreases, the wait timeshould increase, which is equivalent to increasing the IRTs. Thus,the response switching rate would decrease to ensure that R re-sponses are no longer emitted. However, although extinction in-variably causes response suppression, response rate in extinctiondoes not decrease immediately following the last reinforcer. Ac-cording to Figure 10, response rate decreases slowly for approxi-mately 1,000 responses, and, after the initial slow decay stage, it

Figure 7. A: Interresponse time (IRT) distributions for the differentialreinforcement of low rate (DRL) 10-, 20-, 30-, 40-, 50-, and 60-s condi-tions. The size of the class interval in each case is one fifth of the DRLvalue. Each point is the arithmetic mean of the IRTs recorded during thelast 5,000 time units. B: Average IRT values for the DRL 10-, 20-, 30-, 40-,50-, and 60-s conditions in comparison with the matching function andwith experimental data (Staddon, 1965, Subject 422).

135EMERGENT TIMING

Page 11: Interval timing as an emergent learning property

decreases at a higher rate until responding is totally abolished. Thisexponential decay of the response rate after extinction followsFerster and Skinner’s scenario according to which response rateduring early extinction is maintained for some time and decreasesonly slowly.

The explanation for the profile shown in Figure 10 follows fromEquations 6 and 7: During extinction the reinforcement is madezero, and thus the STMRF decays exponentially according to

STMRF�t� � STMRF�0�e��t, (11)

where t � 0 represents the time when extinction begins.In this situation, the response decay rate � will have the solution

(see Equation 7)

��t� � ��0� ke��t � ke��2t, (12)

where k � {[�STMRF(0)]/(�2 � �)}. It then follows (Equations 5and 8) that the IRT at Time t is given by

N � Pln�wRO /wOR�

ln�1 � ���P . (13)

Figure 8. Development of typical fixed-interval (FI) performance. A: Cumulative records of 1 pigeon duringthe second (A), fourth (B), and sixth (C) session of a 120-s FI schedule. From Schedules of Reinforcement (pp.137 and 144) by C. Ferster and B. F. Skinner, 1957, New York: Appleton-Century-Crofts. Copyright 1957 byAppleton-Century-Crofts. Adapted with permission. B: Model cumulative records during the first, fourth, andsixth session of a 120-s FI schedule.

136 DRAGOI, STADDON, PALMER, AND BUHUSI

Page 12: Interval timing as an emergent learning property

Because �2 �� �, then �(t) can be approximated to [�(0) � k]e��t. Because the product ��(t) is small, we can approximateln[1 � ��(t)] to ���(t). Therefore, the IRT in extinction can beapproximated by N(t) � c e�t, where c � {ln [wOR/wRO]}/{�[�(0) � k]}, that is, during extinction the IRTs increase exponen-tially with time (cf. Ferster & Skinner, 1957).

We further investigated the dynamics of extinction by looking athow the model characterizes different phases of extinction follow-ing FI training. In the experiment by Machado and Cevik (1998),pigeons were exposed to one of two FI schedules (FI 40 s and FI80 s) for many sessions. After acquisition performance had stabi-lized, during the second phase of the experiment, each sessionstarted with reinforced trials and ended with extinction. Figure 11shows the normalized response rate immediately after the last foodtrial and averaged over the first five (early) sessions and last five(late) sessions of extinction for the two FI schedules. In agreementwith the experimental procedures, we exposed the model to 15extinction sessions (50 trials/session) and we set the duration ofeach extinction trial to the duration of the FI trial (i.e., either 40 or80 s). Since there was no pause between model extinction sessions,we slightly deviated from experimental procedures and startedeach extinction session without the reinforced trials at the begin-ning of each extinction session. Figure 11 shows that, in agreementwith experimental data and confirming the predictions in Fig-ure 10, response rate in the early extinction sessions (calculated asthe inverse of the IRT, and estimated on a response-by-response

basis) is roughly similar for both FI schedules (we calculated theaverage response rate by binning the FI intervals into 40-s or 80-ssubintervals). This is due to the rapid decay of the short-termmemory for reinforcement that makes responding during extinc-tion insensitive to the training history. During the late extinctionsessions, however, response rate decreased faster than in the firstsessions. The slope ratio of the regression lines for the late–earlyphases of extinction are as follows: ratio 1.2 for FI 40 s andratio 1.44 for FI 80 s. These simulation results confirm our math-ematical analysis of extinction dynamics, that is, response ratedecreases only slightly during the early phases of extinction, andthen it decreases at a higher rate to approach exponential increaseof IRTs with time.

FI timing. We show first how the model predicts the temporalpattern of responses in individual trials for basic FI schedules. Theanalysis in Figure 5A indicates that there is a relationship betweenthe value of the response variance parameter, �, and the timing ofresponses, that is, increasing � leads to a decrease in the time whena response is generated. This suggests that considering a popula-tion of units such as those described in Figure 3 by Equations 1–7,where each unit has a different � value and generates independentresponses, it is possible to sum responses that occur at differenttimes to obtain the temporal response pattern described in individ-ual trials (Ferster & Skinner, 1957). Figure 12A shows that rein-forcement, which is common to all the units, controls the responsedecay parameter (similar to all the units) to generate responses R1,R2, . . . , RN at times controlled by the response variation parame-ters �1, �2, . . . , �N. Because we have already demonstrated thateach response unit has a tendency to generate responses toward theend of the interval, by summing responses from all units we canget a pattern that resembles the sigmoidal response curves inindividual trials. And indeed, Figure 12B shows that, for twodifferent interval schedules, FI 30 s and FI 60 s, cumulative-response histograms in successive 1-s segments of the intervalgenerally follow a monotonically increasing curve. For these sim-ulations we used � parameters in the range 0.02–0.06 (i.e., in thequasi-linear regime; cf. Figure 5A, inset), and we collected allresponses generated in Trial 500 for each FI schedule (the resultshold for responses collected in any other trial during the final stageof FI acquisition, i.e., after performance has stabilized).

Figure 10. Extinction—average interresponse time (IRT) as a function ofnumber of responses since the onset of extinction (Time 0).

Figure 9. Assignment of credit. A: Decay rate (�) as a function ofresponse number. B: Wait time gradually increases during fixed-intervaltraining.

137EMERGENT TIMING

Page 13: Interval timing as an emergent learning property

We next use our model to predict average performance levelsfor FI schedules (see Figure 13). Figure 13A represents the totalnumber of responses during each 4-s portion of the interval be-tween successive reinforcements, calculated after the steady statewas reached. We consider all responses for the first 300 reinforce-ments, for each one of FI 32, FI 64, FI 128, and FI 256 schedules.The model is consistent with experimental data (Schneider, 1969)in predicting that, on average, the animal pauses immediately afterthe reinforcement and then, after about two thirds of the interval,the average response rate follows a monotonically increasing curve(see Figure 13A). One interesting feature of this response patternis shown in Figure 13B, that is, the wait time is proportional to thelength of the interval (we measure wait time as the intervalsegment in which the response rate is at most 5% from the terminalresponse rate). This behavior is a consequence of the dynamics ofthe decay parameter, �, that adjusts continuously during learning.According to Equations 5 and 7, higher reinforcement rates (low

FIs) ensure higher values of � (see Figure 13D), which, accordingto Equation 5, produce shorter wait times.

After the first reinforcement is obtained, the decay rate of thetwo response units, �, starts decreasing (see Equation 7), reducingthe response switching rate (� modulates the strength of behavioralcompetition—see also the explanations for spaced responding inthe Spaced Responding section). As the reinforcement schedule ismade richer (e.g., FI 32 in Figure 13A), the frequency of rein-forcement increases and � stabilizes at larger values (see Equa-tion 7 and Figure 13D). However, because � controls the responseswitching rate, smaller decay rates ensure that responses are emit-ted less frequently and therefore the probability of long O re-sponses (“other” responses or wait time) increases as the reinforce-ment frequency decreases. Therefore, as Figure 13, A and D,

Figure 11. Normalized response rate during the early and late phases ofextinction for fixed-interval (FI) 40-s (top) and FI 80-s (bottom) acquisi-tion. Data points were adapted from the experiment by Machado andCevik, 1998 (Figure 5) and then normalized for each schedule. Modelresponse rates were averaged for the first three (early) extinction sessionsand for the last three (late) extinction sessions. For each set of lines wecalculated the slope of the linear regression to compare the time course ofextinction for early and late sessions.

Figure 12. Temporal response patterns in individual trials. A: Parallelresponse unit model. Each box represents a unit similar to that described inFigure 3. Reinforcement is common to all response units. Responses R1,R2, . . . , RN are generated at times controlled by the response variationparameters �1, �2, . . . , �N. B-C: Cumulative response curves in individualtrials shown in successive 1-s segments of the interval for different fixed-interval (FI) schedules. In these simulations we used 40 response units with� in the range 0.02–0.06.

138 DRAGOI, STADDON, PALMER, AND BUHUSI

Page 14: Interval timing as an emergent learning property

shows, the richer the reinforcement schedule, the higher the as-ymptote at which � decays, determining shorter wait times andhigher response rates.

Proportional timing and Weber-law (scalar) timing are illus-trated in Figure 13, B and C. Proportional timing is proportionalitybetween a measure of temporal control such as peak time (on thepeak-interval procedure; Catania, 1970; Church, Meck, & Gibbon,1994) or wait time (on FI) and the to-be-timed interval. Weber-law(scalar) timing is the approximate constancy of the COV (standarddeviation divided by the mean of a temporal dependent variable),or the superposition of normalized distributions scaled to theirmean value (i.e., response rate at any point in the interval is a fixedproportion of the maximum rate at the end of the interval). We

show a comparable plot in the final graph in Figure 12C, whichshows average response rates relative to the maximum responserate along the interval, normalizing both axes, for the FI schedulesanalyzed in Figure 13A. Figure 13C shows that curves for FIsranging from 32 s to 256 s superimpose, illustrating Weber-lawtiming (the scalar property). Figure 13C also shows that as the FIrequirement increases, for example, FI 256, the model predicts thatthe peak response is obtained in the interval just before reinforce-ment, a deviation from proportional timing. This behavior is aconsequence of the way variability is incorporated into the model.When the FI requirement is large, the weight of the additive noiseincreases relative to that of the response decay parameter � (Figure13D shows that for high FIs � stabilizes at lower values) to

Figure 13. Fixed-interval (FI) performance. A: Total number of responses during each 4-s portion of theinterval between successive reinforcements, calculated for the first 300 reinforcements after stable performancewas reached, for each one of FI 32, FI 64, FI 128, and FI 256 schedules. Data are adapted from Schneider (1969).B: Proportional timing—the ratio between the wait time (response rate less than 10% from the terminal rate) andthe corresponding FI value is approximately constant. C: Relative response rates as a function of relative timein interval for the four reinforcement schedules from A. D: Response trace decay rate, �, as a function of numberof reinforcements in each of the four schedules from A.

139EMERGENT TIMING

Page 15: Interval timing as an emergent learning property

degrade the timing performance (the same “degradation” feature ofthe model was found for large DRL schedules—see the SpacedResponding section). This result is not entirely inconsistent withexperimental data: For instance, Zeiler and Powell (1994) showedthat the Weber-law property of interval timing does not hold atlonger FIs. One way to correct this problem of the model is todecrease the parameter �, which controls the strength of responsevariation (Equations 9 and 10). Decreasing � would cause the IRTsto increase on average (see Figure 5A) so as to generate a bettertemporal discrimination at high FI schedules (however, this woulddegrade to some extent the model’s behavior at low FIs by vio-lating Weber’s law, which might be inconsistent with experimentaldata, but see Schneider’s 1969, data at low FIs). We were able tosolve this puzzle (data not shown) by randomizing the � parameterbetween 0.02 and 0.05 and running the simulations with a random� for each block of sessions (because � controls the strength ofresponse variation, we suggest that individual variation in thisparameter could explain the individual differences in timing per-formance observed experimentally).

Rapid timing. The model’s ability to explain the dynamics oftiming was tested by the “impulse” experiment, which is a proce-dure used by Higa, Wynne, & Staddon (1991) to show rapidtemporal learning in pigeons. Figure 14 shows experimental dataalong with model predictions for the impulse experiment: Pigeonswere exposed to 100 FI 15 s interspersed with occasional shorterFIs (5 s in duration, the impulse). The location of the impulse wasunpredictable. The question is whether exposure to the short FI hasan effect on subsequent wait times. Figure 14 presents the resultsfrom a baseline condition (top) in which all FIs were 15 s and fromthe impulse condition (bottom). On average, pigeons had shorterpostfood pauses just in the 15-s interval after the impulse (thedotted line in Figure 14, bottom), and then the wait time insubsequent FIs returned to levels observed in the FIs preceding theimpulse. Our model can explain this type of rapid timing (seeFigure 14) by the dynamic interplay between the short-term mem-ory for reinforcement, STMRF, and the response decay parameter,�. When the short FI (or impulse) is presented, STMRF transientlyincreases because the reinforcement occurs earlier than during thepreceding trials. The increase in STMRF causes a transient increasein � (Equation 7), which thus increases the IRT (Equation 5), and,correspondingly, the wait time. Because subsequent trials arecharacterized by longer FIs (i.e., 15 s), STMRF decreases again torestore baseline wait times. The only model parameter controllingthis effect is the rate of increase and decrease of � (i.e., �; seeEquation 7). Therefore, to increase the model’s impulse sensitivity,we ran the Figure 14 simulations using a high � value (0.2); thisdependence ensures that rapid timing will also occur after longerimpulse intervals or at large FIs (unlike another dynamic model oftemporal learning, Machado, 1997b, which predicts that the resultsof the impulse experiment will not hold at large FI values).

Timing two intervals. Experimental evidence shows that ani-mals are able to time more than one interval, even in the absenceof discriminative cues. For instance, Catania and Reynolds (1968)exposed pigeons to two FIs in random order, that is, mixed FI-FI.Each reward is followed by either an FI-N1 or an FI-N2 schedule(N1 � N2), the two independent variables across conditions beingthe length of the short interval (N1) and its probability of occur-rence ( p). Because our model incorporates only one adaptiveoscillator, one might expect it to be unable to deal with more than

one to-be-timed interval. However, because the data from exper-iments such as Catania and Reynolds’s are averages over manyintervals, it is reasonable to ask whether our model can showaverage behavior of this sort. Indeed, if the response pattern ishighly variable (large �), “random” responses are likely to beassigned to either short or long intervals. Therefore, when theschedule components N1 and N2 have equal probabilities, that is,high probability for repetitive sequences of the form N1

k and N2k,

the short-term memory for reinforcement can change quickly inresponse to changes in reinforcement frequency to produce anaverage response pattern that shows two peaks, given a suitablenoise level.

Figure 15 shows simulation results of a mixed FI N-FI 240schedule, where N equaled 100, 140, and 240 s (small arrows).Each of the two mixed FIs occurred randomly with probability .5.One feature of the mixed FI-FI schedules is that for large differ-ences between the two intervals, the animal’s response patternshows two peaks centered at N1 and N2. This response profile isillustrated in Figure 15A, in which N1 � 100 s and N2 � 240 s. In

Figure 14. The impulse experiment. Data were adapted from Experi-ment 3 of Higa et al. (1991). Top: The baseline condition shows data andmodel normalized wait time (t�) for 32 nonimpulse trials (fixed interval[FI] 15 s). Bottom: The impulse condition shows data and model normal-ized wait time during the impulse (FI 5 s) and 15 preceding and 16following nonimpulse trials. The data are normalized such that t� � (t �tmin)/(tmax � tmin). The values for tmin and tmax were calculated separatelyfor data and model wait times in each condition according to the shortestand longest wait time.

140 DRAGOI, STADDON, PALMER, AND BUHUSI

Page 16: Interval timing as an emergent learning property

agreement with Catania and Reynolds (1968), the magnitude of thefirst peak is larger than that corresponding to the longer interval,although our model’s prediction overestimates the difference inmagnitude. Another interesting feature of mixed FI-FI schedules isthat as N1 increases, responding starts later in the interval. Figure15B shows that when N1 � 140 s, the response pattern shiftstoward the right, and thus the wait time increases; when N1 �240 s, there is only one peak around 240 s (see Figure 15C). Asexpected, when the model was tested with lower short-intervalprobabilities ( p), we failed to predict Catania and Reynolds’sexperimental data (e.g., when p is low, animals are still able totrack the two intervals, except that the magnitude of the first peakis much smaller than the terminal response rate). We have alsofailed to predict the correct behavior when N1 was made muchsmaller than N2. Figure 15D confirms that response variation is thekey variable controlling the occurrence of multiple peaks andshows the correlation between the noise level (��1) and themagnitude of the first peak (at the short interval) for the represen-tative mixed FI 100-FI 240 schedule. Confirming our prediction,the magnitude of the first peak increases at large noise levels.Altogether, the mixed FI-FI simulations suggest that even a modelincorporating only one adaptive oscillator can track two intervalsgiven that their occurrence probabilities and response variabilitylevel are high enough.

Discussion

The study of interval timing in animals has long been dominatedby SET, a cognitive, information-processing approach (Gibbon,1991). Nevertheless, there are several other theories waiting in thewings: the BET (Killeen, 1991; Killeen & Fetterman, 1988;Machado, 1997b), a multiple-oscillator neural-net theory similar toSET in several respects (Church & Broadbent, 1990), a spectraltiming real-time neural-net theory (Grossberg & Schmajuk, 1989),and, most recently, a memory-based theory that relates intervaltiming and habituation (the MTS theory; Staddon & Higa, 1999).Why add to this long list? Our reason is that with the exception ofthe MTS model, all these approaches rely on some kind of internaloscillator, pacemaker, or discretization of time by using unitstuned to a certain time scale, for which the evidence is slight. Ifinterval timing is to be understood, we believe that the domain oftheory has to be searched just as thoroughly as the domain of data.The almost total dominance of the theoretical arena by explicittime-representation theories constitutes an unjustifiable historicalbias. Our exploration of the adaptive timer is intended to drawattention to the largely unexplored set of simple timing processesthat do not incorporate any kind of pacemaker but depend insteadon familiar learning processes such as memory, competition, andresponse variation. The idea behind our model is that temporaldiscrimination can emerge from reward-modulated transitions be-

Figure 15. Total responses in mixed fixed-interval/fixed-interval (FI-FI) schedules. A: Mixed FI 100-FI 240.The arrow indicates the duration of each interval of the two FIs. The shorter FI occurred with probability .5. B:Mixed FI 140-FI 240. C: Simple FI 240 schedule. D: Amplitude of the first peak (for the low interval) relativeto the maximum (max.) peak as a function of noise level (�). In all FI-FI simulations we average all responsesgenerated during the last 100,000 time units since the asymptote in responding was reached.

141EMERGENT TIMING

Page 17: Interval timing as an emergent learning property

tween response classes. This assumption is motivated by experi-mental data showing a significant relationship between the rate oftransition between response classes and measures of responsetiming (Staddon & Simmelhag, 1971). Indeed, when reinforcementdepends on a response (as in spaced responding or interval timing),this response follows the same time course as food-related terminalresponses (or R responses). In contrast, when reinforcement israndom in time (as in variable-interval schedules) and the responsepattern does not show temporal regularity, interim responses (or Oresponses) occur in random alternation with food-related responses(for instance, pigeons responding on variable-interval schedulesoften show brief turning-away movements alternating with keypecking; Staddon & Simmelhag, 1971).

In contrast to the more traditional internal clock–pacemakertheories that have always claimed strong neurobiological support,our adaptive timer model is not specific about the precise neuralmechanism underlying timing behavior and does not assume thattiming mechanisms are similar in all brain systems (e.g., fromthose controlling eyeblink conditioning to those regulating keypecking or barpressing). Instead, we propose a simple networkmechanism that could have several neural implementations. Forexample, it is possible that the two types of response, R and O(Equations 1–4), could represent two populations of neurons thatmutually inhibit each other (through two different types of inter-neuron populations). A third neuronal population could inhibitboth the R and O populations to control the response decay rate, �(Equations 7–8), which could be implemented as an experience-dependent decrease in neuronal firing rates (e.g., changes in neu-ronal responses induced by adaptation; Dragoi, Sharma, & Sur,2000). The most difficult question to answer is how to implementthe multiplicative interaction between the response decay rate, �and the strength of the response (Equations 1–2). However, it hasbeen previously shown (Salinas & Abbott, 1996) that multiplica-tive responses can arise in a neural network model through pop-ulation effects. Specifically, neurons in a recurrent network withexcitatory connections between nearby neurons (for instance, sub-populations of R responses) and inhibitory connections betweenmore distant neurons (for instance, connections between R and Opopulations) can perform a product operation on additive synapticinputs (see also Mel, 1993, for a cellular implementation of non-linear operations).

The present study is original in two ways. First, we implementour analysis at a different, lower level than all previous modelsof temporal learning and use a parsimonious approach in oursearch for the basis of timing behavior. We deviate from mostcurrent theorizing about temporal learning by eschewing com-pletely any kind of internal clock and internal time representationassumption or resetting mechanism. Instead, we test the effects ofa simple “nontemporal” hypothesis: that overall reinforcement rate(arousal) modulates the strength of response variation and selec-tion. We use this simple principle to explain several aspects of thetemporal regulation of behavior. We suggest that assumptionsabout internal representations of time (including the ubiquitousclock hypothesis) are not necessary to explain timing behavior andthen show that our model’s principles capture basic properties ofspaced responding and interval timing, the most representativetemporal learning phenomena in animals.

Second, our timing model is a real-time model, in contrast tomost current models for temporal control, which emphasize steady

state performance, with little concern for the behavioral trajectorypreceding asymptote. In contrast, we describe both the steadystates and the dynamics of timing behavior, in both spaced re-sponding and interval timing. Our model accounts reasonably wellfor major results in FI schedules and other procedures. We wereable to predict basic acquisition–extinction performance in simpleFI schedules, including the scalar property of timing; rapid tem-poral control; and, in some conditions, performance in timing twointervals (mixed FI-FI schedules). However, because of the sim-plicity of the model, we could not avoid certain incorrectpredictions.

In its basic implementation (Equations 1–7), the model is unableto generate the pattern of responding after reinforcement in inter-val timing experiments within individual trials, although the modelcorrectly predicts the average temporal pattern on multiple trials.However, we have addressed this limitation, which is shared by allsteady state timing models, by showing that a population of unitssuch as that described in Figure 3 (but with different responsevariation parameters) can act in parallel to characterize the re-sponse pattern on individual trials. The population model is alsocapable of explaining the other timing data sets discussed in thearticle, but for reasons of parsimony we studied first the single-unitmodel (see Figure 3).

Because model parameters are kept fixed for most simula-tions, the model is sometimes unable to cover the full paramet-ric range encountered in timing experiments. For instance, inmixed FI-FI schedules we were unable to predict that animalsexposed to low-probability small FI and high-probability largeFI schedules are still able to accurately time both intervals. Themodel is also unable to predict that when the short FI intervalis much smaller than the long FI interval (e.g., Leak & Gibbon,1995), but the probability of the short interval is high, animalsare able to time the two intervals independently and simulta-neously. We suggest that to explain a full range of such multipletiming phenomena, the model should be endowed with at leasttwo adaptive oscillators operating in parallel with differentshort-term memory parameters.

Finally, the model lacks resources for stimulus–stimulus andstimulus–response associations. This limitation is not a problemfor explaining temporal properties of responses in operant condi-tioning where the onset of a trial is signaled by the delivery ofreinforcement. However, the lack of external stimulus inputsmeans that the simple model cannot be applied to timing inclassical conditioning paradigms. Two previous models (Buhusi &Schmajuk, 1999; Grossberg & Schmajuk, 1989) are able to explaintemporal discrimination in simple conditioning and occasion-setting experiments. However, these models have not been appliedto static and dynamic operant conditioning paradigms.

In summary, we have demonstrated that a timing process thatincorporates familiar learning principles is able to explain basicsteady state properties of temporal discriminations in animals,as well as describe how temporal regulation develops in realtime. We showed that two principles, response selection andvariation and modulation by overall arousal level, are sufficientto achieve temporal regulation, opening up a promising avenueto a unified framework for “timing” and “nontiming” learningphenomena.

142 DRAGOI, STADDON, PALMER, AND BUHUSI

Page 18: Interval timing as an emergent learning property

References

Breukelaar, J. W., & Dalrymple-Alford, J. C. (1999). Effects of lesions tothe cerebellar vermis and hemispheres on timing and counting in rats.Behavioral Neuroscience, 113, 78–90.

Buhusi, C. V., & Schmajuk, N. A. (1999). Timing in simple conditioningand occasion setting: A neural network approach. Behavioural Pro-cesses, 45, 33–57.

Cahill, G. M., Hurd, M. W., & Batchelor, M. M. (1998). Circadianrhythmicity in the locomotor activity of larval zebrafish. Neurore-port, 26, 3445–3449.

Catania, A. C. (1970). Reinforcement schedules and the psychophysicaljudgements: A study of some temporal properties of behavior. In W. N.Schoenfeld (Ed.), The theory of reinforcement schedules (pp. 1–42).New York: Appleton-Century-Crofts.

Catania, A. C., & Reynolds, G. S. (1968). A quantitative analysis of theresponding maintained by interval schedules of reinforcement. Journalof the Experimental Analysis of Behavior, 11, 327–383.

Cevik, M. O. (2000). Changes in signal durations and methamphetamineon duration discrimination. Unpublished doctoral dissertation, IndianaUniversity, Bloomington.

Church, R. M. (1984). Properties of the internal clock. In J. Gibbon & L. G.Allan (Eds.), Annals of the New York Academy of Sciences: Timing andtime perception (pp. 566–582). New York: New York Academy ofSciences.

Church, R. M., & Broadbent, H. A. (1990). A connectionist model oftiming. In M. L. Commons, S. Grossberg, & J. E. R. Staddon (Eds.),Quantitative models of behavior: Neural networks and conditioning (pp.135–155). Hillsdale, NJ: Erlbaum.

Church, R. M., Meck, W. H., & Gibbon, J. (1994). Application of scalartiming theory to individual trials. Journal of Experimental Psychology:Animal Behavior Processes, 20, 135–155.

Clark, F. C. (1964). Effects of repeated VI reinforcement and extinctionupon operant behavior. Psychological Reports, 15, 943–955.

Dragoi, V. (1997). A dynamic theory of acquisition and extinction inoperant learning. Neural Networks, 10, 201–229.

Dragoi, V., Sharma, J., & Sur, M. (2000). Adaptation-induced plasticity oforientation tuning in adult visual cortex. Neuron, 28, 287–298.

Dragoi, V., & Staddon, J. E. R. (1999). The dynamics of operant condi-tioning. Psychological Review, 106, 20–61.

Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. NewYork: Appleton-Century-Crofts.

Gibbon, J. (1977). Scalar expectancy theory and Weber’s law in animaltiming. Psychological Review, 84, 279–325.

Gibbon, J. (1991). Origins of scalar timing. Learning and Motivation, 22,3–38.

Gibbon, J. (1995). Dynamics of time matching: Arousal makes better seemworse. Psychonomic Bulletin & Review, 2, 208–215.

Gibbon, J., & Church, R. M. (1992). Comparison of variance and covari-ance patterns in parallel and serial theories of timing. Journal of theExperimental Analysis of Behavior, 57, 393–406.

Grossberg, S., & Schmajuk, N. A. (1989). Neural dynamics of adaptivetiming and temporal discrimination during associative learning. NeuralNetworks, 2, 79–102.

Higa, J. J., & Staddon, J. E. R. (1997). Dynamic models of rapid temporalcontrol in animals. In C. M. Bradshaw & E. Szabadi (Eds.), Time andbehavior: Psychological and neurobiological analyses (pp. 1–40). Am-sterdam: Elsevier Science.

Higa, J. J., Wynne, C. D., & Staddon, J. E. (1991). Dynamics of timediscrimination. Journal of Experimental Psychology: Animal BehaviorProcesses, 17, 281–291.

Kelleher, R. T., Fry, W., & Cook, L. (1959). Interresponse time distributionas a function of differential reinforcement of temporally spaced re-sponses. Journal of the Experimental Analysis of Behavior, 2, 91–106.

Killeen, P. R. (1991). The behavior’s time. In G. H. Bower (Ed.), The

psychology of learning and motivation (pp. 295–334). New York: Ac-ademic Press.

Killeen, P. R., & Fetterman, J. G. (1988). A behavioral theory of timing.Psychological Review, 95, 274–285.

Leak, T. M., & Gibbon, J. (1995). Simultaneous timing of multiple inter-vals: Implications of the scalar property. Journal of Experimental Psy-chology: Animal Behavior Processes, 21, 3–19.

Machado, A. (1997a). Increasing the variability of response sequences inpigeons by adjusting the frequency of switching between two keys.Journal of the Experimental Analysis of Behavior, 68, 1–25.

Machado, A. (1997b). Learning the temporal dynamics of behavior. Psy-chological Review, 104, 241–265.

Machado, A., & Cevik, M. (1998). Acquisition and extinction underperiodic reinforcement. Behavioural Processes, 44, 237–262.

Malott, R. W., & Cumming, W. W. (1964). Schedules of interresponsetime reinforcement. Psychological Record, 14, 211–252.

Maricq, A. V., Roberts, V., & Church, R. M. (1981). Methamphetamineand time estimation. Journal of Experimental Psychology: Animal Be-havior Processes, 7, 18–30.

Meck, W. H. (1996). Neuropharmacology of timing and time perception.Cognitive Brain Research, 3, 227–242.

Meck, W. H., Church, R. M., Wenk, G. L., & Olton, D. S. (1987). Nucleusbasalis magnocellularis and medial septal area lesions differentiallyimpair temporal memory. Journal of Neuroscience, 7, 3505–3511.

Mel, B. W. (1993). Synaptic integration in an excitable dendritic tree.Journal of Neurophysiology, 70, 1086–1101.

Mirmiran, M., Swaab, D. F., Kok, J. H., Hofman, M. A., Witting, W., &Van Gool, W. A. (1992). Circadian rhythms and the suprachiasmaticnucleus in perinatal development, aging and Alzheimer’s disease.Progress in Brain Research, 93, 151–162.

Odum, A. L., Lieving, L. M., & Schaal, D. W. (2002). Effects ofd-amphetamine in a temporal discrimination procedure: Selectivechanges in timing or rate dependency? Journal of the ExperimentalAnalysis of Behavior, 78, 195–214.

Pearson, K. G., & Fourtner, C. R. (1975). Nonspiking interneurons inwalking system of the cockroach. Journal of Neurophysiology, 38,33–52.

Perrett, S. P., Ruiz, B. P., & Mauk, M. D. (1993). Cerebellar cortex lesionsdisrupt learning-dependent timing of conditioned eyelid responses. Jour-nal of Neuroscience, 13, 1708–1718.

Richelle, M., & Lejeune, H. (1980). Time in animal behavior. Oxford,England: Pergamon.

Salinas, E., & Abbott, L. F. (1996). A model of multiplicative neuralresponses in parietal cortex. Proceedings of the National Academy ofSciences USA, 93, 11956–11961.

Santi, A., Weise, L., & Kuiper, D. (1995). Memory for event duration inrats. Learning and Motivation, 26, 83–100.

Schneider, B. (1969). A two-state analysis of fixed-interval responding inthe pigeon. Journal of the Experimental Analysis of Behavior, 12,667–687.

Shurtleff, D., Raslear, T. G., Genovese, R. F., & Simmons, L. (1992).Perceptual bisection in rats: The effects of physostigmine, scopolamineand pirenzepine. Physiology and Behavior, 51, 381–390.

Sidman, M. (1956). Time discrimination and behavioral interaction in afree operant situation. Journal of Comparative and Physiological Psy-chology, 49, 469–473.

Spetch, M. L., & Treit, D. (1984). The effect of d-amphetamine onshort-term memory for time in pigeons. Pharmacology Biochemistry andBehavior, 21, 663–666.

Staddon, J. E. R. (1965). Some properties of spaced responding in pigeons.Journal of the Experimental Analysis of Behavior, 8, 19–27.

Staddon, J. E. R. (1977). Schedule-induced behavior. In W. K. Honig &J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 125–152).Englewood Cliffs, NJ: Prentice-Hall.

143EMERGENT TIMING

Page 19: Interval timing as an emergent learning property

Staddon, J. E. R. (2001). Adaptive dynamics: The theoretical analysis ofbehavior. Cambridge, MA: MIT Press.

Staddon, J. E. R., Chelaru, I. M., & Higa, J. J. (2002). A tuned-trace theoryof interval-timing dynamics. Journal of the Experimental Analysis ofBehavior, 77, 105–124.

Staddon, J. E. R., & Higa, J. J. (1999). Time and memory: Towards apacemaker-free theory of interval timing. Journal of the ExperimentalAnalysis of Behavior, 71, 215–251.

Staddon, J. E. R., & Simmelhag, V. G. (1971). The “superstition” exper-iment: A reexamination of its implications for the principles of adaptivebehavior. Psychological Review, 78, 3–43.

Staddon, J. E. R., & Zhang, Y. (1991). On the assignment-of-creditproblem in operant learning. In M. L. Commons, S. Grossberg, &J. E. R. Staddon (Eds.), Neural networks of conditioning and action, theXIIth Harvard Symposium (pp. 279–293). Hillsdale, NJ: Erlbaum.

Turek, F. W. (1994). Circadian rhythms. Recent Progress in HormoneResearch, 49, 43–90.

Treisman, M. (1963). Temporal discrimination and the indifference inter-

val: Implications for a model of the internal clock. Psychological Mono-graphs, 77, 1–31.

Welsh, D. K., Logothetis, D. E., Meister, M., & Reppert, S. M. (1995).Individual neurons dissociated from rat suprachiasmatic nucleus expressindependently phased circadian firing rhythms. Neuron, 14, 697–706.

Welzl, H., Berz, S., & Battig, K. (1991). The effects of the noncompetitiveNMDA receptor antagonist MK 801 on DRL performance in rats.Psychobiology, 19, 211–216.

Wilson, M. P., & Keller, F. S. (1953). On the selective reinforcement ofspaced responses. Journal of Comparative and Physiological Psychol-ogy, 46, 190–193.

Zeiler, M. D., & Powell, D. G. (1994). Temporal control in fixed-intervalschedules. Journal of the Experimental Analysis of Behavior, 61, 1–9.

Received January 8, 1999Revision received January 8, 2002

Accepted January 24, 2002 �

144 DRAGOI, STADDON, PALMER, AND BUHUSI