Rescorla & Wagner 1972

Embed Size (px)

Citation preview

  • 8/11/2019 Rescorla & Wagner 1972

    1/18

    Rescorla, R. A, & Wagner, AR. (1972). A theory of Pavlovian conditioning:variations in the effectiveness of reinforcement and non reinforcement.In AH. Black & W.F. Prokasy (eds.), Classical conditioning II: current researchand theory (pp. 64-99) New York: Appleton-Century-Crofts.

    ROBERT A. RESCORLA

    ALLAN R. WAGNER

    3A Theory of Pavlovian Conditioning:Variations in the Effectiveness of

    Reinforcement and Nonreinforcement

    In several recent papers (Rescorla, 1969; Wagner, 1969a, 1969b) wehave entertained similar theories of Pavlovian conditioning. The separatestatements have in [act differed more in the language of their expressionthan in their substance. The major intent of the present paper is to ex-plicate a more precise version of the form of theory involved, and toindicate how it may be usefully applied to a variety of phenomena involving asso('iative learning.

    The impetus for a new theoretical model is not generally a newdatum which dearly disconfirms existing theory. It is more likely to bethe accumulation of a salient pattern of data, separate portions of whichmay be ade(luately handled by separate existing theories, but which apopears to invite a more integrated theoretical account. Such, at least, isthe better description of the background of the present work.

    In the sections which follow we will first describe certain data fromour laboratories which exemplify the kind of observations which haveencouraged the present theorizing. The theory will then be presented insufficient detail to show how it may beapplied to experimental situationsinvolving a variety of Pavlovian conditioning arrangements. Finally, wewill briefly discuss the theory in relationship to more conventional

    approaches.

    BACKGROUND

    The background data pauern embraces a considerable range ofphenomena. At the core, however, is a rather simple set of observationsinvolving Pavlovian conditioning with compound CSs.

    Suppose we have inferential knowledge concerning the "associative

    Tht, pn'p;lrariol\ of this manuscript and the research reportt.d wt.'rc slIpportt'

  • 8/11/2019 Rescorla & Wagner 1972

    2/18

    66 Assumptions, Theories, Models

    Immediately following training all Ss received 16 reinforced testtrials with X presented alone for the first time. In each of the three lrain-ing conditions X had been experienced an equal number of times, andhad always been followed by reinforcement. However, the conditionswere designed to encourage different degrees of conditioning or "asso-ciative strength" to A with which X was always experienced in com-

    pound. In comparison to Group Uwhich received only compound trials,reinforcing A alone in Group I should have increased the associativestrength of A and hence of the AX compound during training, whereasnonreinforcing A alone in Group lU should have decreased the strengthof A and hence of the AX compound during training. The question waswhether X would be differentially responded to in the three conditions

    as a function of this differential experience with A.Figure 1 summarizes the percentage test trial responses of the three

    groups to the X element. Also included for comparison are the per-centages of conditioned eyeblink responses to the AX compound and tothe A element, where appropriate, during the immediately precedingblock of training trials, As may be seen, relative to Condition II in-creasing the associative strength of A in Condition I decreased condi-tioned responding acquired by X, whereas depreciating the associative

    strength of A in Condition lU increased conditioned responding acquired

    I

    GROUPS

    n lIT

    A X+ A + AX+ STIMULI

    A X+ A -

    Figur. 1. Median percentage conditioned eyeblink responses to on AX com-

    pound and to the A and X elements alone, in three groups receiving either no

    training with A alone (lI), training with A alone reinforced (I), or training with

    A alone nonreinforced (III), contemporaneous with AX reinforced.

    Rescorla & Wagner 67

    lJy X. This onlerinll; of lhc treatments is nol only slatisti"ally reliahle,but is very replOdu(eablc in ditlerent siluations, \Vall;ner (I!Hi!lh) hasreportcd esscutially identi(al results I"l"Omsimilar (omparisous iuvolvinl;Conditioned Emotional Response (CER) naiuiug or disniminated bar-

    press training wilh rats.Res.:orla, in a previously unpuhlished experillleut. ohtained similar

    etreCls, hut in a situation in whi(:h Ihe asso('iativc value 01 the A .:ue was

    manipulated pr iOl' to the start 01'AX traininl;. Four groups of rats werefirst trained tohar-press on a VI sdledule for food ,-eiuforcement. Theseveral groups then re(:eived ditlerelll Pavlovian (Olulitioning treatmentswith a 2,min. tone CS (A) and a0.5 sec l,m.l. f,x,t shoc'k, while confineIlin a separate shoc'k ch;unher. These treatments were designed toestah-lish different hehavior etrel:ls to A. For all groups 12 preselllations of Aoccurred in each of 52-hour conditioning sessions. In Group ,1\-0, Awas traiued toelicit fear by presenting the shock with a prohahility of. Rduring the CS but never in its ahsence. For a semnd group (C;roup 0-.Il)shoc'kson-lined with a frequen(:y of,tl per 2-min. interval in the ahsenfCof A bUI the onset of A signalled a 4,min. pericxl free from sho(:ks. Thispnxedure ("(HI hi he expened (e.g" Rescor1a, I!I!l)tomake 1\ a (ondi,tioned inhihitor 01" feaL The remaininf!; IWOgroups were ,ontrol groupsin whidl lhe ('ollilitioninf!; trcatmcnts ("(mid he expe'-Ied to leave Arela,tively neutra\. Thus, Group Control 0-.1\ re.:eived Ihe same numhn ofshoc'ks as Group 0-.1\, and the same IIllmber of exposures to1\, bul thetwo were tllKorrelated in lime. Group Shock re(-eived the same scheduleof shoc'ksas Group 0- .R,hut never cxperienced A,

    Following this c'onditioning to 1\ alone. all Ss re('eived 1\lrials inwhi(:h a I\ashing light (X) was presented in fOnjunclon wilh 1\ and thecompound was reinforced with sho"k on a !iO 'i ;, sdledule. Finally, oneach of four test days following this wmpoullll (-ondilioning all Ss re,ceived 4 nonreinford test presentalions of X alone while bar'pressing.

    Figure 2 sUlllmarizes the results of thcse extinnion tesl sessions, inthe form of mean suppression ratios (I\nnau & Kamin, l!)()l). This l-atioyields a value of lero when the Ui completely disrupts har-pressing, anda value of . r , when har-pressing behavior is nnatleCled hy the CS, Thus.

    the lower the value indicated, the more eHenive was X alone. Ilshoul(lalso he noted that the hehavior of the IWOreferen('e ~roups (Group Sho..kand Group Control, O-.R) was virtually iclentil:al throu~hout testing, sothat the two groups have heen ("(nnhined in Figure 2.

    Although X was experiencl, amI had been followed by l'einfon'e,melll an elJllal number of times in the several groups, X was not similarlyeffective during tesling. In ("(1II1parisonto the performance of the refer-ence groups, pretraining cue 1\ toeli..it fear in C;roup .tl-O decreased theanuisition of fear to X as a result of the AX reinlor ..emellls. In adclilion,pretrainin~ nie 1\to be an inhihitor of fear iu Group 0-.1\ i/lnellsed theaUluisition 01" I"ear10 X as a result of the I\X reinfon:ements.

  • 8/11/2019 Rescorla & Wagner 1972

    3/18

    68

    x

    ot-

    o~o ::

    ~CI)CI)

    I . L I

    o ::Q.Q.

    : : : >CI)

    z~I . L I

    ~

    .4

    .3

    .2

    .1

    Assumptions, Theories, Models Rescorla & Wagner

    /,./~,/p

    78-0 ,/

    /

    _ /1' lleference

    I/ A/A. //

    ,/

    cl

    69

    The A and 8 cues, by virtue of lhe difl"cretJl lIumhers of reinlITe-ments ill their presen, were designed lo havc diflcrcnl assOlialivestrenglhs. i.e., A was designed to he a relatively slrong (uc. and H arela.tively weak cue hy Ihc clld of anuisilion. For half of the Ss A was aflashing lighl ami H was a vilnati,," al'plic:d lo Ss' dH~S\. For the remain .illg Ss the' lIat ure of Ihe nIes was rc,crsed.

    The X cue, which for all .'Iswas a !lI tifl hz lone, was lhe elemenl ofspecial illlercs\. hllmedialely followin/{ anluisitioll. Ss were assigned loone of two treatment cOllditions alld adminislered 32 extinction trials,in whid. X was presented and nonreinfon-ed. For half of lhe Ss X \Vaspresented during extinction in wmpoulld with Ss' A (:ue, while for theremaining IR .'Isil was presemed in wmpmmd with the B Ule.

    On the !l2 lrials immediately folluwing the extinctiou phase. X wasagain presented alone to all Ss ami was reiuforce - ~ - { J ,

    w /~ "~ o/ I'/ '-euCl: 20wQ.

    1'JY10

    I 2 3 4 5 6 7

    BLOCKS OF 32A, 4B,and32 XTRIALS

    2 3

    SESSIONS

    Figur. 2. Mean suppression ratio for X fallowing AX reinforced trials. Groups.8-0 and 0-.8 had prior excitatory and inhibilory training, respectively, to A,while the reference groups had prior treatment nol expected to influence theassociative strenglh ofA.

    TEST

    Varia/ioll in the effects of nonreinforcemen/. A study conducted by

    Wagner, Saavedra, and Lehmann (Wagner, 1969b) was designed to evaluate whether nOllreinforcement would also have different effects upona stimulus element, depending upon the strength of the compound inwhich the element was imbedded. The study was conducted in eyelidconditioning and generally employed parameters similar to those in theearlier Wagner and Saavedra study.

    Thirty-six rabbits were first conditioned to three separate stimuluselements, which will be referred toas A, B, and X. Over the course oftwo days training there were 224 A, 28 B, and 224 X trials, irregularlyordered, in which the respective cues were presented alone and reinforred.

    EXTINCTION REACQUISITIONSE LECTED COMPOUNDI X

    After

    U ..r~~-BLOCKS OF 8 TRIALS

    234

    Figure 3. Mean percenloge eyeblink responses during three Iroining phases, invalvillg acquisition to each of three separale component CSs, extinction with oneof Iwo compounds farmed from Ihe acquisilion componenls, and reacquisilion lothe component common lo the twa exlinction compounds. (FromWagner, 1969b.l

  • 8/11/2019 Rescorla & Wagner 1972

    4/18

    70 Assumptions, Theories, Models Rescorla & Wagner 71

  • 8/11/2019 Rescorla & Wagner 1972

    5/18

    72 Assumptions, Theories, Models

    GROUPS

    .0 MA .5 MA I MA

    I.20

    5 2

    l-e:{It:

    z .15OInInIo ::Q.. .10Q..~(/)

    ze:{

    .05I

    :l

    e e x e e x e e x

    Figur. 5. Medion suppre .. ion ratio lae alone and Iheex compound. Slimulus Xhad received prior inhibitory training contrasted with different intensities ofIhe US.

    nonreinforcement of X in compound with the relatively weak B cue.In all of the previous studies the associative strength of the cue with

    which X was eventually treated in compound was manipulated by vary-ing the schedule of reinforcements and nonreinforcements with respect tothat cue alone. There are, however, other variables which should in-Ouence the learning which would accrue to such cues alone and il mightbe expected that these variables would have an influence similar to that

    produced by varying the reinforcement schedule. For example, in theWagner and Saavedra inhibition study above, it might have been aseffective 10bring A and B to differential strengths as a result of thesame number of pairings with a US, but with a higher intensity US

    associated with A than with B.Such reasoning gains support from an unpublished experiment by

    Rescorla. Following VI food-rewarded barpress training, three groupsof 8 rats received CER conditioning, with a 1200 Hz tone alone (A) anda compound (AX) composed of this tone plus a Oashing light. In total,45 A and 75 AX trials were irregularly distributed over 30 training days.

    Rescorla & Wagner 73

    For all Ss the AX trials were consistently nonreinfon:ed. The groups dif-fered in the intensity of a .5 sec. shock US which they received on the Aalone trials, being eilher (l,ma. (nonreinfonemem), .S-ma., or l'ma .

    In onlel' to evaluale the inhibitory effects of X in each of the threetreatments it was ne(:essary to train the CER to an additional nIe (C).This cue was a 2S0 Hz tone, introduced for all Ss after the differentialtreatmem phase and reinforced with a .S-ma. shock on a S0'7oreinforce,ment schedule. Testing was then accomplished by evaluating the degreeof suppression produced when X was presellled in compound with C. ascompared to C alone, Two C and two ex trials, each nonreinforced,were presented on each of 6 consecutive test days.

    Figure 5 presents the median suppression ratios during testing underthe two cue conditions in the three gToUpS.Again it should be noted thatthe smaller the ratio value the more effective was the CS in disruptingbar-pressing. As may be seen, C alone produced equivalent degrees ofsuppression in all three groups. The addition of X had little effect uponsuppression in the O-ma. group but increasingly interfered with suppres-sion in the .5-ma. and I-rna. groups, There was a clear and statisticallyreliable tendency for X tobe a more effective conditioned inhibitor ofthe CER as a result of AX nonreinforcement, the more intense the US

    with which the A cue alone was paired.

    THE BASIC THEORY

    The generalization which applies toall of the results in the previoussection is thai the effect of a reinforcement or nonreinforcement inchanging the associative strength of a stimulus depends upon the existingassociative strength, not only of that stimulus, but also of other stimuliconcurrently present. It appears that the changes in associative strengthof a stimulus as a result of a trial can be well-predicted from the com-posite strength resulting from all stimuli preselll on that trial. If thiscomposite strength is low, the ability of a reinfon:ement to produce in-crements in the strength of component stimuli will be high; if thecomposite strength is high, reinfonement will he relatively less efteetive.

    Similar generalizations appear to govern the eft'elliveness of a nonrein-forced stimulus presentation. Ifthe composite associative strength of astimulus compound is high, then the degree to which a nonreinforcedpresentation will produce decrements in the associative strength of thecomponents will be large; if the composite strength is low, the effectof a nOllreinforcement will be redured.

    Certain similarities and differences between these generalizations andHull's postulates for growth of 11H11will !J e readily recognized. Thechanges in associative strength are acknowledged to depend upon cur-rent levels of that strength. However, the statements above assert that

  • 8/11/2019 Rescorla & Wagner 1972

    6/18

    74 Assumptions, Theories, Models

    changes in the strength of a stimulus depend upon the total associativestrength of the compound in which that stimulus appears, whereas forHull only the strength of the component in question was relevant, It is

    just this dependence upon total associative strength which is central tothe theory we wish 10develop here.

    There are a variety of theoretical languages in which this centralidea can be expressed. One rather peripheralistic formulation has been

    suggested by Rescorla (1969). He proposed that the change in CR condi-tioned toa CS, as a result of a CS-US pairing may depend upon thediscrepancy between the CR actually evoked on that trial and the maxi-mum CR which the particular US will support. The CR occurring on atrial arises from all of the stimuli present on that trial, not simply theCS in question. Increments in conditioning may be assumed to occurwhen the actual CR evoked on a trial is smaller than the maximumwhich the ensuing US will support. Correspondingly. decrements resultwhen Ihe actual CR is larger than the maximum CR. Rescorla hasparticularly emphasized the implication that Pavlovian conditioned in-hibition can be established toa CS by presenting it at a time when theactual CR is larger than the maximum CR which the subsequent US

    will support.A somewhat different version of the central notion, that condition-

    ing depends upon the associative strength of all stimuli occurring on atrial, has been suggested by Wagner (1969a, b). Wagner couched hisproposal in terms of the (:hanges in "signal value" of a cue, an associa-tive construct meant to embrace both the incremental effects of rein-forcement and the decremental effects of nonreinforcement. Specifically,the changes in signal value as a result of a trial were assumed tobe linearfunctions of the composite signal value resulting from all stimuli presenton that trial. Separate sets of such linear functions were suggested to beappropriate for the cases of reinforcement and nonreinforcement. Theresultant signal value of the stimulus would presumably be reflected inthe overt CR, although the specific relationship was not treated.

    A less completely formulated version of these ideas has been sug-gested by Kamin (1968). Indeed, it was Kamin's notions concerning the

    "surprisingness" of a US that originally encouraged the formulations ofRescorla and \Vagner. Attempting to accounl for his data on the so-calledblocking effect, Kamin argued that conditioning will occur only when theUS event is somehow "surprising" for the animal. Although the condi-tions which produce this surprise were not detailed, Kamin clearly in-tended that the surprise generated by a US be assumed to be reduced ifthat US is preceded by a CS which has previously been paired with it.Conse

  • 8/11/2019 Rescorla & Wagner 1972

    7/18

    76 Assumptions. Theories. Models

    associative s trengths into responding in any part icular s ituation. S ec-

    ondly. we will explidtly remgnize that learning is tied to various exter-

    nal s timuli and discuss associative s trength to various s timuli . In

    recognition of these two modifi( 'a tions, we will descrihe the model interms of VI' the strength of association to stimulus j,

    It is a lso important to note thaI VI wil l be al lowed to take on both

    positive and negative values, corresponding roughly to conditioned exci-

    tation and conditioned inhibition. But, the most significant departure

    from the l inear model is that when a st imulus compound, AX, is fol-

    lowed I)y a US , the changes in the s trength to each of the component

    s timuli , A and X, wil l be taken to be a function of VAX' i.e., the strengthof the compound, rather than the strength of the I'espective components.

    When a mmpound, AX, is followed by US" the changes in associa-

    tive strength of the respective components may be represented as:

    ~VA =aAt ( 1 - VAX)and

    INX =ax, ( 1 - VAX)'

    If AX is followed by a different valued US, i .e . US. , which may

    include O or nonreinforcement, the changes in associative strength of therespective components may be represented as:

    ~VA =aA~ ( 2 - VAX)and

    ~Vx =aX2 ( 2 - V A X )

    As may he seen in the ahove equation, there are three sets of para-

    meters which affect the magnitude of the changes involved. The alphasare learning rate parameters, each associated with one component stimu-

    lus. and are appropriately subscripted to indicate this identification. Thevalue of a lpha roughly represents s timulus sal ience and indicates our

    assumption that different s timuli may acquire associative s trength a tdifferent rates despite equal reinforcement. The betas are learning rate

    parameters associated with the USs. The ass ignment of different beta

    values to different USs indicates our assumption that the rate of learning

    may depend upon the part icular US employed. Alpha and beta values

    are confined lo the unit illlerval, {)~a, ~1. Finally, the values repre-sent the asymptotic level of associative s trength which each US will

    support; presumably different USs will yield different asymptotic levels.

    Although is not formally hounded, changing the range of its permis-

    sible values simply shifts the scale on which we observe Vs.

    In order to apply the model. two further specifications are needed,

    T he as so ci at iv e s tr en gt h o f t he c om po un d, V A X' mu st s om eh ow b e

    specified in terms of the s trengths of the components. The s imples t

    assumption, and the one we will make here is, VAX= VA+ Vx. Notice

    Rescorla & Wogner 77

    that ll though the Vs are in prilu: iple unhounded, iu appli, a tion the values .~et limits on lhe ("(lIupound Vs.

    S(("(lIldly. wc IIced lo providc somc mapl'ill~ ut V voIl1C'silllo hC',

    havior. "Ve arc Ilot prepared to makc delai led ass llmptiulls i ll this in,

    st an

  • 8/11/2019 Rescorla & Wagner 1972

    8/18

    78 Assumptions, Theories, Models

    spersing trials of A alone with reinforced AX trials. In the case whereA is reinforced on those intermixed trials, again VA will be large andresult in an enlarged VAX' thus limiting the amount of conditioniugwhich can acnue 10 X on lhe compound trials. Early in condilioning,reinforcements will occur 10 AX while VAX is still below asymptote andconsequently Vx will increase initially. Nevertheless, VX will eventuallydecrease lo zero. Since VA will increase toward , as a result of the A

    alone trials. VAX will come to exceed . Notice that when this happens,. the result of a,-lI/orced AX trial is to durement the associative strength

    of lhe components. As A and AX are both reinforced, increments 10 Awill occur on thc reinforced A trials and decrements to A and X onthe reinforced compound trials. The result will be a transfer to A ofwhatever associative strength X may have initially acquired. It is animportalll characteristic of the model that even on reinforced trials, ifVAX ex('eeds , A and X will be decremented.

    A similar account can be given of the results of nonreinforcedpresentations of A alone in the Wagner and Saavedra study. Thesepresentations should lead to a reduction of VA and hence a reductionof VAX' This provides increased opportunity to condition X on the AXtrials, as compared 10a condition involving only reinforced AX trials.

    Kamin (1968) has provided considerable additional data for theparticular case in which AX reinforcement is preceded by a history ofreinforcement of A. His experiments, carried out in a CER situation,indicate that with a high degree of prior conditioning to A, reinforce-ment of AX can be rendered almost completely ineffective in condi-tioning fear to X.

    Several variations in the treatment of A, however, were found toattenuate the ability of A to "block" conditioning of X. For instance,as the number of prior conditioning trials to A was reduced, the abilityof A to block the conditioning of X was lessened. Alternatively, if A -was first highly conditioned but then extinguished, the extinction dis-rupted A's ability toblock. Finally, ifthe intensity of A was decreased,blocking was redmed.

    All of the lauer manipulations might be expected to yield a lower

    VA and thus a lower VAX at the time of reinforcement of the compound.This deduction should require no elaboration in the cases where numberof reinfon-ed or extinguished trials to A alone was manipulated. In thecase of decreased A intensity, it is only necessary to make the reasonableassumption that a lower VA was attained prior to compound trainingas a result of a lower aassociated with the weaker stimulus.

    Kamin also reponed that the nature of the US at the time AX isintroduced is critical. For instance, if the prior conditioning of A isdone with a loma. shock and then AX is followed hy a I-rna. shock,a large interfel'ence with the conditioning of X is observed. But if the

    Rescorla & Wagner 79

    AX compound is followed instead by a 4ma, shoc'k, wnsiderahly moreconditioning to X results. This conditioning to X depends not simplyupon the use of a high sho('k intensity, but requires an innease in shockintensity from the conditioning of A to that of AX. Alternalively, ifA is followed by a single shock and then AX is followed hy two shocksin close succession, similar conditioning to X results.

    There is a natural way for the present model to handle these out-

    comes. There is evidence available (e.g., Annau ! le Kamin, 1961) thathigher asymptotes of conditioning result from higher shock intensities.Thus it would not be unrcasonable lO assume that the value associa-ted with a 4-ma. shock is larger than lhat associated with a l-ma, shock.The result of increasing shock intensity when shifting from conditioningof A to conditioning of AX is that the potelllial for conditioning X isenhanced; consequently, the reduction of the blocking is not surprising.Notice that it is the increasing of the value between the two stages ofthe experiment, rather than simply having a larger throughout, thatis critical to this prediflion. A similar kind of reasoning might be appliedto the case of shifting from a single shock following A to a double shock

    following AX"There are aspects of Kamin's data, however, with which the pre-

    sent model does not deal so well. For instance, Kamin found that theinitial trial on which X is presented in conjunction with A generallyyields less suppression than the previous A-alone trial. Furthermore,he provided evidence that most of the conditioning to X which occursin his paradigm results from that first AX trial. The present theoryprovides no statement of performance axioms which might lead us toexpect the introduction of X to interfere with suppression to A unlessit is assumed that X is initially associated with a negative V. Moreproblematic is the fact that while the theory predicts that the first AXtrial will produce more X conditioning than any subsequent compoundtrial, it does not anticipate Kamin's claim that the first AX trial accountsfor all of the conditioning to X. Further analysis of this problem mustawait additional data collection as well as the development of more detailed performance statements for the theory.

    Kamin has also investigated a phenomenon related to the blockingeffect, so-called "overshadowing." If an AX compound is repeatedlyreinforced and A is simply a more salient stimulus, little conditioning

    I It lIIiRht I",1I0led in passing that jusI as Ihe preM'nt 1II0dt'1 P""licls a possibleincn"III('nl in as..uxiative slH'nRth of X wh('n the reinforcement lIIa~nittJ(l(" fur A Xis int'fl 't5l'd uvc'" thai for X, so il pn'cliets l d(cr ...llwnt in associative valut' of X whenIhl' H'infol(C'lIu'nt 1113Jtniuu!t' Jor A X i s dt'Crt'flM'd with rt.~p('ct lO Ihat fur X. In Ihatcas...., " will be Inw...f(d and 3!l'HlillinK that \ '~ h3~ hern made' lo approach the pre-~hifl "l'V AX will he.' AH'aler Ih an r he P O SI -s hi fl X . rr-sultinR in .lccnlIItIUS to holhf t " and X, IfX h"Rills wilh 110associalivt' Slrt 'lIKth, this proct't1nre miKht be ("'clt'd toIHochu;(' a l 'CJluliliol1ed inhibitur. F.xptritlu'nu arc cUlTt'nlly und ...Tway in our labora-

    lorics to illVt'stiKalt' this pos.ihililY,

  • 8/11/2019 Rescorla & Wagner 1972

    9/18

    80 Assumptions, Theories, Models

    to X may result. Even though A is not reinforced more frequently thanX, because it is a more salient stimulus it may overshadow X and inter-fere with developmcnt of associative strength to the latter cue. Kaminhas demonstrated that the degree to which A overshadows X is depen-dent upon the relative intensities of the two stimuli, relatively moreintense stimuli yielding more overshadowing. From the point of viewof the present model, the effect of having A be a more salient stimulus

    is that it will have a larger alpha value. Hence when AX is reinforced,VA will grow rapidly with respect to Vl[. Consequently, the more salientA is with respect to X, the greater proportion of VAX will be due toVAand the more limited the conditioning to X. More precisely, it is ex-pected that when VA and Vx both begin conditioning at zero, that afterany large number of conditioning trials with AX reinforced that Vx

    will be equal to ~ VAX'all.+ ax

    Notice that, nevertheless, the more potent the US used, the highershould be and the more conditioning to AX and hence to X shouldresult, Thus, overshadowing, measured in terms of absolute respondingto X. should be attenuated by employing greater US magnitudes, a find-ing confirmed by Kamin (1968).

    Nonreinforcement of compound .stimuli. The preceding para-graphs indicate that the present model is capable of integrating a con-siderable amount of data on the effects of reinforcement, both fromour own laboratories and those of others. We now turn to an accountof some elementary effects of nonreinforcement. Because of the sym-metry of the model, the arguments for nonreinforcement are analogousto lhose for reinforcement and canbe presented briefb'.

    As pointed out above, if we assume that the value associated witha liS of zero intensity is zero, then the model naturally generates anegatively accelcrated extinction function. Considering the nonreinforccdpresentation of a compound stimulus, the changes in component associa-tive strength should be dependent upon the total V of the ,compound:the larger VAX' the larger should be the decrement expected in both

    A and X as a result of a nonreinforced presentation of the compound.Conse(luently, any operation which enhances VA and hence VAX shouldresult in a larger decrement to X as a result of nonreinforcement ofAX. This prediction is consistent with the findings from the two eye-blink conditioning studies from Wagner's laboratory reported above,in which the number of prior reinforcements of A critically affected thedecrementing of X. In fact, in the Wagner and Saavedra study whereX was presumably introduced with a zero V, greater conditioned in-hibition accrued to X when it was nonreinforced in conjunction withthat stimulus which had been more frequently reinforced in the past. If

    Rescorla & Wagner 81

    we assume that dilferent intensities of the US result in difrerent levels ofconditionin!,\, the conditioned inhihition results of the Res('orla experi-ment also fall inlo place. IfA is followed hy a mort' inlcnse US amI asa consequence VA is larger, then nonreinfof("ement of X in the pre-sence of A should result in greater conditioned inhibition to X.

    There is an additional case of nonreinforccment which is of interestto consider even though no data are currently available, Suppose lhat

    stimulus A were pretreated so as to give it a negative V and that a novelcue, X, were then combined with it and the AX compound nonrein-forced. The value of VAX should then be negative, and assuming that the associated with nonreinforcement is zero, - VAX should be positive,Thus lIonreinfo7Ting X in conjunction with an inhibitor should give Xpositive associative strength. This prediction, however paradoxical,should further emphasize the kind of symmetry inherent in the model'streatment of nonreinforcement and reinforcement.

    A P PL I CA T IO N T O A P R OI LE M I N D I SC R IM I N AT IO N L E AR N IN G

    In the foregoing applications of lhe model, reasonably adequatepredictions from the basic theory might well have been drawn withoutbenefit of any quantitative formulations. The cue of interest was neverpresented in compound with more than a single additional stimulus,and it would generally have been possible and sufficient. in order toaccount for the data dis('ussed, to specify lhat the associative strengthof the latter stimulus had been manipulated to have various ordereddegrees of excitatory or inhihitory value.

    In other instances to which the theory should be applil"able, how-ever, it is not possible to proceed at such a level. The present sectionwill attempt to illustrate one such instance, in which certain quantita-tive assumptions become critical.

    The problem to be considered involves the dillerelllial reinforre-ment of stimulus compounds coutaining a so,called ('Ollllllon ('ue. Sup-pose a compound CS, composed of experimentally isolalable eues A amIX is consistently reinforced, while another compound CS composed of

    cues B and X is consistently nonreinforced. If the several componentsare ade(luately discriminable we should, of course, expe(:t that differen-tial associative strengths will be acquired, with VAX approaching thatasymptote appropriate to the US employed, and VIIX approaching thatasymptote appropriate to nonreinforcement. But what should be theexpected fate of Vx?

    What makes this question especially interesting is that there isreason to believe (e.g., Wagner. Logan, Haberlandt, &: Price, 1968) thata cue occupying the place of X in an AX, BX discrimination will cometo be less responded to alone than might be expected simply on the

  • 8/11/2019 Rescorla & Wagner 1972

    10/18

    82 Assumptions, Theories, Models

    basis of the schedule of reinforcement and nonreinforcement with whichit is associated. And, several theories have been advanced (e.g., Restle,1957; Sutherland, 1964; Mackintosh, 1965) which propose that such acommon, or "irrelevant" cue will become "adapted," "neutralized," or"unattended-to," by virtue of the availability of more valid cues.

    The present theory includes no such special "neutralization" as-sumption, but indicates only that the learning which accrues to Xshould be a function of the trial by trial strength of both AX and BXin relationship to the reinforcing events with which they are separatelyassociated. But, should X then be expected to become neutral? Or, atleast more "neutral" than in various comparison treatments not in-volving a discrimination?

    In order to illustrate certain relatively general features expectedin the course of discrimination learning when AX trials are alternatedwith BX trials, the former followed by reinforcement, and the latter bynonreinforcement, a sample learning run was computed. It should berecalled that the theory specifies that,

    liVA= aA{J1(l- VAX),liVx= ax{Jt (l-VAX)

    when AX is reinforced, and,liVB= aB{J2(2- VBX),liVx= aX{J2(2- VIIX)

    when BX is Ilonreinforced.For purposes of this example, A, B, and X were assumed to begin

    with Vs of zero prior to the first learning trial, and were assumed to beequally salient (aA= aB= ax= 1.0). Reinforcement and nonreinforce-ment were taken to be associated with Sof 1,0 and O respectively, andwith equal rate parameters (l= (J 2 = .05). Any set of parameter as-sumptions identifies a special case, but this set was intended to havesome semblance to certain experimental arrangements (e.g., the initialVs of zero might be relatively typical of untrained cues) and to avoidany assumptions that would beg justification (e.g., unequal as or (Js).

    The left panel of Figure 6 depicts the mean VAXand mean VIIXcomputed over successive blocks of 4 trials. The right panel of Figure 6depicts the corresponding V values of the separate components. Severalfeatures of the plotted functions should be noted. With all Vs beginningat zero, VAXand VBXboth increase over the early trials. Although therate parameters {JI and {J 2 associated with reinforcement and nonreinforcement were set equal, l- VAXis initially large, whereas 2- VBXis concurrently small. As a consequence Vx as well as VAincrease rapidlyin relationship to a slower decline in VB.Only when the absolute valueof (JI(I - VAX) becomes equal 10 (J 2 (2 - VBX)does Vx cease to grow

    Rescorla & Wagner

    1.00

    ,75

    .50

    V .25

    ~OL

    -.25

    83

    4 8- . 5 0 ' , I I I

    OBLOCKS Of fOUR TRIALS

    8 12 16

    Fi"u .. 6. Mean V. computed from a sample learning run in which AX reinforced

    trials were alternaled with BX nonreinforced Irial . The left panel depicts Ihecompound theorelieal values, the right panel the corre.ponding componenl

    values.

    and does the rate of decrease in VII equal the rate of increase in VA' Astrials continue, VAXand VBXnecessarily approach 1 and 2, so that inthis example, the terminal value of VA is sufficiently positive so thaiVA+ Vx= I.and the value of VIIis sufficiently negative that VII+VX= O.

    Itis unnecessary for present purposes 10completely detail the wayin which this picture changes as all the values of the model are manipu,

    lated. For example, whereas VAXand VIIXwill always approach I and2, the terminal values of VA' Vn, and Vx in relationship to I and 2will depend appreciably upon their initial starting values. But, for theconclusions we wish to draw, we can restrict ourselves without dangerto instances like that exemplified, in which all Vs begin at zero,

    It is worth noting, however, the effects of variations in a and {J .For instance, if {JI is made larger than {J 2 ' the initial positive course ofVox, associated with the nonreinforced compound, would attain a higherabsolute level and be more protracted. Similarly, Vx would initially riseto a higher value and then fall. However, for all values of {JI and {J 2

  • 8/11/2019 Rescorla & Wagner 1972

    11/18

    84 Assumptions, Theories, Models

    greater than zero, the asymptotic levels of VA' Vx' and VII as well asVAXand VIIXarc indepeJl(lclll of the s.

    Variatiun in lhc re1alive salienccs 0 1 " i\, B, ami X h as a mo rcmarked effect. For example, if ax is made larger relative to aA and all,the formation of the discrimination in VAXand VIlXis not only slowed,but the terminal values of VAo VII' and Vx are modified. It can be shown

    that under the conditions, A I

    =I,

    and A

    2 =0,

    and where aA= all' theaxasymptotic value of Vx is e'I'"'' lU _

    aA+ 2axThe picture of discrimination learning in VAXand VIIX shown in

    Figure fi rcsembles many empirical functions. Especially satisfying islhe initial rise predicted in the strength of the nonreinforced compound,a phenomenon which is frequently observed (e.g., Wagner, 1968). It isevident. however, that the theory does not generally predict that theassociative slrength of a common cue such as X will attain a value ofzero. Rather as indicated ahove, X should attain some associative strengthdepending upon its relative salience in comparison to that of the dis-niminative cues. Itseems, in fact. that an important expectation fromthe model for the case described is that X will have some positive value,such that discriminative cues, in the nonreinforced compound must be

    come i"hiIJi/lny in order for the strength of the latter compound toapproach zero.

    Still, it may he more informative to ask whelher X should be more"neutral" as a result of being imbedded in such a discrimination than asa result of other treatments involving the same associated schedule ofreinforcement.

    We llIay simply declare, that the strength of X should, accordingto the model, generally be less following discrimination training thanfollowing an idelllical partial reinforcement schedule of X in isolation.BUI, grallled the discussion thus far, this is a rather uninteresting comparison, since it can largely be viewed in terms of the overshadowingthat occurs when a cue is trained in compound with other cues, as compared to being trained in isolation.

    A more interesting empirical comparison was provided hy Wagner,Logan, Haberland!, and Price (1968). In hath CER and eyelid conditioning, these investigators compared responding toan isolatable common cue ocCtlpying the place of X in an AX, BX discrimination, withthe responding to a similar cue, experienced in a "pseudodiscrimination"treatment in whi('h AX and BX were both partially reinforced on a50% schedule. Although X in the two treatments was associated duringtraining Wilh the same reinforcement schedule, and in compound withthe same cues, it was much more responded to when tested alone follow-ing pseudodisuimination as compared to discrimination training.

    Rescorla & Wagner 85

    We have already specified the asymptotic value of Vx to be ex-pected in an AX, BX discrimination in which eadl compound is ex-perienced on half of the trials. It will thus he usclnl lo ("()Jnpare lhisexpected value with the comparable value expected following a pseudo-discrimination procedure in which half of each of the AX and BX trialsare reinforced and half nonreinforced. in the manner of Wagner, et al.

    (1968).It can be shown that according to the model, the asymptotic value

    approached by a partially reinforced compound should be equal to

    'lT\At - ('IT-I) 2A2

    'lT I - ( ' I T - I)Jh

    where 'IT is the proportion of reinforced trials. Adopting, as we havepreviously, the assumption that l = 1.0 and A2=0, then in the case of

    a50% reinforcement schedule this becomes __ _ I_I +~

    This quantity thus expresses the theoretical asymptotic value of VAXandVRX in the pseudodiscrimination case under consideration.

    Of course VAX= VA +Vx ami VIIX= VII + Vx and it can be demonstrated that in the instance in which X is presented on all trials, and A

    and B each on half the trials that.

    2ax 2ax

    Thus the asymptotic Vx attained under the pseudodiscrimination

    procedure should be:

    An appreciation of the relative Vx expected in the discriminationand pseudodiscrimination treatments is most evident if we now expressthe asymptotic Vx in the pseudodiscrimination condition, minus the

    asymptotic Vx in the discrimination condition, which difference is:

    ~ [ 2 (_I )- I ]I + 2

    Wagner (1969b) has noted that the present form of the theory canaccount for the greater responding to X alone under a pseudodiscrimi-nation as compared to a discrimination treatment, only if one is willingto specify certain quantitative differences in the effects of reinforced andnonreinforced trials. The mathematical expression above makes thisevident in terms of the current model. Only when the rate parameter

  • 8/11/2019 Rescorla & Wagner 1972

    12/18

    86 Assumptions, Theories, Models

    associated with reinforcement (.) is greater than the rate parameterassociated with nonreinforcement (2) will the quantity expressed bepositive, i.e.. will Vx be greater in the pseudodiscrimination than in thediscrimination treatment. The above expression also indicates that themagnitude of this effect will depend upon the cue saliences. so that anydifference between the two conditions will be augmented as ax ap-proaches \.0 and aA becomes small.

    The robustness of the effect demonstrated by Wagner, Logan, Haberlandt, and Price, would suggest that l is considerably larger than 2in the situations which they employed. How adequate this assumptionwill otherwise turn out to be remains to be determined. It is worthnoting, however, that a similar assumption has frequently been deemednecessary in the application of related models to other data areas (e.g.,Bush S c Mosteller, 1955; Lovejoy, 1966). The advantage of the presentanalysis is that it makes dear the kinds of additional quantitative as-sumptions which are necessary in order to account for the differencesin strength of the common cue in these different treatments.2

    APPLICATION TO BACKGROUND STIMULI AS COMPONENTS

    Although the present model is stated in terms of increments anddecrements in the associative strength of component stimuli as a result

    2 Bet:ause of the summalion assumption, il may appear that the presenl model willbe inappropriate for certain cases of discrimination learning. For in.tance, there is someevidrnce thai organisms are capable of learning to respond lo a compound stimuluswhile wilhholding their rrsponse to its components. and vice versa (Woodbury, 194~),Such discriminations seem lo call for an appeal to some special characlcristic of lhecompound nol present in its components. However, Mr, Donald Rightmer has sug-gested to us a way of conceptualizing such discriminations without recoune to"configuring." Consider a compound composed of two componenl slimuli. Thesecomponents, although readily discriminable from each other, wilt nevertheless con,tain some COllllllon properties. To indicate Ihis, we may dncribe the components asAX amI BX and lhc compound formed from lheir combinations as ABX, Ifwe applythe model lO a case ill which this compound is consistenlly reinforced and lhesecomponenls lion reinforced. it correclly predicts learning of that discrimination. Inlhis case VAliX approaches At while VAX and VIIX both approach A2 as a result ofVA and V

    bolh approaching A.-A~. while Vx approaches 2 A2-At. Much of lhe

    hurden of lhe learning resls Wilh the common pans of the component stimuli and

    con"-'qu"ntly lhe rate of learning wilt depend upon the assumed salience of X. Asimilar result occurs for the case of reinforcement of AX and BX and the nonrein

    forcement of ABX. We mention these examples only to indicale lhat al leasl onekind of evidence commonly ciled in criticism of summation notions is compatablewith the preseIlt model.

    A common approach to discrimination learning (e.g., Estes S c Burke, 195~) is toassllme. howewr. not only that any pair of CSs can be lheorelically conceptualizedas being composed of unique and common ClIcS. bUI lhat the discriminability of theC's d"pI'nds only upon lhe relalive weights of the IWO sets of elements. While thislauer manner of allalysis mighl appear especially congenial 10 lhe present lheory,it pn'SelllS suicent difficulties thai we would prefer not to commil ourselves to lhislUore: Kt:'IIC'ralstrateg)'. A consideration of the alternatives would, unfortunately, take

    us beyond lhe scope of the present paper.

    lI

    .;

    I(

    I

    J .

    I -

    Rescorla & Wagner 87

    of reinforcement and nonreinforcement of compounds, it also has im,plications for situations not obviously involving compound stimuli, Oneinteresting such application is to data recently collected by Rescorla(1969) pointing to the importance of CS-lJS correlalions in Pavlovianfear condilioning. Consider a situalion in whic:h an animal re('eivesbrief, intense electric shocks randomly distributed in lime. Supposefurther, that tonal stimuli are presented irregularly without regard to

    the occurrence of the shocks, i.e., in such a way that shocks may ocnlrin both the presence and absence of the CS, and there is no correlationbetween the CS and shock. The queslion of interest is to what degreethe tones will acquire associative strength.

    A typical experiment asking this question employed a CER pro,cedure with rats (Rescorla. 1968, Experiment 1). Three groups of animalsreceived tone CSs and shock USs. Group I received the tones and shocksin random relation to each other, with shocks oc('urring both in thepresence and absense of the CS. Group Il received the identical treatment except that all shocks programmed to occur in the absence of theCS were omitted. Notice that these two groups received the same numberof shocks during the CS; they differed only in that the first gronp alsoreceived shocks at other times. Finally, a third group re('eived the samereduced number of shocks as Group IJ, but those shocks were distrib,uted randomly in time, in the manner of Group I.When these stimuliwere subsequently presented while the rats bar'pressed for food reward,only Group II showed fear of the CS. The two groups for which the CSand shock were independent showed no measurable fear conditioning.

    This result suggests that the correlation of the CS and US, in addition to the number of reinforced CS presentations, is an importantdeterminant of fear conditioning. One way 10 describe this correlationis in terms of the probability of occurrence of the US in the presenceand absence of the CS. When the two events are positively correlated,then the probability of shock is higher during the CS lhan in its absence;when they are uncorrelated, lhen those probabilities are equal. Further-more, this description suggests a third case: when the prohability of theUS is higher in the absence of the CS than in its presenc'e, the two

    events are negatively correlated. In a series of experiments, Res('orla(1969) has accumulated evidence that these relative probabilities areimportant in determining the amount of conditioning oblained, According to that evidence, when the probability of shock is higher during theCS than in its absem:e, the CS becomes a conditioned elicitor of fear;when the CS signals a period which is relatively free from otherwiseprobable shocks. it becomes a conditioned inhibitor of fear. Finally,when the probabilities of shock are equal in the presenc'e or absence ofthe CS, little or no conditioning of either sort occurs.

    Somehow the organism appears to evaluate the probability with

  • 8/11/2019 Rescorla & Wagner 1972

    13/18

    Rescorla & WagnerI

    1.0

    .8

    .6

    .4

    .2

    >

    -.2

    -.4

    -. 6

    -. 8

    -1.0TRIALS

    88 Assumptions, Theories, Models

    which shocks occur both in the presence and in the absence of the CS,and il is the relation between these two probabilities which determinesthe amount of fear conditioning observed to the CS. The organism isapparently hehaving as a relatively complex probability comparitor.What we wish to suggest here is that the preselll model may provideone way of understanding how lhe animal can be sensitive to such subtlerelations via a relatively simple process.

    The important point to notice for this analysis is that the CS occursagainst a background of uncontrolled stimuli. To speak of shocksoccurring in the absence of the CS is to say that they occur in the pre-sence of situational stimuli arising from the experimental environment.Although these stimuli are not explicitly manipulated by the experi-menter, they nevertheless can be expected 10influence the animal. Thus,one way to think about the occurrence of the CS is as an event trans-forming the background stimulus, A, into background-plus-CS, AX. Thepresent model, of course, has been designed to account for the condi-tioning of X when it appears in such a compound, as a function of thetreatment of A elsewhere.

    In order to exemplify the application of the model to this particu-lar case, lhe experimental session was taken to be divisible into timesegments the length of the CS duration. Each segment containing the

    CS is thus treated as an AX "trial" and each segment not containingthe CS as an A "trial." It is possible then to specify the sequence of re-inforcement and nonreinforcement over each of the two kinds of trials.

    Sample learning TUnswere computed from the model with sched-ules of background alone (A) and background-plus-CS (AX) as mightbe the case in experiments such as those of Rescorla (1968). Figure 7shows the results of one such application of the model. This figuredescribes the V value of the CS over trials, as a function of differentshock probabilities in the presence and absence of the CS. The firstdigit labeling each curve indicates shock probability during the CS;the second the probability in the absence of the CS. The particularparameter selections used in arriving at the functions plotted were asfollows: The CS was assumed to be present 1/5 of the time according toan irregular sequence and to have a salience 5 times that of the back-ground (a A = .1, ax = .5); the values associated with reinforcementand nonreinforcement were taken to be 1 and O,respectively, while therate parameter associated with reinforcement was set at twice that as-sociated with nonreinforccment (t= .J, 2 = .05).

    The asymptotic values of the functions represented in Figure 7 arein general agreement with Rescorla's data in that they are clearly orderedby the relative probability of shock in the presence and absence of theCS. In addition. positive V values are associated with positive correla-tions between the CS and shock; negative V values are associated withnegative correlations. Furthermore, the magnitude of the correlation

    89

    ~ .8-0

    - - - - - - -------- .8-.2

    .8-.4

    .8-.8

    .4-.8

    .2-.8

    0-.8

    FiIlU'" 7. Predicted .trenllth of a.sociation of X as a function of different USprobabilitie. in the pre.ence and absence ofX . The first number next to eachcurve indicate. the probability of the US during the CS; the second, the prob-ability in the absence of theCS .

    may be seen to be important, with stronger correlations ~enerating Vsmore removed from zero. Finally, the asymptotir V of a CS \IIHorrelatedwith the occurrence of shock is zero.

    In fact, it is possible to arrive at a relatively simple expressiondescribing the asymptote of comlitioning (Vx) for these various treat-ments. The equation for the asymplotic value of VA is:

    VA = 'T rA .

    Similarly, the equation for the asymptotic value of VAX is:

    'TrAX.

    1 T ,\ x . - (1-1TAx) "VAX =

  • 8/11/2019 Rescorla & Wagner 1972

    14/18

    Random control procedure. It is of interest to consider further thecase in which shock probability is equal in the presence and absence ofthe CS. This is the "truly random" control treatment suggested by Res-corIa (1967) as a procedure against which to evaluate the effects of CS-UScontingencies. As noted above, asymptotically the predictions from themodel agree with Rescorla's findings of little conditioning using such aprocedure. However, the model suggests that early in conditioning such

    a CS may in fact show an initial rise in V, followed by a return to thezero point.

    The basis for this initial rise in V lies in the rates of conditioningof A and AX. Early in conditioning, reinforcement of the AX compoundoccurs while VAand hence VAXis relatively low. C'..onsequently, Vx canreceive a sizable increment. As VA increases due to shocks in the absenceof the CS, VAXapproaches its asymptotic value and the increments in Xcorrespondingly decrease in size. Furthermore, when V A approaches itsasymptote, VAXwill begin to exceed its asymptote due to the contribu-tion of Vx. When this occurs, VAXwill be decremented. What follows is

    Rescorla& Wagner 91

    a period in which VAis incremented from shocks in the absence of theCS and VAXis decremented because is exceeds its asymptote; effectivelythis produces a redistribution of the associative strength of AX amongthe components A and X. The process stops when VA and VAXare equal,i.e., Vx is zero.

    This account makes dear that the degree of initial conditioning ofthe CS predicted by the model in the random produre is a function of

    the relative conditioning rates of A and AX. There are a variety ofexperimental manipulations and parameters of the model which conse-quently should inuence the magnitude and duration of this rise. Twoexperimental conditions are especially important. The first is the overallprobability of shock in both the presence and absence of the CS. As theoverall shock probability increases, the magnitude of the initial condi-tioning to X is greater, and the approach to the final zero asymptote isslower. Itis interesting to note, however, that the stage of training atwhich the maximum Vx is reached remains the same. A second inRuen-tial manipulation is the proportion of the total session during which theCS is present. In Figure 7, the CS was assumed to be present 1 /5 of lhetime, as it was in fact in Rescorla's experiments. However, il can beshown that as the proportion of the session during which lhe CS is pre-sent is increased, the magnitude of the initial rise in V of the CS is in-creased; furthermore, the peak magnitude occurs earlier in conditioning.and the attainment of the final asymptote is retarded.

    In addition to these experimental manipulations, two parametersof the model are important in determining the magnitude of the initialconditioning of X. As might be expected, one of these is the relativestimulus salience assumed for the background and the CS. Accordingto the model, as the relative salience of the CS increases, the initial posi-tive value taken on by a random CS is enlarged and its duration pro-longed. Finally, the assumed relative importance of reinforcement andnonreinforcement is also relevant. As the rate parameters associated withreinforcement is assumed to be progressively larger than the rate param-eter associated with nonreinforcemem, the magnitude of this initialrise would be expected to increase.

    Since most of the experiments employing this procedure have onlyassessed conditioning to the CS after extended training, there is rela-tively little direct evidence bearing on the details of these predictionsfrom the model. However, because the manipulations which are pre-dicted to affect the magnitude of the initial rise are also predicted toprolong its presence, some studies employing extended training mightyet reRect the effects in question. One interesting example is a recentdissertation carried out at McMaster by Kremer. Kremer (1968) reportednonzero terminal levels of fear following a random procedure. l-lisconditions were similar to those of Rescorla except that he used a more

    90 Assumptions, Theories, Models

    Thus, one may arrive at the asymptotic value of Vx as:

    Vx = V A X - V Altmay be seen that V x will depend only upon the probability of

    reinforcement in the presence of X (lTAX) and in the absence of X (lTA)'as well as upon the rate parameters associated with reinforcement and

    nonreinforcement. Since the lauer parameters are constants in the twoequations forV AX andV Ait should be evident howV x will vary with therelative probabilities of reinforcement. When 1T.u is greater than lTA'V x will have a positive value, as lI'AX becomes equal to lI'A' V x willapproach zero, and finally when lI'A is greater than lTAX'V x will becomenegative.

    Although we will discuss below the effects of other variables uponVx prior to asymptote, none of these influence the final product oflearning. In particular it is worth pointing out that in this instancethe initial Vs, at the begining of any of the probability treatments.leave no permanent effect; whatever the starting VA and Vx' a giventreatment will eventually yield asymptote values, as specified, appro-priate to that treatment. For example. should Vx first be incrementedto some high value prior to a .8-.8 "extinction" treatment, the asymp-

    totic value of X would still be zero according to the mel.Whatever the overall shock probability, if AX and A are reinforced

    with equal probability, the value of X will approach zero. Notice, how-ever, that this zero level of conditioning for X occurs against differentlevels of conditioning to A which is dependent upon the overall shock

    probability.

  • 8/11/2019 Rescorla & Wagner 1972

    15/18

    92 Assumptions, Theories, Models Rescorla & Wagner 93

    salient CS (white noise vs. 720 Hz tone) and the CS was present a largerproportion of the session (1/3 or 1/2 vs. 1/5). It should be noted thatthese modifications of Rescorla's procedure are ones which according tothe present model should enhance and prolong the initial positiveconditioning of the CS. Furthermore, Kremer found that more frequentalternation between CS and non-CS periods produced more fear con-ditioning in the random procedure. It seems likely that more frequent

    alternation of these periods would also enhance CS salience, in whichcase the model again predicts a prolongation of positive phase of theCS. Consequently, if we assume that Kremer's data were collected priorto asymptotic levels of conditioning. his results would generally fit withthe model.

    Some comment should be made about the consequences of thesepredictions for the suitability of the truly random procedure as a "con-trol" treatment in Pavlovian conditioning. This procedure was designedas a control for a particular operation, namely the establishment of acontingency between CS and US. On the assumption that contingenciesare important in conditioning, this procedure gives a baseline of nocorrelation with which to make comparisons. This application of theprocedure remains indifferent to the assignment of any particular theo-retical learning value to the CS. What the present account attempts is

    a theoretical understanding of the results of arranging various degreesof correlation between a CS and US. And according to that account,although 'conditioning results are ordered by degree of correlation atevery stage of learning and although asymptotically the random pro-cedure does attain an associative value of zero, nevertheless there arestages at which it has positive value. But other theoretical accounts ofthis procedure are also possible and all would leave equally unaffectedthe status of the random treatment as a procedural control.

    X. The maKnitllde of this overshootinK in \Ix will depend upon lherelative rates of conditioning of A and AX. Thus. the same parameterswill affect the rise in the case of positive wrrelations as were importantin the case of no correlation.

    Negative CS-US correlations. One of the more interesting resultsfrom experiments exploring correlations between CSs and USs is lhe

    finding that negative correlations lead to CSs which are conditioned inhibitors. Furthermore, the magnitude of the conditioned inhibition isa function of the degree of negative correlation (Rescor1a, 1969). Thepresent model is in general agreement with lhese findings; however,conditions which asymptotically generate negative Vs may. accordingto the model, initially generate positive Vs. For instance, the .4-.8condition in Figure 7, although asymptoting at a level of -.33, attainsconsiderable positive associative strength early in conditioning. Again,this initial rise is controlled by parameters affecting relative rates ofconditioning for A and AX; it is only when VA is sufficiently large thatVAX exceeds its asymptote that X can begin to acquire negative associa-tive value. Thus prior to the setting up of X as a conditioned inhibitor,A must first be established as a conditioned excitor. Furthermore, noticethat if a negatively correlated treatment is terminated early in comli-

    tioning, conditioned excitation may be observed despite the fact thatthe negative correlation was in force from the outset.

    In summary, the present model seems consistent with the majorasymptotic results of arranging various correlations hetween a CS andUS. Furthermore, it specifies the set of experimental manipnlationswhich might be expected to inuence these results. In addition, it makesa number of interesting predictions abont preasymptotic consequencesof arranging correlations between CSsand USs.

    Positive CS-US correlations. From Figure 7 it is dear that through-out learning the degree of excitatory conditioning varies with the mag-nitude of the correlation between the CS and US; furthermore, positivecorrelations always yield positive Vs. However. the learning curves

    predicted from this account of positive correlated situations differ fromtypical learning curves in that they are not all monotonic. When shocksare delivered both in the presence and absence of the CS but with ahigher probability during the CS, the V associated with the CS maysometimes attain a value early in conditioning which exceeds its finalasymptotic value. The reason for this is similar to that for the initialrise in the random treatment; initially AX may approach its asymptotemore rapidly than A approaches its final level. Consequently, after AXhas ceased togrow, A continues to increase and during AX trials thecompound V is redistributed among the components at the expense of

    RELAliON 10 AnENTlONAL THEORY

    The model we have presented was designed to account for instancesin which identical stimuli, although associated wilh equal reinforcement

    schedules. nevertheless acquire different associative strengths, as a resultof the stimulus context in which they are imbedded. The correspondencebetween the model and data, as described in the preceding sections,would appear to offer encouragement to the line of theorizing whichhas been developed. There is, however, another plausible, more conven-tional theoretical approach to this same general problem, and one whichhas otherwise received some measure of support (e.g., Mackintosh, 1965).It thus becomes pertinent lo ask what relative advantage, if any, is en

    joyed by the present theory.The alternative which must be considered is an "attemional" or

  • 8/11/2019 Rescorla & Wagner 1972

    16/18

    94 Assumptions, Theories, Models Rescorla & Wagner 95

    "stimulus selection" interpretation (e.g., Sutherland, 1964). The familiarnotion is that an organism can learn only about those cues to which it isattending. and tl':lt it has a limited attentional capacity. Attending toone cue is thus presumed to decrease the likelihood of attending to, andhence learning about, other available cues. Such theory has no apparentdifficulty, for example, in accounting for the fact that the reinforcementof one stimulus element will have less incremental effects upon the

    learning to that element, if there is concurrently present a stronger (bet-ter attended-to) cue.

    The kind of "in principle" arguments in favor of attentional theoryare very seductive. Certainly the organism does not have an unlimitedcapacity to process sensory information. Thus, it must be expected thatunder some circumstances an environmental stimulus'will not be reactedto, or may be less reacted 10. as a result of the processing of concurrentstimuli. We would hardly quarrel with this. The proper question, how-ever, is whether the organism:s capacity is so limited that it is necessaryto assume that several highly distinctive stimuli, as employed in com-pound-stimulus Pavlovian training, cannot generally be simultaneouslyattended to. And even if it were advisable to make such assumptions,would an attentianal theory still account for the range of data with

    which we are presently concerned?In many instances an attentional theory appears quite adequate. Ifwe pretrain associative strength to a stimulus (A) and then reinforcethe same stimulus in conjunction with a novel stimulus (X), the resultantassociative strength of X will be reduced compared with that of a groupnot pretrained on A. Itseems natural to assume that pretraining on Aleads the animal to attend to A to the deteriment of X, and thus toshow little conditioning to X. A similar account seems applicable tothe failure of a common cue to acquire considerable associative strengthin a discriminative conditioning situation. The animal may be assumed10 attend 10 the dimension defining the primary discriminanda andthereby fail to attend to stimuli less well correlated with the US. Andthe same reasoning may appear to apply 10 the failure to observe sub-stantial evidence of learning in the "truly random" conditioning pro-

    cedure. The CS is no more informative than are the contextual cuesconcerning the occurrence of the US, is thus not especially attended to,and is not learned about.

    A major difficulty with this approach is that we are not providedwith a specification of the trial-by-trial events which control attentionin Pavlovian conditioning. It is not sufficient to argue that stimuli un-correlated with reinforcement are not attended to; one needs to knowhow the trial-by-trial events which compose the uncorrelated treatmentare processed by the animal so as to generate failure to attend to the

    CS. In the absence of such mechanisms the attentional account is lillIemore than a redescription of the data.

    But, to our view, the most significant fact is that while there areobvious symmetries in the results we have discussed, the attentionalaccount seems only to apply to portions of the data. For inslance, justas prior reinforcement of A will reduce the amount learned about X onsubsequent reinforced AX trials, so prior inhibitory training of A will

    augment the amount learned about X on reinforced AX trials. Itis notclear what modification in allenIion to A could be produced by inhibi-tory training which would enhance the amount learned about X on AXtrials. It does not seem plausible to argue that inhibitory training of Amakes the animal especially fail to attend 10 that cue since Rescorla(1969) has shown that such training gives A deeremental control overresponding. In addition, an attentional account of the effects of priorreinforcement of A upon the subsequent nonreinforcement of AX doesnot seem satisfactory. Why should training an animal to allend to Amake nonreinforced AX trials especially potent in conditioning inhibition to X?

    Even in the case of certain phenomena which attentional theory hasbeen thought to handle well, the apparent adequacy of the theory maybe illusionary. Consider Kamin's (I96R) blocking experiment in which

    prior conditioning to A makes the subsequent reinforcement of AXpractically ineffective in conditioning X. According 10auentianal theory,pretreatment of A causes the animal 10attend to A on AX trials, so thatall of the reinforcement effects that occur go to A. However, since A isalready well conditioned, the influence of this reinforcement is difficultto detect. According to the present model. the prior conditioning of Aresults in an AX with a high associative value which in turn devaluesthe reinforcer; consequently, neither A nor X should receive additionalconditioning.

    An unpublished experiment from Rescorla's laboratory was de-signed to evaluate these alternative interpretations. The strategy wasto produce "blocking" using a high VAX arrived at through a low levelof pretraining to both A and X rather than considerable training of A

    alone. According to the present model, any way of arriving at the sameVAX should interfere equally with the effectiveness of the reinforcer andpreclude further conditioning toeither stimulus on the AX trials. A('-cording 10an attentional notion, however, the reinforcer remains effec-tive. Since both A and X separately. should have low associative values,any selection of one of the stimuli to which to attend should result infurther conditioning of that stimulus. Conse

  • 8/11/2019 Rescorla & Wagner 1972

    17/18

    96 Assumptions, Theories, Models

    reinforcement. They then all received 6 conditioning trials while bar-pressing. with a 2-min. CS ami a 0.5-sec., loma. shock; on three lrials theCS was a flashings light, on three a 1200 hz tone. This training resultedin less than asymptotic suppression to both CSs, such that subsequentTL compound trials could he shown to produce more complete suppres-sion. Group TL tl~en received 10 conditioning trials on which the tone-light compound terminated in shock. Groups T and L each received 10

    reinforced trials with the tone or light alone, respectively. Group Nreceived no further conditioning with either stimulus. All animals werethen tested with 2 nonreinforced presentations of each component stimu-lus over each of 5 test days. Finally, each animal received litest sessionsduring which the TL compound was presented 4 times.

    Figure 8 shows the mean suppression ratios for the light and toneseparately for each of the 4 groups over the initial 5 test days. Lookingfirst at the suppression observed to the tone CS, as represented in theleft panel of Figure 8, it is clear that further conditioning to the tonealone (Group T) resulted in more suppression than did failure to giveadditional training (Group N). Group L, which had received onlyfurther conditioning to the light, showed suppression to the tone similarto that of Group N. The most interesting result, however, was that thesuppression in Croup TL was not different from that of Croup N, but

    was considerably less than that of Croup T. Reinforcing the tone in thepresence of the light evidently produced no additional acquisition offear to the lone.

    The suppression observed to the light CS, as shown in the rightpanel of Figure 8, indicates that the findings for the tone in Group TLwere not due simply to all animals attending to the light during com-pound training. The overall suppression to the light was greater thanthat to the tone, but the pattern of results was similar. Croups N, T,and TL did not differ in responding to the light, but all suppressedless than did Croup L.Thus, additional training to the light alone, butnot to the light in compound with the tone, yielded further condition-ing to the light.

    The results of the subsequent compound test trials were consonant

    with the ahove results obtained with the components. Over the threedays of testing the TL compound, Groups TL and N gave a mean sup-pression ratio of .23 and .28; the combined T and L groups gave a ratioof .13. Thus, additional training to either T or L alone yielded moresubsequent suppression to the TL compound than did additional train-ing to the compound itself.

    These data clearly demonstrate that by giving a small amount ofprior conditioning to each of two stimuli, it is possible to interfereseverely with further conditioning to both stimuli when their compoundpresentation is reinforced. This finding is difficult to reconcile with an

    III

    ~Io10

    ort)

    o

    OI.lV~ NOISS3~ddns NV3W

    97

  • 8/11/2019 Rescorla & Wagner 1972

    18/18

    98 Assumptions, Theories, Model. Re.carla & Wagner 99

    C O N C LU D IN G C O M M E N TS

    Konorski. J. Conditioned refleus and neuron organization. Cambridge: TheUniversity Press. 1948.

    Kremer. E. Pavlovian conditioning and lhe random control procedure. Unpublished docloral dissertalion. McMaster University, 1968.

    Lovejoy, E. P. An analysis of the overlearning reversal effect. PsychologicalReview, 1966, 7J, 87-101I.

    Mackintosh, N.J . Selective attelliion ill animal diseriminalion learning. Psy-chological Bulletin, 1965. 64, 124-150.

    Pavlov. I. P. Conditioned reflexes. London: Oxford UniversilY Press, 1927.Rescorla. R. A. Pavlovian conditioning and ilS proper connol procedures.Psychological Review, 1967, 74,71-80.

    Rescorla, R. A. ProbabililY of shock in the presence and absence of CS in fearcondilioning. Journal of Comparative and Physiological Psychology, 1968.66,1-5.

    Rescorla, R. A. Conditioned inhibition of fear. In W. K. Honig and N. J.Mackintosh (Eds.), Fundamental i"ues in associative learning. Halifax:Dalhousie University Press, 1969.

    Rescorla, R. A., ! t e LoLordo, V. M. Inhibition of avoidance behavior. Journalof Comparative and Physiological Psychology, 1965.59,406--412.

    Restle. F. A theory of discrimination learning. Psychological Review, 1955. 62,11-19.

    SUlherland, N. S. Visual discrimination in animals. British Medical Bul/etin,1964,20,54-59.

    Wagner, A. R. Slimulus validity and slimulus seleclion. In W. K. Honig andN.J. Mackinl05h (Eds.).Fundamental i"ues in associative learning. Halifax:Dalhousie Universily Press, 1969, (a).

    Wagner, A. R. Stimulusselection and a "modified continuity lheory." In G. H.Bower and J . T. Spence (Eds.), The psychology of learning and motivation.Vol. li. New York: Academic Press, 1969, (b).

    Wagner. A. R. Incidenlal stimuli and discrimination learning. In G. Gilbertand N. S. Sutherland (Eds.), DiJcrimination learning. London: AcademicPress, 1968.

    Wagner. A. R., Logan, F. A.. Haberlandt, K.,I le Price. T. Slimulus selection inanimal discrimination learning. Journal of Experimental Psychology, 1968,76,171-180.

    Woodbury, C. B. The learning of stimulus patterns by dogs. Journal of Comparative Psychology, 1943, H, 29-40.

    attentional theory, but is an obvious deduction from the present model.Whatever the other virtues or liabilities of auentional theory, it

    simply does not fare well in relationship tothe present model when ap-plied to the range of Pavlovian conditioning arrangements under eon-

    sideration.

    We have attempted to point out some general principles governing

    the effectiveness of reinforcement and nonreinforcement in Pavlovian

    conditioning situations. Experiments from our own laboratories indicate

    that the incremental or decremental effects upon a component stimulus

    as a result of the reinforcement or nonreinforcement of a stimulus com

    pound containing that component, depend upon the total associative

    strength of the compound, not simply upon the associative strength of

    the component. This general dependence incorporated within a morequantitative formulation of our earlier theoretical position (e.g., Wag-

    ner, 1969a, 1969b; Rescorla, 1969) provides a way of integrating a size-

    able number of empirical findings. Several sample derivations made fromthe theory have been demonstrated to match well with available data.

    But, the greater value of the more specific theoretical formulation which

    has been proposed may be, as we have seen in several instances, in theidentification of additional variables of importance to Pavlovian condi

    tioning. It at least invites a fresh look at a data area in which the avail-

    able theoretical alternatives have been meager.

    Reference.

    Annau, Z., ! t e Kamin, L. J. The conditioned emotional response as a function ofintensity of the US. Journal of Comparative and Physiological Psychology,

    1961,54,428-H2.Bush, R. R., I le Mosleller, F. Stochastic models for learning. New York: Wiley,

    1955.Egger, D. M., I le Miller, N. E. Secondary reinforcement in rais as a function of

    information value and reliability of lhe stimulus. Journal of Experimental

    Psychology, 1962.64,97-104.Estes, W. K. I le Burke, C. J. A lheory of slimulus variability in learning. Psycho-

    logical Review, I9511.60,276--286.Hull. C. L. Principles of behavior. New York.: Appleton-Century-Crofts, 19411.Kamin. L. J. Attention-like processes in classical conditioning. In M. R. Jones

    (Ed.), Miami Symposium on the prediction of behavior: Aversive stimula-tion. Miami: Universily of Miami Press, 1968.

    Kamin, L.J. Predictabilily, surprise, attenlion, and condilioning. In R. Churchand B. Campbell (Eds.). Punishment and aversive behavior. New York.:

    Appleton-CenturyCrofts. 1969.