An Associative Analysis of Intrumental Learning

Embed Size (px)

Citation preview

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    1/16

    AnimalLearning Behavior1995,23 (2), 218-233

    RUTH M.COLWILL and BETHA. DELAMATERBrown University Providence Rhode IslandThree different techniques were employed to analyze the associative structures mediating perfor-mance on an instrumental biconditional discrimination, In all three experiments, rats were trainedconcurrently on two tasks in which different stimuli signaled which one of two responses would befollowed by reward, In each task, one responsewas rewarded in one stimulus and the other responsewas rewarded inthe other stimulus. Correct responses earned pellets in one task and sucrose in theother task. The transfer procedure was used in Experiment lA to identify whether or not an associ-ation developed between a biconditional discriminative stimulus and its instrumental outcome. Ev-idence was obtained that a biconditional cue elevated preferentially a new response trainedwith thesame outcome. Experiments IBand 3 examined the potential contributionof this stimulus-outcomeassociation to biconditional performance by training the biconditional cues as signals 8-s for the

    nonreinforcement ofa different response. There was no evidence that this operation interfered withthe ability of a biconditional cue to control performance of its correct response. In Experiments 1Band 2, the value of the instrumental outcome was reduced in an attempt to assess the contributionof stimulus-response associations to performance on the biconditional discrimination. The resultsofExperiments IB and 2 reveal that correct responses were depressed following devaluation of theoutcome used to train them, suggesting that learning about the response-outcome relation occurs.The implications of these results for binary and hierarchical models of instrumental learning arediscussed.In recent years, considerable progress has been madein identifying the content of the associations that medi-ate performance on simple instrumental discriminations.Two findings suggest that learning about a rewarding

    outcome occurs in a situation in which a stimulus S+signals when a response will be followed by that out-come. First, there is evidence that instrumental perfor-mance is affected by postconditioning manipulations ofthe value of the outcome (Colwill Rescorla, 1990a;Rescorla Colwill, 1989). Second, it has been shownthat discriminative stimuli trained with one response-outcome relation will selectively promote performanceof other responses trained with the same outcome (Col-will Rescorla, 1988). These results have encouraged

    This research was supported byNational Science Foundation GrantIBN 8915342. Wethank Eric Wolfinger for his careful assistance withdata collection. Reports based on various portions of this work werepresented at the 63rd Annual Meeting of the Eastern Psychological As-sociation in Boston, April 1992; at the Centennial Meeting of theAmerican Psychological Association in Washington, D.C., August1992; and at the 33rd Annual Meeting of the Psychonomic Society inSt. Louis, November 1992. A thesis based on Experiments IA and IBwas submitted by B.A.D. to the Department of Psychology, BrownUniversity, in partial fulfillment ofthe bachelor s degree requirementsfor honors in psychology, 1992.A briefreport of Experiment 2 appearsin Colwill (1993a, 1994). Correspondence concerning this articleshould be addressed to Ruth M. Colwill, Department of Psychology,Brown University, Box 1853, Providence, RI 02912. cceptedby previous editor Vincent M LoLordo

    the view that, over the course of S+ training, associa-tions develop between the instrumental response and theoutcome R-O association) and between the discrimina-tive stimulus and the outcome S-O association).Other work has focused on an analysis of the associa-tions learned in a situation in which a stimulus (S- ) sig-nals when a response will not be followed bya rewardingoutcome (Bonardi, 1989; Colwill, 1991). Two observa-tions have been made about the operation of an S- .First, several authors have found that an S- preferen-tially suppresses the response used for its training(Bonardi, 1989; Richeson Colwill, 1994). Moreover,suppression of the original response appears indepen-dent of both the value of the consequences associatedwith the original response and the availability of the out-come predicted by the stimulus (Colwill, 1991, 1993b).

    This evidence suggests that an S- specifically inhibitsits nonreinforced response S-1R . Second, it has beenreported that the transfer ofS- s to new instrumental re-sponses is mediated in part by the identity of the instru-mental outcome. Colwill (1991) has shown that an will suppress better another response trained with thesame outcome relative to a response trained with a dif-ferent outcome. This finding of outcome-dependenttransfer implies that an develops an inhibitory asso-ciation with the instrumental outcome (S-10).The goal of the present experiments was to explorefurther the adequacy of an analysis of instrumentallearning in terms of these types of binary associations.Of particular interest was the question of how animals

    Copyright 1995 Psychonomic Society, Inc. 218

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    2/16

    INSTRUMENTAL BICONDITIONAL DISCRIMINATION 219

    solve an instrumental biconditional discrimination inwhich two stimuli differentially signal which one of twoconcurrently available responses will be reinforced andwhich will not Trapold, 1970). In this kind of task, oneresponse Rl ) is rewarded only during one stimulus S1)and the other response R2) is rewarded only during theother stimulus S2). The present paper reports a series ofexperiments that examined how rats learn which re-sponse tomake in each stimulus when.correct responsesare followed by the same outcome 0) .There are two straightforward versions ofa binary as-sociative analysis of performance on this simple bicon-ditional discrimination. One approach employs a mix-ture of R-O, S-O, and S-IRassociations. According tothis view, performance of the correct response is attrib-uted to a combination ofS-O and R-O associations. Es-sentially, it is assumed that discriminative stimuli con-trol the responses with which they are trained in thesameway as they transfer to or promote performance ofnew instrumental responses trained with the same out-come. This assumption is supported by recent workshowing that instrumental discriminative stimuli dependupon an intact s-o association to control performanceof their original responses Colwill, 1993b).However, inorder to generate differential responding in the presenceof the biconditional discriminative stimuli trained witha single outcome, promotion of the incorrect responseby the s-o association must be counteracted. One wayto accomplish this is through an inhibitory associationbetweenthe stimulus and the incorrect response Bonardi,1989; Colwill, 1991).The other version ofa binary account ofbiconditionaldiscrimination learning also makes use of an associationbetween the discriminative stimulus and the response.However,correct performance is attributed to the devel-opment ofexcitatory S-R associations. Because rewardsonly follow performance ofthe correct response in eachstimulus, an association between the discriminativestimulus and the correct response will be selectivelystrengthened e.g., Hull, 1943). In this way, the uniqueS-R associations guarantee performance of the correctresponse in each stimulus. What is important to noteabout this classical S-R account is that the outcome, al-though responsible for producing the association thatpermits this problem to be solved, is not itself repre-sented in that associative structure. Consequently, ma-nipulations of the value of the outcome after learninghas taken place will have no differential impact on bi-conditional discriminative performance.The present experiments employed three differenttechniques to analyze the associative structures mediat-ing performance on a biconditional discrimination task.First, the transfer procedure was used in Experiment lAto identify whether or not an association developed be-tween the biconditional discriminative stimulus and theinstrumental outcome. Second, in Experiments IB and3, the biconditional cues were retrained as signals S - s)for the nonreinforcement of a different response to as-

    sess the involvement of the potential S-O association inperformance of the correct response on the biconditionaldiscrimination. This operation has been shown to pro-duce a virtual elimination of the ability of a discrimina-tive stimulus to control its original response when theoutcome employed for S- training is the same as thatused for original discriminat ion training Colwill,1993b). Third, Experiments 1Band 2 manipulated thevalue of the instrumental outcome in an attempt to as-sess the contribution of excitatory S-R associations toperformance on the biconditional discrimination.

    The purpose of Experiment 1A was to use the trans-fer procedure to assess the presence ofs-o associationsin biconditional discrimination learning. Rats weretrained concurrently on two independent biconditionaldiscrimination tasks. In one task using auditory dis-criminative stimuli, one response R1) was rewardedwith one outcome 01) in the presence of one stimulus I), and the other response R2) was rewarded with thesame outcome 01) in the presence of the other stimu-lus A2). In the other task using visual discriminativestimuli, Rl was rewarded with a different outcome 02)in the presence of one stimulus VI) , and R2 was re-warded with that outcome 02) in the presence of theother stimulus V2).Following acquisition of the biconditional discrimi-nations, two new instrumental responses were trained,one R3) with 01 and one R4) with 02. Each discrim-inative stimulus was then tested with the two new re-sponses. If information about outcome identity is pro-vided by the biconditional discriminative stimuli, theneach stimulus should selectively promote performanceof the response that was trained with the same outcome.Thus, Al and A2 should elevate R3, whereas VI andV2 should elevate R4. The basic design of the bicondi-tional training procedure and the transfer test is shownin Figure 1.

    The subjects were 16 experimentally naive, male Holtzman-derived Sprague-Dawley rats Harlan Co.) approximately 100daysold at the start of the experiment. They were housed individually.Water was always available in the home cage. Daily food intakewas regulated so that the animals were maintained at 80 of theirfree-feeding body weight.

    The apparatus consisted of eight identical Skinner boxes mea-suring 22.9 20.3 20.3 em. The two end walls of the chamberwere aluminum, and the sidewalls and ceiling were made of PIexi-glas. The floor of the chamber was composed ofOA cm stainlesssteel rods spaced 1.9 cm apart. Each chamber had a recessed foodmagazine in the center of one end wall. Sucrose and pellets weredelivered through separate tubes inserted through the roof of thefood magazine. Each operation of the sucrose dispenser allowed0.2 ml of8 sucrose to flow onto the floor of the food magazine,

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    3/16

    220 COLWILL AND DELAMATER

    Basic design of Experiment IADiscrimination Training and Transfer TestDiscriminationTrainingAI: RI-Ol, R2-A2: R2-0I , RI-

    VI: RI-02, R2-V2: R2-02, RI-

    TransferTraining

    R3-0lR4-02

    TransferTestingAI: R3 v R4V2: R3 v R4

    VI: R3 v R4A2: R3 v R4Figure I. Basicdesignofbiconditionaldiscriminationtraining and thetransfer testin Ex-periment tA. Al and A2denote auditory discriminativecues noiseand tone),and VI andV2 denotevisual discriminative cues steadylight and flashing light).Rl, R2, RJ, and R4are instrumental responses nose poke,handle pull,leverpress, and chain pull).01 and 02are reinforcers foodpelletsand liquid sucrose).

    where it collected in a shallow indentation. Each operation of thepellet dispenser allowed a single 45-mg food pellet Formula A, J. Noyes Co.) to drop onto the floor of the magazine.Each box was equipped with four manipulanda: a lever, a chainpull, a nose poke, and a handle pull. The lever was mounted 2.5 emfrom the right-hand wall of the food magazine. The chain was sus-pended from a microswitch mounted on the roof of the chamber.The end of the chain was 11cm from the grid floor and 3 cm fromthe left-hand wall of the food magazine. Located 5.5 ern directlyabove the roof of the magazine was the nose-poke manipulandum,which consisted of a circular aperture, 2 cm indiameter and 1.3emdeep. The back of this aperture was covered by a metal plate thatoperated a microswitch whenever it was depressed. Mounted onthe same side of the chamber as the chain, but 1.5 em below thegrid floor, was the handle-pull manipulandum. This consisted of ashort flat rod protruding 3 em into the chamber. Whenever the rodwas pulled upward, a microswitch was closed and a responserecorded. The same model of microswitch Unimax Switch Co.,2HBT-l) was used to detect responding on all four manipulanda.Access to these manipulanda was prevented by covering the leverwith a metal plate or by retract ing it, by ret ract ing the chainthrough an opening in the ceiling, by inserting a metal cover intothe aperture of the nose-poke manipulandum, and by withdrawingthe arm of the handle pull.Each Skinner box was enclosed in a sound-attenuating andlight-resistant shell. Two loudspeakers were mounted on the backwall inside the shell, one in the upper left-hand corner and one intheupper right-hand corner. One speaker permittedpresentation ofa white noise N), and the other speaker allowed presentationof an1800-Hz tone T). A 6-W houselight F) was mounted from theceiling of each shell such that it was located over the center of theoperant chamber. It was flashed once per second. Another 6-Wlight L) was mounted on the side wall of each operant chamberabout 3 em above the grid floor. Experimental events were con-trolled and recorded automatically by interfacing Med Asso-ciates) and an XT microprocessor located in an adjoining room.ProcedureMagazine training. The subjects were given one session ofmagazine training in which 10food-pellet reinforcers followed by10 liquid-sucrose reinforcers were delivered on a variable-time

    VT) 60-sec schedule. No manipulanda were available during thissession.Response training.All subjects were then trained to nose pokeand handle pull for pellet and sucrose rewards. Initially, each re-

    sponse was trained on a continuous reinforcement CRF) scheduleuntil 30 reinforcers had been earned. The subjects were trained tonose poke for pellets in the first session, to handle pull for pelletsinthe second session, to nose poke for sucrose inthe third session,and to handle pull for sucrose in the fourth session.Following CRF training, the subjects received two 20-min ses-sions with each response-outcome combination. In each session.responding was reinforced on a variable-interval VI) 30-secschedule. The first four sessions occurred in the following se-quence: handle pulling for sucrose, nose poking for pellets, han-die pulling for pellets, and nose poking for sucrose. The next foursessions were given in the reverse order.Biconditional discrimination training. The subjects weretrained concurrently on two biconditional discrimination tasks.For one task, two visual stimuli L and F) served as the conditionalcues. For the other task, two auditory stimuli T and N) served asconditional cues. Training sessions were administered daily, andthe order of t ra in ing on the two tasks followed a repeatingABBABAAB sequence. The subjects received a total of 14 ses-sions of training on each task.For both tasks, each training session contained 16 30-sec pre-sentations of each of the two stimuli. Both the nose-poke and thehandle-pull responses were available inthese sessions. A responsewas reinforced on a VI 30-sec schedule only in the presence of itscorrect stimulus. Food pellets served as the reinforcer for one bi-conditional discrimination task, and sucrose liquid served as thereinforcer for the other biconditional discrimination task. Nose

    poking was designated as the correct response for one of the audi-tory and one of the visual stimuli; handle pulling was designatedas the correct response for the other auditory and visual stimuli.The intertrial interval III was gradually lengthened over thecourse of training. For the first training session with each pair ofdiscriminative stimuli, the mean III was 15sec. Forthe next foursessions on each task, it was lengthened to a mean of 30 sec; forthe next seven sessions on each task, the mean IT was 60 sec; andfor the final two sessions on each task, it was increased to a meanof90 sec.The specific designations of correct responses and outcomeidentities were counterbalanced across animals in the followingway.Half of the animals always earned pellets during the sessionswith the auditory cues and sucrose during the sessions with the vi-sual cues; this contingency was reversed for the remaining ani-mals. Within each of these conditions, halfof the animals were re-warded for nose poking in the noise and in the steady light and forhandle pulling in the tone and in the flashing light. The remaining

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    4/16

    INSTRUMENTAL BICONDITIONAL DISCRIMINATION 221

    animals were rewarded for handle pulling in the noise and in thesteady light and for nosepoking in the tone and in the flashing light.Because of initial low levels of responding by some animals, re-medial sessions of VI training were administered to elevate re-sponding. These sessions were given after two sessions of condi-tional discrimination training on each task had been completed.An additional VI training session with the sucrose reward wasalso given after five further sessions on each conditional discrim-ination task.Transfer-response training. Twonew responses, lever press-ing and chain pulling, were then trained. Each response wastrained initially on a CRF schedule until 30 reinforcers had beenearned. The subjects were then given one 20-min session oftrain-ing with each response inwhich its outcome was available on a VI30-sec schedule. This was followed by four 20-min sessions inwhich both responses were available, but each was reinforced onseparate VI 60-sec schedules. For half of the animals, pelletsserved as the reinforcer for lever pressing and sucrose served asthe reinforcer for chain pulling. The reverse was true for the re-maining animals. Following this training, the subjects were givenan 8-min extinction session in whichboth responseswere available.Transfer testing. The discriminative stimuli were examinedfor their ability to transfer to the lever and chain ina series of testsessions. In each test session, both the lever and the chain wereavailable but responding was never reinforced. Within each ses-sion, there were an equal number of presentations of one visualand one auditory stimulus with a mean 30-sec IT ; in this way,each session contained one stimulus trained with pellets and onestimulus trained with sucrose.In the first test session with each pair of auditory and visualstimuli, there were four 30-sec presentations of each stimulusscheduled in an ABBABAAB sequence. In the second test sessionwith each pair of auditory and visual stimuli, there were eight30-sec presentations of each stimulus. The first eight trials werescheduled in an ABBABAAB sequence; this sequence was re-

    versed for the last eight trials. For all subjects, N and F were testedin the same session and T and L were tested together.Between the first and second series of test sessions, the subjectsreceived three sessions of biconditional discrimination trainingwith each of the two pairs of stimuli. In addition, each of the sec-ond test sessions was preceded by a 20-min VI training sessionwith the two transfer responses. The procedural details of thesetraining sessions were identical to those described for originaltraining. The purpose ofthese additional training sessions was toattenuate any decremental effects that might have been producedby initial testing.Results and Discussion

    Biconditional Discrimination TrainingAcquisition of the biconditional discriminations pro-ceeded smoothly; by the end of training, responding oc-curred predominantly during the stimulus when it wasfollowed by a reward. Analysis of the terminal session ofdiscrimination training collapsed across stimulus, re-sponse, and outcome identity revealed that performanceof the correct response 25.0 responses per minute) wassignificantly higher than performance of the incorrectresponse 7.9 responses per minute) during the bicondi-tional stimuli [Wilcoxon T 16 0, p < .01]. Further-more, relative to the ITI rate 3.2 responses per minute),the biconditional cues elevated performance of both thecorrect response [T 16 = O,p .01] and the incorrectresponse [T 16 = 0, p .01]. Similar results were alsoobtained from an analysis of the session of discrimina-

    tion training preceding the second transfer test. The rateof correct responses 22.8 responses per minute) wassignificantly higher than the rate of incorrect responses7.5 responses per minute) and the ITI rate 2.0 re-sponses per minute) [Ts 16 0, p .01]. There werealso more incorrect responses than ITI responses[T 16 = O,p .01].Transfer-Response TrainingTraining on the transfer responses proceeded smoothly.On the last day ofVI training, the mean rate of respond-ing for pellets l0.7 responses per minute) was signifi-cantly higher than the mean rate of responding for su-crose 7.4 responses per minute) [T 16 = 22, p .05].There was no significant difference between the rates oflever pressing and chain pulling 10.4 and 7.8 responsesper minute, respectively). A similar pattern of resultsemerged during the 8-min extinction test. Respondingtrained with pellets was significantly higher than thattrained with sucrose 11.5 and 8.2 responses per minute,respectively) [T 15 2 p .05], and the rate ofleverpressing 11.1 responses per minute) did not differ sig-nificantly from that of chain pulling 8.6 responses perminute). In the sessions ofVI training that immediatelypreceded the second series of transfer testing, there wasno significant effect ofeither response or outcome iden-tity. The mean rates of lever pressing and chain pullingwere 10.0 and 7.6 responses per minute, respectively;the mean rates of pellet and sucrose trained responseswere 9.8 and 7.8 responses per minute, respectively.Transfer TestingThe results of the first test session with each bicondi-tional discriminative stimulus were moderately encour-aging. Relative to the ITI rate 4.5 responses per min-ute), the discriminative stimuli promoted performanceof the transfer response trained with the same outcome5.7 responses per minute [T 16 = 25 p .05] but hadno significant effect on the different response 5.0 re-sponses per minute). However, comparison of the ratesof same and different responses failed to reveal a signif-icant difference.

    Of primary interest are the resul ts of the secondtransfer-test session shown in Figure 2. Responding isshown separately during the st imuli tra ined with thesame outcome as the response SAME), during stimulitrained with a different outcome DIFF , and during theITI. There was a substantial elevation of the same re-sponse relative both to the different response [T 16) =O,p .01] and to the ITI rate [T 16 O,p .01]. Re-sponding on the different response, however, did not dif-fer significantly from the ITI [T l6 7 p .10].The pattern of these results is identical to that re-ported by Colwill and Rescorla 1988). The selective el-evation of a new instrumental response as a function ofshared association with an outcome suggests that the bi-conditional discriminative stimuli developed associa-tions with the instrumental outcome. The present results,

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    5/16

    222 COLWILL AND DELAMATER

    EXPERIMENT lB

    OL-.----- ------ ----- ------- --

    together with those of Colwill and Rescorla 1988), en-courage the view that subjects encode information aboutthe identity of the outcome during training on a va riet yof appetitive instrumental tasks.

    Figure 2.Transfertest results from Experiment IA. Responding isshown when a stimulus signaled the same outcome as the transfersponse filled circles), a different outcome from the transfer response open circles), and during the ITI open triangles) when no stimuluswas present

    Figure 3. F ollowing retra ining on the origina l dis crim i-nations, each of the four st im ul i was t ra in ed as a signal S- for the nonreinforcement of one of the two t ra ns -fer res pons es R3 and R4). Two of the stimuli one visualand one au dit or y) were t ra in ed as signals for the om is-sion of the s am e o ut co me used for t he ir o ri gi na l t ra in -ing; the other two stimuli signaled the omission of a dif-ferent outcome. Finally, the s ub je ct s were t es te d withthe original responses R and R2) in the presence ofeach of the four stimuli. was expected that if s-o as-sociations contributed to performance of the original re-sponses, the s am e S + /S - stimul i sh oul d be less effec-tive than the different S+/S- stimuli in promoting theiroriginal responses.

    The second goal of Experiment B was to provide apreliminary assessment of the role of the outcome in de-termining performance on the biconditional discrimina-tion. The outcome-revaluation procedure was used toprobe the sensitivity of the correct response to the currentvalue of its outcome. Because of the present subjects ex-tensive experience with the outcomes, a satiation proce-dure was u se d to d ecr ea se the a tt ra ct iv en es s of the in-strumental outcome. The s ub je ct s w er e t es te d on e ac hbiconditional discrimination first following satiation onone outcome and then fol lo wing s at ia ti on on the o th eroutcome . To the degre e that S-R learning underlies in-strumental performance on the bic onditional tas k, the res ho ul d be no d if fer en tia l e ff ec t on p er fo rm an ce on thetwo tasks of devaluing the outcome for one of those tasks.

    4

    e-e 6 ,- -6, e _____

    , _ ____ e

    -a. 6.,co 4o 2

    Method

    Basic design of Exp er im ent lBS- T ra ining and Te sting

    ProcedureRetraining of th e biconditional discriminations. Using theprocedure of Exp er im en t l A , two sessi on s of biconditional dis-

    Subjects and ApparatusThe s ub je ct s were the same as those used in E xp er im ent 1A.They were housed and maintained according to the procedures de-s cri bed for E xp er ime nt I A. The a pp ara tu s was also the same asthat used in Experiment IA.

    Testing

    AI: RI v R2V2: RI v R2VI: RI v R2A2: Rl v R2

    S- T ra ining

    R3-01, AI: R3-, V2: R3-R4-02, VI: R4-, A2: R4-

    Figure3. Basic design ofS - training and the test of its effects onoriginal biconditional discriminative performancein Experiment B.Al and A2denote auditorydiscriminativecues noise and tone), andVI and V2 denote visual discriminative cues steady light and flash-ing light).RI , Rl, RJ, and R4 are instrumental responses nose poke,handle pull, lever press, and chain pull). 01 and 02 are reinforcers food pelletsand liquid sucrose).

    The first goa l of Experiment 1B was to eva lu ate therole of the S-O association in performance on the bi-conditional discriminat ion. Recent evidence suggeststha t in s im ple instrumental discriminations performanceof the response may be undermined by treatments thatextinguish the S-O association Colwill, 1993b). In onestudy, Colwill l993b trained one discriminative stim-ulus with one response-outcome relation S 1: R 101 and a different discriminative stimulus with anotherresponse-outcome relation S2: R2 02 . Both stimuliwere then trained as s igna ls for the nonreinforcement ofanother instrumental response R3) trained with one ofthese outcomes either 01 or 02 . Thus, one of the stim-uli same S+/S- signaled that R3 would not be fol-l ow ed by the outcome used for the o ri gi na l training ofthat stimulus, whereas the other stimulus differentS + / S - signaled the omission of a different response-contingent outcome. In a subsequent extinction testwith the original responses, the same S+ /S - no longerevidenced an a bi li ty to e li ci t the response with which ithad been trained, although the diffe re nt S+ /S - con-tinued to e voke its original response. These results sug-gest that the s-o association mediates performance ofthe instrumental response in a simple instrumental dis-crimination.

    Experiment 1B used this technique to modify the s-oassociation in order to e va luate its contribution to per -form ance on the biconditional discrimination task. Thebasic design of this stage of the experiment is sh ow n in

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    6/16

    INSTRUMENTAL BICONDITIONAL DISCRIMINATION 223

    crimination training with each pair of cues were given. The pur-pose of this retraining was to ensure reliable performance on thetwodiscriminations after transfer testing.Transfer-response retraining. The lever and chain responseswere retrained separately for 10min. These responses were rein-forced on a VI 30-sec schedule with the same outcomes used fortheir original training in Experiment IA. training. In this phase, the four biconditional discrimi-native stimuli were trained as signals for the omission of a re-sponse-contingent outcome that was either the same as or differ-ent from the outcome used in the original discriminations. Eachsession contained one auditory and one visual stimulus with oneresponse manipulandum: in 16 sessions, Nand F were trainedwith one response either lever or chain), and in the other 16ses-sions, T and L were trained with a different response eitherchainor lever). Sessions were conducted daily and followed a repeatingABBABAABsequence. Within each session, there were 1630-secpresentations of each of the two stimuli separated by a mean IT

    of 30 sec. The order of trial presentations was randomized inblocks of eight trials. Responses were reinforced on a VI 30-secschedule, except during the stimulus presentations.The various combinations of stimuli, responses, and outcomeswere counterbalanced across subjects in the following way. For 4

    of the animals, Nand F signaled the omission of pellets for leverpressing, and T and L signaled the omission of sucrose for chainpulling; for four different animals, N and F signaled the omissionof pellets for chain pulling, and T and L signaled the omission ofsucrose for lever pressing; for another 4 animals, Nand F signaledthe omission of sucrose for lever pressing, and T and L signaledthe omission of pellets for chain pulling; finally, for the remaining4 animals, N and F signaled the omission of sucrose for chain pull-ing, and T and L signaled the omission ofpellets for lever pressing.Testing for control of original responses. Each of the fourdiscriminative stimuli was tested with the original responses, nosepoke and handle pull. The first test session contained four 30-secpresentations of each of Nand F. The tri al sequence wasNFFNFNNF, and trials were separated by an IT of 90 sec. Thesecond test session conducted the following day was identical tothe first, except that T and L were presented. The trial sequencewasTLLTLTTL.During testing, responding wasnever reinforced,Satiation and test. Two sessions of retraining were given oneach biconditional task. The procedural details were the same asthose described for original training.The subjects were sated first on sucrose and then, on a differentday,on pellets. Satiation was accomplished by delivering the des-ignated outcome on a VT 30-sec schedule until the animal hadstopped consuming the reward. The subject was removed from theoperant chamber but was replaced 5min later and given additionaldeliveries of the to-be-sated outcome until consumption ceased.This cycle was repeated twice. During this satiation procedure, nomanipulanda were available and no discriminative stimuli werepresented.The subjects were then removed from the chambers and placedin holding cages while the chambers were prepared for testing.The test session contained two presentations of each of the fourdiscriminative stimuli separated by an IT of 90 sec. The order oftesting was NFFNTLLT following satiation on the sucrose, andTLLTNFFN following satiation on thepellets. Both the nose-pokeand the handle-pull manipulanda were available in these test ses-sions, but responses were never reinforced.

    Results and DiscussionRetraining

    Retraining proceeded uneventfully. Performance onthe last session of each discrimination was similar tothat observed at the end of original training. Mean ratesof correct and incorrect responses during a stimulus

    were 24.1 and 9.2 responses per minute, respectively.The mean rate was 2.6 responses per minute during theITI. Analysis of these data revealed that the correct re-sponse was significantly elevated by its stimulus relativeto performance ofthe incorrect response [T 16 = O,p .10].Suppression ratios for the first and last sessions oftraining were calculated for each subject by dividing therate of responding during the stimulus by the sum ofthat rate and the ITI rate. training resulted in a sig-nificant increase in the suppression of responding dur-ing stimulus presentations relative to the ITI between thefirst and last training sessions .53 and .35, respectively)[T 16 O,p < .01].Testing for Control ofOriginaI ResponsesThe results of the extinction test with the discrimina-tive stimuli and their original responses are shown inFigure 4. Responding is shown in the presence of the two

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    7/16

    224 COLWILL AND DELAMATER

    stimuli that were trained to signal the omission of anoutcome that was either the same as top panel) or dif-ferent from lower panel) the outcome used for originaltraining. Within each panel, responding is shown sepa-rately for the correct response, the incorrect response,and during the ITI. Inspection of Figure 4 reveals thatthe discriminative stimuli continued to exhibit differen-tial control over their original responses regardless ofwhich outcome had been used for S- training. Duringpresentations of stimuli trained as S s with the sameoutcome, performance of the correct response was sig-nificantly elevated relative to both the incorrect re-sponse [T 16 26, p .05] and the IT rate [T 16 =13, p .01]. Similar results were obtained when thestimuli had been trained as S- s with a different out-come. Performance of the correct response was signifi-cantly elevated relative to both the incorrect response[T 16 20.5, p .05] and responding during the IT[T 16 = 24, p .05]. In neither condition did perfor-mance ofthe incorrect response differ significantly fromthe IT [Ts 16 5 p .10]. Furthermore, direct com-parison of the rates of correct responding across thedifferent- and same-outcome conditions yielded no sig-nificant difference [T 16 56 p .05]. In contrast tothe results obtained by Colwill 1993b) with simple dis-criminative stimuli, there was no evidence that S train-ing with the same outcome had any adverse effect on theability ofbiconditional stimuli to promote performanceof their responses. Whereas Colwill 1993b) found thatS- training with the same outcome removed the abilityof that stimulus to increase performance of its responserelative to the IT , there was no evidence for a similar ef-fect in the present experiment.These results show that, regardless of the outcomeused, the training of biconditional discriminative stim-uli as signals for the nonreinforcement of an instrumen-tal response does not affect their ability to promote theiroriginal reinforced responses. The failure to obtain anoutcome-specific decremental effect of S- training inthis experiment suggests that performance of the correctresponse is not promoted by an S O association in thistype of instrumental discrimination.Satiation and TestThe retraining sessions were sufficient to reestablishgood performance on the two tasks. Performance on thelast session of each discrimination was similar to thatobserved at the end of original training. Mean responserates during a stimulus were 20.9 responses per minutefor the correct response, 7.1 responses per minute for theincorrect response, and 2.7 responses per minute duringthe ITI. Analysis of these data revealed that the correctresponse was significantly elevated by its stimulus rela-tive to performance of the incorrect response [T 16 =0, p < .01]. Both correct and incorrect responses oc-curred at a higher rate during the stimuli than during theIT [Ts 16 p < .01].The results of primary interest concern the effect ofoutcome satiation on performance of the correct re-

    20CD

    15 t: - t:.CD o t:CD 10r0

    t:.o 5r10 t:CD::E t:0

    2 J

    20CDr

    15 t: t:.CDCo -D 10r\t

    0Co0 ,5

    r \...:.t:.0CD::E 0 =002 J

    Figure 4.Mean rates of respondingon the biconditional discrimi-nations foBowingS training in Experiment lB The top panel dis-plays performance on the discrimination whose cues were trainedwith the same outcome used for S training; the bottom panel dis-plays the comparable data when S training employed a differentoutcome. In each panel, correct filledcircles)and incorrect responsesopen circles) are plotted separately. Responding is also shown during the open triangles) when no stimuli were presented.

    sponse. For the purposes of analysis, data from the firsttest trial with each stimulus were combined across thetwo test sessions for the sated and nonsated conditions.Inspection of these data reveal that the correct responsewas less likely in the presence of the stimuli in which ithad previously earned the sated outcome 6.5 responsesper minute) than during the stimuli in which it had pro-duced the nonsated outcome 9.0 responses per minute).An analysis of these data found this difference to be sta-tistically significant [T 16 26.5, P .05]. That cor-rect responding was sensitive to differences in the valueof the outcome predicted by the discriminative cue sug-gests that the rats learned about the identity of the out-come. Such a result is clearly not conducive to an S Ranalysis of biconditional discriminative performance.In summary, the results of Experiment 1B do not ap-pear to favor accounts of biconditional discrimination

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    8/16

    INSTRUMENTAL B IC ONDITIONAL DISCRIMINATION 225

    learning in terms of the binary models outlined in the in-troduction. The fact that performance of the correct re-sponse was not selectively disrupted by modifying theinformation that the discriminative stimulus providedabout the occurrence of its outcome suggests that re-sponse is not promoted by the S-O association. Second,the observation that changing the value of the outcomeearned by that response does affect its performance sug-gests that responding is not controlled through an S-Rassociation. These results were confirmed and strength-ened by Experiments 2 and 3.

    EXPERIMENT 2The purpose of Experiment 2 was to examine furtherthe impact ofoutcome devaluation on biconditional dis-crimination performance. The design of the bicondi-tional task used in Experiment IB was modified in order

    to increase its sensitivity to the effect ofoutcome deval-uation. In Experime nt IB, the two biconditional tasksemployed different outcomes but used the same pair ofinstrumental responses. Thus, each response was asso-ciated with both outcomes. Colwill and Rescorla 1990a)have reporte d tha t a discrimina tive stimulus will en-courage performance of its response even after the out-come with which they share an association has been de-valued, provided there is some v al ued consequenceassociated with that response. Thus, in Experiment IB,the continued attractiveness of one outcome may haveprovided support for performance of the response in thepresence of the stimulus in which it had previouslyearned the currently devalued i.e., sated) outcome. Toeliminate this source of support for responding, the sub-jects were again trained on two separate biconditionaldiscriminations, but each employed a different pair ofinstrumental responses.The basic design of Experi ment 2 is shown in Fig-ure 5. Following acquisition of the two biconditional dis-criminations, the value of one of the outcomes was re-duc ed by pairing the outcome with a nause a-inducing

    agent. The s ubj ect s were then test ed in exti nctio n oneach biconditional discrimination. The question of in-terest is whether performance of the correct responsewill be depressed when the outcome used to train that re-sponse has been devalued. Such a result is clearly not an-ticipated if biconditional stimuli control their responsesthrough S-R connections.

    MethodSubjects andApparatusSixteen experimentally naive Holtzman-derived Sprague-Daw-ley rats s er ve d as sub je ct s. T he y were housed and maintainedunder the same conditions as the subjects in Experiment IA. Theapparatus used was the same as that employed in Experiment IA.ProcedureMagazine training. The s ub je ct s r ec ei ve d one s es si on ofmagazine training in which 10 food pellets followed by 10 liquid-sucrose outcomes were delivered on a VT 60-sec schedule. No re-

    sponse manipulanda were available.Response training. All s ubjects were trained to lever press,chain pull, nose poke, and h an dl e pull for e it he r food-pe llet orliquid-sucrose rewards. The responses were trained in separatesessions on a CRF schedule until 30 reinforcers had been earned.The order of training was lever press, chain pull, nose poke, andhandle pull.Following CRF t ra in in g, each re sp on se was t ra in ed on a VI30-sec schedule for two 20-min sessions. The orderof training washandle pull, nose poke, chain pull, and lever press for the first foursessions. This sequence was reversed for the next four sessions.For half of the animals, pellets served as the reinforcer for leverpress and chain pull resp onse s, and sucrose s er ve d as the rein-f or cer for nose poking and h an dl e p ul li ng . These response-outcome combinations were switched for the remaining animals.Biconditional discrimination training. The s ubjects weretrained on two biconditional discrimination tasks, one using thevisual cues L and F) and one using the a ud it or y cues T and N).Within each task, each stimulus uniquely signaled which one oftwo concurr ently available r es pons es would be r einf or ced andwhich one would not. For halfof the animals, lever press and chainpull were trained with the visual cues, and nose poke and handlepull with the auditory cues; the remaining animals had the oppo-site combination of response and stimulus pairs. For all animals,one response from each pair was designated correct in one stimu-

    Training

    AI: RI-Ol R2-A2: R2-0l RI-VI: R3-02, R4-V2: R4 -0 2, R3-

    Basic Design of Experiment 2Devaluation

    01 02-02 01-

    Testing

    AI: RI v R2A2: RI R2VI: R3 v R4V2: R3 v R4

    Figure 5. Basic design of Experiment 2.Al and A2 denote auditory discrimina-tive cues noiseand tone), and VI and V2 denote visual discriminative cues steadylight and flashing light).RI R2, R3, and R4 are instrumental responses nosepoke,handle puU.lever press, and chain pull). and are reinforcers foodpellets andliquid sucrose). and - denote whether the outcome was devalued or not.

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    9/16

    226 COLWILL AND DELAMATER

    Ius and the other response was correct during the other stimulus.Correct responses were reinforced on a VI 30-sec schedule withthe same outcome used for their initial t raining; i ncor rect re-sponses and ITI responses were never reinforced. Thus, pelletsserved as the reinforcer for one biconditional discrimination andsucrose served as the reinforcer for the other biconditional dis-crimination.The specific stimulus, response, and outcome combinations thatwere distributed across subjects were as follows. Four animalswere given pellets for chain pulling in N and lever pressing in Tand were given sucrose for nose poking in L and handle pulling inF.Four other animals were given pellets for nose poking inNandhandle pulling in T and were given sucrose for chain pulling in Land lever pressing in F. Another four animals received sucrose forlever pressing in N and chain pulling in T and received pellets forhandle pulling in L and nose poking in F. The remaining four ani-mals received sucrose for handle pulling during N and nose pok-ing during T and received pellets for lever pressing in L and chainpulling in F.There were 12sessions oftraining on the task with the visual bi-conditional stimuli and 12 sessions on the task with the auditorybiconditional stimuli. Each session contained 16 30-sec presenta-tions of each biconditional stimulus. For each task, the mean du-ration ofthe ITIwas 15sec forthe first session, 30 sec forthe next5 sessions, 60 sec for the following 3 sessions, and finally 90 secfor the final 3 sessions.

    Outcome devaluation There were five 2-day cycles of out-come devaluation training. On the first day of each cycle, the to-be-devalued outcome was delivered on a VT 60-sec schedule for20 min or until five reinforcers had accumulated in the food mag-azine. The subjects were then injected with a 5 ml/kg solution of0.6 M lithium chloride Lie I) administered intraperitoneally be-fore being returned to their home cages. On the second day ofeachcycle, the other outcome was delivered on a VT 60-sec schedule.At the end of this session, no injection was administered and thesubjects were simply returned to their home cages. The doors ofthe sound-attenuating boxes were propped open for the last twoconditioning cycles in order to permit observation of the animalsconsumption of the outcomes. During this period, the room lightswere dimmed.Testing. Each pair of biconditional discriminative stimuli wastested with its original response pair. In each test session, therewere eight 30-sec presentations of each of the two stimuli with a30-sec ITI. For all subjects, the lever and chain were available inthe first test session, and the nose poke and handle pull were avail-able in the second test session. During testing, responding wasnever reinforced.Testing was repeated following training designed to elevateoverall levels of responding. That training involved one 20-minsession with each of the four responses reinforced on a VI 30-secschedule with the v al ue d outcome. This was followed by one20-min session with each of the two pairs of responses in whicheach response was reinforced with the valued outcome on a VI60-sec schedule. The test procedure was identical to that describedabove, except that the order of the two tests was reversed.

    Results and DiscussionBiconditional Discrimination TrainingB oth bic onditional discrim inations were a cquire dsuccessfully in that a response occurred most often dur-ing the stimulus in which it was reinforced. Analysis ofperformance in the final session ofdiscrimination train-ing collapsed across stimulus, response, and outcomeidentity revealed that performance of the c or re ct re-sponse 16.9 responses per minute ) was significa ntlyhigher than performance of the incorrect response 4.2

    responses per minute) during the biconditional stimuli[T 16 = 0, p .01]. Furthermore , relative to the ITIrate 3.0 responses per minute), the biconditional cuessignificantly elevated performance of both correct re-sponses [T 16 = 0, p .01] and incorrect responses[T 16 2 p .01].

    The assignment ofwhich outcome was to be devaluedwas balance d with respe ct to the various stimulus, re-sponse, and outcome combinations. In addition, termi-nal biconditional discrimination performance wasmatched so that there were no significant differences inresponding as a function ofwhether or not the outcomewas to be devalued. The mean correct, incorrect, and ITIresponse rates, respectively, were 16.1, 4.0, and 3.1 re-sponses per minute for the to-be-devalued outcome and17.6, 4.4, and 2.9 responses per minute for the non-devalued outcome.Outcome DevaluationOutcome devaluation was accomplished smoothly.Over the course of the conditioning cycles, the subjectsbegan to reject the outcome that was paired with thetoxin but continue d to c onsume the nondeva lued out-come. By the end of this phase, no animal consumed thedevalued sucrose and all but three animals showed com-plete rejection of the devalued pellets. All subjects con-sumed the valued outcome. In the consumption test ad-m in is te re d a ft er the s ec on d test session, all a nima lsrejected the devalued outcome.

    TestingThe results of primary interest from the first extinc-tion tests with the discriminative stimuli and their orig-inal responses are shown in Figure 6. Responding trainedwith the de valued outcome is shown on the left of Fig-ure 6, and responding tra ined with the valued outcome

    Figure 6. Experiment2: Mean rates of responding on the bicon-ditionaldiscriminations in the first extinction test after the trainingoutcome had been devalued Oeftpanel) or not right panel). In eachpanel, performance isshown separately for correct responses filledcircles) and incorrect responses open circles) during the discrimi-native stimuli and during the ITI open triangles) when no stimuliwere delivered.

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    10/16

    INSTRUMENTAL BICONDITIONAL DISCRIMINATION 227

    Figure 7.Experiment 2:Mean rates ofresponding on the bicondi-tional discriminations in the secondextinction test after the trainingoutcome had beendevalued leftpanel or not right panel . Prior tothis test, responseshad been givenVItraining with thevalued out-come. In eachpanel,performance shownseparately for correctsponses filledcircles and incorrect responses open circles duringthe discriminativestimuliand during the ITI open triangles whenno stimuliweredelivered.

    is shown on the right of Figure 6. In each case, perfor-mance is shown separately for the correct response, theincorrect response, and during the ITI. is clear that de-valuation of the outcome produced a substantial de-crease in performance of the correct response trainedwith that outcome. The mean rate of correct responseswas significantly lower for the devalued outcome thanfor the nondevalued outcome [T 16 = 19.5, P < .01].Neither comparison of incorrect responses nor compar-ison of ITI responses revealed any significant effect ofthe outcome value.Biconditional discrimination performance continuedto show sensitivity to the value of the current outcomeeven after performance of the devalued response hadbeen elevated by training with the valued outcome. Theresults of that test are shown in Figure 7. There were nosignificant differences in either overall ITI rates or over-all incorrect response rates as a function of the value ofthe outcome. However, in the test session, the mean rateof correct responses that had earned the outcome whosevalue had been reduced was significantly lower than therate of correct responses trained with the currently val-ued outcome [T 16 21 5 p .05]. These results con-firm those of Experiment lB and show that responsestrained in a biconditional discrimination are affected bychanges in the value of their instrumental outcomes.Such sensitivity of instrumental performance to the cur-rent value of the outcome is clearly not consonant witha classical S R analysis of biconditional discriminationlearning.

    On the basis of the results of the present experiments,it is clear that rats have knowledge about the identity of

    the reward used in a biconditional discrimination andthat they use this knowledge to guide performance of thecorrect response. Furthermore, the finding in Experi-ment lB that subsequent training of the biconditionalcue as an S- did not selectively undermine performanceof the correct response renders implausible an explana-tion for performance of that response in terms of theproduct ofR O and S O binary associations. The pur-pose ofExperiment 3 was to assess the generality of thisfinding by combining the design of Experiment 2 withthe S- extinction manipulation ofExperiment 1The basic design of Experiment 3 is outlined in Fig-ure 8. The subjects were trained concurrently on two in-dependent biconditional discriminations. As in Experi-ment 2, each discrimination employed a different pair ofresponses and a unique rewarding outcome. Followingacquisition of both discriminations, the four bicondi-tional cues were established as signals for the nonrein-forcement ofa different response, displacement ofajoy-stick. In the absence of the stimuli, joystick responsesearned food pellets for half of the animals and sucroseliquid for the remaining animals. Thus, the biconditionalcues for one task signaled the omission of an outcomethat was the same as that used in original biconditionaldiscrimination training, whereas the cues for the othertask signaled the omission of a different outcome. Eachpair of biconditional cues was then tested with its origi-nal pair of responses. was anticipated that if the bi-conditional discriminations had been solved using bi-nary S O and R associations then S- training wouldproduce an outcome-dependent disruption of correct re-sponses in those discriminations.

    MethodSubjects and ApparatusSixteen experimentally naive Holtzman-derived Sprague-Dawley rats served as subjects. They were housed and main-tained under the same conditions as subjects in Experiment lA.The apparatus used was the same as that employed in Experi-ment lA with one additional response, ajoystick manipulandumModel 80111, Lafayette Instrument . This manipulandum con-sisted of a steel pole, measuring approximately 12.1 em in lengthand 0.6 em in diameter, inserted through a hole in the ceiling ofthe operant chamber. The end of the pole was approximately6 em above the grid floor and 7 em from the center of the endwall opposite the food magazine. A response was recorded when-ever the joystick was displaced from its centered position. Ac-cess to the joystick was prevented by removing the joystick fromthe operant chamber.Procedure

    Magazine and response training The procedures for maga-zine training and initial training of lever pressing, chain pulling,nose poking, and handle pulling were identical to those describedfor Experiment 2.Biconditional discrimination training As in Experiment 2,the subjects were trained on two biconditional discriminationtasks, one using the visual cues and one using the auditory cues.

    Within each task, each stimulus uniquely signaled which one oftwo concurrently available responses would be reinforced andwhich one would not. Details about the counterbalancing ofvarious combinations of stimuli, responses and outcomes across ani-mals were identical to those ofExperiment 2. In addition, the same

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    11/16

    228 COLWILL AND DELAMATER

    procedure used for training the subjects inExperiment 2 was usedto establish discriminative performance in this experiment.S training. In this phase, the four biconditional discrimina-tive stimuli were trained as signals for the nonreinforcernent of ajoystick response that otherwise earned pellets for half of the ani-mals and sucrose for the other animals. To this end, all subjectswere initially trained to displace a joystick for either food pellets

    or sucrose liquid. In the first session, eachjoystick response earneda reward. Responding was then reinforced on a VI 30-sec sched-ule for five 20-min sessions.Following this training, the subjects received two types of training sessions distributed randomly across days. These sessionsdiffered only in whether the auditory or the visual stimuli werepresented. In each session, there were 16 30-sec presentations ofeach of the two biconditional cues affiliated with the same task.The order of trial presentations within a session was randomizedin blocks of eight trials, and the mean IT was 30 sec. Joystick re-sponding was reinforced with the outcome used for its initial train-ing on a VI 30-sec schedule, except during stimulus presentationswhen no outcomes were available. The particular combinations ofcue and outcome identity were counterbalanced across animals.The subjects were given 10sessions o f training with the visualcues and 10 sessions with the auditory cues. The design of thisphase ensured that for each subject the two cues for one bicondi-tional task signaled the omission of the same outcome used fortheir original training and that the two cues for the other task sig-naled the omission of a different outcome. That arrangementmakes the design maximally sensitive to any decremental effectthat 8 - training with the sameoutcome might have by taking ad-vantage of the fact that generalization between stimuli in the samesensory modality is likely to be greater than that between stimulifrom different modalities.Testing for control of original responses. Each pair of bi-conditional discriminative stimuli was tested with its original pairof responses. In each test session, there were eight 30-sec presen-tations of each of the two stimuli with an IT of 30 sec. The trialsequence followed an ABBABAAB schedule. For all subjects, thelever and chain were available in the first test session, and the nosepoke and handle pull were available in the second test session.During testing, responding was never reinforced.

    Results andDiscussionBiconditional Discrimination Training

    Acquisition of both biconditional discriminationsproceeded smoothly, with the result that responding oc-

    curred most often during the stimulus in which it was re-inforced. Analysis ofperformance in the final session ofdiscrimination training collapsed across stimulus, re-sponse, and outcome identity revealed that performanceof the correct response 16.7 responses per minute) wassignificantly higher than performance of the incorrectresponse 5.3 responses per minute) during the bicondi-tional stimuli [T 16 O,p .01]. Furthermore, relativeto the ITI rate 2.6 responses per minute), the bicondi-tional cues significantly elevated performance of boththe correct response [T 16 = O p .01] and the incor-rect response [T 16 = p .01].The selection ofpellets or sucrose as the reinforcer forjoyst ick responding was balanced with respect to thevarious stimulus, response, and outcome combinations.In addition, terminal biconditional discrimination per-formance was matched so that there were no significantdifferences in responding as a function ofwhether or notthe outcome used for S- training was the same as or dif-ferent from that earned in the presence of the cues. Themean correct, incorrect, and ITI response rates, respec-tively, were 17.4,6.0, and 3.1 responses per minute forthe same-outcome condition and 16.1, 4.6, and 2.1 re-sponses per minute for the different-outcome condition.There were no significant differences in the rates of ei-ther correct, incorrect, or IT responses between thesame- and different-outcome conditions.s-Over the course of S- training, the level of respond-ing during the IT increased, whereas responding duringthe stimuli declinedgradually. During the first session ofS- training, the mean rates of responding during thesame- and different-stimulus conditions and the ITIwere 5.1,5.3, and 4.5 responses perminute, respectively.None of these differences was significant. Analysis ofthe terminal S- session revealed that responding duringthe IT 8.3 responses per minute was significantlyhigher than responding during either the same stimulus4.0 responses per minute) or the different stimulus 3.2

    Training

    AI: RI-Ol, R2-A2: R2-0I , RI-VI: R3-02, R4-V R4-02, R3-

    Basic Design of Experiment 3

    S- Training

    R5-0I, AI: R5-, A2: R5-R5-0I, VI: R5-, V2: R5-R5-02, AI: R5-, A2: R5-R5-02, VI: R5, V2: R5-

    Testing

    AI: RI v R2A2: RI v R2VI: R3 v R4V2: R3 v R4

    Figure8.Basicdesign ofS - trainingand the test ofits effects onoriginalbiconditionaldiscriminative performance Experiment 3.Al and A2denote auditory discriminativecues noiseand tone),and VI and V2denote visual discriminativecues steadyUghtandflashing light). Rl , R2, R3, R4, and R5 are instrumental responses nose poke, handlepull, leverpress, chain pull, and joystick). and are reinforcers foodpeUetsand liq-uid sucrose).

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    12/16

    INSTRUMENTAL BICONDITIONAL DISCRIMINATION 229

    responses per minute [Ts 16 ps:5 .01]. Althoughthe mean rate of responding during the same stimuluswas slightly higher than that during the different stimu-lus, this difference was not significant [T 16 50.5,p >.10].Suppression ratios for the first and last sessions oftraining were calculated for each subject by dividing therate of responding during the stimulus by the sum ofthat rate and the III rate. training resulted in a sig-nificant increase in the suppression of responding dur-ing stimulus presentations relative to the ITI between thefirst and last training sessions .52 and .30, respectively[T 16 O,p .01].Testing for Control ofOriginal Responses

    Of most interest are the results of the extinction testwith the discriminative stimuli and their original re-sponses. These data are summarized in Figure 9. Re-sponding is shown in the presence of the biconditionalstimuli trained as signals for the omission of either thesame outcome top panel or a different outcome lowerpanel . Within each panel, responding is shown sepa-rately for the correct response, the incorrect response,and during the ITI. Inspection of Figure 9 reveals thatthe discriminative stimuli continued to exhibit differen-tial control over their original responses regardless ofwhich outcome had been used for S- training. In thepresence of stimuli trained as swith the same out-come, performance of the correct response was signifi-cantly elevated relative to both the incorrect response[T 16 = p .01] and the ITI [T 16 = O p .01].Similar results were obtained when the stimuli had beentrained as S- s with a different outcome. Performance ofthe correct response was significantly elevated relativeto the rate of incorrect responses [T 16 = 2, p .01]and to responding during the ITI [T 16 0, p .01].For both the same- and the different-outcome condi-tions, the rates of incorrect responses were greater thantheir respective III rates [Ts 16 = 5 ps .01].That biconditional control was not differentially af-fected by S- training as a function ofwhether the sameoutcome or a different outcome was used is further sup-ported by the results of direct comparisons of correctand incorrect response rates across the different- andsame-outcome conditions. There was no significant dif-ference between performance of the correct response[T 16 63.5, p .05] or between the incorrect re-sponse rates [T 16 = 41, p .10].In summary, these results strengthen and extend therelated findings of Experiment 1 First, they replicatethe finding that S- training with the same outcome hasno adverse effect on the ability of biconditional stimulito promote performance oftheir responses. Second, theyshow that the failure of training in Experiment IB todisrupt discriminative control was not due to masking bythe associat ion between the correct response and itsother valued outcome. Thus, in contrast to a simple S+,a biconditional discriminative stimulus retains the abilityto promote performance of the response whose rein-

    forcement it signals after training as a cue that another re-sponse will not be followed by that rewarding outcome.GENER L DISCUSSION

    The results of the present experiments reveal threecharacteristics ofbiconditional discrimination learning.First, a biconditional discriminative stimulus providesinformation about the identity of the outcome earned inits presence. Using the transfer procedure, Experi-ment l demonstrated that a biconditional cue wouldpromote performance of a new response tra ined withthe same outcome but not one tra ined with a differentoutcome. Second, biconditional performance does notdepend upon the integrity of the information that thestimulus provides about the identity of the earned out-come. In Experiments IB and 3, there was no specificdecremental effect on performance of the original re-sponse of training a biconditional stimulus as a signalthat a different response would not be followed by thatoutcome. Third, biconditional performance is sensitiveto postlearning manipulations of the value of the out-come. Correct responding was reduced in the presenceof its discriminative stimulus when the outcome was de-valued using either motivational Experiment IB or con-ditioning Experiment 2 operations.These results have important implications for analy-ses of biconditional discrimination learning in terms ofbinary associations. With respect to classical S R the-ory, the present observations of response sensitivity tooutcome devaluation confirm the inadequacy of an ac-count of instrumental performance purely in terms ofanS R association. However, it should be noted that otheraspects of the results may appear to be consistent withthe view that S R associations may make some contribu-tion to performance. For instance, both the ineffective-ness of S- training in eliminating discriminative con-trol and the residual performance following outcomedevaluation are to be expected ifan association exists be-tween the stimulus and the correct response. Conse-quently, it is of some interest to examine the extent towhich elaborations of classical S R theory can predictthe complete pattern of results obtained in the presentstudies.Two-process theories were developed in an attempt topreserve the classical S R account of instrumental per-formance and to accommodate evidence that learningabout the outcome occurs during instrumental training.The latter goal was accomplished by allowing a Pavlov-ian association to develop between the stimulus and theoutcome. The function of this Pavlovian S O associationwas to provide either additional stimulus support Tra-pold Overmier, 1972 or motivational support Res-corla Solomon, 1967 for the instrumental response.Thus, Trapold and Overmier 1972 proposed that thePavlovian s-o association yields an outcome expec-tancy that becomes part ofthe stimulus complex associ-ated with the response. Rescorla and Solomon 1967 ,on the other hand, argued that the Pavlovian s-o asso-

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    13/16

    230 COLWILL AND DELAMATER

    Figure 9.Mean rates of responding on the biconditional discrimi-nations foUowings training in Experiment 3. The top panelplays performance on the discrimination whose cues were trainedwith the same outcome used for training; the bottom panelplays the comparable when training employed a differentoutcome. In each panel, correct fmed circles)and incorrectresponsespen circles) are plotted separately. Responding is also shown during the ITI open triangles) when no stimuli were presented.

    ciation simply energizes the expression of instrumentalbehavior. Colwill and Rescorla 1986) have commentedon the ability of these two-process accounts to handlethe basic demonstrations that responses are sensitiveto the value of the outcomes used to train them. How-ever, both accounts have difficulty explaining the pat-tern of outcome-devaluation effects obtained in the pres-ent experiments.The predictions of the two-process theory developedbyTrapold and Overmier 1972) regarding the effects ofoutcome devaluation in Experiments 1 and2 tum outto be indistinguishable from those made by classicalS R theory. In those experiments, each biconditionaldiscrimination task employed only one outcome. Thisfeature of the designs guarantees that the two discrimi-native stimuli within a task would each develop an asso-ciation with the same outcome. Thus, according to Tra-pold and Overmier 1972), both cues within a task would

    come to evoke the same outcome expectancy. Conse-quently, relative to the physical features of the bicondi-tional cues, that outcome expectancy would be uninfor-mative about which of the two responses was to befollowed by the outcome. Elsewhere, it has been re-ported that such incidental or redundant stimuli do notgain control over behavior Mackintosh, 1984; Wagner,Logan, Haberlandt, Price, 1968). Thus, in the tasksused in the present study, the contribution of the out-come expectancy generated by the Pavlovian s-o asso-ciation to performance would be negligible. Therefore,there should have been no differential effect on instru-mental responding of outcome devaluation in either Ex-periment IB or Experiment 2.The pattern of devaluation effects obtained in the pres-ent experiments also poses a problem for the version oftwo-process theory proposed by Rescorla and Solomon967). According to their account, performance of boththe correct and the incorrect responses should have beendepressed following devaluation of the instrumental out-come. Consistent with that account is the finding thatperformance of the correct response was reduced in thepresence of its discriminative stimulus following a de-crease in the value of its outcome. However, there was nocomparable effect for incorrect responding. Even afterthe general level of responding had been increased, thelikelihood of an incorrect response in the presence of thestimulus associated with the devalued outcome was notsignificantly different from the likelihood of incorrectresponding in the stimulus that signaled the valued out-come. Such results provide little encouragement for theidea that the observed devaluation effects were mediatedby a reduction in the motivation for responding normallyprovided by the Pavlovian S O association.Various shortcomings of two-process theories of in-strumentallearning have been noted on previous occa-sions by proponents of the view that both responses anddiscriminative stimuli become associated with their in-strumental outcomes Colwill Rescorla, 1988, 1990b;Rescorla Colwill, 1989). The combined R O and s-oaccount of instrumental learning rightly anticipates thesensitivity that correct responses showed to manipula-tions of the value of their outcomes in Experiments 1Band 2. However, the finding that biconditional perfor-mance was resistant to the destructive consequences that training has been found to exert on simple discrim-inative control was not anticipated by this view. A grace-ful reconciliation of these different effects of S- train-ing on discriminative control does not appear possiblewithin this particular binary framework. At best, it mightbe argued that the conflict in signaling whether or notthe same outcome will occur that is inherent in the bi-conditional tasks used here prevents the development ofan inhibitory s-o association during subsequent training. Although there is little theoretical support forthis speculation, it is relatively easy to test empiricallyby using different outcomes to reinforce correct re-sponses within a biconditional task. Such training shouldrender vulnerable to the effects of subsequent S- train-

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    14/16

    INSTRUMENTAL BICONDITIONAL DISCRIMINATION 231

    ing whichever biconditional cue hadpreviously signaledthe outcome that was used for S - training. Colwill1994 tested this prediction and found no evidence thatsuch training of biconditional cues reduced their immu-nity to the disruptive effects of S training with thesame outcome.Before abandoning a binary analysis of biconditionaldiscrimination learning, it may be useful to consider amodel of instrumental learning in which the contribu-tion to performance of the various binary connectionsthat havebeen described is determined by features of thetraining situation. seems quite reasonable that simpleS training encourages the development of predomi-nantly s-o and R-O associations. However, in bicondi-tional training, the concurrent signaling of the occur-rence of 0 for one response and its omission for anotherresponse may selectively undermine development of ans-o association. Under these circumstances, the subject

    may come to rely on an S-R association to retrieve a rep-resentation of the correct response. Performance of thatresponse would then be determined by evaluation of theconsequences of that response activated by the R-Oassociation. This combination ofS-R and R associa-tions would predict the devaluation data of Experi-ment 2. Such a view would also anticipate the preserva-tion of discriminative control following S- training inExperiments IB and 3. Unfortunately, this view has dif-ficulty explaining the remaining results. First, a selec-tive effect ofoutcome devaluation should not have beenfound in Experiment I Although the S-R associationwould lead to retrieval of the correct response for thatstimulus, the R-G association would result in activationofboth the valued and the devalued outcomes associatedwith that correct response. Consequently, there shouldnot have been differential performance of that response.For the same reason, this view would not have predictedthe transfer effect found in Experiment IA. Becauseboth ofthe outcomes associated with the response wouldhave been activated as a consequence of the S-R andR-O connections, there should have been no opportu-nity for selective control over the transfer responses. Anadditional point worth noting about this argument is thatColwill 1994 has found no evidence that transfer canin fact be mediated by the combination of S-R and R-Oassociations. An explanation of instrumental bicondi-tional discrimination learning in terms of binary con-nections does not seem very promising.Several authors have cautioned against analyses of in-strumental learning in terms of binary associations be-tween various pairs of elements in an instrumental taske.g., Colwill Rescorla, 1986; Mackintosh Dickin-son, 1979; Skinner, 1938 . They have argued instead fora single associative structure incorporating all threeterms of the task. One particularly attractive version ofthis point of view is that a discriminative stimulus be-comes associated with the particular R-O relationsarranged in its presence, S R-O and S-j R-O . Sup-port for this hierarchical view of instrumental learninghas come from studies employing the classic switching

    design in which the identity of the discriminative stimu-lus disambiguates which responses lead to which out-comes. Specifically, two responses are reinforced withdifferent outcomes in the presence of two different stim-uli. Each stimulus, however, signals unique response-outcome combinations. Thus, one stimulus SI signalsthat one response RI leads to one outcome 01 andthat the other response R2 leads to a different outcome02 ; but the other stimulus S2 signals the oppositecombinations of responses and outcomes i.e., RI leadsto 02 and R2 leads to 01 . Because each outcome fol-lows both responses and occurs in the presence of bothstimuli, the binary associations are uninformative aboutwhich outcome follows a particular response in a givenstimulus.Recent work has documented that in this type of dis-crimination task rats learn about the higher order rela-tions Colwill Rescorla, 1990b; Rescorla, 1990 . Forexample, Colwill and Rescorla 1990b trained rats on aswitching task and then made one ofthe outcomes unat-tractive. In a subsequent extinction test with the stimuliand responses, the subjects showed a preference withineach stimulus for the response that had previouslyearned the currently attractive outcome in that stimulus.The observation that the prevailing response preferencewas determined by the identity of the discriminativestimulus strongly suggests that the rats had encoded thehierarchical relations.The hierarchical model is able to provide a satisfac-tory explanation for the three major findings obtained inthe present experiments. First, the hierarchical model at-tributes transfer effects to differential generalizationacross R-O relations. Novel combinations of a discrim-inative stimulus and an R-O relation are treated as moresimilar to the original training condition when both re-sponses share an association with the same outcome.Second, the immunity of biconditional performance tothe potential decremental effects of establishing the dis-criminative cues as S- s for other responses is attributedto changes in the generalization gradients produced byoriginal t raining. In the present experiments , eachbiconditional cue was simultaneously established as asignal for the reinforcement of one response and thenonreinforcement of a different response. This discrim-ination training may have sharpened the generalizationgradients such that additional training of the cue as asignal for the nonreinforcement of yet another responsewould have little impact on the ability of that cue to elicitits original R-O relation see Mackintosh, 1974 . Third,the hierarchical model predicts the pattern of outcomedevaluation effects found in Experiments IB and 2. Be-cause the response-outcome relation is associated witha specific stimulus, performance of the correct responsewould be susceptible to the influence of a shift in thevalue of the outcome. The incompleteness of the deval-uation effect is not necessarily in conflict with a hierar-chical account of instrumental learning. Colwill andRescorla 1990a have offered several suggestions forthe residual behavior observed following outcome de-

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    15/16

    232 COLWILL AND DELAMATER

    valuation that are quite compatible with the general the-sis that responses are associated with their outcomes.would be somewhat remiss not to acknowledgequite a different conceptualization ofthe three-term con-tingency that continues to be invoked to account for dis-crimination learning (e.g., Preston, Dickinson, Mack-intosh, 1986; Wilson Pearce, 1989). This alternativeassumes the learning of associations between configu-rations ofparticular stimulus and response combinationsand their consequences, SR 0 and SR-10.Colwill andRescorla (1990b) have discussed application of the con-figural cue account to the results ofoutcome devaluationon discriminative control in a switching design. The pres-ent findings that outcome devaluation affects bicondi-tional performance pose no problem for this view. Fur-thermore, the failure of the outcome tomediate a selectiveimpairment of S- training on biconditional control isalso consistent with the predictions ofthe configural ac-count. Any generalization of extinction from the SRconfiguration in S- training to the SR configurations inthe biconditional discriminations should be equivalentregardless ofwhich outcome is used in training. How-ever, Colwill s l993b original observation that training with the same outcome selectively disrupts theability of an S to control its response is not predictedby the configura1 cue model. Even more damaging forthis position are data showing that discriminative controlestablished in a switching design may be selectively af-fected by manipulations of its R-O relation in anotherstimulus context (Rescorla, 1990). In both cases, theconfigural cue account does not anticipate differentialgeneralization between stimulus-response configura-tions as a function of outcome identity. does not seemvery plausible that subjects would employ only a con-figural cue solution to the biconditional discriminationsused in the present experiments.In this discussion, it has been assumed that a bicondi-tional cue employs a symmetrical structure for encodinginformation about the consequences of both its rein-forced (correct) response and its nonreinforced (incor-rect) response. However, evidence for hierarchical learn-ing has come exclusively from situations in whichst imuli signal that responses will be followed by re-warding outcomes. Thus, whereas empirical confirma-tion exists for learning of S R-O relations, there isno comparable support for S-1(R-O) associations. Infact, analyses of signals for the nonreinforcement of in-strumental responses seem to suggest that such stimulidevelop direct inhibitory connections with those re-sponses (Bonardi, 1989; Colwill, 1991). Given the pres-ent interpretation of the various effects of S- trainingon previously established discriminative control, it iscrucial that future studies inspect further the mecha-nisms by which biconditional cues control their incor-rect responses.In summary, the present analysis has favored the opin-ion that biconditional discriminations using a single out-come are solved with hierarchical associations. This

    conclusion is of special interest because various binarysolutions to the problem were also feasible. Thus, thepresent data make clear that hierarchical solutions arenot deployed only under conditions in which binary so-lutions fail to capture the accuracy of the instrumentalcontingencies that are in effect. Rather, it seems that atthe very least hierarchical learning occurs when discrim-inative cues disambiguate the multiple consequences as-sociatedwith an instrumental response. Whether hierar-chical associations mediate all types of instrumentaldiscriminations or whether multiple codes are employeddepending upon the nature of the instrumental task is anissue that remains to be settled.

    REFERENCESBON RDI C. (1989). Inhibitory discriminative control is specific toboth the response and the reinforcer. Quarterly Journal Experi

    mental Psychology 418, 225-242.COLWILL R. M. (1991). Negative discriminative stimuli provide in-formation about the identity of omitted response-contingent out-comes. AnimalLearning Behavior 19,326-336.COLWILL R. (1993a). An associative analysis of instrumentallearning. Current Directions in Psychological Science 2, 111-116.COLWILL R. 993b). Signaling the omission of a response-con-tingent outcome reduces discriminative control. Animal Learning Behavior 21, 337-345.COLWILL R. M. (1994). Associative representations of instrumentalcontingencies. In D. Medin (Ed.), Thepsychology learning andmotivation (Vol. 31, pp. 1-72). New York: Academic Press.COLWILL RESCORL A. (1986). Associative structures ininstrumental learning. In G. H. Bower (Ed.), The psychology learning and motivation (Vol. 20, pp. 55-104). New York: AcademicPress.COLWILL RESCORL A. (1988). Associations between thediscriminative stimulus and the reinforcer in instrumental learning.Journal Experimental Psychology: Animal Behavior Processes4 155-164.

    COLWILL RESCORL A. 990a). Effect of reinforcer de-valuation on discriminative control of instrumental behavior. Journal Experimental Psychology Animal Behavior Processes 1640-47.COLWILL RESCORL A. (1990b). Evidence for the hierar-chical structure of instrumental learning. Animal Learning Behavior 18, 71-82.HULL,C. (1943). Principles behavior. New York: Appleton-Century-Crofts.

    M CKINTOSH N. J. (1974). Thepsychology animal learning. Lon-don: Academic Press.M CKINTOSH N. J. (1984). Conditioning and associative learning.Oxford: Oxford University Press.

    M CKINTOSH N. J., DICKINSON A. (1979). Instrumental (Type II)conditioning. In A. Dickinson R. A. Boakes (Eds.), Mechanisms learning and motivation (pp. 143-167). Hillsdale, NJ: Erlbaum.PRESTON G. C., DICKINSON A., M CKINTOSH N. J. (1986). Con-textual conditional discriminations. Quarterly Journal Experimental Psychology 388, 217-237.RESCORL R. A. (1990). Evidence for an association between the dis-criminative stimulus and the response-outcome association in in-

    strumentallearning. Journal Experimental Psychology: AnimalBehavior Processes 16, 326-334.RESCORL A., COLWILL R. M. (1989). Associations with antic-ipated and obtained outcomes in instrumental learning. Animal

    Learning Behavior 17, 291-303.RESCORL R. A., SOLOMON R. (1967) . Two-process learningtheory: Relationships between Pavlovian conditioning and instru-mental learning. Psychological Review 74, 151-182.

  • 8/12/2019 An Associative Analysis of Intrumental Learning

    16/16

    INSTRUMENTAL BICONDITIONAL DISCRIMINATION 233

    RICHESON J. A., COLWILL R. M. 1994, April . Mechanisms ofresponse suppression by negative discriminative stimuli. Poster pre-sented at the 65th Annual Meeting of the Eastern Psychological As-sociation in Providence, RI.

    SKINNER F.1938 . Thebehavior oforganisms.New York: Appleton-Century-Crofts.TR POLD M. A. 1970 . Are expectancies based upon different posi-

    tive reinforcing events discriminably different? Learning Motivation 1, 129-140.TR POLD M. A., OVERMIER J. 1972 . The second learningprocess in instrumental learning. In A. A. Black Prokasy

    Eds. , Classical conditioning Current research and theorypp. 427-452 . New York: Appleton.W GNER A. R., LOG N F. A., H BERL NDT K., PRICE T. 1968 .

    Stimulus selection in animal discrimination learning. Journal of xperimental Psychology 76, 171-180.WILSON P. N., PE RCE J. M. 1989 . A role for s timulus general -ization in conditional discrimination learning. Quarterly Journal ofExperimental Psychology 243-273.

    Manuscript received May 17, 1993;revision accepted for publication June 14, 1994.