Upload
david-a-townsend
View
1.131
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
Operant Conditioning:
Causal Factors and Explanations
Contingency and Continuity
Conditioning and Learning
Conditioning is the process by which an activity originates or is changed through reacting to an encountered situation provided that the change in activity can not be explained on the basis or native tendencies, maturation or temporary states. Hilgard (1956)
Conditioning is the learning of relations among events so as to allow the organism to represent its environment. Rescorla (1988)
S-R
S-S
WHAT PRODUCES CONDITIONING: CONTIGUITY OR CONTINGENCY? Power of reinforcement to shape and
sustain operant behavior is pervasive. So, too, is its potential as a tool in a
wide range of practical applications. But, what is it about response-reinforcer
relation that produces conditioning?
Operant Conditioning
S+ R O*
WHAT PRODUCES CONDITIONING: CONTIGUITY OR CONTINGENCY? Two answers have been offered: Contiguity
– Temporal proximity between response and reinforcer
Contingency– Probabilistic relation with reinforcement of
responding and not responding
temporal relationship; temporal contiguity refers to the delivery of the reinforcer immediately after the
response
causal relationship; response-reinforcer contingency refers to the extent to which the response is necessaryand sufficient for the occurrence of the reinforcer
Evidence for Contiguity
Generally, as delay between response and reinforcer increases, rate of operant responding decreases.
Evidence for Contiguity
Generally, as delay between response and reinforcer increases, rate of operant responding decreases.
Evidence for Contiguity
Generally, as delay between response and reinforcer increases, rate of operant responding decreases.
Temporal contiguity is thus necessary for conditioning to occur.
But, is temporal contiguity also sufficient?
Can’t tell because most studies require responses to produce reinforcers.
Evidence for Contiguity Skinner’s 1948
superstition project : Studied 8 hungry
pigeons. Food given every 15 sec
regardless of pigeon’s behavior.
6/8 pigeons performed idiosyncratic patterns of unnecessary behavior.
Responding rose as time to food approached.
Why did all of this happen?
Evidence for Contiguity
Food happened to follow something each pigeon was doing.
Different behaviors were strengthened for different pigeons.
Higher the rate of response, the more likely food would again follow response.
Responding rose as time to food neared because P of response-food pairing rose the longer the time since last food.
Evidence for Contiguity
Skinner thus concluded that necessary and sufficient condition for operant conditioning was that a reinforcer closely follow a response.
Why is response-reinforcer contingency so effective?
It guarantees response-reinforcer contiguity.
Evidence for Contiguity
Skinner’s results and conclusions have been questioned.
His results may be difficult to replicate. His conclusions may not be general. But, beyond superstition experiment,
there may be good evidence to support importance of response-reinforcer contiguity.
What leads to conditioning?
Contiguity– Stimuli that are close
to one another in time and in space become associated
Co-occurrence– Proximity critical
Contingency– When one stimulus
depends on the other, they will become associated
Information – Predictive value
critical
Evidence for Contiguity
Thomas (1981) study on contiguity-promoting schedules.
P(Food|Press) = P(Food|No Press)
Trial 1 Trial 2
Trial 1 Trial 2
20s
No Response:
Subject Responds:
RewardS*
RewardS*
RewardS*
RewardS*
Bar pressR
Bar pressR
Thomas Schedule
p p pp ----/----/----/---- f f f f
Evidence for Contiguity
Thomas (1981) study on contiguity-promoting schedules.
P(Food|Press) = P(Food|No Press) No press-food contingency. But, response-food contiguity was
promoted by novel schedule. So too was lever pressing. Rats
increased lever pressing
Evidence for Contiguity
Extra wrinkle of Thomas (1981) study: P(Food|Press) < P(Food|No Press) Thus, negative press-food contingency.
Trial 1 Trial 2
Trial 1 Trial 2
20s
No Response:
Subject Responds:
RewardS*
NOS*
RewardS*
RewardS*
Bar pressR
Bar pressR
Rewarded response causes next 20s trial to be unrewarded
Trial 4Trial 3
Second Thomas Schedule
p p pp ----/----/----/---- f f f
Evidence for Contiguity
Extra wrinkle of Thomas (1981) study: P(Food|Press) < P(Food|No Press) Thus, negative press-food contingency. Response-food contiguity was still promoted
by second schedule. So too was lever pressing. Rats increased
lever pressing Power of contiguity is very strong; can even
override effects of contingency.
What leads to conditioning?
Contiguity– Stimuli that are close
to one another in time and in space become associated
Co-occurrence– Proximity critical
Contingency– When one stimulus
depends on the other, they will become associated
Information – Predictive value
critical
Operant Conditioning: Contiguity
S+ R O*
O* stamps in S+-R relationship
Pavlov
Hull
Skinner
CONTINGENCY LEARNING
Attempts to assess contingency learning in operant conditioning parallel studies in Pavlovian conditioning.
Operant studies suggest that organisms can distinguish dependence from independence between response and reinforcer.
Cause and effect
What leads to conditioning?
Contiguity– Stimuli that are close
to one another in time and in space become associated
Co-occurrence– Proximity critical
Contingency– When one stimulus
depends on the other, they will become associated
Information – Predictive value
critical
CONTINGENCY LEARNING Figure 8.4
Contiguity without Contingency
10 20
20 40
airplane
no plane
no
S* 2 S* 2 a b
c d
S+ 1
No S+ 1
bird andplane are paired
A quick test for contingency
a·d > c·bthen positive
a·d = c·bzero contingency
a·d < c·bthen negative
no bird bird
prob.(birdplane) = .33prob.(birdno plane) = .33
10/30 20/60
You can have a positive contingency even whenpairing is the least frequent possibility
Example: can you learn that
and “cat” are associated?
“cat” no “cat”
100 900 1,000
200 9,800 10,000
see
no
p (“cat” ) = .10
p (“cat”no ) = .02
hear
positive contingency
Learning:
Seeking cause and effect
relationships
CONTINGENCY LEARNING
Head turn, mobile, infants given positive contingency procedure (Watson, 1967):– Infants’ head turning increased, plus they
smiled when mobile moved Infants put on zero contingency
procedure:– Infants’ head turning did not increase, plus
they stopped smiling when mobile moved
Apple martinis
Carolyn Rovee-Collier
445-3364
CONTINGENCY LEARNING
Infants discriminate response-dependent from response-independent reinforcement: shown by head turning.
Infants differentially enjoy response-dependent and response-independent reinforcers: shown by smiling and cooing.
Both cognition and affect may be changed by control by consequence.
Cause and effect
Learned Helplessness (Seligman)
panel
Learned Helplessness (Seligman)
Phase I - Learning to Escape
Control Dogs Yoked Dogs
Shock
•A long lasting shock is given to both groups
every once in a while
•Control dogs can turn shock off by pushing a
panel
•Yoked dogs’ shock turns off too, when control
dog pushes panel
•Yoked dogs can do nothing themselves to escape
shock
Contiguity or Contingency?
Spot Periodically shocked Can terminate shock
by pressing lever with his nose
Lassie Periodically shocked Has no control over
shocks, but when Spot’s shock is terminated, so is Lassie’s
Phase 2 - Avoidance Learning
•shock delivered to one side of box•if dog jumps hurdle to other side there is no shock
Control dogs learn to avoid shockYoked dogs don’t
Yoked dogs have learned that they can’t stop shockThey have learned to be helpless
hurdle
Learned Helplessness
Yoked dog seems to have learned that its behavior does not matter:– It not only fails to learn– It stops reacting to shock
Phenomenon of learned helplessness strongly suggests that organisms can discriminate response-dependent from response-independent events.
Learned Helplessness
Animals must learn to jump barrier to avoid shock
Results– Spot learns, Lassie yelps
but eventually becomes passive and accepts shocks
Contingency– Spot learns his actions
matter– Lassie learned that it was
helpless Contiguity
– Spot learned to press lever– Lassie learned to act
passively
Seligman’s Learned Helplessness Study
Two groups of dogs are exposed to shock– control group could escape shock– “no escape” group could NOT escape
shock Later, when escape was possible, “no
escape” dogs didn’t even try Learned that they had NO CONTROL
OPERANT CONDITIONING: WHAT IS LEARNED? In any operant conditioning study, three
events need to be considered: – Response (R)– Reinforcer or punisher it produces (O*) – Stimulus situation in which response occurs (S) – Three occur in S-R-S* sequence
What associations among three elements are formed when animal learns to make operant response?
OPERANT CONDITIONING: WHAT IS LEARNED? R-O* association Seems to require foresight: acting in
accord with future consequences. Thorndike famously denied that
animals know what consequence of their behavior will be.
Law of effect thus emphasized past consequences.
Operant Conditioning
S+ (R O*)
OPERANT CONDITIONING: WHAT IS LEARNED? S+-R association Thorndike’s idea Situation evokes behavior (S-R). Reinforcers strengthen S-R bond. S+ becomes more likely to evoke R.
Operant Conditioning: Contiguity
S+ R O*
O* stamps in S+-R relationship
Pavlov
Hull
Skinner
OPERANT CONDITIONING: WHAT IS LEARNED? Two-process theory:
– S-R association (operant)– S-O* association (Pavlovian)
Sight of lever not only triggers lever pressing, but it also makes animal “think” about upcoming food.
Anticipation of reinforcer motivates operant response.
OPERANT CONDITIONING: R-S* Learning Strongest evidence comes from studies
using devaluation procedure (Colwill & Rescorla, 1985).
Chain PullSugar Water Lever PressFood Pellet Food PelletIllness (Devaluation) Choice: chain pull versus lever press Rats pull chain much more than press
lever.
R-O* association
Colwill & Rescorla (1985)Training Devaluation Test
R1 O1
R2 O2
O1 LiCL
O2 nothing
R1 and R2
1
2
3
4
5
6
7
Meanresp/min
R1 -outcomewas devalued
Time
R2 -outcomenot devalued
OPERANT CONDITIONING: R-O* Learning Association of food with illness does not
change stimulus aspects of situation that might generate responses.
Lever press does not occur because it is associated with chamber (S-R), but because it is associated with reinforcer (R-S*).
When value of reinforcer is eliminated, so too is impetus for response.
OPERANT CONDITIONING: R-O* Learning Operant conditioning involves learning to
expect responses to produce reward. Rats not only expect reward, but a
particular kind of reward. Devaluation procedure could not work
unless rats had specifically remembered that one response produced food pellets and other produced sugar water.
Operant Conditioning
(S+ ) R (O*)
OPERANT CONDITIONING: S-O* Learning Rats trained to panel press. A light or noise was always present. S1 = sugar water and S2 = food pellets. Lever press = sugar water and chain pull =
food pellets. S1 increased lever pressing, but not chain
pulling. S2 increased chain pulling, but not lever
pressing.
S-O* Learning
OPERANT CONDITIONING: S-O* Learning For rat to show these selective
increases in responding, it must have learned which stimulus was associated with which reward.
Therefore, this study provides evidence of S-O* associations in operant conditioning (Colwill & Rescorla, 1988).
S-O association
Colwill & Rescorla (1988)
Sd training Response training Test
S1 R1 O1
S2 R2 O2
R3 O1
R4 O2
S1: R3 vs R4
S2: R3 vs R4
2
4
6
10
Meanresp/min Different
outcome
Trials
Sameoutcome
8
OPERANT CONDITIONING: S-R Learning Devaluation studies find a reduction in
response that leads to devalued reward. But, response is rarely eliminated. Residual responding may represent
behavior triggered by stimulus situation in which responding was rewarded.
OPERANT CONDITIONING: WHAT IS LEARNED? Summary statement: Research suggests organisms learn
associations between response and reinforcer (R-O*), environmental stimuli and reinforcer (S-O*), and stimuli and response (S-R).
The “simple” process of operant conditioning is not so simple after all.