Upload
jeff-green
View
262
Download
0
Embed Size (px)
Citation preview
7/27/2019 PSY211 Operant Conditioning Reinforcement
1/17
Psychology of Learning
PSY211Operant/Instrumental Conditioning:
Reinforcement
B. Charles Tatum
7/27/2019 PSY211 Operant Conditioning Reinforcement
2/17
Definition: A form of learning (conditioning) in which the organism
is free to respond to (operate on) the environment and changes inbehavior occur as the result of the stimulus consequences
(reinforcement/punishment) of the spontaneous actions.
Trial and Error (Trial and Success) Learning
Thorndikes Puzzle Box/Skinners Operant Chamber/Tolmans Maze
Law of Effect
Operant/Instrumental Conditioning
7/27/2019 PSY211 Operant Conditioning Reinforcement
3/17
Reinforcement: Any stimulus event that increases the likelihood of a
preceding response
Positive Reinforcement: The presence of a stimulus increases the
likelihood of the preceding response (e.g., food, money, praise, drugs,
electrical stimulation of pleasure centers in the brain). Sometimes
called Reward.
Negative Reinforcement: The removal of a stimulus increases thelikelihood of the preceding response (e.g., remove hand from a warm
stove, improve grades to lift restriction, work hard not to get fired).
Punishment: Any stimulus event that decreases (suppresses) the likelihood
of a preceding response
Positive Punishment: The presence of a stimulus (usually aversive suchas slap, a scolding, or a dirty look) decreases (suppresses) the likelihood
of a preceding response.
Negative Punishment: The removal of a stimulus (usually
something pleasant such as TV privileges or a desirable object)
decreases (suppresses) the likelihood of a preceding response. When thestimulus that is removed is a reinforcer, we call this extinction.
Reinforcement and Punishment
7/27/2019 PSY211 Operant Conditioning Reinforcement
4/17
Response
Stimulus
Consequence
(Onset/Offset)
Reinforcer
(Positive/Negative)
Punisher
(Positive/Negative/Extinction)
Future Responses
Increase Decrease
Produce
(Onset)
Remove
(Offset)StimulusConsequence Positive
ReinforcementReward
(e.g., praise)
Negative
Reinforcement
(e.g., nagging)
Positive
Punishment(e.g., spanking)
Negative Punishment
Extinction
(e.g., time out)
7/27/2019 PSY211 Operant Conditioning Reinforcement
5/17
Reinforcement and Punishment
as Reciprocal Processes
Attending Meetings
+
Doughnuts
=
Increase Attendance(Positive Reinforcement)
Playing Computer Games
+
No Doughnuts
=
Reduce Game Playing(Negative Punishment/Extinction)
Hey Dude to Officer+
Criticism
=
Reduce Verbal Salutations
(Positive Punishment)
Saluting an Officer+
No Criticism
=
Increase Saluting as Form of Address
(Negative Reinforcement)
The Confused Soldier
The Problem Employee
7/27/2019 PSY211 Operant Conditioning Reinforcement
6/17
Reinforcement and Punishment
as Dynamic Processes
Child
Response = Dirty Words
Parent
Stimulus = Spanking
Future Result
Reduce Foul Language
(Positive Punishment)
Parent
Response = Spanking
Child
Stimulus = Nice Words
Future Result
Increase Use of Spankings(Positive Reinforcement)
Husband
Response = Bring Gifts
Wife
Stimulus = Stop Sulking
Future Result
Increased Gift Giving
(Negative Reinforcement)
Wife
Response = Sulking
Husband
Stimulus = Gifts
Future Result
Increased Sulking
(Positive Reinforcement)Marital Dynamics
Parental Dynamics
7/27/2019 PSY211 Operant Conditioning Reinforcement
7/17
Primary versus Secondary Reinforcement
Primary: Naturally or innately reinforcing stimuli (e.g., food, water, sex)
Secondary (Conditioned): Reinforcers that are dependent on theirassociation with other reinforcers (e.g., praise, recognition, money).
UCS
(Food)
UCR
(Satisfaction)
CS
(Money)
CR
(Satisfaction)
Generalized Reinforcer: Secondary reinforcers that have been paired with
a wide variety of primary reinforcers (e.g., money, praise)
7/27/2019 PSY211 Operant Conditioning Reinforcement
8/17
Comparison Between Classical (Pavlovian)
and Operant (Instrumental) Conditioning
Responses
Stimuli
Peripheral
Nervous System
Examples
Classical/Pavlovian Operant/Instrumental
Elicited (Reflex) Emitted (Spontaneous)
Unconditioned (UCS)
Conditioned (CS)
Unobserved (Internal)
Reinforcing/Punishing
Discriminative (External)
Autonomic
(Involuntary)
Somatic
(Voluntary)
Light - Air Puff
Tone - Knee Tap
Bell - Food Powder
Snap Fingers - Roll over - Treat
Deadline - Work late - Bonus
Exam - Study - Good grades
Association S-S S-R-S
7/27/2019 PSY211 Operant Conditioning Reinforcement
9/17
Acquisition, Extinction, and Spontaneous Recovery
of and Operantly/Instrumentally Conditioned Response
REINFORCEDTIME OR
TRIALS
ACQUISITION EXTINCTION SPONTANEOUS
RECOVERY
SPONTANEOUS
RECOVERY
NON-REINFORCED
TIME OR
TRIALS
Break(Interruption)
Break(Interruption)
Extinction Burst
7/27/2019 PSY211 Operant Conditioning Reinforcement
10/17
Phases and Principles of Operant/Instrumental Conditioning
Acquisition: Gradual increase in responding when reinforcing stimulus
follows the behavior (e.g., toilet training, athletic skills, stupid pet tricks) Successive Approximation (Shaping)
Chaining: Performing behaviors in a sequence (e.g., ordering take-out)
Forward Chaining: Train first-to-last
Backward Chaining: Train last-to-first
Superstitious Behavior Conditions of Reinforcement
Reward Delay (delayed gratification)
Reward Contingency (predictability is good)
Reward Preference (chocolate better than raisins)
Reward Amount Diminishing returns
Contrast effects
Frequency effects
7/27/2019 PSY211 Operant Conditioning Reinforcement
11/17
WorkProficiency
0 10 20 30 40 50 6 0 70 80 90 100
0
10
20
30
40
50
70
80
90
100
60
Salary Increase
Changes in Effectiveness of Reward Amount
7/27/2019 PSY211 Operant Conditioning Reinforcement
12/17
Acquisition (continued) Response Characteristics
Skeletal muscles (voluntary [somatic] nervous system) easier to
condition than smooth muscles and glands (involuntary [autonomic]
nervous system)
Simple responses easier to condition than complex responsesMotivational Level: Learning is faster and stronger when learner is
deprived of rewards (better for primary than secondary rewards)
Competing Rewards: Conditioning is slow and weak if other
(competing) behaviors are also being rewarded
Awareness Not necessary for conditioning
Leads to faster conditioning
Phases and Principles of Operant/Instrumental Conditioning
(Continued)
7/27/2019 PSY211 Operant Conditioning Reinforcement
13/17
Phases and Principles of Operant/Instrumental Conditioning
(Continued)
Extinction: Reduced responding when the reinforcing stimulus is removed
(e.g., ignore bed-time tantrums)
Extinction Burst
Reinforcement Variability (Schedules of Reinforcement)
Continuous
Intermittent (partial)
Stimulus Variability (e.g., extinguishing smoking habit)
Response Variability (e.g., extinguishing athletic skills)
Spontaneous Recovery: Return of the extinguished behavior following an
interruption (e.g., child stays with grandma)
Resurgence: Return of a behavior following the extinction of anotherbehavior (e.g., extinction of day-care tantrums produces return to bed-time
tantrums). Similar to regression, but not always a return to a more
primitive behavior.
7/27/2019 PSY211 Operant Conditioning Reinforcement
14/17
Theories of Reinforcement Hedonic Theory
Reinforcement strengthens behavior because it produces pleasurable
sensations
Problems with theoryMasochism: Pain (unpleasant sensations) are reinforcers
Negative reinforcement: Removal of aversive stimuli reduces
discomfort but is not pleasurable
Tautology: If it makes you feel good, its a reinforcer. If its a
reinforcer, it makes you feel good (circular reasoning). Drive Reduction (Hull)
Drive: A motivational force. Tension from unfulfilled needs or desires
Primary Drives (e.g. hunger, thirst)
Secondary Drives (e.g., success, popularity)
Reinforcer: Any stimulus that reduces drive by fulfilling the needs anddesires (e.g., food, water, money)
Difficulties with the theory
Some reinforcers do not reduce drives (e.g., electrical stimulation of
the brain, copulation without ejaculation)
Some motivations do not create states of tension that need to be
reduced (e.g., exploratory behavior)
7/27/2019 PSY211 Operant Conditioning Reinforcement
15/17
Relative Value (Premack)
Reinforcers viewed as behaviors (e.g., food smell vs. chewing behavior)
Relative value: Some behaviors are more probable (more preferred)
than others (e.g., partying vs. studying)
Premack Principle: High probability (preferred) behavior reinforces
low probability (non-preferred) behavior Problems with theory
How to explain strong secondary reinforcers (e.g., why is verbal
praise such a powerful reward?)
Sometimes low probability behavior reinforces high probability
behavior if the less likely behavior has been prevented(e.g., deprivation of study time)
Response Deprivation (Timberlake & Allison): Relative value of responses
depends on relative deprivation. Behaviors that are not allowed to occur
will reinforce other, less deprived, behaviors (e.g., Prohibition in the 1920s
made drinking booze a much stronger reward).
Theories of Reinforcement
(continued)
7/27/2019 PSY211 Operant Conditioning Reinforcement
16/17
Escape Conditioning
UCS
(hot water)
UCR
(pain reaction)
Operant
Response
(turn nozzle)
Negative Reinforcer
(remove hot water)
UCS
(shock)
UCR
(pain reaction)
Operant
Response
(jump hurdle)
Negative Reinforcer
(remove shock)
Example # 1
Example # 2
Avoidance Conditioning
OperantResponse
(turn nozzle)
Negative Reinforcer(avoid hot water)
Operant
Response(jump hurdle)
Negative Reinforcer
(avoid shock)
Example # 1
Example # 2
Warning Signal
(flushing toilet)
Warning Signal(ringing bell)
7/27/2019 PSY211 Operant Conditioning Reinforcement
17/17
Theories of Avoidance Two Processes
Classical Conditioning
UCS: A noxious stimulus that produces an unpleasant reaction
(e.g., flinch, startle reaction) or an escape response (e.g., jump aside, run away) CS: Some signal that precedes the noxious stimulus (light, bell, flush)
Operant Conditioning
Operant Response: Response that removes the noxious stimulus
Negative Reinforcer: Termination of a noxious stimulus
Explanation: The CS becomes noxious and the animal learns to escape the noxious CS
ProblemsAvoidance continues even after CS loses its aversive qualities
Avoidance response does not extinguish even though CS is no
longer paired with UCS
UCS
(hot water)
UCR
(jump aside)
Operant
Response(turn nozzle)
Negative Reinforcer
(remove hot water)
CS
(flush)
One Process: Only operant conditioning is involved in avoidance. The warning signal (CS)
becomes a discriminative stimulus