PSY211 Operant Conditioning Reinforcement

Embed Size (px)

Citation preview

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    1/17

    Psychology of Learning

    PSY211Operant/Instrumental Conditioning:

    Reinforcement

    B. Charles Tatum

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    2/17

    Definition: A form of learning (conditioning) in which the organism

    is free to respond to (operate on) the environment and changes inbehavior occur as the result of the stimulus consequences

    (reinforcement/punishment) of the spontaneous actions.

    Trial and Error (Trial and Success) Learning

    Thorndikes Puzzle Box/Skinners Operant Chamber/Tolmans Maze

    Law of Effect

    Operant/Instrumental Conditioning

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    3/17

    Reinforcement: Any stimulus event that increases the likelihood of a

    preceding response

    Positive Reinforcement: The presence of a stimulus increases the

    likelihood of the preceding response (e.g., food, money, praise, drugs,

    electrical stimulation of pleasure centers in the brain). Sometimes

    called Reward.

    Negative Reinforcement: The removal of a stimulus increases thelikelihood of the preceding response (e.g., remove hand from a warm

    stove, improve grades to lift restriction, work hard not to get fired).

    Punishment: Any stimulus event that decreases (suppresses) the likelihood

    of a preceding response

    Positive Punishment: The presence of a stimulus (usually aversive suchas slap, a scolding, or a dirty look) decreases (suppresses) the likelihood

    of a preceding response.

    Negative Punishment: The removal of a stimulus (usually

    something pleasant such as TV privileges or a desirable object)

    decreases (suppresses) the likelihood of a preceding response. When thestimulus that is removed is a reinforcer, we call this extinction.

    Reinforcement and Punishment

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    4/17

    Response

    Stimulus

    Consequence

    (Onset/Offset)

    Reinforcer

    (Positive/Negative)

    Punisher

    (Positive/Negative/Extinction)

    Future Responses

    Increase Decrease

    Produce

    (Onset)

    Remove

    (Offset)StimulusConsequence Positive

    ReinforcementReward

    (e.g., praise)

    Negative

    Reinforcement

    (e.g., nagging)

    Positive

    Punishment(e.g., spanking)

    Negative Punishment

    Extinction

    (e.g., time out)

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    5/17

    Reinforcement and Punishment

    as Reciprocal Processes

    Attending Meetings

    +

    Doughnuts

    =

    Increase Attendance(Positive Reinforcement)

    Playing Computer Games

    +

    No Doughnuts

    =

    Reduce Game Playing(Negative Punishment/Extinction)

    Hey Dude to Officer+

    Criticism

    =

    Reduce Verbal Salutations

    (Positive Punishment)

    Saluting an Officer+

    No Criticism

    =

    Increase Saluting as Form of Address

    (Negative Reinforcement)

    The Confused Soldier

    The Problem Employee

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    6/17

    Reinforcement and Punishment

    as Dynamic Processes

    Child

    Response = Dirty Words

    Parent

    Stimulus = Spanking

    Future Result

    Reduce Foul Language

    (Positive Punishment)

    Parent

    Response = Spanking

    Child

    Stimulus = Nice Words

    Future Result

    Increase Use of Spankings(Positive Reinforcement)

    Husband

    Response = Bring Gifts

    Wife

    Stimulus = Stop Sulking

    Future Result

    Increased Gift Giving

    (Negative Reinforcement)

    Wife

    Response = Sulking

    Husband

    Stimulus = Gifts

    Future Result

    Increased Sulking

    (Positive Reinforcement)Marital Dynamics

    Parental Dynamics

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    7/17

    Primary versus Secondary Reinforcement

    Primary: Naturally or innately reinforcing stimuli (e.g., food, water, sex)

    Secondary (Conditioned): Reinforcers that are dependent on theirassociation with other reinforcers (e.g., praise, recognition, money).

    UCS

    (Food)

    UCR

    (Satisfaction)

    CS

    (Money)

    CR

    (Satisfaction)

    Generalized Reinforcer: Secondary reinforcers that have been paired with

    a wide variety of primary reinforcers (e.g., money, praise)

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    8/17

    Comparison Between Classical (Pavlovian)

    and Operant (Instrumental) Conditioning

    Responses

    Stimuli

    Peripheral

    Nervous System

    Examples

    Classical/Pavlovian Operant/Instrumental

    Elicited (Reflex) Emitted (Spontaneous)

    Unconditioned (UCS)

    Conditioned (CS)

    Unobserved (Internal)

    Reinforcing/Punishing

    Discriminative (External)

    Autonomic

    (Involuntary)

    Somatic

    (Voluntary)

    Light - Air Puff

    Tone - Knee Tap

    Bell - Food Powder

    Snap Fingers - Roll over - Treat

    Deadline - Work late - Bonus

    Exam - Study - Good grades

    Association S-S S-R-S

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    9/17

    Acquisition, Extinction, and Spontaneous Recovery

    of and Operantly/Instrumentally Conditioned Response

    REINFORCEDTIME OR

    TRIALS

    ACQUISITION EXTINCTION SPONTANEOUS

    RECOVERY

    SPONTANEOUS

    RECOVERY

    NON-REINFORCED

    TIME OR

    TRIALS

    Break(Interruption)

    Break(Interruption)

    Extinction Burst

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    10/17

    Phases and Principles of Operant/Instrumental Conditioning

    Acquisition: Gradual increase in responding when reinforcing stimulus

    follows the behavior (e.g., toilet training, athletic skills, stupid pet tricks) Successive Approximation (Shaping)

    Chaining: Performing behaviors in a sequence (e.g., ordering take-out)

    Forward Chaining: Train first-to-last

    Backward Chaining: Train last-to-first

    Superstitious Behavior Conditions of Reinforcement

    Reward Delay (delayed gratification)

    Reward Contingency (predictability is good)

    Reward Preference (chocolate better than raisins)

    Reward Amount Diminishing returns

    Contrast effects

    Frequency effects

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    11/17

    WorkProficiency

    0 10 20 30 40 50 6 0 70 80 90 100

    0

    10

    20

    30

    40

    50

    70

    80

    90

    100

    60

    Salary Increase

    Changes in Effectiveness of Reward Amount

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    12/17

    Acquisition (continued) Response Characteristics

    Skeletal muscles (voluntary [somatic] nervous system) easier to

    condition than smooth muscles and glands (involuntary [autonomic]

    nervous system)

    Simple responses easier to condition than complex responsesMotivational Level: Learning is faster and stronger when learner is

    deprived of rewards (better for primary than secondary rewards)

    Competing Rewards: Conditioning is slow and weak if other

    (competing) behaviors are also being rewarded

    Awareness Not necessary for conditioning

    Leads to faster conditioning

    Phases and Principles of Operant/Instrumental Conditioning

    (Continued)

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    13/17

    Phases and Principles of Operant/Instrumental Conditioning

    (Continued)

    Extinction: Reduced responding when the reinforcing stimulus is removed

    (e.g., ignore bed-time tantrums)

    Extinction Burst

    Reinforcement Variability (Schedules of Reinforcement)

    Continuous

    Intermittent (partial)

    Stimulus Variability (e.g., extinguishing smoking habit)

    Response Variability (e.g., extinguishing athletic skills)

    Spontaneous Recovery: Return of the extinguished behavior following an

    interruption (e.g., child stays with grandma)

    Resurgence: Return of a behavior following the extinction of anotherbehavior (e.g., extinction of day-care tantrums produces return to bed-time

    tantrums). Similar to regression, but not always a return to a more

    primitive behavior.

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    14/17

    Theories of Reinforcement Hedonic Theory

    Reinforcement strengthens behavior because it produces pleasurable

    sensations

    Problems with theoryMasochism: Pain (unpleasant sensations) are reinforcers

    Negative reinforcement: Removal of aversive stimuli reduces

    discomfort but is not pleasurable

    Tautology: If it makes you feel good, its a reinforcer. If its a

    reinforcer, it makes you feel good (circular reasoning). Drive Reduction (Hull)

    Drive: A motivational force. Tension from unfulfilled needs or desires

    Primary Drives (e.g. hunger, thirst)

    Secondary Drives (e.g., success, popularity)

    Reinforcer: Any stimulus that reduces drive by fulfilling the needs anddesires (e.g., food, water, money)

    Difficulties with the theory

    Some reinforcers do not reduce drives (e.g., electrical stimulation of

    the brain, copulation without ejaculation)

    Some motivations do not create states of tension that need to be

    reduced (e.g., exploratory behavior)

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    15/17

    Relative Value (Premack)

    Reinforcers viewed as behaviors (e.g., food smell vs. chewing behavior)

    Relative value: Some behaviors are more probable (more preferred)

    than others (e.g., partying vs. studying)

    Premack Principle: High probability (preferred) behavior reinforces

    low probability (non-preferred) behavior Problems with theory

    How to explain strong secondary reinforcers (e.g., why is verbal

    praise such a powerful reward?)

    Sometimes low probability behavior reinforces high probability

    behavior if the less likely behavior has been prevented(e.g., deprivation of study time)

    Response Deprivation (Timberlake & Allison): Relative value of responses

    depends on relative deprivation. Behaviors that are not allowed to occur

    will reinforce other, less deprived, behaviors (e.g., Prohibition in the 1920s

    made drinking booze a much stronger reward).

    Theories of Reinforcement

    (continued)

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    16/17

    Escape Conditioning

    UCS

    (hot water)

    UCR

    (pain reaction)

    Operant

    Response

    (turn nozzle)

    Negative Reinforcer

    (remove hot water)

    UCS

    (shock)

    UCR

    (pain reaction)

    Operant

    Response

    (jump hurdle)

    Negative Reinforcer

    (remove shock)

    Example # 1

    Example # 2

    Avoidance Conditioning

    OperantResponse

    (turn nozzle)

    Negative Reinforcer(avoid hot water)

    Operant

    Response(jump hurdle)

    Negative Reinforcer

    (avoid shock)

    Example # 1

    Example # 2

    Warning Signal

    (flushing toilet)

    Warning Signal(ringing bell)

  • 7/27/2019 PSY211 Operant Conditioning Reinforcement

    17/17

    Theories of Avoidance Two Processes

    Classical Conditioning

    UCS: A noxious stimulus that produces an unpleasant reaction

    (e.g., flinch, startle reaction) or an escape response (e.g., jump aside, run away) CS: Some signal that precedes the noxious stimulus (light, bell, flush)

    Operant Conditioning

    Operant Response: Response that removes the noxious stimulus

    Negative Reinforcer: Termination of a noxious stimulus

    Explanation: The CS becomes noxious and the animal learns to escape the noxious CS

    ProblemsAvoidance continues even after CS loses its aversive qualities

    Avoidance response does not extinguish even though CS is no

    longer paired with UCS

    UCS

    (hot water)

    UCR

    (jump aside)

    Operant

    Response(turn nozzle)

    Negative Reinforcer

    (remove hot water)

    CS

    (flush)

    One Process: Only operant conditioning is involved in avoidance. The warning signal (CS)

    becomes a discriminative stimulus