27
Integrated Learning for Integrated Learning for Interactive Characters Interactive Characters Bruce Blumberg, Marc Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Downie, Yuri Ivanov, Matt Berlin, Michael P. Berlin, Michael P. Johnson, Bill Tomlinson. Johnson, Bill Tomlinson.

Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Embed Size (px)

Citation preview

Page 1: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Integrated Learning for Integrated Learning for Interactive CharactersInteractive CharactersIntegrated Learning for Integrated Learning for Interactive CharactersInteractive Characters

Bruce Blumberg, Marc Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Berlin, Michael P. Johnson,

Bill Tomlinson.Bill Tomlinson.

Bruce Blumberg, Marc Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Berlin, Michael P. Johnson,

Bill Tomlinson.Bill Tomlinson.

Page 2: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Practical & compelling real-time learningPractical & compelling real-time learning

• Easy for interactive characters to learn what they ought to be able to learn

• Easy for a human trainer to guide learning process

• A compelling user experience

• Provide heuristics and practical design principles

• Easy for interactive characters to learn what they ought to be able to learn

• Easy for a human trainer to guide learning process

• A compelling user experience

• Provide heuristics and practical design principles

Page 3: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Related WorkRelated Work

• Reinforcement learningReinforcement learning• Barto & Sutton 98, Mitchell 97, Kaelbling 90, Drescher 91

• Animal trainingAnimal training• Lindsay 00, Lorenz & Leyahusen 73, Ramirez 99, Pryor 99, Coppinger 01

• Motor learningMotor learning• van de Panne et al 93,94, Grzeszczuk & Terzopoulos 95, Hodgins &

Pollard 97, Gleicher 98, Faloutsos et al 01

• Behavior ArchitecturesBehavior Architectures• Reynolds 87, Tu & Terzopoulos 94, Perlin & Goldberg 96, Funge et al 99,

Burke et al 01

• Computer games & digital petsComputer games & digital pets• Dogz, AIBO, Black & White

• Reinforcement learningReinforcement learning• Barto & Sutton 98, Mitchell 97, Kaelbling 90, Drescher 91

• Animal trainingAnimal training• Lindsay 00, Lorenz & Leyahusen 73, Ramirez 99, Pryor 99, Coppinger 01

• Motor learningMotor learning• van de Panne et al 93,94, Grzeszczuk & Terzopoulos 95, Hodgins &

Pollard 97, Gleicher 98, Faloutsos et al 01

• Behavior ArchitecturesBehavior Architectures• Reynolds 87, Tu & Terzopoulos 94, Perlin & Goldberg 96, Funge et al 99,

Burke et al 01

• Computer games & digital petsComputer games & digital pets• Dogz, AIBO, Black & White

Page 4: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Dobie T. Coyote Goes to SchoolDobie T. Coyote Goes to School

QuickTime™ and aAnimation decompressorare needed to see this picture.

Page 5: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Reinforcement Learning (R.L.) As Starting Point

Reinforcement Learning (R.L.) As Starting Point

A1 A2 A3

S1 Q(1,1) Q(1,2) Q(1,3)

S2 Q(2,1) Q(2,2) Q(2,3)

S3 Q(3,1) Q(3,2) Q(3,3)

Utility of taking action A3 in state S2

Set of all possible actions

Set of all possible states of world

• Dogs solve a simpler problem in a Dogs solve a simpler problem in a much larger space & one that is more much larger space & one that is more relevant to interactive charactersrelevant to interactive characters

• Dogs solve a simpler problem in a Dogs solve a simpler problem in a much larger space & one that is more much larger space & one that is more relevant to interactive charactersrelevant to interactive characters

Page 6: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

D.L.: Take Advantage of Predictable Regularities

D.L.: Take Advantage of Predictable Regularities• Constrain search for causal agents by Constrain search for causal agents by

taking advantage of temporal proximity taking advantage of temporal proximity & natural hierarchy of state spaces& natural hierarchy of state spaces• Use consequences to bias choice of action

• But vary performance and attend to differences

• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand

• Constrain search for causal agents by Constrain search for causal agents by taking advantage of temporal proximity taking advantage of temporal proximity & natural hierarchy of state spaces& natural hierarchy of state spaces• Use consequences to bias choice of action

• But vary performance and attend to differences

• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand

Page 7: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

D.L.: Make Use of All Feedback: Explicit & Implicit

D.L.: Make Use of All Feedback: Explicit & Implicit• Use rewarded action as context for Use rewarded action as context for

identifying identifying •Promising state space and action space to

explore

•Good examples from which to construct perceptual models, e.g.,

•A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

• Use rewarded action as context for Use rewarded action as context for identifying identifying •Promising state space and action space to

explore

•Good examples from which to construct perceptual models, e.g.,

•A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

Page 8: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

D.L.: Make Them Easy to TrainD.L.: Make Them Easy to Train

• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies

• Support Luring and ShapingSupport Luring and Shaping•Techniques to prompt infrequently expressed

or novel motor actions

• ““Trainer friendly” credit Trainer friendly” credit assignmentassignment•Assign credit to candidate that matches

trainer’s expectation

• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies

• Support Luring and ShapingSupport Luring and Shaping•Techniques to prompt infrequently expressed

or novel motor actions

• ““Trainer friendly” credit Trainer friendly” credit assignmentassignment•Assign credit to candidate that matches

trainer’s expectation

Page 9: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

The SystemThe System

Page 10: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Representation of State: PerceptRepresentation of State: Percept

• Percepts are Percepts are atomic perception atomic perception unitsunits

• Recognize and Recognize and extract features extract features from sensory datafrom sensory data

• Model-basedModel-based

• Organized in Organized in dynamic hierarchydynamic hierarchy

• Percepts are Percepts are atomic perception atomic perception unitsunits

• Recognize and Recognize and extract features extract features from sensory datafrom sensory data

• Model-basedModel-based

• Organized in Organized in dynamic hierarchydynamic hierarchy

Page 11: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Representation of State-Action Pairs: Action Tuples

Representation of State-Action Pairs: Action Tuples

Percept Action

Value

Novelty

Reliability

ValuePercept Activation

Action Tuples are organized Action Tuples are organized in dynamic hierarchy and in dynamic hierarchy and compete probabilistically compete probabilistically based on their learned value based on their learned value and reliability and reliability

Page 12: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Representation of Action: Labeled Path Through Space of Body Configurations

Representation of Action: Labeled Path Through Space of Body Configurations• A motor program generates a path A motor program generates a path

through a graph of annotated through a graph of annotated poses, e.g.,poses, e.g.,•Sit animation

•Follow-your-nose procedure

• Paths can be compared and Paths can be compared and classified just like perceptual classified just like perceptual events using Motor Model Perceptsevents using Motor Model Percepts

• A motor program generates a path A motor program generates a path through a graph of annotated through a graph of annotated poses, e.g.,poses, e.g.,•Sit animation

•Follow-your-nose procedure

• Paths can be compared and Paths can be compared and classified just like perceptual classified just like perceptual events using Motor Model Perceptsevents using Motor Model Percepts

Page 13: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Use Time to Constrain Search for Causal Agents

Use Time to Constrain Search for Causal Agents

Sit

Attention Window:Look here for cues that appear correlated with increased likelihood of action being followed by a good thing

Good Thing

Consequences Window:Assume any good or bad things that happen here are associated with the preceding action and the context in which it was performed

Scratch

Time

Page 14: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Four Important Tasks Are Performed During Credit Assignment

Four Important Tasks Are Performed During Credit Assignment• Choose most worthy Action Tuple Choose most worthy Action Tuple

heuristically based on reliability heuristically based on reliability and novelty statisticsand novelty statistics

• Update valueUpdate value• Create new Action Tuples as Create new Action Tuples as

appropriateappropriate• Guide State and Action Space Guide State and Action Space

DiscoveryDiscovery

• Choose most worthy Action Tuple Choose most worthy Action Tuple heuristically based on reliability heuristically based on reliability and novelty statisticsand novelty statistics

• Update valueUpdate value• Create new Action Tuples as Create new Action Tuples as

appropriateappropriate• Guide State and Action Space Guide State and Action Space

DiscoveryDiscovery

Page 15: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Most Worthy Action Tuple Gets CreditMost Worthy Action Tuple Gets Credit

Sit

Time

“sit-utterance” perceived.

Good Thing

“click” perceived.

<true/Sit> begins

But credit goes to <“sit-utterance”/Sit>

Page 16: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Create New Action Tuples As AppropriateCreate New Action Tuples As Appropriate

Page 17: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Implicit Feedback Guides State Space Discovery

Implicit Feedback Guides State Space Discovery

Good Thing appears. Create a new Percept with “beg” example as initial model

Time

Utterance occurs within window but not classified by any existing percept

“beg”

This means that Percepts are only created to recognize “promising” utterances

Beg Good ThingScratch

Page 18: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Implicit Feedback Identifies Good Examples

Implicit Feedback Identifies Good Examples

Beg Good Thing

Good Thing appears. Update model of “beg” utterance using “beg” that occurred in attention window

Scratch

Time

Classify utterance as “beg”.

“beg”

This means model is built using good examples

Page 19: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Unrewarded Examples Don’t Get Added to Models

Unrewarded Examples Don’t Get Added to Models

Beg Sit

Beg ends without food appearing. Do not update model since example may have been bad.

Scratch

Time

Utterance classified as “Beg” by mistake. Beg becomes active.

“Leg”

Actually, bad examples can be used to build model of “not-Beg.”

Page 20: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Implicit Feedback Guides Action Space Discovery

Implicit Feedback Guides Action Space Discovery

Good Thing appears. Compare accumulated path to known paths

Time

“Follow-your-nose” action accumulates path through pose-space

Down

Down gets the credit for Good Thing appearing, rather than “Follow-your-nose.”

Follow-your-nose Good Thing

Page 21: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

If Path Is Novel, Create a New Motor Program and Action

If Path Is Novel, Create a New Motor Program and Action

Good Thing appears. Compare accumulated path to known paths

Time

“Follow-your-nose” action accumulates path through pose-space

Figure-8 is created and subsequent examples of Figure-8 are used to improve model of path

Figure-8

Follow-your-nose Good Thing

Page 22: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Dobie T. Coyote…Dobie T. Coyote…

QuickTime™ and aAnimation decompressorare needed to see this picture.

Page 23: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Limitations and Future WorkLimitations and Future Work

• Important extensions Important extensions •Other kinds of learning (e.g., social or

spatial)

•Generalization

•Sequences

•Expectation-based emotion system

• How will the system scale?How will the system scale?

• Important extensions Important extensions •Other kinds of learning (e.g., social or

spatial)

•Generalization

•Sequences

•Expectation-based emotion system

• How will the system scale?How will the system scale?

Page 24: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Useful InsightsUseful Insights

• UseUse•Temporal proximity to limit search.

•Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration

• “trainer friendly” credit assignment

• Luring and shaping are essentialLuring and shaping are essential

• UseUse•Temporal proximity to limit search.

•Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration

• “trainer friendly” credit assignment

• Luring and shaping are essentialLuring and shaping are essential

Page 25: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

AcknowledgementsAcknowledgements

• Members of the Synthetic Members of the Synthetic Characters Group, past, present & Characters Group, past, present & futurefuture

• Gary WilkesGary Wilkes

• Funded by the Digital Life Funded by the Digital Life ConsortiumConsortium

• Members of the Synthetic Members of the Synthetic Characters Group, past, present & Characters Group, past, present & futurefuture

• Gary WilkesGary Wilkes

• Funded by the Digital Life Funded by the Digital Life ConsortiumConsortium

Page 26: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson
Page 27: Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson

Reinforcement Learning (R.L.) as starting point

Reinforcement Learning (R.L.) as starting point• GoalGoal

•Learn optimal set of actions that will take creature from any arbitrary state to a goal state

• ApproachApproach•Probabilistically explore states, actions and

their outcomes to learn how to act in any given state.

• GoalGoal•Learn optimal set of actions that will take

creature from any arbitrary state to a goal state

• ApproachApproach•Probabilistically explore states, actions and

their outcomes to learn how to act in any given state.