50
Learning from Observation Using Primitives Darrin Bentivegna

Learning from Observation Using Primitives Darrin Bentivegna

Embed Size (px)

Citation preview

Page 1: Learning from Observation Using Primitives Darrin Bentivegna

Learning from Observation Using Primitives

Darrin Bentivegna

Page 2: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Outline

Motivation Test environments Learning from observation Learning from practice Contributions Future directions

Page 3: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Motivation

Reduce the learning time needed by robots. Quickly learn skills from observing others. Improve performance through practice . Adapt to environment changes. Create robots that can interact with and learn

from humans in a human-like way.

Page 4: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Real World Marble Maze

Page 5: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Real World Air Hockey

Page 6: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Research Strategy

Roll Off WallRoll To Corner

Guide Roll From WallLeave Corner

Domain knowledge: library of primitives. Manually defining primitives is a natural way to

specify domain knowledge. Focus of research is on how to use a fixed

library of primitivesMarble Maze Primitives

Page 7: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Primitives in Air Hockey

Right Bank Shot Straight ShotLeft Bank Shot

- Defend Goal -Static Shot -Idle

Page 8: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Take home message

Learning using primitives greatly speeds up learning and allows more complex problems to be performed by robots.

Memory based learning makes learning from observation easy.

I created a way to do memory based reinforcement learning. Problem is no fixed set of parameters to adjust. Learn by adjusting distance function.

Present algorithms that learn from both observation and practice.

Page 9: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Raw Data

Observe Critical Events in Marble Maze

Page 10: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Raw Data

Wall Contact Inferred

Observe Critical Events in Marble Maze

Page 11: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Shots made by human

Human paddle movement

Puck movement

+x

+y

Pad

dle

YP

addl

e X

Puc

k X

Puc

k Y

Shots made by human

Human paddle movement

Puck movement

+x

+y

+x

+y

Pad

dle

YP

addl

e X

Puc

k X

Puc

k Y

Observe Critical Events in Air Hockey

Page 12: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Learning From Observation

Memory-based learner: Learn by storing experiences.

Primitive selection: K-nearest neighbor. Sub-goal generation: Kernel regression

(distance weighted averaging) based on remembered primitives of the appropriate type.

Action generation: Learned or fixed policy.

Page 13: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Three Level Structure

Primitive Selection

Action Generation

Sub-goalGeneration

Page 14: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Learning from Observation Framework

Primitive Selection

Action Generation

Sub-goalGeneration

Learning from Observation

Page 15: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Observe Primitives Performed by a Human

◊-Guide ○-Roll To Corner □- Roll Off Wall*-Roll From Wall X-Leave Corner

0 5 10 15 20 250

5

10

15

20

Page 16: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Primitive Database

Create a data point for each observed primitive. The primitive type performed: TYPE State of the environment at the start of the primitive

performance.

State of the environment at the end of the primitive performance.

),,,,,( yxyxyx BBMMMM

),,,,,( yxyxyx EBEBMEMEEMEM

),,,,,,( yxyxyx EBEBMEMEEMEMTYPE ),,,,,( yxyxyx BBMMMM

Page 17: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Marble Maze Example

Primitive Selection

Action Generation

Sub-goalGeneration

Primitive Selection

Action Generation

Sub-goalGeneration

Page 18: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Primitive Type Selection

Lookup using environment state.

Weighted nearest neighbor.

Many ways to select a primitive type. Use closest point. Use n nearest points to vote.

Highest frequency. Weighted by distance from the query point.

j jjj qxwqxd 2)(),(

),,,,,( yxyxyx BBMMMMq

),,,,,,( yxyxyx EBEBMEMEEMEMTYPE ),,,,,( yxyxyx BBMMMM

Primitive Selection

Action Generation

Sub-goalGeneration

Primitive Selection

Action Generation

Sub-goalGeneration

Page 19: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Sub-goal Generation

Locally weighted average over nearby primitives (data points) of the same type.

Use a kernel function to control the influence of nearby data points. 2dedK

n

i

n

ii

qxdK

qxdKyqy

1

1

)),((

)),(()( ),,,,,( yxyxyx EBEBMEMEEMEM

),,,,,,( yxyxyx EBEBMEMEEMEMTYPE ),,,,,( yxyxyx BBMMMM

Primitive Selection

Action Generation

Sub-goalGeneration

Primitive Selection

Action Generation

Sub-goalGeneration

Page 20: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Action Generation

Provides the action (motor command) to perform at each time step.

LWR, neural networks, physical model, etc.

Primitive Selection

Action Generation

Sub-goalGeneration

Primitive Selection

Action Generation

Sub-goalGeneration

Page 21: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004),( yx BB )),,,,,(),,,,,,(( yxyxyxyxyxyx EBEBMEMEEMEMBBMMMM

Creating an Action Generation Module (Roll to Corner)

Record at each time step from the beginning to the end of the primitive: Environment state: Actions taken: End state:

),,,,,( yxyxyx BBMMMM

),,,,,( yxyxyx EBEBMEMEEMEM ),( yx BB

Observed environment states

End of theRoll to Corner Primitive

Start of the Roll to Corner Primitive

Incoming Velocity Vector

Page 22: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Transform to a Local Coordinate Frame

Global information

Primitive specific local information.

),( yx BB )),,,,,(),,,,,,(( yxyxyxyxyxyx EBEBMEMEEMEMBBMMMM

),( yx BB )),,(),,,,(( yxxyxx EBEBMEBBMX

Dist to the end

Reference Point

+Y

+X

Page 23: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Learning the Maze Task from Only Observation

Page 24: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Related Research: Primitive Recognition

Survey of research in human motion analysis and recognizing human activities from image sequences. Aggarwal and Cai.

Recognize over time. HMM, Brand, Oliver, and Pentland. Template matching, Davis and Bobick.

Discover Primitives. Fod, Mataric, and Jenkins.

Page 25: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Related Research: Primitive Selection

Predefined sequence. Virtual characters, Hodgins, et al., Faloutsos, et

al., and Mataric et al. Mobile robots, Balch, et al. and Arkin, et al.

Learn from observation. Assembly, Kuniyoshi, Inaba, Inoue, and Kang.

Use a planning system. Assembly, Thomas and Wahl. RL, Ryan and Reid.

Page 26: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Related Research: Primitive Execution Predefine execution policy.

Virtual characters, Mataric et al., and Hodgins et al.

Mobile robots, Brooks et al. and Arkin. Learn while operating in the environment.

Mobile robots, Mahadevan and Connell RL, Kaelbling, Dietterich, and Sutton at al.

Learn from observation Mobile robots, Larson and Voyles, Hugues and

Drogoul, Grudic and Lawrence. High DOF robots, Aboaf et al., Atkeson, and

Schaal.

Page 27: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Primitive Selection

Action Generation

Sub-goalGeneration

Learning from Observation

Review

Page 28: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Using Only Observed Data

Tries to mimic the teacher. Can not always perform primitives as well as

the teacher. Sometimes select the wrong primitive type for

the observed state. Does not know what to do in states it has not

observed. No way to know it should try something

different. Solution: Learning from practice.

Page 29: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Primitive Selection

Action Generation

Sub-goalGeneration

Learning from Observation

Learning from Practice

Improving Primitive Selection and Sub-goal Generation from Practice

Page 30: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Improving Primitive Selection and Sub-goal Generation Through Practice

Need task specification information to create a reward function.

Learn by adjusting distance to query: Scale distance function by value of using a data point.

f(data point location, query location) related to Q value: 1/Q or exp(-Q)

Associate scale values with each data point. The scale values must be stored, and learned.

j jjj qxwqxd 2)(),( ),(),(),( qxfqxdqxd

Page 31: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Store Values in Function Approximator

Look-up table. Fixed size.

Locally Weighted Projection Regression (LWPR), Schaal, et al.

Create a model for each data point. Indexed by the difference between the query

point and data point’s state (delta-state).

),,,,,,( yxyxyx EBEBMEMEEMEMTYPE ),,,,,( yxyxyx BBMMMM

Page 32: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Learn Values Using a Reinforcement Learning Strategy

State: delta-state. Action: Using this data point. Reward Assignment

Positive: Making progress through the maze. Negative:

Falling into a hole. Going backwards through the maze. Taking time performing the primitive.

Page 33: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Learning the Value of Choosing a Data Point (Simulation)

Incoming Velocity Vector

Observed Roll Off Wall Primitive

= Two marble positions with the incoming velocity as shown when the LWPR model associated with the Roll Off Wall primitive shown is queried.

Testing area

+Y

+X

(12.9,18.8)

Computed Scale Values

BAD

GOOD

Page 34: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Maze Learning from Practice

0 50 100 150 200 250 3000

20

40

60

80

100

120

140

160

180

200

Cumulative failures/meter

Table

LWPR

Obs. Only

Real World Simulation

Page 35: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Learning New StrategiesHuman

Learning From Observation After Learning From Practice

Page 36: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Primitive Selection

Action Generation

Sub-goalGeneration

Learning from Observation

Learning from Practice

Learning Action Generation from Practice

Page 37: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Improving Action Generation Through Practice

Environment changes over time. Need to compensate for structural modeling

error. Can not learn everything from only observing

others.

Page 38: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Knowledge for Making a Hit

After hit location has been determined. Puck movement Puck-paddle collision Paddle placement Paddle movement timing

HitLocation

TargetLocation Path of the

incoming puck

Hit LineAbsolutePost-hitVelocity

TargetLine

PuckMotion

Impact Robot

TargetLocation

OutgoingPuck Velocity

IncomingPaddle Velocity

RobotTrajectory

Page 39: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Results of Learning Straight Shots (Simulation)

Observed 44 straight shots made by the human.

Running average of 5 shots. Too much noise in hardware sensing.

Practice100 Shots

Number of shot taken

Tar

get

E

rro

r(m

ete

rs)

PuckMotion

Impact Robot

TargetLocation

OutgoingPuck Velocity

IncomingPaddle Velocity

RobotTrajectory

Page 40: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Robot Model (Real World)

PuckMotion

Impact Robot

TargetLocation

OutgoingPuck Velocity

IncomingPaddle Velocity

RobotTrajectory

Page 41: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Obtaining Proper Robot Movement

Six set robot configurations. Interpolate between the four

surrounding configurations.

P(x,y)

+Y

+X

Compute joint angles

Paddle command: Desired end location and time of the trajectory, (x, y, t). Follows fifth-order polynomial equation, zero start and end

velocity and acceleration.

Page 42: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Robot Model

Desired state of the puck at hit time.

Robot trajectory

0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.40

0.05

0.1

0.15

0.2

0.25

0.3

+x

+y

Starting location

(x, y, t)

Compute the movement command

Pre-set time delay

Generate robot

trajectory

(x, y, t)

Page 43: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Robot Movement Errors

0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.40

0.05

0.1

0.15

0.2

0.25

0.3

Desired trajectory

Observed path of the paddle.Desired hit location –

Location of highest paddle velocity -

+x

+y

Starting location

Movement accuracy determined by many factors. Speed of the movement. Friction between the paddle and the board. Hydraulic pressure applied to the robot. Operating within the designed performance parameters.

Page 44: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Robot Model

Learn to properly place the paddle. Learn the timing of the paddle. Observe its own actions:

Actual hit point (highest velocity point). Time from when the command is given to the

time the paddle observed at the hit position.

PuckMotion

Impact Robot

TargetLocation

OutgoingPuck Velocity

IncomingPaddle Velocity

RobotTrajectory

Page 45: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Improving the Robot Model

Desired state of the puck at hit time.

Robot trajectory

Robot Movement

LWPR

(x, y, t)

Time delay from

LWPR

Generate robot

trajectory

(x, y, t)

Timing info.

(x, y, t)

Compute the movement command

Pre-set time delay

Generate robot

trajectory

(x, y, t)

Page 46: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Using the Improved Robot Model

Desired trajectory

Observed path of the paddle.

Desired hit location – Location of highest paddle velocity -

+x

+y

Starting location

Page 47: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Using the Improved Robot Model

Desired trajectory

Observed path of the paddle.

Desired hit location – Location of highest paddle velocity -

+x

+y

Starting location

Page 48: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Real-World Air Hockey

Page 49: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Major Contributions

A framework has been created as a tool in which to perform research in learning from observation using primitives. Flexible structure allows for the use of various learning

algorithms. Can also learn from practice.

Presented learning methods that can learn quickly from observed information and also have the ability to increase performance through practice.

Created a unique algorithm that gives a robot the ability to learn the effectiveness of data points in a data base and then use that information to change its behavior as it operates in the environment.

Presented a method of breaking the learning problem into small learning modules. Individual modules have more opportunities to learn and

generalize.

Page 50: Learning from Observation Using Primitives Darrin Bentivegna

Bentivegna Thesis, July 2004

Some Future Directions

Automatically defining primitive types. Explore how to represent learned information so it can

be used in other tasks/environments. Can robots learn about the world from playing these

games? Explore other ways to select primitives and sub-goals. Use the observed information to create a planner. Investigate methods of exploration at primitive

selection and sub-goal generation. Simultaneously learn primitive selection and action

generation.