1
S S T T A A N N F F O O R R D D Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position 1. Motivating Application •Planning footsteps for a quadruped robot over challenging, irregular, previously unseen terrain •Good footsteps need to properly trade off several features: slope, proximity to drop-offs, stability of robot’s pose, etc. •Highly non-trivial to hand-specify the reward function for a planner, which requires manually determining relative weights for all features 2. Apprenticeship Learning Background •Key idea of Apprenticeship Learning : often easier to demonstrate good behavior than to specify a reward that induces this behavior •Two factors make Apprenticeship Learning hard to apply to large, complex problems such as quadruped planning: 1. Very difficult, even for a domain expert, to specify a good complete path (e.g., a full set of footsteps across terrain) 2. Even given a reward function, planning (e.g. finding a complete set of a footsteps) is a hard, high-dimensional, task 5. Experimental Results Multi-room Grid World S G • 10x10 rooms connected by doors, where each room is a 10x10 grid world • High level demonstration shows only room-to-room path (using true reward function) • Low-level demonstration shows only local greedy action at grid level 6. Related Work •Apprenticeship Learning: Abbeel and Ng (2004), Ratliff et. al (2006, 2007), Neu and Szepesvari (2007), Syed and Schapire (2007) •Hierarchical Reinforcement Learning: Parr and Russell (1998), Sutton et. al (1999), Dietterich (2000), Barto and Mahadevan (2003) 7. Conclusion •Presented a novel algorithm for applying apprenticeship learning to large, complex domains via hierarchical decomposition •Demonstrated algorithm on multi-room grid world and challenging quadruped task, where we achieve state-of the-art performance •More generally, algorithm is applicable whenever reward function can be hierarchically Quadruped Robot • Evaluated algorithm on easier terrain for training, and harder terrain for testing • On training terrain, demonstrated a single high-level body path and 20 greedy low-level foot placements (~10 minutes to gather all data) • System achieves state- of-the-art performance on this task 3. Hierarchical Apprenticeship Learning: Main Idea Step 1: High level Plan path for center of robot body Step 2: Low level Plan footsteps along body path Goal Initial Position 2) Demonstrate good behavior at each level separately 1) Decompose planning task into multiple levels of abstraction Easier to specify a path in the reduced, abstract state space than in the full state space Goal Initial Position Initial Position Goal Footstep specified by teacher Current foot positions Easier to demonstrate greedy actions than long-term optimal actions 4. Convex Formulation •Two assumptions on the reward function 1. Reward is linear in state features 2. High level rewards are averages of low level rewards •High-level demonstrations imply constraints on value function •Low-level demonstrations imply constraints on reward function •Can combine high and low-level constraints (plus adding slack variables) to form a single, unified, convex optimization problem S S T T A A N N F F O O R R D D Planned Footsteps Training Terrain Testing Terrain No Planning Hierarchical Apprenticeship Learning High Level (Body Path) Constraints Only Low Level (Footstep) Constraints Only High level demonstration: Demonstrate body path across terrain Low level demonstration: Greedy local footsteps at a few key locations

STANFORD Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: STANFORD Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position

SSTTAANNFFOORRDD

Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng

GoalInitial Position

1. Motivating Application•Planning footsteps for a quadruped robot over challenging, irregular, previously unseen terrain

•Good footsteps need to properly trade off several features: slope, proximity to drop-offs, stability of robot’s pose, etc.

•Highly non-trivial to hand-specify the reward function for a planner, which requires manually determining relative weights for all features

2. Apprenticeship Learning Background•Key idea of Apprenticeship Learning: often easier to demonstrate good behavior than to specify a reward that induces this behavior

•Two factors make Apprenticeship Learning hard to apply to large, complex problems such as quadruped planning:

1. Very difficult, even for a domain expert, to specify a good complete path (e.g., a full set of footsteps across terrain)

2. Even given a reward function, planning (e.g. finding a complete set of a footsteps) is a hard, high-dimensional, task

5. Experimental ResultsMulti-room Grid World

S

G

• 10x10 rooms connected by doors, where each room is a 10x10 grid world

• High level demonstration shows only room-to-room path (using true reward function)

• Low-level demonstration shows only local greedy action at grid level

6. Related Work•Apprenticeship Learning: Abbeel and Ng (2004), Ratliff et. al (2006, 2007), Neu and Szepesvari (2007), Syed and Schapire (2007)

•Hierarchical Reinforcement Learning: Parr and Russell (1998), Sutton et. al (1999), Dietterich (2000), Barto and Mahadevan (2003)

7. Conclusion•Presented a novel algorithm for applying apprenticeship learning to large, complex domains via hierarchical decomposition

•Demonstrated algorithm on multi-room grid world and challenging quadruped task, where we achieve state-of the-art performance

•More generally, algorithm is applicable whenever reward function can be hierarchically decomposed as described above

Quadruped Robot

• Evaluated algorithm on easier terrain for training, and harder terrain for testing

• On training terrain, demonstrated a single high-level body path and 20 greedy low-level foot placements (~10 minutes to gather all data)

• System achieves state-of-the-art performance on this task

3. Hierarchical Apprenticeship Learning: Main Idea

Step 1: High level Plan path for center of robot body

Step 2: Low level Plan footsteps along body path

GoalInitial Position

2) Demonstrate good behavior at each level separately1) Decompose planning task into multiple levels of abstraction

Easier to specify a path in the reduced, abstract state space than in the full state space

GoalInitial Position

Initial Position Goal

Footstep specified by

teacher

Current foot positions

Easier to demonstrate greedy actions than long-term optimal actions

4. Convex Formulation•Two assumptions on the reward function

1. Reward is linear in state features

2. High level rewards are averages of low level rewards

•High-level demonstrations imply constraints on value function

•Low-level demonstrations imply constraints on reward function

•Can combine high and low-level constraints (plus adding slack variables) to form a single, unified, convex optimization problem

SSTTAANNFFOORRDD

Planned Footsteps

Training Terrain Testing Terrain

No Planning

Hierarchical Apprenticeship Learning

High Level (Body Path) Constraints Only

Low Level (Footstep) Constraints Only

High level demonstration:Demonstrate body path across terrain

Low level demonstration:Greedy local footsteps at a few key locations