View
218
Download
1
Tags:
Embed Size (px)
Citation preview
SSTTAANNFFOORRDD
Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion
J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng
GoalInitial Position
1. Motivating Application•Planning footsteps for a quadruped robot over challenging, irregular, previously unseen terrain
•Good footsteps need to properly trade off several features: slope, proximity to drop-offs, stability of robot’s pose, etc.
•Highly non-trivial to hand-specify the reward function for a planner, which requires manually determining relative weights for all features
2. Apprenticeship Learning Background•Key idea of Apprenticeship Learning: often easier to demonstrate good behavior than to specify a reward that induces this behavior
•Two factors make Apprenticeship Learning hard to apply to large, complex problems such as quadruped planning:
1. Very difficult, even for a domain expert, to specify a good complete path (e.g., a full set of footsteps across terrain)
2. Even given a reward function, planning (e.g. finding a complete set of a footsteps) is a hard, high-dimensional, task
5. Experimental ResultsMulti-room Grid World
S
G
• 10x10 rooms connected by doors, where each room is a 10x10 grid world
• High level demonstration shows only room-to-room path (using true reward function)
• Low-level demonstration shows only local greedy action at grid level
6. Related Work•Apprenticeship Learning: Abbeel and Ng (2004), Ratliff et. al (2006, 2007), Neu and Szepesvari (2007), Syed and Schapire (2007)
•Hierarchical Reinforcement Learning: Parr and Russell (1998), Sutton et. al (1999), Dietterich (2000), Barto and Mahadevan (2003)
7. Conclusion•Presented a novel algorithm for applying apprenticeship learning to large, complex domains via hierarchical decomposition
•Demonstrated algorithm on multi-room grid world and challenging quadruped task, where we achieve state-of the-art performance
•More generally, algorithm is applicable whenever reward function can be hierarchically decomposed as described above
Quadruped Robot
• Evaluated algorithm on easier terrain for training, and harder terrain for testing
• On training terrain, demonstrated a single high-level body path and 20 greedy low-level foot placements (~10 minutes to gather all data)
• System achieves state-of-the-art performance on this task
3. Hierarchical Apprenticeship Learning: Main Idea
Step 1: High level Plan path for center of robot body
Step 2: Low level Plan footsteps along body path
GoalInitial Position
2) Demonstrate good behavior at each level separately1) Decompose planning task into multiple levels of abstraction
Easier to specify a path in the reduced, abstract state space than in the full state space
GoalInitial Position
Initial Position Goal
Footstep specified by
teacher
Current foot positions
Easier to demonstrate greedy actions than long-term optimal actions
4. Convex Formulation•Two assumptions on the reward function
1. Reward is linear in state features
2. High level rewards are averages of low level rewards
•High-level demonstrations imply constraints on value function
•Low-level demonstrations imply constraints on reward function
•Can combine high and low-level constraints (plus adding slack variables) to form a single, unified, convex optimization problem
SSTTAANNFFOORRDD
Planned Footsteps
Training Terrain Testing Terrain
No Planning
Hierarchical Apprenticeship Learning
High Level (Body Path) Constraints Only
Low Level (Footstep) Constraints Only
High level demonstration:Demonstrate body path across terrain
Low level demonstration:Greedy local footsteps at a few key locations