S S T T A A N N F F O O R R D D Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position 1. Motivating Application •Planning footsteps for a quadruped robot over challenging, irregular, previously unseen terrain •Good footsteps need to properly trade off several features: slope, proximity to drop-offs, stability of robot’s pose, etc. •Highly non-trivial to hand-specify the reward function for a planner, which requires manually determining relative weights for all features 2. Apprenticeship Learning Background •Key idea of Apprenticeship Learning : often easier to demonstrate good behavior than to specify a reward that induces this behavior •Two factors make Apprenticeship Learning hard to apply to large, complex problems such as quadruped planning: 1. Very difficult, even for a domain expert, to specify a good complete path (e.g., a full set of footsteps across terrain) 2. Even given a reward function, planning (e.g. finding a complete set of a footsteps) is a hard, high-dimensional, task 5. Experimental Results Multi-room Grid World S G • 10x10 rooms connected by doors, where each room is a 10x10 grid world • High level demonstration shows only room-to-room path (using true reward function) • Low-level demonstration shows only local greedy action at grid level 6. Related Work •Apprenticeship Learning: Abbeel and Ng (2004), Ratliff et. al (2006, 2007), Neu and Szepesvari (2007), Syed and Schapire (2007) •Hierarchical Reinforcement Learning: Parr and Russell (1998), Sutton et. al (1999), Dietterich (2000), Barto and Mahadevan (2003) 7. Conclusion •Presented a novel algorithm for applying apprenticeship learning to large, complex domains via hierarchical decomposition •Demonstrated algorithm on multi-room grid world and challenging quadruped task, where we achieve state-of the-art performance •More generally, algorithm is applicable whenever reward function can be hierarchically Quadruped Robot • Evaluated algorithm on easier terrain for training, and harder terrain for testing • On training terrain, demonstrated a single high-level body path and 20 greedy low-level foot placements (~10 minutes to gather all data) • System achieves state- of-the-art performance on this task 3. Hierarchical Apprenticeship Learning: Main Idea Step 1: High level Plan path for center of robot body Step 2: Low level Plan footsteps along body path Goal Initial Position 2) Demonstrate good behavior at each level separately 1) Decompose planning task into multiple levels of abstraction Easier to specify a path in the reduced, abstract state space than in the full state space Goal Initial Position Initial Position Goal Footstep specified by teacher Current foot positions Easier to demonstrate greedy actions than long-term optimal actions 4. Convex Formulation •Two assumptions on the reward function 1. Reward is linear in state features 2. High level rewards are averages of low level rewards •High-level demonstrations imply constraints on value function •Low-level demonstrations imply constraints on reward function •Can combine high and low-level constraints (plus adding slack variables) to form a single, unified, convex optimization problem S S T T A A N N F F O O R R D D Planned Footsteps Training Terrain Testing Terrain No Planning Hierarchical Apprenticeship Learning High Level (Body Path) Constraints Only Low Level (Footstep) Constraints Only High level demonstration: Demonstrate body path across terrain Low level demonstration: Greedy local footsteps at a few key locations

STANFORD Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position

Download PPT Report

View
218
Download
1

Tags:

Embed Size (px)

Citation preview

SSTTAANNFFOORRDD

Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng

GoalInitial Position

1. Motivating Application•Planning footsteps for a quadruped robot over challenging, irregular, previously unseen terrain

•Good footsteps need to properly trade off several features: slope, proximity to drop-offs, stability of robot’s pose, etc.

•Highly non-trivial to hand-specify the reward function for a planner, which requires manually determining relative weights for all features

2. Apprenticeship Learning Background•Key idea of Apprenticeship Learning: often easier to demonstrate good behavior than to specify a reward that induces this behavior

•Two factors make Apprenticeship Learning hard to apply to large, complex problems such as quadruped planning:

1. Very difficult, even for a domain expert, to specify a good complete path (e.g., a full set of footsteps across terrain)

2. Even given a reward function, planning (e.g. finding a complete set of a footsteps) is a hard, high-dimensional, task

5. Experimental ResultsMulti-room Grid World

• 10x10 rooms connected by doors, where each room is a 10x10 grid world

• High level demonstration shows only room-to-room path (using true reward function)

• Low-level demonstration shows only local greedy action at grid level

6. Related Work•Apprenticeship Learning: Abbeel and Ng (2004), Ratliff et. al (2006, 2007), Neu and Szepesvari (2007), Syed and Schapire (2007)

•Hierarchical Reinforcement Learning: Parr and Russell (1998), Sutton et. al (1999), Dietterich (2000), Barto and Mahadevan (2003)

7. Conclusion•Presented a novel algorithm for applying apprenticeship learning to large, complex domains via hierarchical decomposition

•Demonstrated algorithm on multi-room grid world and challenging quadruped task, where we achieve state-of the-art performance

•More generally, algorithm is applicable whenever reward function can be hierarchically decomposed as described above

Quadruped Robot

• Evaluated algorithm on easier terrain for training, and harder terrain for testing

• On training terrain, demonstrated a single high-level body path and 20 greedy low-level foot placements (~10 minutes to gather all data)

• System achieves state-of-the-art performance on this task

3. Hierarchical Apprenticeship Learning: Main Idea

Step 1: High level Plan path for center of robot body

Step 2: Low level Plan footsteps along body path

GoalInitial Position

2) Demonstrate good behavior at each level separately1) Decompose planning task into multiple levels of abstraction

Easier to specify a path in the reduced, abstract state space than in the full state space

GoalInitial Position

Initial Position Goal

Footstep specified by

teacher

Current foot positions

Easier to demonstrate greedy actions than long-term optimal actions

4. Convex Formulation•Two assumptions on the reward function

1. Reward is linear in state features

2. High level rewards are averages of low level rewards

•High-level demonstrations imply constraints on value function

•Low-level demonstrations imply constraints on reward function

•Can combine high and low-level constraints (plus adding slack variables) to form a single, unified, convex optimization problem

SSTTAANNFFOORRDD

Planned Footsteps

Training Terrain Testing Terrain

No Planning

Hierarchical Apprenticeship Learning

High Level (Body Path) Constraints Only

Low Level (Footstep) Constraints Only

High level demonstration:Demonstrate body path across terrain

Low level demonstration:Greedy local footsteps at a few key locations

ZICO IP - IP Management Experts Network (IPMEN) · 2019. 7. 2. · | 2 ZICO IP in ASEAN Countries • ZICO IP is part of ZICO, an integrated network of multi- disciplinary firms committed

Documents

ZICO KOKOSWASSER Nährwertangaben

Health & Medicine

ZICO HOLDINGS INC. SUSTAINABILITY REPORT 2019 · ZICO HOLDINGS INC. | SUSTAINABILITY REPORT 2019 | 3 ABOUT ZICO HOLDINGS INC. ZICO Holdings Inc. (“ZHI”), together with its subsidiaries

Documents

Final presentasi zico

Design

Computational Methods for Sustainable Energyzkolter/ijcai13/tutorial.pdf · Computational Methods for Sustainable Energy J. Zico Kolter August 5, 2013 ... MATLAB code for electricity

Documents

Recent Advances in Algorithms for Energy Disaggregation · Recent Advances in Algorithms for Energy Disaggregation J. Zico Kolter MIT Computer Science and Artificial Intelligence

Documents

Alphabet kolter

Education

J. Zico Kolter February 12, 2014 - Carnegie Mellon School ...zkolter/course/15-780-s14/mip.pdf · J. Zico Kolter February 12, 2014 1. Overview Introduction to mixed integer programs

Documents

Data Mining for Sustainable Data Centersweb.stanford.edu/.../lecture/may1/mm_hp.pdf · • Manish Marwah, HP Labs (co-chair) • Mario Berges, CMU (co-chair) • Zico Kolter, MIT

Documents

COPA ZICO PROPOSTA DE PATROCÍNIO

Documents

Apprenticeship Learning for Robotic Control Pieter Abbeel Stanford University Joint work with: Andrew Y. Ng, Adam Coates, J. Zico Kolter and Morgan Quigley

Documents

arXiv:2010.04205v1 [cs.LG] 8 Oct 2020Satya Narayan Shukla College of Information and Computer Sciences University of Massachusetts Amherst snshukla@cs.umass.edu J. Zico Kolter Bosch

arXiv:2010.04205v1 [cs.LG] 8 Oct 2020Satya Narayan Shukla College of Information and Computer Sciences University of Massachusetts Amherst [email protected] J. Zico Kolter Bosch

Documents

Gaussians Pieter Abbeel UC Berkeley EECS

Documents

15-780 –Graduate Artificial Intelligence: Integer programming15780/slides/780s18-7.pdf · 15-780 –Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

Documents

Katalog Kolter News 23

Documents

HES2011 - Itzik Kolter - Let me Stuxnet You

Technology

Kolter, Christian Stepanakert und Schuschi unter besonderer ......Kolter, Christian Veröffentlichungsversion / Published Version Zeitschriftenartikel / journal article Empfohlene

Documents

Bullying talk kolter

Education

15-388/688 -Practical Data Science: Deep learning15-388/688 -Practical Data Science: Deep learning J. Zico Kolter Carnegie Mellon University Fall 2019 1. Outline Recent history in

Documents

Typotheque Zico Sans, Zico Slab, Zico Sans Condensed and

Documents

A Continuous-Time View of Early Stopping for Least Squares · 2019-02-26 · A Continuous-Time View of Early Stopping for Least Squares Alnur Ali J. Zico Kolter Ryan J. Tibshirani

Documents

Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science

Documents

Kolter cantina 2013

Entertainment & Humor

ZICO - in puncto Wohnen

Documents

Apprenticeship Learning Pieter Abbeel Stanford University In collaboration with: Andrew Y. Ng, Adam Coates, J. Zico Kolter, Morgan Quigley, Dmitri Dolgov,

Documents

October 18, 2019 | Carnegie Mellon University | Pittsburgh, PApeople.cs.pitt.edu/~litman/ECIF_brochure_2019_print.pdf · Zico Kolter is an Assistant Professor in the Computer Science

Documents

Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University

Documents

15-388/688 -Practical Data Science: Nonlinear modeling ... · 15-388/688 -Practical Data Science: Nonlinear modeling, cross-validation, and regularization J. Zico Kolter Carnegie

Documents

Linear Algebra Review and Referencecs229.stanford.edu/notes2020fall/notes2020fall/linalg2.pdf2020/09/20 · Linear Algebra Review and Reference Zico Kolter (updated by Chuong Do and

Documents

15-388/688 -Practical Data Science: Matrices, vectors, and ...15-388/688 -Practical Data Science: Matrices, vectors, and linear algebra J. Zico Kolter Carnegie Mellon University Fall

Documents