19

Efficient Policy Gradient Optimization/Learning of Feedback Controllers Chris Atkeson

Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Download PPT Report

Upload
peter-mendez
View
23
Download
2

Tags:

Embed Size (px)

DESCRIPTION

Efficient Policy Gradient Optimization/Learning of Feedback Controllers. Chris Atkeson. Punchlines. Optimize and learn policies. Switch from “value iteration” to “policy iteration”. This is a big switch from optimizing and learning value functions. Use gradient-based policy optimization. - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Efficient Policy GradientOptimization/Learning of Feedback

Controllers

Chris Atkeson

Page 2: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Punchlines

• Optimize and learn policies.

Switch from “value iteration” to “policy iteration”.

• This is a big switch from optimizing and learning value functions.

• Use gradient-based policy optimization.

Page 3: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Motivations

• Efficiently design nonlinear policies

• Make policy-gradient reinforcement learning practical.

Page 4: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Model-Based Policy Optimization• Simulate policy u = π(x,p) from some initial

states x0 to find policy cost.

• Use favorite local or global optimizer to optimize simulated policy cost.

• If gradients are used, they are typically numerically estimated.

• Δp = -ε ∑x0w(x0)Vp 1st order gradient

• Δp = -(∑x0w(x0)Vpp)-1 ∑x0w(x0)Vp 2nd order

Page 5: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Can we make model-based policy gradient more efficient?

Page 6: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Analytic Gradients• Deterministic policy: u = π(x,p) • Policy Iteration (Bellman Equation):

Vk-1(x,p) = L(x,π(x,p)) + V(f(x,π(x,p)),p)

• Linear models: f(x,u) = f0 + fxΔx + fuΔu

L(x,u) = L0 + LxΔx + LuΔu

π(x,p) = π0 + πxΔx + πpΔp

V(x,p) = V0 + VxΔx + VpΔp

• Policy Gradient:

Vxk-1 = Lx + Luπx + Vx(fx + fuπx)

Vpk-1 = (Lu + Vxfu)πp + Vp

Page 7: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Handling Constraints

• Lagrange multiplier approach, with constraint violation value function.

Page 8: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Vpp: Second Order Models

Page 9: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Regularization

Page 10: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

LQBR: Linear (dynamics) Quadratic (cost) Bilinear (policy) Regulator

Page 11: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Timing Test

Page 12: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Antecedents

• Optimizing control “parameters” in DDP: Dyer and McReynolds 1970.

• Optimal output feedback design (1960s-1970s)

• Multiple model adaptive control (MMAC)

• Policy gradient reinforcement learning

• Adaptive critics, Werbos: HDP, DHP, GDHP, ADHDP, ADDHP

Page 13: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

When Will LQBR Work?

• Initial stabilizing policy is known (“output stabilizable”)

• Luu is positive definite.

• Lxx is positive semi-definite and (sqrt(Lxx),Fx) is detectable.

• Measurement matrix C has full row rank.

Page 14: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Locally Linear Policies

Page 15: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Local Policies

GOAL

Page 16: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Cost Of One Gradient Calculation

Page 17: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Continuous Time

Page 18: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Other Issues

• Model Following

• Stochastic Plants

• Receding Horizon Control/MPC

• Adaptive RHC/MPC

• Combine with Dynamic Programming

• Dynamic Policies -> Learn State Estimator

Page 19: Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Optimize Policies

• Policy Iteration, with gradient-based policy improvement step.

• Analytic gradients are easy.

• Non-overlapping sub-policies make second order gradient calculations fast.

• Big problem: How choose policy structure?

Stability and Performance Limits of Latency-Prone Distributed Feedback Controllers

Stability and Performance Limits of Latency-Prone Distributed Feedback Controllers

Documents

s12555-009-0311-8 Feedback Linearization vs. Adaptive ... · actuating pitch, roll, and thrusts commands, respectively, using visual feedback. There are also robust controllers designed

s12555-009-0311-8 Feedback Linearization vs. Adaptive ... · actuating pitch, roll, and thrusts commands, respectively, using visual feedback. There are also robust controllers designed

Documents

MCEN 467 – Control Systems Chapter 4: Basic Properties of Feedback Part D: The Classical Three- Term Controllers

MCEN 467 – Control Systems Chapter 4: Basic Properties of Feedback Part D: The Classical Three- Term Controllers

Documents

Cardiopulmonary Anatomy Physiology Essentials of ...€¦ · Feedback A A pressure gradient is defined as the difference in pressures occuring between two points. B A pressure gradient

Cardiopulmonary Anatomy Physiology Essentials of ...€¦ · Feedback A A pressure gradient is defined as the difference in pressures occuring between two points. B A pressure gradient

Documents

Time-Delayed Feedback Control Method and Unstable Controllerspyragas.pfi.lt/pdffiles/2003/peterburg.pdf · 2003. 5. 26. · Time-Delayed Feedback Control Method and Unstable Controllers

Time-Delayed Feedback Control Method and Unstable Controllerspyragas.pfi.lt/pdffiles/2003/peterburg.pdf · 2003. 5. 26. · Time-Delayed Feedback Control Method and Unstable Controllers

Documents

gradient 1 gradient 2 gradient 3 gradient 4 ECDIS buyers · PDF fileECDIS buyers guide gradient 1 gradient 2 gradient 3 gradient 4 gradient 1 gradient 2 gradient 3 gradient 4 10436

gradient 1 gradient 2 gradient 3 gradient 4 ECDIS buyers · PDF fileECDIS buyers guide gradient 1 gradient 2 gradient 3 gradient 4 gradient 1 gradient 2 gradient 3 gradient 4 10436

Documents

Synthesis of Optimal Feedback Controllers from Data Via ...cse.lab.imtlucca.it/~bemporad/publications/papers/ecc19-sgd-policy.… · Synthesis of Optimal Feedback Controllers from

Synthesis of Optimal Feedback Controllers from Data Via ...cse.lab.imtlucca.it/~bemporad/publications/papers/ecc19-sgd-policy.… · Synthesis of Optimal Feedback Controllers from

Documents

FEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKS

FEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKS

Engineering

InTech-Experimental Evaluation of Output Feedback Tracking Controllers for Robot Manipulators

InTech-Experimental Evaluation of Output Feedback Tracking Controllers for Robot Manipulators

Documents

On Designing Feedback Controllers for Master-Slave ...downloads.hindawi.com/journals/complexity/2018/5431619.pdfResearch Article On Designing Feedback Controllers for Master-Slave

On Designing Feedback Controllers for Master-Slave ...downloads.hindawi.com/journals/complexity/2018/5431619.pdfResearch Article On Designing Feedback Controllers for Master-Slave

Documents

Efficient Policy Gradient Optimization/Learning of Feedback Controllers Chris Atkeson

Efficient Policy Gradient Optimization/Learning of Feedback Controllers Chris Atkeson

Documents

ERT 210 Process Control & dynamics Anis Atikah binti Ahmad CHAPTER 8 Feedback Controllers

ERT 210 Process Control & dynamics Anis Atikah binti Ahmad CHAPTER 8 Feedback Controllers

Documents

OPTIMAL OUTPUT-FEEDBACK CONTROLLERS FOR LINEAR …

OPTIMAL OUTPUT-FEEDBACK CONTROLLERS FOR LINEAR …

Documents

1 Chapter 8 Figure 8.1 Schematic diagram for a stirred-tank blending system. Feedback Controllers

1 Chapter 8 Figure 8.1 Schematic diagram for a stirred-tank blending system. Feedback Controllers

Documents

Design and Analysis of Feedback controllers for A DC …researchrepository.murdoch.edu.au/id/eprint/25672/1/Chan_ENG460... · DESIGN AND ANALYSIS OF FEEDBACK CONTROLLERS FOR A DC

Design and Analysis of Feedback controllers for A DC …researchrepository.murdoch.edu.au/id/eprint/25672/1/Chan_ENG460... · DESIGN AND ANALYSIS OF FEEDBACK CONTROLLERS FOR A DC

Documents

OPTIMAL OUTPUT-FEEDBACK CONTROLLERS FOR LINEAR …culation of linear feedback controls for linear systems under the constraints that the control variables depend only on the outputs

OPTIMAL OUTPUT-FEEDBACK CONTROLLERS FOR LINEAR …culation of linear feedback controls for linear systems under the constraints that the control variables depend only on the outputs

Documents

Control System Design Guide, Third Edition: Using Your Computer to Understand and Diagnose Feedback Controllers

Control System Design Guide, Third Edition: Using Your Computer to Understand and Diagnose Feedback Controllers

Documents

Feedback Scheduling of Model Predictive Controllers …lup.lub.lu.se/search/ws/files/5417203/625676.pdf · Feedback Scheduling of Model Predictive Controllers Dan Henriksson, Anton

Feedback Scheduling of Model Predictive Controllers …lup.lub.lu.se/search/ws/files/5417203/625676.pdf · Feedback Scheduling of Model Predictive Controllers Dan Henriksson, Anton

Documents

Feedback Controllers R12

Feedback Controllers R12

Documents

Learning Robust Controllers for Linear Quadratic Systems ... · Learning Robust Controllers for Linear Quadratic Systems with Multiplicative Noise via Policy Gradient ... optimal

Learning Robust Controllers for Linear Quadratic Systems ... · Learning Robust Controllers for Linear Quadratic Systems with Multiplicative Noise via Policy Gradient ... optimal

Documents

Robust event-triggered output feedback controllers for

Robust event-triggered output feedback controllers for

Documents

Exact Tuning of PID Controllers in Control Feedback Design

Exact Tuning of PID Controllers in Control Feedback Design

Documents

Robust static and fixed-order dynamic output feedback …web.mit.edu/braatzgroup/Kim_OCAM_2017.pdfDesign methods are proposed for static and ﬁxed-order dynamic output feedback controllers

Robust static and fixed-order dynamic output feedback …web.mit.edu/braatzgroup/Kim_OCAM_2017.pdfDesign methods are proposed for static and ﬁxed-order dynamic output feedback controllers

Documents

Iterative feedback tuning of wind turbine controllers · @J @ˆ (î); (2) where iis the iteration number, @J=@ˆ(î) the gradient of the cost function Eq. (1), Ria positive definite

Iterative feedback tuning of wind turbine controllers · @J @ˆ (î); (2) where iis the iteration number, @J=@ˆ(î) the gradient of the cost function Eq. (1), Ria positive definite

Documents

Implementing Nonlinear Feedback Controllers using DNA ...DNA-based feedback controllers, the biomolecular process to be controlled here is both dynamic and nonlinear. Note also that

Implementing Nonlinear Feedback Controllers using DNA ...DNA-based feedback controllers, the biomolecular process to be controlled here is both dynamic and nonlinear. Note also that

Documents

Hummingbird Manual Introduction to Feedback Control › lib › exe › fetch.php?media=hummingbird_manual.pdftroduce different controllers that control the hummingbird according to

Hummingbird Manual Introduction to Feedback Control › lib › exe › fetch.php?media=hummingbird_manual.pdftroduce different controllers that control the hummingbird according to

Documents

Robustness analysis of DNA-based biomolecular feedback ... · biomolecular feedback controllers to parametric and time delay uncertainties. in Proceedings - 2016 IEEE Biomedical Circuits

Robustness analysis of DNA-based biomolecular feedback ... · biomolecular feedback controllers to parametric and time delay uncertainties. in Proceedings - 2016 IEEE Biomedical Circuits

Documents

1 PID Feedback Controllers PID 反馈控制器 Shen Guo-jiang Institute of Industrial Control, Zhejiang University

1 PID Feedback Controllers PID 反馈控制器 Shen Guo-jiang Institute of Industrial Control, Zhejiang University

Documents

PRO PTZ Camera ControlLERS - Panasonic USA...recalls, and smoother camera movements via a responsive joystick which offers physical feedback. Hardware controllers Hardware controllers

PRO PTZ Camera ControlLERS - Panasonic USA...recalls, and smoother camera movements via a responsive joystick which offers physical feedback. Hardware controllers Hardware controllers

Documents

Optimization of Fixed-Order Controllers Using Exact Gradientsfolk.ntnu.no/skoge/...jpc-pid-exact-gradient/main.pdf · controller, Smith Predictor), and to other process models. This

Optimization of Fixed-Order Controllers Using Exact Gradientsfolk.ntnu.no/skoge/...jpc-pid-exact-gradient/main.pdf · controller, Smith Predictor), and to other process models. This

Documents