47
430.729 (003) – Spring 2020 Deep Reinforcement Learning Gaussian Process Regression Prof. Songhwai Oh ECE, SNU Prof. Songhwai Oh 1 Deep Reinforcement Learning ‐ Spring 2020

430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

430.729 (003) – Spring 2020

Deep Reinforcement Learning

Gaussian Process Regression

Prof. Songhwai OhECE, SNU

Prof. Songhwai Oh  1Deep Reinforcement Learning ‐ Spring 2020

Page 2: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

REVIEW OF PROBABILITY THEORY

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 2

Page 3: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Random Variables

3Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

A

B

XX‐1(B)

X‐1(A)

Page 4: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Gaussian Random Variable

4Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 5: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Multivariate Gaussian

5Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 6: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Conditional Density of Multivariate Gaussian

6Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 7: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Proof Continued

7Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 8: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Proof Continued (Determinant)

8Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 9: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Proof Continued (A)

9Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 10: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Proof Continued (A)

10Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 11: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Matrix Inversion Lemma

11Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 12: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Matrix Inversion Lemma

12Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Matrix Inversion Lemma

Page 13: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Random Processes

13Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 14: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Gaussian Processes

14Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 15: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Summary (up to now)

15Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 16: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

GAUSSIAN PROCESS REGRESSION (GPR)

[Reference]Gaussian Processes for Machine Learning, C. E. Rasmussen and C. K. I. Williams, MIT Press, 2006.

16Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 17: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Gaussian Process Regression

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 17

Page 18: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

WEIGHT‐SPACE VIEW

18Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 19: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Linear Regression

19Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 20: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Weight‐Space View: Bayesian Formulation

20Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 21: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Weight‐Space View: Posterior Distribution

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 21

Page 22: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Weight‐Space View: Predictive Distribution

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 22

Page 23: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Weight‐Space View: Kernel Trick (1)

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 23

Page 24: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Weight‐Space View: Kernel Trick (2)

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 24

Page 25: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Weight‐Space View: Kernel Trick (3)

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 25

Page 26: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

FUNCTION‐SPACE VIEW

26Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 27: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Function‐Space View

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 27

Page 28: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Function‐Space View: Prediction w/o Noise

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 28

Page 29: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Function‐Space View: Prediction w/ Noise

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 29

Page 30: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Function‐Space View: Linear Predictor

Prof. Songhwai Oh  Deep Reinforcement Learning ‐ Spring 2020 30

Page 31: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

MORE ON GPR

31Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 32: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Learning

32Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 33: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Comments on GPR

33Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Data GPR Prediction

Page 34: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Kriging

34Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 35: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

• Neural network with a single hidden layer with NH units

Neural Networks

35Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

1

2f(x)

vj

ui,j

x1

x2

x3

x4 NH

[Cybenko 1989, Hornik 1993]• Neural network with one hidden layer is a universal approximator as NH ∞

• That is, it can approximate any continuous function on a compact support under mild conditions.

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303‐314.Hornik, K. (1993). Some New Results on Neural Network Approximation. Neural Networks, 6(8):1069–1072.

Page 36: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Neural Network Converges to a GP

36Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics 118.

Page 37: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Gaussian process regression:• Weight‐space view: Bayesian approach to linear regression 

(with the kernel trick)• Function‐space view: MMSE estimate, linear predictor • Best linear unbiased prediction (kriging)

• Provides the predictive variance for an unseen data• Can be computationally intensive (for prediction, O(n3))

• A single hidden layer neural network converges to a Gaussian process

Summary

37Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 38: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

REAL‐TIME AUTONOMOUS ROBOTNAVIGATION

38Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real‐Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non‐Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.

Page 39: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Knightscope K5 Meets a Guy

39Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh 

Page 40: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Real‐Time Navigation

Deep Reinforcement Learning ‐ Spring 2020 40

Autoregressive Gaussian Process Motion Model

Autoregressive Gaussian Process Motion Controller

Control a robot to avoid an incoming object (with sample based MPC)

AR‐GPMC

These data will be servedas a motion controller

(     ,     ). .(     ,     )

(     ,     )Collect state‐action pairs from diverse scenarios

Input Output

Prof. Songhwai Oh 

Page 41: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Real‐Time Navigation

Deep Reinforcement Learning ‐ Spring 2020 41

Training Phase (slow) Execution Phase (fast)

(Trajectory, Control) pairs are collected from diverse scenarios. This is a time‐consuming work. 

Learned motion controller is used in the execution phase. It takes less than 50ms to make one control. 

Prof. Songhwai Oh 

Page 42: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Real‐Time Navigation

Deep Reinforcement Learning ‐ Spring 2020 42

Selecting the Number of Training Data

Gaussian process predictive variance

Closest point

Prof. Songhwai Oh 

Page 43: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Real‐Time Navigation

Deep Reinforcement Learning ‐ Spring 2020 43

Selecting the Number of Training Data

With two following assumptions:

And combined with following theorem:

Distribution of the distance from the origin to the mth closest point

n training data points are uniformly distributed

We assume that the test input is located

in the center

Prof. Songhwai Oh 

Page 44: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Real‐Time Navigation

Deep Reinforcement Learning ‐ Spring 2020 44

Selecting the Number of Training Data

We can efficiently approximate the upper bound of the learning curve of Gaussian process regression as follows:

Two integrals with respect to training data configurations and test input location ⋆ become one integral with respect to .

Blue: Learning curve

Red: Proposed upper-bound

Green: Original upper-bound

As an isotropic kernel function is assumed, predictive variance is a function of a distance.

Distance distribution from a binomial point process.

Prof. Songhwai Oh 

Page 45: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Real‐Time Navigation

Deep Reinforcement Learning ‐ Spring 202045

Selecting the Number of Training Data

Radius is 5 / dimension is 6

The volume of integration is reduced from 0, to 0, .

Prof. Songhwai Oh 

Page 46: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Real‐Time Navigation

Deep Reinforcement Learning ‐ Spring 2020 46

Hierarchical Motion Controller (Mixture of Experts)

Prof. Songhwai Oh 

Page 47: 430.729 (003) –Spring 2020 Deep Reinforcement Learningrllab.snu.ac.kr/courses/deeprl_2020/files/lecture03_gpr.pdf · Data GPR Prediction. Kriging Prof. Songhwai Oh Deep Reinforcement

Real‐Time Navigation

Deep Reinforcement Learning ‐ Spring 2020 47

Experiments

QuadCore 2.7GHzProf. Songhwai Oh