Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
430.729 (003) – Spring 2020
Deep Reinforcement Learning
Gaussian Process Regression
Prof. Songhwai OhECE, SNU
Prof. Songhwai Oh 1Deep Reinforcement Learning ‐ Spring 2020
REVIEW OF PROBABILITY THEORY
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 2
Random Variables
3Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
A
B
XX‐1(B)
X‐1(A)
Gaussian Random Variable
4Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Multivariate Gaussian
5Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Conditional Density of Multivariate Gaussian
6Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Proof Continued
7Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Proof Continued (Determinant)
8Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Proof Continued (A)
9Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Proof Continued (A)
10Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Matrix Inversion Lemma
11Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Matrix Inversion Lemma
12Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Matrix Inversion Lemma
Random Processes
13Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Gaussian Processes
14Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Summary (up to now)
15Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
GAUSSIAN PROCESS REGRESSION (GPR)
[Reference]Gaussian Processes for Machine Learning, C. E. Rasmussen and C. K. I. Williams, MIT Press, 2006.
16Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Gaussian Process Regression
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 17
WEIGHT‐SPACE VIEW
18Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Linear Regression
19Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Weight‐Space View: Bayesian Formulation
20Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Weight‐Space View: Posterior Distribution
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 21
Weight‐Space View: Predictive Distribution
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 22
Weight‐Space View: Kernel Trick (1)
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 23
Weight‐Space View: Kernel Trick (2)
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 24
Weight‐Space View: Kernel Trick (3)
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 25
FUNCTION‐SPACE VIEW
26Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Function‐Space View
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 27
Function‐Space View: Prediction w/o Noise
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 28
Function‐Space View: Prediction w/ Noise
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 29
Function‐Space View: Linear Predictor
Prof. Songhwai Oh Deep Reinforcement Learning ‐ Spring 2020 30
MORE ON GPR
31Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Learning
32Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Comments on GPR
33Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Data GPR Prediction
Kriging
34Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
• Neural network with a single hidden layer with NH units
Neural Networks
35Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
1
2f(x)
vj
ui,j
x1
x2
x3
x4 NH
[Cybenko 1989, Hornik 1993]• Neural network with one hidden layer is a universal approximator as NH ∞
• That is, it can approximate any continuous function on a compact support under mild conditions.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303‐314.Hornik, K. (1993). Some New Results on Neural Network Approximation. Neural Networks, 6(8):1069–1072.
Neural Network Converges to a GP
36Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics 118.
Gaussian process regression:• Weight‐space view: Bayesian approach to linear regression
(with the kernel trick)• Function‐space view: MMSE estimate, linear predictor • Best linear unbiased prediction (kriging)
• Provides the predictive variance for an unseen data• Can be computationally intensive (for prediction, O(n3))
• A single hidden layer neural network converges to a Gaussian process
Summary
37Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
REAL‐TIME AUTONOMOUS ROBOTNAVIGATION
38Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real‐Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non‐Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
Knightscope K5 Meets a Guy
39Deep Reinforcement Learning ‐ Spring 2020Prof. Songhwai Oh
Real‐Time Navigation
Deep Reinforcement Learning ‐ Spring 2020 40
Autoregressive Gaussian Process Motion Model
Autoregressive Gaussian Process Motion Controller
Control a robot to avoid an incoming object (with sample based MPC)
AR‐GPMC
These data will be servedas a motion controller
( , ). .( , )
( , )Collect state‐action pairs from diverse scenarios
Input Output
Prof. Songhwai Oh
Real‐Time Navigation
Deep Reinforcement Learning ‐ Spring 2020 41
Training Phase (slow) Execution Phase (fast)
(Trajectory, Control) pairs are collected from diverse scenarios. This is a time‐consuming work.
Learned motion controller is used in the execution phase. It takes less than 50ms to make one control.
Prof. Songhwai Oh
Real‐Time Navigation
Deep Reinforcement Learning ‐ Spring 2020 42
Selecting the Number of Training Data
Gaussian process predictive variance
Closest point
Prof. Songhwai Oh
Real‐Time Navigation
Deep Reinforcement Learning ‐ Spring 2020 43
Selecting the Number of Training Data
With two following assumptions:
And combined with following theorem:
Distribution of the distance from the origin to the mth closest point
n training data points are uniformly distributed
We assume that the test input is located
in the center
Prof. Songhwai Oh
Real‐Time Navigation
Deep Reinforcement Learning ‐ Spring 2020 44
Selecting the Number of Training Data
We can efficiently approximate the upper bound of the learning curve of Gaussian process regression as follows:
Two integrals with respect to training data configurations and test input location ⋆ become one integral with respect to .
Blue: Learning curve
Red: Proposed upper-bound
Green: Original upper-bound
As an isotropic kernel function is assumed, predictive variance is a function of a distance.
Distance distribution from a binomial point process.
Prof. Songhwai Oh
Real‐Time Navigation
Deep Reinforcement Learning ‐ Spring 202045
Selecting the Number of Training Data
Radius is 5 / dimension is 6
The volume of integration is reduced from 0, to 0, .
Prof. Songhwai Oh
Real‐Time Navigation
Deep Reinforcement Learning ‐ Spring 2020 46
Hierarchical Motion Controller (Mixture of Experts)
Prof. Songhwai Oh
Real‐Time Navigation
Deep Reinforcement Learning ‐ Spring 2020 47
Experiments
QuadCore 2.7GHzProf. Songhwai Oh