83
Structural Return Maximization for Reinforcement Learning Josh Joseph Alborz Geramifard Javier Velez Jonathan How Nicholas Roy 1

Structural Return Maximization for Reinforcement Learning

  • Upload
    anisa

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Structural Return Maximization for Reinforcement Learning. Josh Joseph Alborz Geramifard Javier Velez Jonathan How Nicholas Roy. How should we act in the presence of complex , u nknown dynamics?. How should we act in the presence of complex, unknown dynamics?. - PowerPoint PPT Presentation

Citation preview

Page 1: Structural Return Maximization for Reinforcement Learning

1

Structural Return Maximization for Reinforcement Learning

Josh JosephAlborz Geramifard

Javier Velez Jonathan HowNicholas Roy

Page 2: Structural Return Maximization for Reinforcement Learning

2

How should we act in the presence of complex, unknown dynamics?

Page 3: Structural Return Maximization for Reinforcement Learning

3

How should we act in the presence of complex, unknown dynamics?

Page 4: Structural Return Maximization for Reinforcement Learning

4

How should we act in the presence of complex, unknown dynamics?

Page 5: Structural Return Maximization for Reinforcement Learning

5

How should we act in the presence of complex, unknown dynamics?

Page 6: Structural Return Maximization for Reinforcement Learning

6

What do I mean by complex dynamics?

• Can’t derive from first principles / intuition• Any dynamics model will be approximate• Limited data– Otherwise just do nearest neighbors

• Batch data– Trying to keep it as simple as possible for now– Fairly straightforward to extend to active learning

Page 7: Structural Return Maximization for Reinforcement Learning

7

What do I mean by complex dynamics?

• Can’t derive from first principles / intuition• Any dynamics model will be approximate• Limited data• Batch data– Fairly straightforward to extend to active learning

Page 8: Structural Return Maximization for Reinforcement Learning

8

How does RL solve these problems?

• Assume some representation class for:– Dynamics model– Value function– Policy

• Collect some data• Find the “best” representation based on the

data

Page 9: Structural Return Maximization for Reinforcement Learning

9

How does RL solve these problems?

• Assume some representation class for:– Dynamics model– Value function– Policy

• Collect some data• Find the “best” representation based on the

data

Page 10: Structural Return Maximization for Reinforcement Learning

10

• The “best” representation based on the data

• This defines the best policy…not the best representation

Value (return)

How does RL solve these problems?

Policy

Starting state

reward unknown dynamics model

Page 11: Structural Return Maximization for Reinforcement Learning

11

• The “best” representation based on the data

• This defines the best policy…not the best representation

Value (return)

How does RL solve these problems?

Policy

Starting state

reward unknown dynamics model

Page 12: Structural Return Maximization for Reinforcement Learning

12

• The “best” representation based on the data

• This defines the best policy…not the best representation

Value (return)

How does RL solve these problems?

Policy

Starting state

reward unknown dynamics model

Page 13: Structural Return Maximization for Reinforcement Learning

13

…but does RL actually solve this problem?

• Policy Search– Policy directly parameterized by

Page 14: Structural Return Maximization for Reinforcement Learning

14

…but does RL actually solve this problem?

• Policy Search– Policy directly parameterized by

Page 15: Structural Return Maximization for Reinforcement Learning

15

…but does RL actually solve this problem?

• Policy Search– Policy directly parameterized by

Number of episodes

Empirical estimate

Page 16: Structural Return Maximization for Reinforcement Learning

16

…but does RL actually solve this problem?

• Policy Search– Policy directly parameterized by

Number of episodes

Empirical estimate

Page 17: Structural Return Maximization for Reinforcement Learning

17

…but does RL actually solve this problem?

• Model-based RL– Dynamics model =

Page 18: Structural Return Maximization for Reinforcement Learning

18

…but does RL actually solve this problem?

• Model-based RL– Dynamics model =

Page 19: Structural Return Maximization for Reinforcement Learning

19

…but does RL actually solve this problem?

• Model-based RL– Dynamics model =

Page 20: Structural Return Maximization for Reinforcement Learning

20

…but does RL actually solve this problem?

• Model-based RL– Dynamics model =

Maximizing likelihood != maximizing return

Page 21: Structural Return Maximization for Reinforcement Learning

21

…but does RL actually solve this problem?

• Model-based RL– Dynamics model =

Maximizing likelihood != maximizing return

…similar story for value-based methods

Page 22: Structural Return Maximization for Reinforcement Learning

22

ML model selection in RL

• So why do we do it?– It’s easy– It sometimes works really well– Intuitively it feels like finding the most likely model

should result in a high performing policy• Why does it fail?– Chooses an “average” model based on the data– Ignores reward function

• What do we do then?

Page 23: Structural Return Maximization for Reinforcement Learning

23

ML model selection in RL

• So why do we do it?– It’s easy– It sometimes works really well– Intuitively it feels like finding the most likely model

should result in a high performing policy• Why does it fail?– Chooses an “average” model based on the data– Ignores reward function

• What do we do then?

Page 24: Structural Return Maximization for Reinforcement Learning

24

ML model selection in RL

• So why do we do it?– It’s easy– It sometimes works really well– Intuitively it feels like finding the most likely model

should result in a high performing policy• Why does it fail?– Chooses an “average” model based on the data– Ignores reward function

• What do we do then?

Page 25: Structural Return Maximization for Reinforcement Learning

25

Our Approach

• Model-based RL– Dynamics model =

Page 26: Structural Return Maximization for Reinforcement Learning

26

Our Approach

• Model-based RL– Dynamics model =

Empirical estimate

Page 27: Structural Return Maximization for Reinforcement Learning

27

Our Approach

• Model-based RL– Dynamics model =

Empirical estimate

Page 28: Structural Return Maximization for Reinforcement Learning

28

Planning with Misspecified Model Classes

Us

Page 29: Structural Return Maximization for Reinforcement Learning

29

Our Approach

• Model-based RL– Dynamics model =

Empirical estimate

Page 30: Structural Return Maximization for Reinforcement Learning

30

Our Approach

• Model-based RL– Dynamics model =

Empirical estimate

We can do the same thing in a value-based setting.

Page 31: Structural Return Maximization for Reinforcement Learning

31

…but

• We are indirectly choosing a policy representation

• The win of this indirect representation is that it can be “small”

• Small = less data?– Intuitively you’d think so– Empirical evidence from toy problems

• But all of our guarantees rely on infinite data• …maybe there’s a way to be more concrete

Page 32: Structural Return Maximization for Reinforcement Learning

32

…but

• We are indirectly choosing a policy representation

• The win of this indirect representation is that it can be “small”

• Small = less data?– Intuitively you’d think so– Empirical evidence from toy problems

• But all of our guarantees rely on infinite data• …maybe there’s a way to be more concrete

Page 33: Structural Return Maximization for Reinforcement Learning

33

What we want

• How does the representation space relate to true return?

• …they’ve been doing this in classification since the 60s– Relationship between the bound and “size” of the

representation space / amount of data

≈?

Page 34: Structural Return Maximization for Reinforcement Learning

34

What we want

• How does the representation space relate to true return?

• …they’ve been doing this in classification since the 60s– Relationship between the bound and “size” of the

representation space / amount of data

≈?

Page 35: Structural Return Maximization for Reinforcement Learning

35

What we want

• How does the representation space relate to true return?

• …they’ve been doing this in classification since the 60s– Relationship between the “size” of the

representation space and the amount of data

≈?

Page 36: Structural Return Maximization for Reinforcement Learning

36

How to get there

Model-based, value-based, policy search

Page 37: Structural Return Maximization for Reinforcement Learning

37

How to get there

Model-based, value-based, policy search

Map RL to classification Empirical Risk Minimization

Page 38: Structural Return Maximization for Reinforcement Learning

38

How to get there

Model-based, value-based, policy search

Map RL to classification Empirical Risk Minimization

Measuring function class size Bound on true risk

Page 39: Structural Return Maximization for Reinforcement Learning

39

How to get there

Model-based, value-based, policy search

Map RL to classification Empirical Risk Minimization

Measuring function class size Bound on true risk

Page 40: Structural Return Maximization for Reinforcement Learning

40

How to get there

Model-based, value-based, policy search

Map RL to classification Empirical Risk Minimization

Measuring function class size Bound on true risk

Structure of function classes Structural risk minimization

Page 41: Structural Return Maximization for Reinforcement Learning

41

How to get there

Model-based, value-based, policy search

Map RL to classification Empirical Risk Minimization

Measuring function class size Bound on true risk

Structure of function classes Structural risk minimization

Page 42: Structural Return Maximization for Reinforcement Learning

42

Classification

Page 43: Structural Return Maximization for Reinforcement Learning

43

Classification

Page 44: Structural Return Maximization for Reinforcement Learning

44

Classification

f

𝑓 ([𝑥1𝑥2])=𝑠𝑖𝑔𝑛([𝜃 1𝜃 2]𝑇

[𝑥1𝑥2])

𝑥1

𝑥2

Page 45: Structural Return Maximization for Reinforcement Learning

45

Classification

Risk

Page 46: Structural Return Maximization for Reinforcement Learning

46

Classification

Loss (cost)

Risk Unknown datadistribution

Page 47: Structural Return Maximization for Reinforcement Learning

47

Empirical Risk Minimization

Unknown datadistribution

Page 48: Structural Return Maximization for Reinforcement Learning

48

Empirical Risk Minimization

Unknown datadistribution

Number of samples

Empirical estimate

Page 49: Structural Return Maximization for Reinforcement Learning

49

Mapping RL to Classification

Page 50: Structural Return Maximization for Reinforcement Learning

50

Mapping RL to Classification

Page 51: Structural Return Maximization for Reinforcement Learning

51

Mapping RL to Classification

Page 52: Structural Return Maximization for Reinforcement Learning

52

How to get there

Model-based, value-based, policy search

Map RL to classification Empirical Risk Minimization

Measuring function class size Bound on true risk

Structure of function classes Structural risk minimization

Page 53: Structural Return Maximization for Reinforcement Learning

53

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide

Page 54: Structural Return Maximization for Reinforcement Learning

54

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide

Page 55: Structural Return Maximization for Reinforcement Learning

55

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide

Page 56: Structural Return Maximization for Reinforcement Learning

56

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide

Page 57: Structural Return Maximization for Reinforcement Learning

57

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide

Page 58: Structural Return Maximization for Reinforcement Learning

58

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide

Page 59: Structural Return Maximization for Reinforcement Learning

59

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide

𝑉𝐶𝐷𝑖𝑚()=3

Page 60: Structural Return Maximization for Reinforcement Learning

60

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide• Magically, shattering (VC Dim) can be used to

bound true risk

Page 61: Structural Return Maximization for Reinforcement Learning

61

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide• Magically, shattering (VC Dim) can be used to

bound true risk

Page 62: Structural Return Maximization for Reinforcement Learning

62

Measuring the size of a function class:VC Dimension

• Introduces a notion of “shattering”– I pick the inputs– You pick the labels– VC Dim = max number of points I can perfectly

decide• Magically, shattering (VC Dim) can be used to

bound true risk

Page 63: Structural Return Maximization for Reinforcement Learning

63

For those of you familiar with statistical learning theory…

• VC Dim – Only known for a few function classes– Difficult to estimate, bound

• Rademacher complexity– Use the data to estimate the “volume” of the

function class– This volume can then be used in a similar bound

Page 64: Structural Return Maximization for Reinforcement Learning

64

Measuring the size of a function class

• Now we can say concrete things about why we may prefer one representation over another with limited data

Page 65: Structural Return Maximization for Reinforcement Learning

65

Measuring the size of a function class

• Now we can say concrete things about why we may prefer one representation over another with limited data

Page 66: Structural Return Maximization for Reinforcement Learning

66

How to get there

Model-based, value-based, policy search

Map RL to classification Empirical Risk Minimization

Measuring function class size Bound on true risk

Structure of function classes Structural risk minimization

Page 67: Structural Return Maximization for Reinforcement Learning

67

Empirical Risk Minimization

Unknown datadistribution

Number of samples

Empirical estimate

Page 68: Structural Return Maximization for Reinforcement Learning

68

Empirical Risk Minimization and Limited Data

Unknown datadistribution

But if we have limited data we cannot expect small empirical risk to result in small true risk

Empirical estimate

Page 69: Structural Return Maximization for Reinforcement Learning

69

Empirical Risk Minimization and Limited Data

• If the bound is large, we cannot expect small empirical risk to result in small true risk

• …so what do we do?• Choose the function class which minimizes the

bound!

Page 70: Structural Return Maximization for Reinforcement Learning

70

Empirical Risk Minimization and Limited Data

• If the bound is large, we cannot expect small empirical risk to result in small true risk

• …so what do we do?• Choose the function class which minimizes the

bound!

Page 71: Structural Return Maximization for Reinforcement Learning

71

Structural Risk Minimization

• Using a “structure” of function classes

• For N data, we choose the function class:

Page 72: Structural Return Maximization for Reinforcement Learning

72

Structural Risk Minimization

• Using a “structure” of function classes

Many natural structures of policy classes!

Page 73: Structural Return Maximization for Reinforcement Learning

73

Structural Risk Minimization

• Using a “structure” of function classes

• We choose the function class:

Page 74: Structural Return Maximization for Reinforcement Learning

74

Is this Bayesian?

• Prior knowledge– Structure encodes prior knowledge

• Robust to over-fitting– Choose the function class based on risk bound

• No Bayes update• No assumptions about the true function lying

in the structure– Breaks most (all?) Bayesian nonparametrics

Page 75: Structural Return Maximization for Reinforcement Learning

75

Is this Bayesian?

• Prior knowledge– Structure encodes prior knowledge

• Robust to over-fitting– Choose the function class based on risk bound

• No Bayes update• No assumptions about the true function lying

in the structure– Breaks most (all?) Bayesian nonparametrics

Page 76: Structural Return Maximization for Reinforcement Learning

76

Is this Bayesian?

• Prior knowledge– Structure encodes prior knowledge

• Robust to over-fitting– Choose the function class based on risk bound

• No Bayes update• No assumption that the true function is

somewhere in the structure– Breaks most (all?) Bayesian nonparametrics

Page 77: Structural Return Maximization for Reinforcement Learning

77

Contribution

• Classification to RL mapping• Transferred probabilistic bounds from

statistical learning theory to RL• Applied structural risk minimization to RL

Page 78: Structural Return Maximization for Reinforcement Learning

78

Contribution

• Classification to RL mapping• Transferred probabilistic bounds from

statistical learning theory to RL• Applied structural risk minimization to RL

Page 79: Structural Return Maximization for Reinforcement Learning

79

Backup Slides

Page 80: Structural Return Maximization for Reinforcement Learning

80

From last time…

Page 81: Structural Return Maximization for Reinforcement Learning

81

From last time…

{𝒎𝒄 ,𝒎𝒑 ,𝒍 }

Page 82: Structural Return Maximization for Reinforcement Learning

82

From last time…

≈?

{𝒎𝒄 ,𝒎𝒑 ,𝒍 }

Page 83: Structural Return Maximization for Reinforcement Learning

83

Measuring the size of a function class

• Rademacher complexity