13
Lab 6-2: Q Network for Cart Pole Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Lab 6-2: Q Network for Cart Pole

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Cart Pole

https://gym.openai.com/docs

Page 3: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Random trials

Page 4: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Rewards

Page 5: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Cart Pole Q-network

(2)Ws(1)s

Page 6: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Q-Network training (Network construction)

(2)Ws(1)s

Page 7: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Q-Network training (linear regression)

(2)Ws(1)s

y = r + �maxQ(s0)

cost(W ) = (Ws� y)2

Page 8: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Code: Network and setup

Page 9: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Code: Training

Page 10: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Code: apply

Page 11: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Results: really poor!

Page 12: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Why does not work? Too shallow?

Page 13: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Excise

• Why does not work?

• Hint: DQN