Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 ·...

Preview:

Citation preview

Lab 6-2: Q Network for Cart Pole

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <hunkim+ml@gmail.com>

Cart Pole

https://gym.openai.com/docs

Random trials

Rewards

Cart Pole Q-network

(2)Ws(1)s

Q-Network training (Network construction)

(2)Ws(1)s

Q-Network training (linear regression)

(2)Ws(1)s

y = r + �maxQ(s0)

cost(W ) = (Ws� y)2

Code: Network and setup

Code: Training

Code: apply

Results: really poor!

Why does not work? Too shallow?

Excise

• Why does not work?

• Hint: DQN

Recommended