5
5/30/17 1 AlphaGo and Monte Carlo Tree Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 ©Sham Kakade 2017 A feat previously thought to be (a decade?) away!!! 2016 26 AlphaGo deep RL defeats Lee Sedol (4-1)

AlphaGo andMonteCarloTreeSearch - University of Washington5/30/17 1 AlphaGo andMonteCarloTreeSearch Machine0Learning0forBig0Data00 CSE547/STAT548, University0of0Washington Sham0Kakade

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AlphaGo andMonteCarloTreeSearch - University of Washington5/30/17 1 AlphaGo andMonteCarloTreeSearch Machine0Learning0forBig0Data00 CSE547/STAT548, University0of0Washington Sham0Kakade

5/30/17

1

AlphaGo and  Monte  Carlo  Tree  Search

Machine  Learning  for  Big  Data    CSE547/STAT548,  University  of  Washington

Sham  Kakade

1©Sham  Kakade  2017

A  feat  previously  thought  to  be  (a  decade?)  away!!!

9/28/16

12

2011

25

http://www.youtube.com/watch?v=WFR3lOm_xhE

2016

26

AlphaGo deep RL defeats Lee Sedol (4-1)

Page 2: AlphaGo andMonteCarloTreeSearch - University of Washington5/30/17 1 AlphaGo andMonteCarloTreeSearch Machine0Learning0forBig0Data00 CSE547/STAT548, University0of0Washington Sham0Kakade

5/30/17

2

Chess  vs.  Alpha  Go

1997,  AI  named  “Deep  Blue”  beat  chess  world  champion.  

Search  space:  𝑏": 𝑏 = 35, 𝑑 = 80Search  space:  𝑏": 𝑏 = 250, 𝑑 = 150

Key  Points

• Given  current  game  state,  – how  to  decide  next  action  “a”?– How  to  evaluate  each  legal  action?

Page 3: AlphaGo andMonteCarloTreeSearch - University of Washington5/30/17 1 AlphaGo andMonteCarloTreeSearch Machine0Learning0forBig0Data00 CSE547/STAT548, University0of0Washington Sham0Kakade

5/30/17

3

Monte  Carlo  Tree  Search  (MCTS)• A  popular  heuristic  search  algorithm  for  game  play– By    lots  of  simulations  and  select  the  most  visited  action.  

Evaluation

Node  value:  Wining  rate

Key  point:  how  to  calculate  this  valueOne  playout  to  simulate  the  game

Monte  Carlo  Tree  Search  (MCTS)• AlphaGo

– Key  point:  how  to  calculate  the  value  for  each  action  (path)

Page 4: AlphaGo andMonteCarloTreeSearch - University of Washington5/30/17 1 AlphaGo andMonteCarloTreeSearch Machine0Learning0forBig0Data00 CSE547/STAT548, University0of0Washington Sham0Kakade

5/30/17

4

The  visited  number  for  this  edge

The  mean  value  of  all  simulations  passing  through  it.

Winning  rate

Winning  rate  is  predicted  by  deep  reinforcement  learning  andpredicted  by  fast  rollout  playout

Action  probability  predicted  by  supervised  and  reinforcement  learning

V(s)  is  estimated  in  two  different  ways.  

(Fast)  Methods  to  evaluate  V  

30  million  games

30  million  self-­‐play  games

Page 5: AlphaGo andMonteCarloTreeSearch - University of Washington5/30/17 1 AlphaGo andMonteCarloTreeSearch Machine0Learning0forBig0Data00 CSE547/STAT548, University0of0Washington Sham0Kakade

5/30/17

5

Thanks!

• ML  for  big  data:  – lots  of  different  methods/challenges  in  the  wild.– Participate  in  the  ML  community.

• Posters:  – Looking  forward  to  the  poster  session.

• Have  a  great  summer!

• Some  slides  modified  from:  prlab.tudelft.nl/sites/default/files/deepmind_go_nature.pptx

Acknowledgments

©Sham  Kakade  2017