41

Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul
Page 2: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Go in numbers

10^17040MPlayers Positions

3,000Years Old

Page 3: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

The Rules of Go

Capture Territory

Page 4: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Brute force search intractable:

1. Search space is huge

2. “Impossible” for computers to evaluate who is winning

Game tree complexity = bd

Why is Go hard for computers to play?

Page 5: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul
Page 6: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul
Page 7: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Convolutional neural network

Page 8: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Value networkEvaluation

Position

s

v (s)

Page 9: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Policy network Move probabilities

Position

s

p (a|s)

Page 10: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Exhaustive search

Page 11: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Monte-Carlo rollouts

Page 12: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Reducing depth with value network

Page 13: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Reducing depth with value network

Page 14: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Reducing breadth with policy network

Page 15: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Deep reinforcement learning in AlphaGo

Human expertpositions

Supervised Learningpolicy network

Self-play data Value networkReinforcement Learningpolicy network

Page 16: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Policy network: 12 layer convolutional neural network

Training data: 30M positions from human expert games (KGS 5+ dan)

Training algorithm: maximise likelihood by stochastic gradient descent

Training time: 4 weeks on 50 GPUs using Google Cloud

Results: 57% accuracy on held out test data (state-of-the art was 44%)

Supervised learning of policy networks

Page 17: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Policy network: 12 layer convolutional neural network

Training data: games of self-play between policy network

Training algorithm: maximise wins z by policy gradient reinforcement learning

Training time: 1 week on 50 GPUs using Google Cloud

Results: 80% vs supervised learning. Raw network ~3 amateur dan.

Reinforcement learning of policy networks

Page 18: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Value network: 12 layer convolutional neural network

Training data: 30 million games of self-play

Training algorithm: minimise MSE by stochastic gradient descent

Training time: 1 week on 50 GPUs using Google Cloud

Results: First strong position evaluation function - previously thought impossible

Reinforcement learning of value networks

Page 19: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul
Page 20: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Monte-Carlo tree search in AlphaGo: selection

P prior probabilityQ action value

Page 21: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Monte-Carlo tree search in AlphaGo: expansion

Policy networkP prior probability p

Page 22: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Monte-Carlo tree search in AlphaGo: evaluation

Value networkv

Page 23: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Monte-Carlo tree search in AlphaGo: rollout

Value networkr Game scorerv

Page 24: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Monte-Carlo tree search in AlphaGo: backup

Value networkr Game scorerv Q Action value

Page 25: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Deep Blue

Handcrafted chess knowledge

Alpha-beta search guided by

heuristic evaluation function

200 million positions / second

AlphaGo

Knowledge learned from expert

games and self-play

Monte-Carlo search guided by

policy and value networks

60,000 positions / second

Page 26: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Nature AlphaGo Seoul AlphaGo

Page 27: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Evaluating Nature AlphaGo against computers

Zen

Crazy Stone

3000

2500

2000

1500

1000

500

0

3500

Go Programs

AlphaGo

9p7p5p3p1p9d

7d

5d

3d

1d1k

3k

5k

7k

Professional dan (p)

Amateur

dan (d)Beginner kyu (k)

Pachi

494/495 against computer opponents

>75% winning rate with 4 stone handicap

Even stronger using distributed machines

Fuego

Gnu

Go

Elo rating

Page 28: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Evaluating Nature AlphaGo against humans

Fan Hui (2p): European Champion 2013 - 2016

Match was played in October 2015

AlphaGo won the match 5-0

First program ever to beat a professional

on a full size 19x19 in an even game

Page 29: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Deep Reinforcement Learning (as Nature AlphaGo)

● Improved value network

● Improved policy network

● Improved search

● Improved hardware (TPU vs GPU)

Seoul AlphaGo

Page 30: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Evaluating Seoul AlphaGo against computers

Zen

Crazy Stone

3000

2500

2000

1500

1000

500

0

3500

AlphaGo (N

ature v13)

Pachi

Fuego

Gnu

Go

4000

4500 AlphaGo (Seoul v18)

9p7p5p3p1p9d

7d

5d

3d

1d1k

3k

5k

7k

Professional dan (p)

Amateur

dan (d)Beginner kyu (k)

>50% against Nature AlphaGo with 3 to 4 stone handicap

CAUTION: ratings based on self-play results

Page 31: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Evaluating Seoul AlphaGo against humans

Lee Sedol (9p): winner of 18 world titles

Match was played in Seoul, March 2016

AlphaGo won the match 4-1

Page 32: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul
Page 33: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

AlphaGo vs Lee Sedol: Game 1

Page 34: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

AlphaGo vs Lee Sedol: Game 2

Page 35: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

AlphaGo vs Lee Sedol: Game 3

Page 36: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

AlphaGo vs Lee Sedol: Game 4

Page 37: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

AlphaGo vs Lee Sedol: Game 5

Page 39: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

What’s Next?

Page 40: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Team

David Silver1*, Aja Huang1*, Chris J. Maddison1, Arthur Guez1, Laurent Sifre1, George van den Driessche1,Julian Schrittwieser1, Ioannis Antonoglou1, , Veda Panneershelvam1, Marc Lanctot1, Sander Dieleman1, Dominik Grewe1John Nham2, Nal Kalchbrenner1, Ilya Sutskever2, Timothy Lillicrap1, Madeleine Leach1, Koray Kavukcuoglu1,Thore Graepel1 & Demis Hassabis1

With thanks to: Lucas Baker, David Szepesvari, Malcolm Reynolds, Ziyu Wang, Nando De Freitas, Mike Johnson, Ilya Sutskever, Jeff Dean, Mike Marty, Sanjay Ghemawat.

DaveSilver

ChrisMaddison

ArthurGuez

LaurentSifre

GeorgeVan Den Driessche

JulianSchrittwieser

IoannisAntonoglou

VedaPanneershelvam

MarcLanctot

SanderDieleman

DominikGrewe

JohnNham

NalKalchbrenner

TimLillicrap

MaddyLeach

KorayKavukcuoglu

ThoreGraepel

AjaHuang

DemisHassabis

YutianChen

Page 41: Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Demis Hassabis