Inteligencia Artificial y Go

1

AI and GO in Cadiz , Spa in

Inteligencia Artificial y GoArtificial Intelligence and Go

Thanks to all contributors to this event.Thanks to all mogo-people and mogoTW people.

Inria, Cnrs, Lri, Univ. Paris-SudNUTN, CJCU, G5K, Univ. Maastricht, Sara.nl...

Cadiz, December 2009.

2

Esta char la acerca de:

– Inteligencia artificial

- El juego del Go, y por qué el juego de Go es interesante para la inteligencia artificial.

– This talk about:● Artificial intelligence

● The game of Go, and why the game of Go is interesting for artificial intelligence.

3

AI and GO in Cadiz , Spa in

.

–Mi experiencia: Inteligencia Artificial, no Go–My expertise: Artificial Intelligence, not Go

–Mi nivel en Go: 15 kyu (en el mejor).● Yo sólo repetir los comentarios de los buenos

jugadores,● Yo pedire vuestra opiniones durante la charla (nos

divertiremos).

– My level in Go: 15 kyu.● I only repeat comments from good players,

● I will will ask for your opinion during the talk (we'll have fun).

4

Out l ine

● IA (Inteligencia artificial) AI● Games● More technical stuff● Go

5

I n te l i genc ia a r t i f i c i a l

● Buscar acciones que son dificiles para los ordenadores

● Resolverlos● Usar las soluciones

para aplicaciones importantes

6

I n te l i genc ia a r t i f i c i a l

● Find things which are difficult for computers (possibly: and easy for humans)

● Solve them● Use the solution for

important applications

7

Di f f i cu l t f o r computersM u y d i f í c i l p a r a l a s o r d e n a d o r e s .

8

Easy fo r computersN o m u y d i f í c i l p a r a l a s o r d e n a d o r e s .

9

Di f f i cu l t f o r computersM u y d i f í c i l p a r a l o s o r d e n a d o r e s .

We'll see much easiersituations poorlyunderstood.

(komi 7.5)

10

Di f f i cu l t f o r computersM u y d i f í c i l p a r a l a s o r d e n a d o r e s .

● Cocinar

● Lavar el suelo

● Soluciones Medicas

11

Cont ro l en t i empo d i sc re to y g rande d imens ión= conjuntos de problemas importantes.

No hay una solución satisfactoria en muchos casos.

Un reto: resolver como los humanos.

Discrete time, high dimension control=important family of problems. No satisfactory solution in many cases.

Challenge: doing as well as the humans.

12

Cont ro l en t i empo d i sc re to y g rande d imens ión= conjuntos de problemas importantes.

No hay una solución satisfactoria en muchos casos.

Un reto: resolver como los humanos.

= important family of problems.

No satisfactory solution in many cases.

A challenge: doing as well as the humans.

13

Cont ro l en t i empo d i sc re to y g rande d imens ión

Example of important application: planning of electric production. Similarity with go:

Playing a stone = Choosing plants

Moves of the opponent = meteo + demand +

technical troubles

14

¿ P o r q u e a p l i c a r l o s c o n o c i m i e n t o s d e l G o e n l a p r o d u c c i ó n e l e c t r i c a ?

La primera version de la nuevas tecnologias siempre fallan.

Perder una partida de Go, no es un problema.

15

Why tes t ing game o f Go ins tead o f p lann ing o f e lec t r i c p roduc t ion ?

The first trials of new techniques always fail.

Loosing a game of Go = no problem.

16

Why tes t ing game o f Go ins tead o f p lann ing o f e lec t r i c p roduc t ion ?

17

D e c i s i o n e s e s t ú p i d a s - s t u p i d m o v e s o k f o r G o , n o t f o r e l e c t r i c p r o d u c t i o n

MoGo vsCatalin Taranu, 2008

18

Cont ro l en t i empo d i sc re to y g rande d imens ión● Artificial intelligence = useful !● La inteligencia artificial = útil !

Not artificial intelligence !No es la inteligencia artificial!

19

Cont ro l en t i empo d i sc re to y g rande d imens ión● Artificial intelligence = useful !● La inteligencia artificial = útil !


20

Cont ro l en t i empo d i sc re to y g rande d imens ión● Algoritmos para el juego del Go ya han

sido utilizados en muchas otras aplicaciones.

● Tools for the game of Go have already been used in many other applications.

YES !Sí!

21

Rec ien temente : g randes me jo rasRecent ly : b ig improvements

Las nuevas técnicas desarrolladas en los juegos (en particular Go). (2006-2009)

Muy eficaz también en otras aplicaciones.

Te prometo que es cierto, no lo

hago sólo por Go :-)

New techniques developed in games (in particular in Go). (2006-2009)

Very efficient in other applications as well.

I promise it's true, I don't do computer Go just for Go :-)

22

Out l ine

● IA● Juegos Games● More technical stuff● Go

23

J uegos Games

● Vamos a ver el caso de Go más tarde● A continuacion, veamos juegos en los que

los humanos son más fuertes que los ordenadores (por ahora)

● We'll see the case of Go later

● Here, other games in which humans are stronger than computers (by far)

24

J uegos d i f i c i l es : Havannah Dif f icu l t games: Havannah

Cada jugador ocupa una ubicación alternativa.

Each player fills a location alternatively.

Conectar entre dos vértices o

tres lados, o la realización

de un ciclo = Ganar.

Muy difícil para las ordenadores.

Linking two corners or three

edges or making a cycle

= winning.

Very difficult for computers.

25

¿Qué más? disparos en primera persona (parcialmente observables)

What else ? First Person Shooter (partially observable)

26

¿Qué más? Juego de estrategia (actores múltiples, parcialmente observables)

What else ? Strategy Game (multiple actors, partially observable)

27

¿Qué más? Deportes (control continuo) What else ? Sports (continuous control)

28

“Reales” juegos “Real” games

Hipótesis: si un ordenador sabe cómo jugar ping-pong (servicio), entonces este robot será eficaz para algo más que sólo juegos.

(es verdad para Go)

Assumption: if a computer understandsand guesses spins, then this robot will be efficient for something else than just games.

(holds true for Go)

29

“Real” games

Assumption: if a computer understands and guesses spins, thenthis robot will be efficient for something else than just games.

VS

30

¿Qué más? Deportes de colaboración What else ? Collaborative sports

31

Jugar = trabajo! Playing = working!

GWAP = games with a purpose= juegos con un propósito

Jugáis en la web:● El ordenador muestra una imagen y una lista de palabras tabú●

You play on the web:● The computer shows an image and a list of tabu words●

32


Jugáis en la web:● El ordenador muestra una imagen y una lista de palabras tabú● Escribéis algunas palabras que describen la imagen●

You play on the web:● The computer shows an image and a list of tabu words● You type some words describing the image●

33


Jugáis en la web:● El ordenador muestra una imagen y una lista de palabras tabú● Escribéis algunas palabras que describen la imagen● Otras personas están jugando al mismo juego con la misma imagen● You play on the web:

● The computer shows an image and a list of tabu words● You type some words describing the image● Other persons are playing the same game (same image)●

34


Jugáis en la web:● El ordenador muestra una imagen y una lista de palabras tabú● Escribéis algunas palabras que describen la imagen● Otras personas están jugando al mismo juego con la misma imagen● Ganáis puntos cada vez que encontramos una palabra que * ha sido elegido también por lo menos una persona * no está en la lista tabú

You play on the web:● The computer shows an image and a list of tabu words● You type some words describing the image● Other persons are playing the same game (same image)● You earn points each time you find a word which

● has been chosen also by at least one other person● Is not in the tabu list

35

Jugar = trabajo! Playing = working!

Por cierto, estáis ayudando a un programa para la clasificación de imágenes.

Los seres humanos siguen siendo necesarios en el proceso :-)Muchos otros "juegos con un propósito" (juegáis, el resultado es una obra).

Incidentally, you are helping a program for classifying images.

Humans are still necessary in the process :-)Many others “games with a purpose”

(you play, the result is a work).

36

Out l ine

● AI● Games● Cosas más técnicas

More technical stuff● Go

Monte -Car lo T ree Search

● Monte-Carlo Tree Search (MCTS) appeared in games.

● Its most well-known variant is termed Upper Confidence Tree (UCT).

● I here present UCT.– Bandits;– Monte-Carlo approach for tree-search;– UCT.

A ``band i t ' ' p rob lem: c h o o s i n g b e t w e e n e x p l o r a t i o n a n d e x p l o i t a t i o n

● p1,...,pN unknown probabilities ∈ [0,1]

● At each time step i∈ [1,n]

– choose ui∈ {1,...,N} (as a function of u

j and r

j, j<i)

– With probability pui

● win ( ri=1 )

● loose ( ri=0 )

Un problema `` bandido'':

la elección entre la

exploración y explotación

A ``band i t ' ' p rob lem: the ta rget

● p1,...,pN unknown probabilities ∈ [0,1]

● At each time step i∈ [1,n]

– choose ui∈ {1,...,N} (as a function of u

j and r

j, j<i)

– With probability pui

● win ( ri=1 )

● loose ( ri=0 )

Regret: Rn=n max{pi} - ∑ rj (j<n)

How to minimize the regret (worst case on p) ?

Band i t s – a c lass i ca l so lu t i on

Regret: Rn=n max{pi} - ∑ rj (j<i)

UCB1: Choose u maximizing the compromise:

Empirical average for decision u

+ √( log(i)/ number of trials with decision u )

==> optimal regret O(log(n))

(Lai et al; Auer et al)

I n f in i te band i t : p rogress ive w iden ing

UCB1: Choose u maximizing the compromise:

Empirical average for decision u

+ √( log(i)/ number of trials with decision u )

==> argmax only on the i first arms

( [ 0.25 0.5 ] )

(Coulom, Chaslot et al, Wang et al)

Band i t s : much more

What is a bandit:

- a criterion (here a bandit)

defines the problem

- usually a score (typically

exploration+exploitation)

defines a criterion

==> an optimal score for a criterion is not optimal for another ==> a wide literature

Band i t s and t rees

- we have seen the

definition of discrete

time control problems;

- we have seen what are

bandits

- we now introduce trees and UCT

Coulom (06)Chaslot, Saito & Bouzy (06)Kocsis Szepesvari (06)

UCT (Upper Confidence Trees)

UCT

UCT

UCT

UCT

UCTKocsis & Szepesvari (06)

Exploitation ...

... or exploration ?

Go: from 29 to 6 stones

Asymptotically optimal move.

But all the tree is visited infinitely often!

What is used in implementations which work ?

Formula forsimulation





Not consistent! Sometimes: - Good move might have 0/1 - Bad move 1/(N-1) after N simulations==> we only simulate bad move!



Other (better) estimates,but still inconsistent


nbWins + 1argmax --------------- nbLosses + 2

==> consistency ==> frugality


57

Out l ine

● AI● Games● More technical stuff● Go

58

Out l ine fo r Go

● Programs and results● Comments on games● Weaknesses● Future ?

● Programas y resultados● Comentarios sobre las partidas● Debilidades● Futuro?

Computers in 19x19 Go

1998: ManyFaces pierde con 29 piedras contra M. Mueller1998: ManyFaces looses with 29 stones against M. Mueller

2008: win against a pro (8p) 19x19, H9 MoGo2008: win against a pro (4p) 19x19, H8-H7 CrazyStone2009: win against a pro (9p) 19x19, H7 MoGo2009: win against a pro (1p) 19x19, H6 MoGo

● But also many losses with similar handicap.● Pero también muchas pérdidas con handicap similar.● Recently: - 9P wins everything with H7 - All strong bots at the same level

Reaching human level in 9x9Llegar a nivel humano en 9x92007: win against a pro (5p) 9x9 (blitz) MoGo2008: win against a pro (5p) 9x9 white MoGo2009: win against a pro (5p) 9x9 black MoGo2009: win against a pro (9p) 9x9 white Fuego2009: win against a pro (9p) 9x9 black MoGoTW

2008: win against a pro (8p) 19x19, H9 MoGo2008: win against a pro (4p) 19x19, H8 CrazyStone2008: win against a pro (4p) 19x19, H7 CrazyStone2009: win against a pro (9p) 19x19, H7 MoGo2009: win against a pro (1p) 19x19, H6 MoGo

==> still 6 stones at least! 6 piedras por lo menos!

Programs today

Programas fuertes / Strong programs: Fuego (Canada) ManyFaces (USA) MoGo (France) Zen (Japan) CrazyStone (France) MoGoTW (France-Taiwan) Maybe KCC Igo ? (North Korea, doubts around plagiarism)

Una gran cantidad de similitudes técnicas entre ellos / a lot of technical similarities: All are UCT / MCTS / BBMCP (yesterday's talk, sorry!) All have “sequence-like” Monte-Carlo All have massive parallelization

62

Out l ine

● Programs and results● Comments on games● Weaknesses● Future ?

63

Robots fue r tes en l as pe leas Bots s t rong in f igh ts

Zen vs Shen-Su Chang 6D

64


65


Mistake ! Group dead...

66


... excepto cuando varias peleas sin terminar

...except when multiple unfinished fights

Los seres humanos se recuerdan de sus soluciones a las luchas locales. Los ordenadores no.

Humans keep in memory their solutions to local fights. Computers don't.

==> multiple unfinished fights make program slower, and therefore weaker everywhere on the board

67

Mul t ip le f igh ts Va r ias pe leas s in reso lve r

68

9x9 l i b ro de aper tu ras 9x9 open ing books

● ¿Hay errores en los libros de 9x9 aperturas de los programas?– Auto construido (MoGo) (inicialización de

expertos)– Hechos a mano por expertos (Fuego, Zen)

● Are there mistakes in the 9x9 opening books of programs ?– Self built (MoGo) (small expert initialization)– Handcrafted by experts (Fuego, Zen)

69

9x9 handcra f ted open ing book (Fuego , b lack )

Move 1

Move 3: lots of debates; error or not error ?

Move 2

70

9x9 handcra f ted open ing book (Fuego , b lack ) w i th komi 7 .5

Move 1

Move 3: lots of debates; error or not error ?

Move 2

71

9x9 handcra f ted open ing book (Fuego wh i te ; shor t open ing ,w ins )

72

9x9 handcra f ted open ing book (Zen , b lack ; shor t open ing , w ins )

Comments ?

73

9x9 handcra f ted open ing book (Zen , wh i te ; shor t open ing , w ins )

Comments ?

74

A lmost se l f -bu i l t 9x9 open ing book (mogo wh i te )

According to some observers:- opening ok- bad move later

do you agree with this ?

75

A lmost se l f -bu i l t 9x9 open ing book (mogo b lack )

Comments ? Correct opening for black ?

76

Robots demas iado agres ivo Bots too aggress ive

77

M o G o t r a t a n d o d e m a t a r a l a s d o s p i e d r a s b l a n c a sM o G o t r y i n g t o k i l l t h e t w o w h i t e s t o n e s

78

Out l ine

● Results● Comments on games● Debilidades Weaknesses● Future ?

79

Weaknesses

● Estúpidas decisiones en situaciones desesperadas Stupid moves when desperate situations

● Computers stupid in liberty races● Too many stones for securing the center● Life and death problems

80

Computers p lay s tup id moves when in a despera te s i tua t ion

81

O r d e n a d o r e s h a c e n j u g a d a s e s t u p i d a s e n s i t u a c i o n e s a r i e s g a d a s

¿Los ordenadores estan locos ?

Los ordenadores eligen solucciones con grandes posibilidades de ganar– Hacen movimientos con

posibilidad de ganar nada (en situaciones ariesgadas)

– Extranos movimientos tienen posibilidad 0.01 (las personas estan dormidas...)

82

Computers p lay s tup id moves when in a despera te s i tua t ion

Is it because computers are crazy ?

Computers choose moves with the highest probability of winning– Reasonable moves have winning

probability 0 (in desperate situations)

– Some strange moves have probability 0.01 (well, if the human is weak, or sleeping or drunk...)

83

Computers p lay s tup id moves when in a despera te s i tua t ionIf your opponent has a nuclear weapon,

and you have just a sword, use the sword.

==> Leads to stupid endgames

==> But does not change the overall probability of success (you're dead anyway)

84

Weaknesses

● (Stupid moves when desperate situations)● Las computadoras son estúpidas en

semeai Computers stupid in liberty races

● Too many stones for securing the center● Life and death problems

85

C o m p u t a d o r a s e s t ú p i d a s e n s e m e a i C o m p u t e r s s t u p i d i n s e m e a i

SemeaiFor people who don't play go

Para las personas que no juegan GoPlenty of equivalent

situations!

They are randomlysampled, with

no generalization.

50% of estimatedwin probability!

Semeai

Plenty of equivalentsituations!


no generalization.


Semeai



no generalization.


Semeai



no generalization.


Semeai



no generalization.


Semeai



no generalization.


Semeai



no generalization.


Semeai



no generalization.




no generalization.


muchas situaciones idénticasplenty of equivalent situations

un montón de situaciones idénticasplenty of equivalent situations



no generalization.




no generalization.


un montón de situaciones idénticasplenty of equivalent situations

It does not work. Why ?


In each node up in the tree:● The first simulations ==> ~ 50%● Later, simulations go to 100% or 0% (depending on the chosen move)● But, then, we switch to another node (~ 8! x 8! such nodes)

And the humans ?50% of estimated

win probability!

In each node up in the tree:● The first simulations ==> ~ 50%● Later, simulations go to 100% or 0% (depending on the chosen move)● But, then, the human does not switch to another node !Los seres humanos sabemos que el orden no importa

99

Semea is

Should

white

play in

the

semeai

(G1)

or capture

(J15) ?

100

Semea is

Should black

play the

semeai ?

Negro debe

jugar semeai?

101

Semea is

Should black

play the

semeai ?

Negro debe

jugar semeai?

102

Semea is

Should black

play the

semeai ?

Negro debe

jugar semeai?

Useless!

¡Inútil!

103

Weaknesses

● (Stupid moves when desperate situations)● Computers stupid in liberty races● Demasiadas piedras para asegurar el

centro Too many stones for securing the center

● Life and death problems

104

Computers spend too much s tones fo r the cen te r

● Utilisamos 24 horas de CPU para saber la mejora primera jugada.

● Y el resultado es ...

● We spent 24h CPU to know which move is the best according to Monte-Carlo programs at the very beginning.

● And the result is ...

105


...probably notthe good answer.... probablementeno la buena respuesta

106


But nowadaysmogo agreesthat K10is a bad idea.

Pero hoy en díaMoGo está de acuerdoque K10es una mala idea.

107


● Menos cierto que hace dos años (su opinión?) Less true than two years ago (your opinion ?)

● Essential reason for this improvement: more diversity in the Monte-Carlo

108

ManyFaces

Are there

too many

stones

securing the

center ?

¿Hay

demasiadas

piedras

garantizar la

centro?

109

Weaknesses

● (Stupid moves when desperate situations)● Computers stupid in liberty races● Too many stones for securing the center ?● Vida y muerte Life and death

110

Example : l i f e and dea th p rob lem

111

M a j o r i m p r o v e m e n t s s i n c e 2 0 0 6L a s p r i n c i p a l e s m e j o r a s d e s d e e l a ñ o 2 0 0 6

● Much better default policy (in MoGo, default policy by Y. Wang ==> “sequence-like simulations” now more or less in all efficient programs)

● Multi-core Parallelization● Message passing Parallelization● Bias in the tree (patterns, rules)● RAVE values ===> permutación● Opening book by meta-UCT● Diversity preservation

112

Só lo o t ros l ad r i l l os en l a pa red?Jus t o ther b r i cks in the wa l l ?

Las limitaciones profundas persisten.

Deep limitations remain.● Recent improvements = big improvement

in self-play– Parallelization– Opening books

● But small progress against humans.● No - some situations are very poorly

handled by all programs.

113

The wa l l

● Por cierto, MoGo ganó un partido con H7 contra un profesional (9p, ganador de la Copa LG).

● Pero MoGo perdió muchos juegos con H7.

● Zen y ManyFaces perdieron recientemente también sus partidos H7 contra 9p jugadores.

● For sure, MoGo won one game with H7 against a top pro (9p, winner of LG Cup).

● But MoGo then lost many games with H7.

● Zen and ManyFaces recently also lost all their H7 games against 9p players.

114

J u s t o ther b r i cks in the wa l l ?

==> muchas mejoras, pero no "gran" mejora contra los seres humanos;

==> many improvements but no “big” improvement against humans;

Limitación = efectos a largo plazo (vida y muerte, semeai); (limit = long term effects – life&death,semeai)

==> many trials, no success.

(conditional Monte-Carlo, learning...)

115

Out l ine

● Results● Comments on games● Weaknesses● Futuro ? Future ?

116

More pa t te rns ? P robab ly no t .

117

Opt im iza t ion o f the t ree po l i cy (band i t fo rmu la ) ?

● Already very complicated in Go

● Only minor improvements

● Interesting for other applications, probably not for Go

118

Opt im iza t ion o f the Monte -Car lo

● Handcrafted for the moment

● Some of the biggest recent improvements

● Won't solve liberty races

119

Cond i t i ona l Monte -Car lo – m ix ing Monte -Car lo and tac t i ca l sea rch

● Remove simulations which are not consistent with tactical solvers

● Nice idea● Not yet

efficient

120

I dea : remove s imu la t ions w i th b lack a l i ve w i thout k i l l i ng wh i te

121

Don ' t overes t imate tac t i ca l so lve rs

● When MoGo does not solve a situation, people often tell me “simple solvers solve this”

● This is not true:– Solvers solve it if when you remove

anything else than the problem from the goban.

– They don't solve it in real situation, within a complete goban.

122

Contex tua l Monte -Car lo

● Keep statistics in order to improve the Monte-Carlo during a game

● Nice idea● Not yet (very)

efficient

123

Learn ing in Monte -Car lo Go

● If C5 E5 is absolutely obvious after D3, the program will perhaps find it:

i.e. it will only consider sequences with D3 followed by C5 E5

● It will do the computation once for D3 as a first move

● Once also for each time D3 is considered in the tree ==> humans certainly don't do that !

● We want to generalize from one branch to another

124

Open ing book in 9x9 ?

● Still too small, and sometimes bad moves (humans: your opinion ?).

● We already spent over 1 century of CPU for building it

● Possibilities:– Building opening books by playing against

pros instead of self-play (needs plenty of pro brains instead of CPU-years)

– Using something like seti@home– Or handcrafted opening books (as in Fuego) ?

mailto:seti@home

¡Gracias! + Biblio Bandits: Lai, Robbins, Auer, Cesa-Bianchi... UCT: Kocsis, Szepesvari, Coquelin, Munos... MCTS (Go): Coulom, Chaslot, Fiter, Gelly, Hoock, Silver, Muller,

Pérez, Rimmel, Wang... Tree + DP for industrial applicationl: Péret, Garcia... Bandits with infinitely many arms:

Audibert, Coulom, Munos, Wang... Applications far from Go: Rolet,

Teytaud (F), Rimmel, De Mesmay ...Links with “macro-actions”


Technology

Inteligencia Artificial y Go