Upload
olivier-teytaud
View
209
Download
1
Embed Size (px)
DESCRIPTION
English & Spanishpresented in Cadiz 2010
Citation preview
1
AI and GO in Cadiz , Spa in
Inteligencia Artificial y GoArtificial Intelligence and Go
Thanks to all contributors to this event.Thanks to all mogo-people and mogoTW people.
Inria, Cnrs, Lri, Univ. Paris-SudNUTN, CJCU, G5K, Univ. Maastricht, Sara.nl...
Cadiz, December 2009.
2
Esta char la acerca de:
– Inteligencia artificial
- El juego del Go, y por qué el juego de Go es interesante para la inteligencia artificial.
– This talk about:● Artificial intelligence
● The game of Go, and why the game of Go is interesting for artificial intelligence.
3
AI and GO in Cadiz , Spa in
.
–Mi experiencia: Inteligencia Artificial, no Go–My expertise: Artificial Intelligence, not Go
–Mi nivel en Go: 15 kyu (en el mejor).● Yo sólo repetir los comentarios de los buenos
jugadores,● Yo pedire vuestra opiniones durante la charla (nos
divertiremos).
– My level in Go: 15 kyu.● I only repeat comments from good players,
● I will will ask for your opinion during the talk (we'll have fun).
4
Out l ine
● IA (Inteligencia artificial) AI● Games● More technical stuff● Go
5
I n te l i genc ia a r t i f i c i a l
● Buscar acciones que son dificiles para los ordenadores
● Resolverlos● Usar las soluciones
para aplicaciones importantes
6
I n te l i genc ia a r t i f i c i a l
● Find things which are difficult for computers (possibly: and easy for humans)
● Solve them● Use the solution for
important applications
7
Di f f i cu l t f o r computersM u y d i f í c i l p a r a l a s o r d e n a d o r e s .
8
Easy fo r computersN o m u y d i f í c i l p a r a l a s o r d e n a d o r e s .
9
Di f f i cu l t f o r computersM u y d i f í c i l p a r a l o s o r d e n a d o r e s .
We'll see much easiersituations poorlyunderstood.
(komi 7.5)
10
Di f f i cu l t f o r computersM u y d i f í c i l p a r a l a s o r d e n a d o r e s .
● Cocinar
● Lavar el suelo
● Soluciones Medicas
11
Cont ro l en t i empo d i sc re to y g rande d imens ión= conjuntos de problemas importantes.
No hay una solución satisfactoria en muchos casos.
Un reto: resolver como los humanos.
Discrete time, high dimension control=important family of problems. No satisfactory solution in many cases.
Challenge: doing as well as the humans.
12
Cont ro l en t i empo d i sc re to y g rande d imens ión= conjuntos de problemas importantes.
No hay una solución satisfactoria en muchos casos.
Un reto: resolver como los humanos.
= important family of problems.
No satisfactory solution in many cases.
A challenge: doing as well as the humans.
13
Cont ro l en t i empo d i sc re to y g rande d imens ión
Example of important application: planning of electric production. Similarity with go:
Playing a stone = Choosing plants
Moves of the opponent = meteo + demand +
technical troubles
14
¿ P o r q u e a p l i c a r l o s c o n o c i m i e n t o s d e l G o e n l a p r o d u c c i ó n e l e c t r i c a ?
La primera version de la nuevas tecnologias siempre fallan.
Perder una partida de Go, no es un problema.
15
Why tes t ing game o f Go ins tead o f p lann ing o f e lec t r i c p roduc t ion ?
The first trials of new techniques always fail.
Loosing a game of Go = no problem.
16
Why tes t ing game o f Go ins tead o f p lann ing o f e lec t r i c p roduc t ion ?
17
D e c i s i o n e s e s t ú p i d a s - s t u p i d m o v e s o k f o r G o , n o t f o r e l e c t r i c p r o d u c t i o n
MoGo vsCatalin Taranu, 2008
18
Cont ro l en t i empo d i sc re to y g rande d imens ión● Artificial intelligence = useful !● La inteligencia artificial = útil !
Not artificial intelligence !No es la inteligencia artificial!
19
Cont ro l en t i empo d i sc re to y g rande d imens ión● Artificial intelligence = useful !● La inteligencia artificial = útil !
Not artificial intelligence !No es la inteligencia artificial!
20
Cont ro l en t i empo d i sc re to y g rande d imens ión● Algoritmos para el juego del Go ya han
sido utilizados en muchas otras aplicaciones.
● Tools for the game of Go have already been used in many other applications.
YES !Sí!
21
Rec ien temente : g randes me jo rasRecent ly : b ig improvements
Las nuevas técnicas desarrolladas en los juegos (en particular Go). (2006-2009)
Muy eficaz también en otras aplicaciones.
Te prometo que es cierto, no lo
hago sólo por Go :-)
New techniques developed in games (in particular in Go). (2006-2009)
Very efficient in other applications as well.
I promise it's true, I don't do computer Go just for Go :-)
22
Out l ine
● IA● Juegos Games● More technical stuff● Go
23
J uegos Games
● Vamos a ver el caso de Go más tarde● A continuacion, veamos juegos en los que
los humanos son más fuertes que los ordenadores (por ahora)
● We'll see the case of Go later
● Here, other games in which humans are stronger than computers (by far)
24
J uegos d i f i c i l es : Havannah Dif f icu l t games: Havannah
Cada jugador ocupa una ubicación alternativa.
Each player fills a location alternatively.
Conectar entre dos vértices o
tres lados, o la realización
de un ciclo = Ganar.
Muy difícil para las ordenadores.
Linking two corners or three
edges or making a cycle
= winning.
Very difficult for computers.
25
¿Qué más? disparos en primera persona (parcialmente observables)
What else ? First Person Shooter (partially observable)
26
¿Qué más? Juego de estrategia (actores múltiples, parcialmente observables)
What else ? Strategy Game (multiple actors, partially observable)
27
¿Qué más? Deportes (control continuo) What else ? Sports (continuous control)
28
“Reales” juegos “Real” games
Hipótesis: si un ordenador sabe cómo jugar ping-pong (servicio), entonces este robot será eficaz para algo más que sólo juegos.
(es verdad para Go)
Assumption: if a computer understandsand guesses spins, then this robot will be efficient for something else than just games.
(holds true for Go)
29
“Real” games
Assumption: if a computer understands and guesses spins, thenthis robot will be efficient for something else than just games.
VS
30
¿Qué más? Deportes de colaboración What else ? Collaborative sports
31
Jugar = trabajo! Playing = working!
GWAP = games with a purpose= juegos con un propósito
Jugáis en la web:● El ordenador muestra una imagen y una lista de palabras tabú●
You play on the web:● The computer shows an image and a list of tabu words●
32
GWAP = games with a purpose= juegos con un propósito
Jugáis en la web:● El ordenador muestra una imagen y una lista de palabras tabú● Escribéis algunas palabras que describen la imagen●
You play on the web:● The computer shows an image and a list of tabu words● You type some words describing the image●
33
GWAP = games with a purpose= juegos con un propósito
Jugáis en la web:● El ordenador muestra una imagen y una lista de palabras tabú● Escribéis algunas palabras que describen la imagen● Otras personas están jugando al mismo juego con la misma imagen● You play on the web:
● The computer shows an image and a list of tabu words● You type some words describing the image● Other persons are playing the same game (same image)●
34
GWAP = games with a purpose= juegos con un propósito
Jugáis en la web:● El ordenador muestra una imagen y una lista de palabras tabú● Escribéis algunas palabras que describen la imagen● Otras personas están jugando al mismo juego con la misma imagen● Ganáis puntos cada vez que encontramos una palabra que * ha sido elegido también por lo menos una persona * no está en la lista tabú
You play on the web:● The computer shows an image and a list of tabu words● You type some words describing the image● Other persons are playing the same game (same image)● You earn points each time you find a word which
● has been chosen also by at least one other person● Is not in the tabu list
35
Jugar = trabajo! Playing = working!
Por cierto, estáis ayudando a un programa para la clasificación de imágenes.
Los seres humanos siguen siendo necesarios en el proceso :-)Muchos otros "juegos con un propósito" (juegáis, el resultado es una obra).
Incidentally, you are helping a program for classifying images.
Humans are still necessary in the process :-)Many others “games with a purpose”
(you play, the result is a work).
36
Out l ine
● AI● Games● Cosas más técnicas
More technical stuff● Go
Monte -Car lo T ree Search
● Monte-Carlo Tree Search (MCTS) appeared in games.
● Its most well-known variant is termed Upper Confidence Tree (UCT).
● I here present UCT.– Bandits;– Monte-Carlo approach for tree-search;– UCT.
A ``band i t ' ' p rob lem: c h o o s i n g b e t w e e n e x p l o r a t i o n a n d e x p l o i t a t i o n
● p1,...,pN unknown probabilities ∈ [0,1]
● At each time step i∈ [1,n]
– choose ui∈ {1,...,N} (as a function of u
j and r
j, j<i)
– With probability pui
● win ( ri=1 )
● loose ( ri=0 )
Un problema `` bandido'':
la elección entre la
exploración y explotación
A ``band i t ' ' p rob lem: the ta rget
● p1,...,pN unknown probabilities ∈ [0,1]
● At each time step i∈ [1,n]
– choose ui∈ {1,...,N} (as a function of u
j and r
j, j<i)
– With probability pui
● win ( ri=1 )
● loose ( ri=0 )
Regret: Rn=n max{pi} - ∑ rj (j<n)
How to minimize the regret (worst case on p) ?
Band i t s – a c lass i ca l so lu t i on
Regret: Rn=n max{pi} - ∑ rj (j<i)
UCB1: Choose u maximizing the compromise:
Empirical average for decision u
+ √( log(i)/ number of trials with decision u )
==> optimal regret O(log(n))
(Lai et al; Auer et al)
I n f in i te band i t : p rogress ive w iden ing
UCB1: Choose u maximizing the compromise:
Empirical average for decision u
+ √( log(i)/ number of trials with decision u )
==> argmax only on the i first arms
( [ 0.25 0.5 ] )
(Coulom, Chaslot et al, Wang et al)
Band i t s : much more
What is a bandit:
- a criterion (here a bandit)
defines the problem
- usually a score (typically
exploration+exploitation)
defines a criterion
==> an optimal score for a criterion is not optimal for another ==> a wide literature
Band i t s and t rees
- we have seen the
definition of discrete
time control problems;
- we have seen what are
bandits
- we now introduce trees and UCT
Coulom (06)Chaslot, Saito & Bouzy (06)Kocsis Szepesvari (06)
UCT (Upper Confidence Trees)
UCT
UCT
UCT
UCT
UCTKocsis & Szepesvari (06)
Exploitation ...
... or exploration ?
Go: from 29 to 6 stones
Asymptotically optimal move.
But all the tree is visited infinitely often!
What is used in implementations which work ?
Formula forsimulation
Go: from 29 to 6 stones
Formula forsimulation
Go: from 29 to 6 stones
Formula forsimulation
Not consistent! Sometimes: - Good move might have 0/1 - Bad move 1/(N-1) after N simulations==> we only simulate bad move!
Go: from 29 to 6 stones
Formula forsimulation
Other (better) estimates,but still inconsistent
Go: from 29 to 6 stones
nbWins + 1argmax --------------- nbLosses + 2
==> consistency ==> frugality
Formula forsimulation
57
Out l ine
● AI● Games● More technical stuff● Go
58
Out l ine fo r Go
● Programs and results● Comments on games● Weaknesses● Future ?
● Programas y resultados● Comentarios sobre las partidas● Debilidades● Futuro?
Computers in 19x19 Go
1998: ManyFaces pierde con 29 piedras contra M. Mueller1998: ManyFaces looses with 29 stones against M. Mueller
2008: win against a pro (8p) 19x19, H9 MoGo2008: win against a pro (4p) 19x19, H8-H7 CrazyStone2009: win against a pro (9p) 19x19, H7 MoGo2009: win against a pro (1p) 19x19, H6 MoGo
● But also many losses with similar handicap.● Pero también muchas pérdidas con handicap similar.● Recently: - 9P wins everything with H7 - All strong bots at the same level
Reaching human level in 9x9Llegar a nivel humano en 9x92007: win against a pro (5p) 9x9 (blitz) MoGo2008: win against a pro (5p) 9x9 white MoGo2009: win against a pro (5p) 9x9 black MoGo2009: win against a pro (9p) 9x9 white Fuego2009: win against a pro (9p) 9x9 black MoGoTW
2008: win against a pro (8p) 19x19, H9 MoGo2008: win against a pro (4p) 19x19, H8 CrazyStone2008: win against a pro (4p) 19x19, H7 CrazyStone2009: win against a pro (9p) 19x19, H7 MoGo2009: win against a pro (1p) 19x19, H6 MoGo
==> still 6 stones at least! 6 piedras por lo menos!
Programs today
Programas fuertes / Strong programs: Fuego (Canada) ManyFaces (USA) MoGo (France) Zen (Japan) CrazyStone (France) MoGoTW (France-Taiwan) Maybe KCC Igo ? (North Korea, doubts around plagiarism)
Una gran cantidad de similitudes técnicas entre ellos / a lot of technical similarities: All are UCT / MCTS / BBMCP (yesterday's talk, sorry!) All have “sequence-like” Monte-Carlo All have massive parallelization
62
Out l ine
● Programs and results● Comments on games● Weaknesses● Future ?
63
Robots fue r tes en l as pe leas Bots s t rong in f igh ts
Zen vs Shen-Su Chang 6D
64
Robots fue r tes en l as pe leas Bots s t rong in f igh ts
65
Robots fue r tes en l as pe leas Bots s t rong in f igh ts
Mistake ! Group dead...
66
Robots fue r tes en l as pe leas Bots s t rong in f igh ts
... excepto cuando varias peleas sin terminar
...except when multiple unfinished fights
Los seres humanos se recuerdan de sus soluciones a las luchas locales. Los ordenadores no.
Humans keep in memory their solutions to local fights. Computers don't.
==> multiple unfinished fights make program slower, and therefore weaker everywhere on the board
67
Mul t ip le f igh ts Va r ias pe leas s in reso lve r
68
9x9 l i b ro de aper tu ras 9x9 open ing books
● ¿Hay errores en los libros de 9x9 aperturas de los programas?– Auto construido (MoGo) (inicialización de
expertos)– Hechos a mano por expertos (Fuego, Zen)
● Are there mistakes in the 9x9 opening books of programs ?– Self built (MoGo) (small expert initialization)– Handcrafted by experts (Fuego, Zen)
69
9x9 handcra f ted open ing book (Fuego , b lack )
Move 1
Move 3: lots of debates; error or not error ?
Move 2
70
9x9 handcra f ted open ing book (Fuego , b lack ) w i th komi 7 .5
Move 1
Move 3: lots of debates; error or not error ?
Move 2
71
9x9 handcra f ted open ing book (Fuego wh i te ; shor t open ing ,w ins )
72
9x9 handcra f ted open ing book (Zen , b lack ; shor t open ing , w ins )
Comments ?
73
9x9 handcra f ted open ing book (Zen , wh i te ; shor t open ing , w ins )
Comments ?
74
A lmost se l f -bu i l t 9x9 open ing book (mogo wh i te )
According to some observers:- opening ok- bad move later
do you agree with this ?
75
A lmost se l f -bu i l t 9x9 open ing book (mogo b lack )
Comments ? Correct opening for black ?
76
Robots demas iado agres ivo Bots too aggress ive
77
M o G o t r a t a n d o d e m a t a r a l a s d o s p i e d r a s b l a n c a sM o G o t r y i n g t o k i l l t h e t w o w h i t e s t o n e s
78
Out l ine
● Results● Comments on games● Debilidades Weaknesses● Future ?
79
Weaknesses
● Estúpidas decisiones en situaciones desesperadas Stupid moves when desperate situations
● Computers stupid in liberty races● Too many stones for securing the center● Life and death problems
80
Computers p lay s tup id moves when in a despera te s i tua t ion
81
O r d e n a d o r e s h a c e n j u g a d a s e s t u p i d a s e n s i t u a c i o n e s a r i e s g a d a s
¿Los ordenadores estan locos ?
Los ordenadores eligen solucciones con grandes posibilidades de ganar– Hacen movimientos con
posibilidad de ganar nada (en situaciones ariesgadas)
– Extranos movimientos tienen posibilidad 0.01 (las personas estan dormidas...)
82
Computers p lay s tup id moves when in a despera te s i tua t ion
Is it because computers are crazy ?
Computers choose moves with the highest probability of winning– Reasonable moves have winning
probability 0 (in desperate situations)
– Some strange moves have probability 0.01 (well, if the human is weak, or sleeping or drunk...)
83
Computers p lay s tup id moves when in a despera te s i tua t ionIf your opponent has a nuclear weapon,
and you have just a sword, use the sword.
==> Leads to stupid endgames
==> But does not change the overall probability of success (you're dead anyway)
84
Weaknesses
● (Stupid moves when desperate situations)● Las computadoras son estúpidas en
semeai Computers stupid in liberty races
● Too many stones for securing the center● Life and death problems
85
C o m p u t a d o r a s e s t ú p i d a s e n s e m e a i C o m p u t e r s s t u p i d i n s e m e a i
SemeaiFor people who don't play go
Para las personas que no juegan GoPlenty of equivalent
situations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Semeai
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Semeai
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Semeai
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Semeai
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Semeai
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Semeai
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Semeai
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
muchas situaciones idénticasplenty of equivalent situations
un montón de situaciones idénticasplenty of equivalent situations
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
Plenty of equivalentsituations!
They are randomlysampled, with
no generalization.
50% of estimatedwin probability!
un montón de situaciones idénticasplenty of equivalent situations
It does not work. Why ?
50% of estimatedwin probability!
In each node up in the tree:● The first simulations ==> ~ 50%● Later, simulations go to 100% or 0% (depending on the chosen move)● But, then, we switch to another node (~ 8! x 8! such nodes)
And the humans ?50% of estimated
win probability!
In each node up in the tree:● The first simulations ==> ~ 50%● Later, simulations go to 100% or 0% (depending on the chosen move)● But, then, the human does not switch to another node !Los seres humanos sabemos que el orden no importa
99
Semea is
Should
white
play in
the
semeai
(G1)
or capture
(J15) ?
100
Semea is
Should black
play the
semeai ?
Negro debe
jugar semeai?
101
Semea is
Should black
play the
semeai ?
Negro debe
jugar semeai?
102
Semea is
Should black
play the
semeai ?
Negro debe
jugar semeai?
Useless!
¡Inútil!
103
Weaknesses
● (Stupid moves when desperate situations)● Computers stupid in liberty races● Demasiadas piedras para asegurar el
centro Too many stones for securing the center
● Life and death problems
104
Computers spend too much s tones fo r the cen te r
● Utilisamos 24 horas de CPU para saber la mejora primera jugada.
● Y el resultado es ...
● We spent 24h CPU to know which move is the best according to Monte-Carlo programs at the very beginning.
● And the result is ...
105
Computers spend too much s tones fo r the cen te r
...probably notthe good answer.... probablementeno la buena respuesta
106
Computers spend too much s tones fo r the cen te r
But nowadaysmogo agreesthat K10is a bad idea.
Pero hoy en díaMoGo está de acuerdoque K10es una mala idea.
107
Computers spend too much s tones fo r the cen te r
● Menos cierto que hace dos años (su opinión?) Less true than two years ago (your opinion ?)
● Essential reason for this improvement: more diversity in the Monte-Carlo
108
ManyFaces
Are there
too many
stones
securing the
center ?
¿Hay
demasiadas
piedras
garantizar la
centro?
109
Weaknesses
● (Stupid moves when desperate situations)● Computers stupid in liberty races● Too many stones for securing the center ?● Vida y muerte Life and death
110
Example : l i f e and dea th p rob lem
111
M a j o r i m p r o v e m e n t s s i n c e 2 0 0 6L a s p r i n c i p a l e s m e j o r a s d e s d e e l a ñ o 2 0 0 6
● Much better default policy (in MoGo, default policy by Y. Wang ==> “sequence-like simulations” now more or less in all efficient programs)
● Multi-core Parallelization● Message passing Parallelization● Bias in the tree (patterns, rules)● RAVE values ===> permutación● Opening book by meta-UCT● Diversity preservation
112
Só lo o t ros l ad r i l l os en l a pa red?Jus t o ther b r i cks in the wa l l ?
Las limitaciones profundas persisten.
Deep limitations remain.● Recent improvements = big improvement
in self-play– Parallelization– Opening books
● But small progress against humans.● No - some situations are very poorly
handled by all programs.
113
The wa l l
● Por cierto, MoGo ganó un partido con H7 contra un profesional (9p, ganador de la Copa LG).
● Pero MoGo perdió muchos juegos con H7.
● Zen y ManyFaces perdieron recientemente también sus partidos H7 contra 9p jugadores.
● For sure, MoGo won one game with H7 against a top pro (9p, winner of LG Cup).
● But MoGo then lost many games with H7.
● Zen and ManyFaces recently also lost all their H7 games against 9p players.
114
J u s t o ther b r i cks in the wa l l ?
==> muchas mejoras, pero no "gran" mejora contra los seres humanos;
==> many improvements but no “big” improvement against humans;
Limitación = efectos a largo plazo (vida y muerte, semeai); (limit = long term effects – life&death,semeai)
==> many trials, no success.
(conditional Monte-Carlo, learning...)
115
Out l ine
● Results● Comments on games● Weaknesses● Futuro ? Future ?
116
More pa t te rns ? P robab ly no t .
117
Opt im iza t ion o f the t ree po l i cy (band i t fo rmu la ) ?
● Already very complicated in Go
● Only minor improvements
● Interesting for other applications, probably not for Go
118
Opt im iza t ion o f the Monte -Car lo
● Handcrafted for the moment
● Some of the biggest recent improvements
● Won't solve liberty races
119
Cond i t i ona l Monte -Car lo – m ix ing Monte -Car lo and tac t i ca l sea rch
● Remove simulations which are not consistent with tactical solvers
● Nice idea● Not yet
efficient
120
I dea : remove s imu la t ions w i th b lack a l i ve w i thout k i l l i ng wh i te
121
Don ' t overes t imate tac t i ca l so lve rs
● When MoGo does not solve a situation, people often tell me “simple solvers solve this”
● This is not true:– Solvers solve it if when you remove
anything else than the problem from the goban.
– They don't solve it in real situation, within a complete goban.
122
Contex tua l Monte -Car lo
● Keep statistics in order to improve the Monte-Carlo during a game
● Nice idea● Not yet (very)
efficient
123
Learn ing in Monte -Car lo Go
● If C5 E5 is absolutely obvious after D3, the program will perhaps find it:
i.e. it will only consider sequences with D3 followed by C5 E5
● It will do the computation once for D3 as a first move
● Once also for each time D3 is considered in the tree ==> humans certainly don't do that !
● We want to generalize from one branch to another
124
Open ing book in 9x9 ?
● Still too small, and sometimes bad moves (humans: your opinion ?).
● We already spent over 1 century of CPU for building it
● Possibilities:– Building opening books by playing against
pros instead of self-play (needs plenty of pro brains instead of CPU-years)
– Using something like seti@home– Or handcrafted opening books (as in Fuego) ?
¡Gracias! + Biblio Bandits: Lai, Robbins, Auer, Cesa-Bianchi... UCT: Kocsis, Szepesvari, Coquelin, Munos... MCTS (Go): Coulom, Chaslot, Fiter, Gelly, Hoock, Silver, Muller,
Pérez, Rimmel, Wang... Tree + DP for industrial applicationl: Péret, Garcia... Bandits with infinitely many arms:
Audibert, Coulom, Munos, Wang... Applications far from Go: Rolet,
Teytaud (F), Rimmel, De Mesmay ...Links with “macro-actions”
Not artificial intelligence !No es la inteligencia artificial!