24
By Lauren Gillespie, Gabby Gonzales and Jacob Schrum [email protected] [email protected] [email protected] Howard Hughes Medical Institute Comparing Direct and Indirect Encodings Using Both Raw and Hand-Designed Features in Tetris Southwestern University

Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum [email protected] [email protected]

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

By Lauren Gillespie, Gabby Gonzales and Jacob [email protected] [email protected] [email protected]

Howard Hughes Medical Institute

Comparing Direct and Indirect Encodings Using Both

Raw and Hand-Designed Features in Tetris

Southwestern University

Page 2: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Introduction• Challenge: Use less domain-specific knowledge

✦ Important for general game-playing agents

✦ Requires using raw features

✦ Difficult to train agents

• This Research

✦ Compare evolutionary algorithm HyperNEAT to NEAT

✦ See if indirect encoding of HyperNEAT advantageous

✦ Also compare with hand-designed features

Page 3: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Tetris Domain• Consists of 10 x 20 game board

• Orient tetrominoes to clear lines

• Clearing multiple lines = more points

• Hole: open spaces with at least one block above

• Previous Work ✦ All use hand-designed features

✦ Reinforcement learning† and evolutionary computation‡

†Bertsekas et al. 1996. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming. ‡ Boumaza 2009. On the Evolution of Artificial Tetris Players.

Page 4: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

• One feature per game state element

• Little input processing ✦ Less limited by

domain† ✦ Less human expertise

needed ✦ Large input space &

networks ✦ Harder to learn, more

time

Hand-Designed vs. Raw Features

• Hand-picked information of game state as input

• User processes input ✦ Smaller input space,

easier to learn ✦ Very domain-

specific, not versatile ✦ Human expertise

needed

Pros:

Cons:

Hand-Designed Raw

Pros:

Cons:

Page 5: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

NEAT vs. HyperNEAT

NEAT HyperNEAT

Evolved network and agent network

Direct Encoding Indirect Encoding

Evolved network (CPPN)

Agent network

UTILITY

X

Y

X

Y

Bias

… ……

Page 6: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

• Board configuration: ✦ 2 input sets: location of all blocks, location of all holes

• NEAT: Inputs sets given as linear sequence • HyperNEAT: Two 2D input substrates

Raw Features Setup

Substrate layers

Input substrates CPPN

UTILITY

X

Y

X

Y

Bias

Game State

detailed view

Agent network… ……

Page 7: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

• Bertsekas et al. features† plus additional hole per column feature

• All scaled to [0,1]

✦ Column height

✦ Height difference

✦ Tallest column

✦ Number of holes

✦ Holes per column

Hand-Designed Features Setup

† Bersekas et al. 1996. Neuro-Dynamic Programming

X

Y

X

Y

Bias

UTILITY

HEIGHTS DIFFS HOLES

MAX HEIGHT

TOTAL HOLES

✦ ✦ ✦

HyperNEAT Setup

Page 8: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

NEAT vs. HyperNEAT: Raw Features

0

50

100

150

200

250

300

350

400

0 100 200 300 400 500

Gam

e S

core

Generation

HyperNEAT RawNEAT Raw

Page 9: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

NEAT vs. HyperNEAT: Hand-Designed Features

0

5000

10000

15000

20000

25000

30000

35000

0 100 200 300 400 500

Gam

e S

core

Generation

HyperNEAT FeaturesNEAT Features

Page 10: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Raw Features Champion Behavior

NEAT with Raw Features HyperNEAT with Raw Features

Page 11: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Hand-Designed Features Behavior

NEAT with Hand-Designed Features

HyperNEAT with Hand-Designed Features

Page 12: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Conclusion• Raw features

✦ Indirect encoding of HyperNEAT effective

✦ Geometric awareness an advantage

• Hand-designed features

✦ Ultimately NEAT produced better agents

• Future work:

✦ HybrID might combine strengths of both

✦ Raw Features in other domains, Ms. Pac-Man

Page 13: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Questions?• Contact info:

[email protected] [email protected] [email protected]

• Movies and Code:  https://tinyurl.com/tetris-gecco2017

Page 14: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Auxiliary Slides

Page 15: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Visualizing Substrates

Hidden OutputInputs Result

Page 16: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Experimental Setup• Agent networks evaluated each piece placement

• Each experiment evaluated with 30 runs

✦ 500 generations/run, 50 agents/generation

✦ Objectives averaged across 3 trials/agent

❖ Noisy domain, multiple trials needed

• NSGA-II objectives: game score & survival time

Page 17: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

NSGA-II• Pareto-based multiobjective EA optimization • Parent population, μ, evaluated in domain • Child population, λ, evolved from μ and evaluated • μ + λ sorted into non-dominated Pareto fronts

• Pareto front: All individual such that • v = (v1, . . . ,vn) dominates vector u = (u1, . . . ,un) iff 1.∀i ∈{1,...,n}:vi ≥ui , and

2.∃i ∈{1,...,n}:vi >ui.• New μ picked from highest fronts • Tetris objectives: Game score, time

Time alive

Game score

Pareto front

Page 18: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Visualizing Link Weights

Page 19: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Future Work• HybrID†

✦ Start with HyperNEAT, switch to NEAT

✦ Gain advantage of both encodings

• Raw feature Tetris with Deep Learning

• Raw features in other visual domains

✦ Video games: DOOM, Mario, Ms. Pac-Man

✦ Board games: Othello, Checkers

† Clune et al. 2004. HybrID: A Hybridization of Indirect and Direct Encodings for Evolutionary Computation.

Page 20: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

NEAT• NeuroEvolution of Augmenting Topologies†

• Synaptic and structural mutations

• Direct encoding

✦ Network size proportional to genome size

• Crossover alignment via historical markings

• Inefficient with large input sets

✦ Mutations do not alter behavior effectively

Perturb Weight Add Connection Add Node

† Stanley & Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies

Page 21: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

HyperNEAT• Hypercube-based NEAT† • Extension of NEAT • Indirect encoding

✦ Evolved CPPNs encode larger substrate-based agent ANNs • Compositional Pattern-Producing Networks (CPPNs)

✦ CPPN queried across substrate to create agent ANN ✦ Inputs = neuron coordinates, outputs = link weights

• Substrates ✦ Layers of neurons with geometric coordinates ✦ Substrate layout determined by domain/experimenter† Stanley et al. 2009. A Hypercube- based Encoding for Evolving Large-scale Neural Networks

UTILITY

X

Y

X

Y

Bias

Page 22: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Evolutionary Algorithms (EA)NEAT HyperNEAT

• NeuroEvolution of Augmenting Topologies†

• Mutates structure and weights

• Direct encoding

✦ Network size = genome size

• Inefficient with large input sets

✦ Mutations not as effective

• Hypercube-based NEAT†

• Indirect encoding ✦ Evolved CPPN indirectly plays

game through agent network • Geometric awareness

✦ Agents can learn from domain geometry

• Better with large input sets ✦ Geometric awareness gives

agents more information

† Stanley & Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies

Page 23: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

Afterstate Evaluation• Evolved agents used as afterstate evaluators

• Determine next move from state after placing piece

• All possible piece locations determined, evaluated

• Placement with best evaluation from state chosen

• If placements lead to loss, not considered

• Agent moves piece to best placement, repeats

Page 24: Comparing Direct and Indirect Encodings Using Both Raw ...schrum2/SCOPE/RACW...By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu

Research & Creative Works Symposium 2018

• Board configuration:

✦ Two input sets

1. Location of all blocks

❖ block = 1, no block = 0

2. Location of all holes

❖ hole = -1, no hole = 0

• NEAT: Inputs in linear sequence

• HyperNEAT: Two 2D input substrates

Raw Features Setup