44
Save the princess! Simon Belak @sbelak [email protected]

Save the princess

Embed Size (px)

Citation preview

Page 1: Save the princess

Save the princess!Simon Belak

@[email protected]

Page 2: Save the princess

We will build an AI to play a silly little game by training a policy network defined using Cortex, using a hot new training algorithm we will implement from the paper first using Neanderthal and then make massively parallel using Onyx.

Page 3: Save the princess

The game• Find the shortest path to the princess

• Moves: up, down, left, right

• Don’t fall off the edge of the world

Page 4: Save the princess

The game• Find the shortest path to the princess

• Moves: up, down, left, right

• Don’t fall off the edge of the world

Page 5: Save the princess

Computers playing computer games

Page 6: Save the princess

Reinforcement learning

• Interact with the environment [embodied cognition]

• Not a single solution but an action to take given environment [model of the world + model of self, consciousness?]

• Learns via positive/negative feedback

Page 7: Save the princess

Reinforcement learning: how it’s usually done

Train a deep neural network using raw sensor data, usually pixels (ie. no feature engineering)

Page 8: Save the princess

… but there is another way

Page 9: Save the princess
Page 10: Save the princess

population

mutate crossover

next generation

solution

jitter jitter … jitter

update

populate

sample weighted

Classic evolutionary algorithm Evolution strategies

combine weighted

Page 11: Save the princess

Using ES to train a neural network

Benefits

• highly parallelizable • more robust (less hyperparameters, more

stabile, doesn’t care about the properties of reward function)

• can exploit structure• less computationally expensive

Downsides

• takes longer to converge

• noise must lead to different outcomes

Instead of backpropagation use ES on weights

Page 12: Save the princess

Let’s build it!

Page 13: Save the princess

1. ES

Page 14: Save the princess
Page 15: Save the princess

Neanderthal

• Blazing fast matrix and linear algebra library

• Based on ATLAS and LAPACK

• Runs on CPUs and GPUs

• A study in writing efficient code

• Somewhat terse API (fluokitten helps)

Page 16: Save the princess
Page 17: Save the princess
Page 18: Save the princess
Page 19: Save the princess
Page 20: Save the princess

x+y ax+y ax+by

Page 21: Save the princess

x+y ax+y ax+by

Page 22: Save the princess

x+y ax+y ax+by

Page 23: Save the princess

x+y ax+y ax+by

Page 24: Save the princess
Page 25: Save the princess
Page 26: Save the princess

1.1 ES parallelized

Page 27: Save the princess

Onyxa masterless, cloud scale, fault tolerant,

high performance distributed computation system

Page 28: Save the princess

Job =

[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]

[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]

workflow + flow conditions + catalogue [{:onyx/name :add-5

:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]

Page 29: Save the princess

[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]

Job =

[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]

[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]

workflow + flow conditions + catalogue

Describing computation

with data

Page 30: Save the princess
Page 31: Save the princess

in

jitter jitter … jitter

update

outmonitor

populate

Page 32: Save the princess

same channel

in

jitter jitter … jitter

update

outmonitor

populate

Page 33: Save the princess

accumulates state :(

in

jitter jitter … jitter

update

outmonitor

populate

Page 34: Save the princess

Resilience and handling state

• Activity log

• Window and trigger states checkpointed

• Resume points (transfer state from job to job)

• Configurable flux policies (continue/kill/recover)

Page 35: Save the princess

Computation graphs are a great way to structure data processing code

Page 36: Save the princess

2. Policy network

Page 37: Save the princess

Cortex• Neural networks, regression and feature learning

• Clean idiomatic Clojure API

• Computation encoded as data (and makes good use of it)

• Uses core.matrix for heavy lifting

Page 38: Save the princess

Encode princess = 1, hero = -1

Page 39: Save the princess

3. Game

Page 40: Save the princess

Simulation• Find the shortest path to the

princess

• Don’t fall off the edge of the world

Page 41: Save the princess

Reward function• Play the entire game (planning)

• Collect multiple playthoughts to lessen effects of randomness

Page 42: Save the princess

Takeouts

Page 43: Save the princess

Explore

Have fun

Go on an adventure!

Page 44: Save the princess

QuestionsSimon Belak

@[email protected]