Save the princess

Preview:

Citation preview

Save the princess!Simon Belak

@sbelaksimon@metabase.com

We will build an AI to play a silly little game by training a policy network defined using Cortex, using a hot new training algorithm we will implement from the paper first using Neanderthal and then make massively parallel using Onyx.

The game• Find the shortest path to the princess

• Moves: up, down, left, right

• Don’t fall off the edge of the world

The game• Find the shortest path to the princess

• Moves: up, down, left, right

• Don’t fall off the edge of the world

Computers playing computer games

Reinforcement learning

• Interact with the environment [embodied cognition]

• Not a single solution but an action to take given environment [model of the world + model of self, consciousness?]

• Learns via positive/negative feedback

Reinforcement learning: how it’s usually done

Train a deep neural network using raw sensor data, usually pixels (ie. no feature engineering)

… but there is another way

population

mutate crossover

next generation

solution

jitter jitter … jitter

update

populate

sample weighted

Classic evolutionary algorithm Evolution strategies

combine weighted

Using ES to train a neural network

Benefits

• highly parallelizable • more robust (less hyperparameters, more

stabile, doesn’t care about the properties of reward function)

• can exploit structure• less computationally expensive

Downsides

• takes longer to converge

• noise must lead to different outcomes

Instead of backpropagation use ES on weights

Let’s build it!

1. ES

Neanderthal

• Blazing fast matrix and linear algebra library

• Based on ATLAS and LAPACK

• Runs on CPUs and GPUs

• A study in writing efficient code

• Somewhat terse API (fluokitten helps)

x+y ax+y ax+by

x+y ax+y ax+by

x+y ax+y ax+by

x+y ax+y ax+by

1.1 ES parallelized

Onyxa masterless, cloud scale, fault tolerant,

high performance distributed computation system

Job =

[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]

[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]

workflow + flow conditions + catalogue [{:onyx/name :add-5

:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]

[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]

Job =

[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]

[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]

workflow + flow conditions + catalogue

Describing computation

with data

in

jitter jitter … jitter

update

outmonitor

populate

same channel

in

jitter jitter … jitter

update

outmonitor

populate

accumulates state :(

in

jitter jitter … jitter

update

outmonitor

populate

Resilience and handling state

• Activity log

• Window and trigger states checkpointed

• Resume points (transfer state from job to job)

• Configurable flux policies (continue/kill/recover)

Computation graphs are a great way to structure data processing code

2. Policy network

Cortex• Neural networks, regression and feature learning

• Clean idiomatic Clojure API

• Computation encoded as data (and makes good use of it)

• Uses core.matrix for heavy lifting

Encode princess = 1, hero = -1

3. Game

Simulation• Find the shortest path to the

princess

• Don’t fall off the edge of the world

Reward function• Play the entire game (planning)

• Collect multiple playthoughts to lessen effects of randomness

Takeouts

Explore

Have fun

Go on an adventure!

QuestionsSimon Belak

@sbelaksimon@metabase.com

Recommended