Upload
simon-belak
View
192
Download
0
Embed Size (px)
Citation preview
We will build an AI to play a silly little game by training a policy network defined using Cortex, using a hot new training algorithm we will implement from the paper first using Neanderthal and then make massively parallel using Onyx.
The game• Find the shortest path to the princess
• Moves: up, down, left, right
• Don’t fall off the edge of the world
The game• Find the shortest path to the princess
• Moves: up, down, left, right
• Don’t fall off the edge of the world
Computers playing computer games
Reinforcement learning
• Interact with the environment [embodied cognition]
• Not a single solution but an action to take given environment [model of the world + model of self, consciousness?]
• Learns via positive/negative feedback
Reinforcement learning: how it’s usually done
Train a deep neural network using raw sensor data, usually pixels (ie. no feature engineering)
… but there is another way
population
mutate crossover
next generation
solution
jitter jitter … jitter
update
populate
sample weighted
Classic evolutionary algorithm Evolution strategies
combine weighted
Using ES to train a neural network
Benefits
• highly parallelizable • more robust (less hyperparameters, more
stabile, doesn’t care about the properties of reward function)
• can exploit structure• less computationally expensive
Downsides
• takes longer to converge
• noise must lead to different outcomes
Instead of backpropagation use ES on weights
Let’s build it!
1. ES
Neanderthal
• Blazing fast matrix and linear algebra library
• Based on ATLAS and LAPACK
• Runs on CPUs and GPUs
• A study in writing efficient code
• Somewhat terse API (fluokitten helps)
x+y ax+y ax+by
x+y ax+y ax+by
x+y ax+y ax+by
x+y ax+y ax+by
1.1 ES parallelized
Onyxa masterless, cloud scale, fault tolerant,
high performance distributed computation system
Job =
[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]
[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]
workflow + flow conditions + catalogue [{:onyx/name :add-5
:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
Job =
[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]
[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]
workflow + flow conditions + catalogue
Describing computation
with data
in
jitter jitter … jitter
update
outmonitor
populate
same channel
in
jitter jitter … jitter
update
outmonitor
populate
accumulates state :(
in
jitter jitter … jitter
update
outmonitor
populate
Resilience and handling state
• Activity log
• Window and trigger states checkpointed
• Resume points (transfer state from job to job)
• Configurable flux policies (continue/kill/recover)
Computation graphs are a great way to structure data processing code
2. Policy network
Cortex• Neural networks, regression and feature learning
• Clean idiomatic Clojure API
• Computation encoded as data (and makes good use of it)
• Uses core.matrix for heavy lifting
Encode princess = 1, hero = -1
3. Game
Simulation• Find the shortest path to the
princess
• Don’t fall off the edge of the world
Reward function• Play the entire game (planning)
• Collect multiple playthoughts to lessen effects of randomness
Takeouts
Explore
Have fun
Go on an adventure!