Upload
adam-gibson
View
2.843
Download
2
Embed Size (px)
Citation preview
Future of AI on the JVM
Scala Days Amsterdam 2015
Adam Gibson Creator of Deeplearning4j (and 4s :)
What is AI?● Not Terminator (despite our name)● Many subfields● Our focus: Machine learning
Big Data?
Problem Space● Spam Classification● Summarization● Face Detection● Eye Tracking● Targeted Ads● Recommendation Engines
Current State of ML● Simpler models● Most of industry barely uses Logistic Reg.● Many problems are binary
o e.g. fraud, spam● Some unsupervised (clustering, reccos)● Lots of ML frameworks on JVM
ML Frameworks on JVM...● Apache Mahout● Spark’s MLlib● Weka (is that R?)
ML GUIs● Prediction.io● Encog
Problems● Monolithic● Makes assumptions about data● Hard to use ● No separation of concerns
Ring a Bell?● We call that “Monolithic”● Separate ML concerns:
Data Pipelines/VectorizationScoringModel TrainingEvaluation
Micro-Services + ML?● Kinda like micro-services● Reduce lock in● Take math, data cleaning, model training,
choosing algorithms ...● … and separate them
Math● Parametric Models (Matrices!)● Non Parametric (Random forest)● Focusing on Matrices (the hard part of ML
systems)
Matrices● NDArrays ( > 2d)● Tensors (think of pages of matrices)● Example: 2 x 2 x 2 (2 2x 2 matrices)● ^^THIS IS UNCLEAR. Two 2 x 2 matrices?● Applies to graphs w/ sparse representations
Chips/Hardware/Matrices● CPUs - We work with these● GPUs - CUDA ditto● FPGAs
o Intel bought Altera, an FPGA maker, for $17 billion this month
o The edge, the cloud
Why New Chips?
Why New Chips?● See the numbers yourself:● http://www.slideshare.net/airbots/cuda-2933
0283● http://devblogs.nvidia.com/parallelforall/bidm
ach-machine-learning-limit-gpus/● http://jcuda.org
Mixed clusters● GPUs aren’t good for all workloads● Because latency● Need to upload data: not good for small
problems● Mixed CPU/GPU clusters are best bet
Data Pipelines● More data will be binary● Frameworks today can’t process binary well● Binary data has different semantics ● Moving windows for audio● 3d for images ...
People Roll Their Own b/c● Current frameworks assume clean data :(● Pipelines are brittle, hard to maintain
● Moving towards being composable (reuse)
Dedicated Libraries● Let’s focus on vectorization -- now!● Because IoT● Because more access to raw media
● Should fit into current big data frameworks
Scoring● AUC● F1● Different Loss Functions● Hyper parameter optimization
All independent● These things work for different models● Shouldn’t be tied to a particular system● Should be embeddable
Training● Split Train/Test● Sample data (no, not all the data ;) to
validate model● Increasingly compute intensive
Deep Learning● Most done in Python...● Norm training time is measured in
hours/days -- weeks!?● Work being done in HPC (Model parallelism)● Distbelief (Data parallelism)
Automatic Learning● Good at unstructured data● Images, Text, Audio and Sensors● Quick, baseline feature engineering
● Not good at feature introspection
Or are they?
TSNE
Where Does Scala Fit In?● Akka - Real time streaming analytics/micro services● Spark - Dataframes/number crunching● JVM Key/Value Stores● Pistachio (powers Yahoo’s ad network)
o http://yahooeng.tumblr.com/post/118860853846/distributed-word2vec-on-top-of-pistachio
The Way We Learn Now● Monolithic ML frameworks● No per-chip optimizations● No Tensors (come on guys, it’s 2015...)● Need isolation and less lockin● JVM is the platform to make it happen
Other Links● http://deeplearning4j.org/● http://nd4j.org/● https://github.com/deeplearning4j/Canova
Questions?● [email protected]● @agibsonccc● github.com/agibsonccc