H2O.aiOpen Source
Machine Learningfor Intelligent Applications
H2O.aiMachine Intelligence
Time is the only non-renewable resource
Speed Matters!
H2O.aiMachine Intelligence
Law of Large Numbers
Sampling
Data scientists & Analysts will not write Java MapReduce
Per Node2M Row ingest/sec
50M Row Regression/sec
750M Row Aggregates / sec
On PremiseOn / Off HadoopOn EC2
Tabl
eau
RJSON
Scal
aJa
va
H2O Prediction Engine
ensembles
Deep learningCl
uste
r
Nano Fast Scoring Engine
Memory Manager Columnar Compression
Query Processor R-engine
In-Mem Map ReduceDistributed fork/join
Pyth
on
HDFS S3 SQL NoSQL
Regr
essi
onCl
assi
fy
Tree
s
Boos
ting
Fore
sts
Solv
ers
Gra
dien
ts
SDK / API
Exce
l
H2O.aiMachine Intelligence
Infrastructure
ParallelismData Parallel Chunking Express!Algorithm Parallel
Parallel Code blocksMath Parallelism
ADMM, HogWild
DistributionZero-Serialization –
endian wars have ended
Scalable Machine LearningFor Smarter Applications
H2O.aiMachine Intelligence
H2O.ai
Programmable Internet
H2O.aiMachine Intelligence
Programmable Devices
H2O.aiMachine Intelligence
AdSense Sense
H2O.aiMachine Intelligence
Correlation Causality
H2O.aiMachine Intelligence
Data
SensorsDevices
Semi-structured data. json. High velocity. High dimensions.
Events. Signals. TimeSeries
H2O.aiMachine Intelligence
Streaming Data
Scoring from predictionAnomaly and Outliers DetectionUnsupervised Learning
Historical Data
H2O.aiMachine Intelligence
Streaming Data
Anomaly and Outliers Detection
Historical Data
mod
el
Scoring from prediction
H2O.aiMachine Intelligence
Streaming Data
Clustering / Unsupervise Learning
Historical Data
mod
el
Scoring from prediction
H2O.aiMachine Intelligence
H2O.aiMachine Intelligence https://developer.nest.com/documentation/api-reference/devices
Take Models to Production in Java
H2O.aiMachine Intelligence
Onset of Rita
H2O.aiMachine Intelligence
Common ensemble techniquesBayesian Classifiers
Ensembles of all hypotheses in hypothesis-space.
Bagging Each model votes with equal weight.
Bagging trains models on randomly drawn subset
Boosting Incrementally build an ensemble of each new model
H2O.aiMachine Intelligence
H2O.aiMachine Intelligence
H2O.aiMachine Intelligence
Gradient Boosting Machine
H2O.aiMachine Intelligence
H2O.aiMachine Intelligence
H2O.aiMachine Intelligence
Variable Importance Comparison
Random Forest, 50 trees
Gradient Boosting Machine, 50 trees
H2O.aiMachine Intelligence
Generalized Linear Modeling – Variable Importance
GLM, Elastic Net (Binomial)Categorical expansion on Age
GLM, Elastic Net (Binomial)
H2O.aiMachine Intelligence
Variable Importance Comparison
Deep Learning (Tanh / 4-layer)
Deep Learning (Tanh / 3-layer)
H2O.aiMachine Intelligence
every generation needs to invent it’s math.
Our data, our tools!
H2O.aiMachine Intelligence
Power-Law
Code is incomplete without Community!
Open Source Matters!
H2O.aiMachine Intelligence
CommunityCommitters 30Meet ups 90
in 12 months
Coverage
Conference Speakers
CurriculumStanford, MIT, CSU, SUNY, SJSU, Purdue
Data Driven Decision Making is hard!
Courage Matters!
H2O.aiMachine Intelligence
Winning customer trust not just quarters!
Mindset matters!
H2O.aiMachine Intelligence
ThanksCourtney, Nick & MLConf
for bringing us to ATL
Sparkling Water Application Life Cycle
Sparkling App
jar file
SparkMaster
JVM
spark-submit
SparkWorker
JVM
SparkWorker
JVM
SparkWorker
JVM
(1)
(2)
(3)
(1) User submits App to Spark cluster Master node(2) App distributed to Spark cluster Worker nodes(3) Spark Executor JVMs start for App(4) H2O instance starts within each Executor JVM(5) App’s Scala main program runs
Sparkling Water Cluster
Spark Executor JVM
H2O(4)
Spark Executor JVM
H2O
Spark Executor JVM
H2O
Sparkling Water Data Distribution
H2O
H2O
H2O
Sparkling Water Cluster
Spark Executor JVMData
Source(e.g.
HDFS) (1)
(2)
(3)
(1) Use Spark SQL to read data into a Spark RDD
(2) Convert Spark RDD to H2O RDD; H2O RDD is column-based and highly compressed
(Not shown) Run modeling and prediction workflows with H2O
(3) Convert H2O RDD (e.g. predictions) back to Spark RDD
H2ORDD
Spark Executor JVM
Spark Executor JVM
SparkRDD
H2O
HHDFS
H2O
YARN
HHDFS
Hadoop MR
H2O
HHDFS
Standalone YARN H2O in MR
HortonWorks, Cloudera, MapR, Intel H2O.aiMachine Intelligence
H2O.aiMachine Intelligence
H2O – The Killer-App for Spark
Sparkling Water
HDFS=DATA
MLlib H2O SQLH2ORDD
In-Memory Big Data, ColumnarML 100x faster AlgosR CRAN, API, fast engineAPI Spark API, Java MMCommunity Devs, Data Science
examples
H2O.aiMachine Intelligence
Fraud / No-fraud1/1000 unbalanced
Click-Stream Browse / Click / Buy
H2O.aiMachine Intelligence
Propensity ModelsMerchants –to- Users
Lifetime Value of CustomerPricing Engines
H2O.aiMachine Intelligence