Upload
sri-ambati
View
682
Download
0
Tags:
Embed Size (px)
Citation preview
How to win data science competitions
Silicon Valley Big Data Science Mountain View, 3/25/2015
Arno Candel, H2O.ai Matt Dowle, H2O.ai
Mark Landry, Team H2O.ai
H2O, @ArnoCandel
OutlineIntroduction to H2O (5 mins)
Kaggle Problems (55 mins)
Otto Group
Rain
2
https://github.com/h2oai/h2o-dev/tree/master/h2o-r/demos/kaggle
H2O, @ArnoCandel
Teamwork at H2O.aiJava, Apache v2 Open-Source
#1 Java Machine Learning in Github Join the community!
3
H2O, @ArnoCandel 5
H2O Architecture - Designed for speed, scale, accuracy & ease of use
Key technical points: • distributed JVMs + REST API • no Java GC issues
(data in byte[], Double) • loss-less number compression • Hadoop integration (v1,YARN) • R package (CRAN)
Pre-built fully featured algos: K-Means, NB, PCA, CoxPH, GLM, RF, GBM, DeepLearning
H2O, @ArnoCandel
H2O World7
http://h2o.ai/h2o-world/ http://learn.h2o.ai Watch the Videos
Day 2 • Speakers from Academia & Industry • Trevor Hastie (ML) • John Chambers (S, R) • Josh Bloch (Java API) • Many use cases from customers • 3 Top Kaggle Contestants (Top 10)
• 3 Panel discussions
Day 1 • Hands-On Training • Supervised • Unsupervised • Advanced Topics • Markting Usecase
• Product Demos • Hacker-Fest with Cliff Click (CTO, Hotspot)
Join us at H2O World 2015!
H2O Deep Learning, @ArnoCandel
Past H2O Kaggle Starter R Scripts10
H2O, @ArnoCandel
Otto Group Challenge11
Data: 93 numerical features 9 output classes 62k training set rows 144k test set rows
H2O, @ArnoCandel
How much did it rain? 18
H2O, @ArnoCandel
Hands-On: Rain Challenge19
Trouble: “List columns”, missing values, outliers, noise, …
H2O, @ArnoCandel 21
by Matt Dowle
H2O, @ArnoCandel
Beating the Benchmark
22
Score: 0.00973, Benchmark: 0.011776 by Mark Landry
H2O, @ArnoCandel
More can be done with H2O23
H2O, @ArnoCandel
Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning.
Join our Community and Meetups! https://github.com/h2oai h2ostream community forum www.h2o.ai @h2oai
24
Thank you!