24
How to win data science competitions Silicon Valley Big Data Science Mountain View, 3/25/2015 Arno Candel, H2O.ai Matt Dowle, H2O.ai Mark Landry, Team H2O.ai

H2 o kaggle-032515

Embed Size (px)

Citation preview

How to win data science competitions

Silicon Valley Big Data Science Mountain View, 3/25/2015

Arno Candel, H2O.ai Matt Dowle, H2O.ai

Mark Landry, Team H2O.ai

H2O, @ArnoCandel

OutlineIntroduction to H2O (5 mins)

Kaggle Problems (55 mins)

Otto Group

Rain

2

https://github.com/h2oai/h2o-dev/tree/master/h2o-r/demos/kaggle

H2O, @ArnoCandel

Teamwork at H2O.aiJava, Apache v2 Open-Source

#1 Java Machine Learning in Github Join the community!

3

H2O, @ArnoCandel

H2O: Open-Source (Apache v2) Predictive Analytics Platform

4

H2O, @ArnoCandel 5

H2O Architecture - Designed for speed, scale, accuracy & ease of use

Key technical points: • distributed JVMs + REST API • no Java GC issues

(data in byte[], Double) • loss-less number compression • Hadoop integration (v1,YARN) • R package (CRAN)

Pre-built fully featured algos: K-Means, NB, PCA, CoxPH, GLM, RF, GBM, DeepLearning

H2O, @ArnoCandel 6

H2O GitBooks

https://leanpub.com/u/h2oai

H2O, @ArnoCandel

H2O World7

http://h2o.ai/h2o-world/ http://learn.h2o.ai Watch the Videos

Day 2 • Speakers from Academia & Industry • Trevor Hastie (ML) • John Chambers (S, R) • Josh Bloch (Java API) • Many use cases from customers • 3 Top Kaggle Contestants (Top 10)

• 3 Panel discussions

Day 1 • Hands-On Training • Supervised • Unsupervised • Advanced Topics • Markting Usecase

• Product Demos • Hacker-Fest with Cliff Click (CTO, Hotspot)

Join us at H2O World 2015!

H2O, @ArnoCandel

iPython Notebooks8

H2O, @ArnoCandel

Sparkling Water: Spark+H2O9

H2O, @ArnoCandel

Otto Group Challenge11

Data: 93 numerical features 9 output classes 62k training set rows 144k test set rows

H2O, @ArnoCandel

Otto Group Challenge12

Install H2O (h2o-dev)

H2O, @ArnoCandel

Otto Group Challenge13

H2O, @ArnoCandel

Flow-based GUI14

H2O, @ArnoCandel

Otto Group Challenge15

H2O, @ArnoCandel

Otto Group Challenge16

H2O, @ArnoCandel

Otto Group Challenge17

LB score: 0.501 (Benchmark: 1.56) #332 out of 1203

H2O, @ArnoCandel

Hands-On: Rain Challenge19

Trouble: “List columns”, missing values, outliers, noise, …

H2O, @ArnoCandel

Please Welcome Matt Dowle!

20

H2O, @ArnoCandel

Beating the Benchmark

22

Score: 0.00973, Benchmark: 0.011776 by Mark Landry

H2O, @ArnoCandel

More can be done with H2O23

H2O, @ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning.

Join our Community and Meetups! https://github.com/h2oai h2ostream community forum www.h2o.ai @h2oai

24

Thank you!