Upload
sri-ambati
View
405
Download
3
Tags:
Embed Size (px)
Citation preview
Agenda1. short, short history of R2. what is h2o3. getting h2o and reading documentation4. data exploration5. model building
Getting H2O & Docs
1. http://h2o.ai/download/ a. Bleeding Edge (link)b. Install in R (tab)
2. build h2o (https://github.com/h2oai/h2o-3#4-building-h2o-3)
3. http://docs.h2o.ai/ -> H2O 3.0 -> R Users (link) -> R docs (link)
H2O.aiMachine Intelligence
A Brief History of R:
- R first appears 22 years ago (1993)*- Implementation of S (which was created by John Chambers @ Bell Labs)
* Python first appeared 24 years ago (1991)
H2O.aiMachine Intelligence
H2O is what exactly?
Services:
- Interfaces to mainstream data science languages (R, Python, Scala)
- I/O common data formats (CSV, zipped, HDFS, ORC, parquet!?)
- Interface with modern big data infrastructures: Hadoop, Spark, H2O
- Feature-generation capabilities
- High Performance State-of-the-Art Machine Learning Algorithms
H2O.aiMachine Intelligence
H2O is what exactly?
Object Taxonomy in H2O
- H2OFrame: A 2D collection of uniformly typed columns
- H2OModel: An H2O model object
- ID/Key: An identifier for an H2O object
H2O.aiMachine Intelligence
H2O is what exactly?
Feature Generation Capabilities
- > 100 operations to perform on an H2OFrame- Aggregations:
- mean, min, max, sum, or any user-defined reduction- distributed parallel group-by- table, cut
- Simple String manipulation: trim, sub, gsub
- Date Formatting/Extraction: get/set timezones, month, year, dayOfWeek
- Transformations: sqrt, log, *,+, …
- Filtering: R-like slicing
H2O.aiMachine Intelligence
H2O is what exactly?
Infrastructure for:
- KFold Cross-Validation
- Grid Search
- Model Import/Export
H2O.aiMachine Intelligence
Driving H2O From R
H2OH2O
H2O
data.csv
HTTP REST API request to H2O
H2O ClusterInitiate distributed
ingest
Some Data Location
Request data
STEP 22.2
2.3
2.4
R
h2o.importFile()
2.1R function
call
H2O.aiMachine Intelligence
Driving H2O From R
H2OH2O
H2O
R
Some data location
STEP 3
Cluster IPCluster Port
Pointer to Data
Return pointer to data in
REST API JSON Response
data provided
3.3
3.43.1h2o_df object
created in Rdata.cs
v
h2o_dfH2O
Frame
3.2Distributed
H2OFrame in DKV
H2O Cluster
H2O.aiMachine Intelligence
R Script Starting H2O GLM
HTTP
REST/JSON
.h2o.startModelJob()POST /3/ModelBuilders/glm
h2o.glm()
R script
Standard R process
TCP/IP
HTTP
REST/JSON
/3/ModelBuilders/glm endpoint
Job
GLM algorithm
GLM tasks Fork/Join
frameworkK/V store
framework H2O process
Network layer
REST layer
H2O - algos
H2O - core
User process
H2O process
Legend
H2O.aiMachine Intelligence
R Script Retrieving H2O GLM Result
HTTP
REST/JSON
h2o.getModel()GET /3/Models/glm_model_id
h2o.glm()
R script
Standard R process
TCP/IP
HTTP
REST/JSON
/3/Models endpoint
Fork/Join framework
K/V store framework
H2O process
Network layer
REST layer
H2O - algos
H2O - core
User process
H2O process
Legend