What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS &...

Preview:

Citation preview

Copyright © 2014, SAS Institute Inc. All rights reserved.

Big Data Analytics met HadoopJos van DongenArno Klijnman

Copyright © 2014, SAS Institute Inc. All rights reserved.

Distributed storage andprocessing of (big) data on large clusters of commodity hardware

HDFS

Map/Reduce

What is…

Copyright © 2014, SAS Institute Inc. All rights reserved.

HDFS - Distributed storage for big files

Copyright © 2014, SAS Institute Inc. All rights reserved.

Map/Reduce- Distributed processing for big data

Copyright © 2014, SAS Institute Inc. All rights reserved.

The Hadoop Jungle

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS & Hadoop Capabilities

WITH Hadoop ON Hadoop IN Hadoop

HDFS

• SAS Data Quality Accelerator

• SAS Scoring Accelerator

• SAS Code Accelerator

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS & Hadoop Integration

Next-Gen

SAS®

User

SAS®

User

User

Interface

Metadata

Data

Access

Data

Processing

File

System

SAS Metadata

In-Memory

Data Access

HivePig

Map Reduce

HDFS

Base SAS & SAS/ACCESS® to Hadoop™In-Memory

Data Access

HivePig

SAS® Data

Management

SAS® Visual

Analytics

SAS® Visual

Statistics

SAS®

Enterprise

Miner™

SAS®

Studio

SAS® LASR™ Analytic

Server

SAS Embedded

Process

SAS® In-memory

Statistics for

Hadoop

Copyright © 2014, SAS Institute Inc. All rights reserved.

Two Paradigms

Hadoop as a Data PlatformHadoop as a core component of next

generation analytical platform

TEXT

MANAGE

DATA

EX

PL

OR

E

DA

TA

DEVELOP

MODELS

DE

PL

OY

&

MO

NIT

OR

Copyright © 2014, SAS Institute Inc. All rights reserved.

Paradigm two Hadoop as a core component of next generation analytical platform

TEXT

MANAGE

DATA

EX

PL

OR

E

DA

TA

DEVELOP

MODELS

DE

PL

OY

&

MO

NIT

OR

• SAS/ACCESS

• SAS Data Management

• SAS Federation Server

• SAS Event Stream Processing

• SAS Data Loader for Hadoop SAS Data Quality Accelerator for

Hadoop

SAS Code Accelerator for Hadoop

• SAS Data Loader for Hadoop

• SAS Visual Analytics

• SAS In-memory Statistics for Hadoop

• SAS High Performance Analytics Products

• SAS Visual Statistics

• SAS In-memory Statistics for Hadoop

• SAS Scoring Accelerator

for Hadoop

• SAS Decision Manager

• SAS Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

SAS runs the Entire Analytical Lifecycle in/on/with Hadoop• BASE SAS

• SAS / Access

• SAS Data Loader for Hadoop

• SAS DI Studio

• SAS Visual Analytics

SAS Visual Statistics

• SAS High Performance Analytics

Offerings

• SAS In-Memory Statistics for

Hadoop

Done using either the Data

Preparation, Data Exploration

or Build Model Tools

• SAS High Performance Analytics Offerings

• SAS In-Memory Statistics for Hadoop

• SAS Visual Statistics

Done using the Build Model

Tools and other checks

• SAS Scoring Accelerator

for Hadoop

• SAS Code Accelerator

for Hadoop

• SAS Visual

Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopA new SAS Web-based Business user interface

Point & Click

User Menus

Little or no Hadoop

experience neededSelf-Service UI HTML 5 Interface

Enables Self-Service approach to managing data in Hadoop environment

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopTransform Data in Hadoop

Filtering RulesColumn

SelectionsAggregation

No coding, scripting or specialized skills required

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopQuery Hadoop data

Select

Source Tables

Apply Query

Criteria

See subset of data in

Table Viewer

Simple Drag & Drop approach to Query Data inside Hadoop

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopProfile Hadoop Data

Select

Source Table

View Reports in

Column Display

Run standard metrics on data inside Hadoop and generate reports

View Reports in

Table Display

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

View Data

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopCopy Data to distributed sas® lasr server

Select

Source Table

Explore Hadoop data quickly and easily for faster insights

Copy Data To distributed

SAS® LASR Servers

SAS® Visual Analytics

Optional

Visualize Data

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS Scoring Accelerator for Hadoop

SAS Model

Manager

Export Score Code

(EM,SAS/STAT,VS)Scoring File(s)

Hadoop Publish Macro

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS Scoring Accelerator for Hadoop

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS DI Data Loader VA

Explorer

VS IMSTAT Scoring

Accelerator

• Access to

Hadoop

• Transform

• Write back

to Hadoop

• Write to

LASR

• Show table

• Profile

• Build Query

• Write result to

LASR

• Discover

relations

• Understand

the data

• Discover a

model

• Determine

significance

• Cluster

variables

• Recommendation

• Datastep to enrich

original dataset

with

recommendation

results

• Write to LASR

• Deploy

model

• Run model

• Back to Data

Loader

Demo flow

SAS Data Management SAS Interactive Analytics On Hadoop SAS Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS & Hadoop: 3 Things to Remember

WITH Hadoop ON Hadoop IN Hadoop

HDFS

Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AWS-Cloud

Elastic IP Address

Internet

Setup:

- CentOS operating system

- Local users on all

Amazon servers

- Internal network for all

Amazon Servers

- Open firewall for all ports

between workstation &

server

- No integration Mail

server

- No SSL1

2

3

Demo Environment Infrastructure

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

9 oktober 2014

Huizen

Copyright © 2014, SAS Institute Inc. All rights reserved.

Recommended