36
Copyright © 2014, SAS Institute Inc. All rights reserved. Big Data Analytics met Hadoop Jos van Dongen Arno Klijnman

What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Big Data Analytics met HadoopJos van DongenArno Klijnman

Page 2: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics
Page 3: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Distributed storage andprocessing of (big) data on large clusters of commodity hardware

HDFS

Map/Reduce

What is…

Page 4: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

HDFS - Distributed storage for big files

Page 5: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Map/Reduce- Distributed processing for big data

Page 6: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

The Hadoop Jungle

Page 7: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS & Hadoop Capabilities

WITH Hadoop ON Hadoop IN Hadoop

HDFS

• SAS Data Quality Accelerator

• SAS Scoring Accelerator

• SAS Code Accelerator

Page 8: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS & Hadoop Integration

Next-Gen

SAS®

User

SAS®

User

User

Interface

Metadata

Data

Access

Data

Processing

File

System

SAS Metadata

In-Memory

Data Access

HivePig

Map Reduce

HDFS

Base SAS & SAS/ACCESS® to Hadoop™In-Memory

Data Access

HivePig

SAS® Data

Management

SAS® Visual

Analytics

SAS® Visual

Statistics

SAS®

Enterprise

Miner™

SAS®

Studio

SAS® LASR™ Analytic

Server

SAS Embedded

Process

SAS® In-memory

Statistics for

Hadoop

Page 9: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Two Paradigms

Hadoop as a Data PlatformHadoop as a core component of next

generation analytical platform

TEXT

MANAGE

DATA

EX

PL

OR

E

DA

TA

DEVELOP

MODELS

DE

PL

OY

&

MO

NIT

OR

Page 10: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Paradigm two Hadoop as a core component of next generation analytical platform

TEXT

MANAGE

DATA

EX

PL

OR

E

DA

TA

DEVELOP

MODELS

DE

PL

OY

&

MO

NIT

OR

• SAS/ACCESS

• SAS Data Management

• SAS Federation Server

• SAS Event Stream Processing

• SAS Data Loader for Hadoop SAS Data Quality Accelerator for

Hadoop

SAS Code Accelerator for Hadoop

• SAS Data Loader for Hadoop

• SAS Visual Analytics

• SAS In-memory Statistics for Hadoop

• SAS High Performance Analytics Products

• SAS Visual Statistics

• SAS In-memory Statistics for Hadoop

• SAS Scoring Accelerator

for Hadoop

• SAS Decision Manager

• SAS Visual Analytics

Page 11: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

SAS runs the Entire Analytical Lifecycle in/on/with Hadoop• BASE SAS

• SAS / Access

• SAS Data Loader for Hadoop

• SAS DI Studio

• SAS Visual Analytics

SAS Visual Statistics

• SAS High Performance Analytics

Offerings

• SAS In-Memory Statistics for

Hadoop

Done using either the Data

Preparation, Data Exploration

or Build Model Tools

• SAS High Performance Analytics Offerings

• SAS In-Memory Statistics for Hadoop

• SAS Visual Statistics

Done using the Build Model

Tools and other checks

• SAS Scoring Accelerator

for Hadoop

• SAS Code Accelerator

for Hadoop

• SAS Visual

Analytics

Page 12: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Page 13: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Page 14: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 15: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Page 16: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopA new SAS Web-based Business user interface

Point & Click

User Menus

Little or no Hadoop

experience neededSelf-Service UI HTML 5 Interface

Enables Self-Service approach to managing data in Hadoop environment

Page 17: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopTransform Data in Hadoop

Filtering RulesColumn

SelectionsAggregation

No coding, scripting or specialized skills required

Page 18: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopQuery Hadoop data

Select

Source Tables

Apply Query

Criteria

See subset of data in

Table Viewer

Simple Drag & Drop approach to Query Data inside Hadoop

Page 19: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopProfile Hadoop Data

Select

Source Table

View Reports in

Column Display

Run standard metrics on data inside Hadoop and generate reports

View Reports in

Table Display

Page 20: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 21: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 22: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

View Data

Page 23: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopCopy Data to distributed sas® lasr server

Select

Source Table

Explore Hadoop data quickly and easily for faster insights

Copy Data To distributed

SAS® LASR Servers

SAS® Visual Analytics

Optional

Visualize Data

Page 24: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 25: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Page 26: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 27: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

USER ROLES & THE ANALYTICS LIFECYCLE

IDENTIFY /

FORMULATE

PROBLEMDATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECTBUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSDomain Expert

Makes Decisions

Evaluates Processes and ROI

BUSINESS

MANAGER

Model Validation

Model Deployment

Data Preparation

IT SYSTEMS /

MANAGEMENT

Data Exploration

Data Visualization

Report Creation

BUSINESS

ANALYST

Exploratory Analysis

Descriptive Segmentation

Predictive Modeling

ANALYST

DATA SCIENTIST

Page 28: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS Scoring Accelerator for Hadoop

SAS Model

Manager

Export Score Code

(EM,SAS/STAT,VS)Scoring File(s)

Hadoop Publish Macro

Page 29: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS Scoring Accelerator for Hadoop

Page 30: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics
Page 31: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS DI Data Loader VA

Explorer

VS IMSTAT Scoring

Accelerator

• Access to

Hadoop

• Transform

• Write back

to Hadoop

• Write to

LASR

• Show table

• Profile

• Build Query

• Write result to

LASR

• Discover

relations

• Understand

the data

• Discover a

model

• Determine

significance

• Cluster

variables

• Recommendation

• Datastep to enrich

original dataset

with

recommendation

results

• Write to LASR

• Deploy

model

• Run model

• Back to Data

Loader

Demo flow

SAS Data Management SAS Interactive Analytics On Hadoop SAS Analytics

Page 32: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS & Hadoop: 3 Things to Remember

WITH Hadoop ON Hadoop IN Hadoop

HDFS

Page 33: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AWS-Cloud

Elastic IP Address

Internet

Setup:

- CentOS operating system

- Local users on all

Amazon servers

- Internal network for all

Amazon Servers

- Open firewall for all ports

between workstation &

server

- No integration Mail

server

- No SSL1

2

3

Demo Environment Infrastructure

Page 34: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 35: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.

9 oktober 2014

Huizen

Page 36: What's hot: Big Data Analytics met Hadoop · Pig Hive Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access Pig Hive SAS® Data Management SAS® Visual Analytics

Copyright © 2014, SAS Institute Inc. All rights reserved.