40
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Cheryl Doninger Nancy Rausch R&D Director, SAS Senior Software Mgr, SAS Data Integration in a Data Integration in a Grid-Enabled Grid-Enabled Environment Environment

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Embed Size (px)

Citation preview

Page 1: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Cheryl Doninger Nancy RauschR&D Director, SAS Senior Software Mgr, SAS

Data Integration in a Data Integration in a Grid-Enabled Grid-Enabled EnvironmentEnvironment

Page 2: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

30 Years Ago - the Mainframe

Page 3: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Exploiting multiple processors in a machine

Page 4: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Grid goes beyond a single machineSAS Grid Manager

Page 5: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Grid Manager Key CapabilitiesSAS Grid Manager

Distributed Enterprise Scheduling

Workload Balancing

Parallelized Workload Balancing

Distribute parallelized SAS workloads to a shared pool of resources.

Distribute workloads to a shared pool of resources.

Distribute jobs within workflows to a shared pool of resources.

Optimize the Efficiency and Utilization of Computing Resources

Page 6: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

What Products Can Leverage SAS Grid Manager?SAS Grid Manager

Distributed Enterprise Scheduling

Workload BalancingParallelized Workload

Balancing

SAS Data Integration Studio

SAS Enterprise Miner

SAS Risk Dimensions*

Any SAS program*

SAS Stored Processes**

*(with modification)**(with limitations)

SAS Data Integration Studio

SAS Enterprise Guide*

SAS Workspace Server

Any SAS program*

SAS Stored Processes**

*(with wrapper)**(with limitations)

SAS Data Integration Studio

SAS Web Report Studio

SAS Marketing Automation

SAS Marketing Optimization

Any SAS program*

*(with modification)

Page 7: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Code Importer

Page 8: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Once Imported ...

http://support.sas.com/documentation/onlinedoc/gridmgr/index.html

Page 9: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Data Integration Studio – Distributed Enterprise Scheduling

Page 10: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Data Integration Studio – Multi-User Workload Balancing …

PUBLIC SECTOR

MANUFACTURINGFINANCIAL

LIFE SCIENCES

Page 11: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Data Integration Studio on a Grid: Loops and Iterations

Example: A simple job

Specific physical tables referenced

Specific transform logic

Page 12: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Repetition can be helpful

Processing data in multiple pieces

Same process over several data sets

Examples:• Same process every hour

• Same process for multiple stores

• Same process for every state

Page 13: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

How to do repetitive things?

Here is one way

Copy, Paste

Edit in new job

Problem: Multiple maintenance

Page 14: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Doing this more automatically

Use Looping

• Loop

• Loop End

Page 15: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

How to do iteration

Loop input:

• list of items to repeat over

Loop body:

• one or more jobs and transforms to run repeatedly

Loop output: status table (optional)

• Can be input into next loop

Page 16: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

How to loop

Sequential• One SAS session runs

them all

Parallel• Connect licensed

− Parallel on samemachine SMP

• Grid Manager licensed

− Parallel on a grid

Page 17: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

How to loop in parallel – Lots of options

1 per CPU• Don’t overload machine

Specified number• Help prevent overload

• Can double up per CPU

Run all• Let ‘er rip!

Page 18: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Controlling iteration with parameters

By default, tables are specific physical locations

Many things in SAS can accept macro variables

Parameters are macro-enabled ETL objects

Data Integration Studio provides user interface

Input values can be mapped to parameters

Page 19: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Creating Parameters

Parameter name

Macro variable• &StateParm

Default value• Used in many ETL/S

activities− Running a test job− Viewing data

Page 20: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Parameters on objects

Property tab: add parameters for that object

Jobs can import them• From referenced tables

• From included nested objects

Loop transform will use them

Can supply a default for testing

Page 21: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Some good Examples of Parameters

In a table name• RETAIL&StateParm

In a filepath

In library path

ODS Titles

Mapping

SQL Query

…Anywhere you want…

Page 22: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Real Example: Start with 1 Retail Store

1,000,000 orders

1 year =

80 MB data

Page 23: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Scale up the data

10,000 stores

52 billion orders

5 years =

4.2 terabytes of data

Page 24: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Run Jobs in Parallel by Looping

Loop transform1 Atlanta Store

2 Chicago Store

3 Miami Store

4 …

Page 25: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Substitute Variables

1 Week1

2 Week2

3 Week3

4 …

Add parameter to Table:

Name = &week

1 OutWeek1

2 OutWeek2

3 OutWeek3

4 …

Add parameter to Table:

Name = Out&week

Process LoadInput

Existing Job

Output

Page 26: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

The Results were very good …

3.22 terabytes per hour

50 GB / minute

~1 GB / sec

Page 27: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Grid Partitions

Data Integration Studio

n

Enterprise Miner

EM grid

DI grid

Base, Connect,

Base, Connect,

Base, Connect,

SAS Grid Mgr

SASServers

Connect Client

Page 28: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Grid Partitions

Restart sessions

Log directory

Error handling• Abort all remaining

• Abort only current

• Continue on error

• …others

Useful Loop Transform Options

Page 29: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Another Case Study: Census dataRunning sequentially

Data from 50 states

Running on one computer at a time

About 580 minutes

(just under 10 hours)

Page 30: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Running in parallel

Running on six computers

About 108 minutes

(under 2 hours)

Page 31: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Adding more computers

Across nine computers

caalarazcoct

dcdenygahi

iaidil tx

inks kyfloh

la mamdme pa

mimnmoms mt

ncnenh njnm

nv okorri scsd tn

ut vavtwawi

wvndwyak

0 10 20 30 40 50 60 70 80

Minutes (showing start time and duration of each job)

Census Data - Parallel Run 13 - Execution Profile(ETL running 1 job at a time on 9 blades, 1 slot/blade)

~ 77 minutes

Page 32: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Did we keep the computers busy?

In this case, we really did

Running 6 jobs at a time on 6 processors

Page 33: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Additional Case Studies

Unstructured Data

…See the paper for more examples

Using Grid harnesses the power of your enterprise

Page 34: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Questions and Want to know more? Achieving High Availability in a SAS® Grid Environment, Paper

001-2009

What's New in SAS® Data Integration Studio 4.2, Paper 093-2009

For Base SAS® Users: Welcome to SAS® Data Integration!, Paper 092-2009

Cross Validation and Learning Curve Model Comparison with JMP® Genomics and Grid Computing, Paper 286-2009

ISO’s Evolution to BI on the Grid: A Customer Perspective, Mon, 5:30 PM, Maryland 3; Paper 269-2009

Going from Good to Great: The Value of an Analytic Grid Platform at ISO, Tues 11:00 PM, National Harbor 12; Presentation only

The University of Phoenix Wins Big with SAS® Grid , Tues 11:00 PM, National Harbor 5; Presentation only

Page 35: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Page 36: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Supply a Default Value for Testing

Default value: Week1

Page 37: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Existing Job

Extract LoadInput

Existing Job

Output

Page 38: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Reuse existing job and run in parallel

1 Portugal

2 France

3 Spain

4 …

Existing Job

Existing Job

Existing Job

Existing Job

Existing Job

Page 39: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Iteration (Looping) in Parallel

1 Portugal

2 France

3 Spain

4 …

Add parameter to Table:

Name = &Country

1 OutPortugal

2 OutFrance

3 OutSpain

4 …

Add parameter to Table:

Name = Out&Country

Extract LoadInput

Existing Job

Output

Page 40: Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

What products can leverage SAS Grid Manager? SAS Grid Manager

Distributed Enterprise Scheduling

Multi-User Workload Balancing

Parallel Workload Balancing

Optimize the Efficiency and Utilization of Computing Resources

SAS Data Integration Studio

SAS Enterprise Miner

SAS Risk Dimensions*

Any SAS program*

SAS Stored Processes**

*(with modification)**(with limitations)

SAS Data Integration Studio

Any SAS program*

SAS Enterprise Guide*

SAS Workspace Server

SAS Stored Processes**

*(with modification)**(with limitations)

SAS Data Integration Studio

SAS Web Report Studio

SAS Marketing Automation

SAS Marketing Optimization

Any SAS program *

*(with modification)