31
Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support Manager SAS Institute Cary, NC USA [email protected]

Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Embed Size (px)

Citation preview

Page 1: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Improving Batch Application Service Through Tuning and ParallelismDan SquillaceMainframe Support ManagerSAS InstituteCary, NC [email protected]

Page 2: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Some Business Drivers for Performance Improvement Acidities

Increasing data volumes• More customers

• More data about each customer needed for increasingly sophisticated analytics which aid better and more timely decision-making.

Decreasing processing window• Improve BI application availability by shortening ETL

elapsed time.

Increasing pressure to reduce costs• Lower resource requirements

• Improve competetive position

Page 3: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Session Overview

This session focuses on processing improvements beneficial to handling large data volumes.

Performance improvement areas • CPU optimization

• Reducing I/O

• Improved overlap and parallelism

• Elapsed time optimization (Not the same

Focus Areas• DATA STEP tuning

• New SAS9 features

Page 4: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Session Outline

Don’t forget the basics! - A ShortTuning Case Study

DATA Step Views

PROC SUMMARY w/DATA Step View

DATA Step hash table functions

SAS Parallel Data Engine (SPDE)

SAS/Connect Pipes

Wrap-up

Page 5: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Back to Basics:High-Volume DATA Step Optimization

Before implementing parallel operations, make sure basic processing flow is efficient

When processing high volumes of data, even apparently small changes can have a large effect

The following customer case study illustrates several points.

Page 6: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Program processes 36 million MXG TYPE74 records (436 CPU seconds 9672 G6)

DATA FILE.A; SET INFILE1.TYPE74;

KOUNT = 1 ; IF VOLSER = '.' OR VOLSER = ' ' THEN DELETE ; IF SYSTEM = '888K' OR SYSTEM = '888Z' OR SYSTEM = '888Q' OR SYSTEM = '888V' OR SYSTEM = '888P' THEN DO ; IF DATEPART(SYNCTIME) < '03APR04'D THEN SYNCTIME = SYNCTIME - '06:00:00.00'T ; IF DATEPART(SYNCTIME) > '02APR04'D THEN SYNCTIME = SYNCTIME - '05:00:00.00'T ; END ;

SYMNUM = 0 ; IF DATEPART(SYNCTIME) < '17MAY04'D THEN DO ; IF DEVNR > 58FFX AND DEVNR < 5FFFX THEN SYMNUM = 111; IF DEVNR > 6FFFX AND DEVNR < 7FFFX THEN SYMNUM = 456; IF DEVNR > 7FFFX THEN SYMNUM = 234; IF DEVNR => 5000X AND DEVNR < 5200X THEN SYMNUM = 234; IF DEVNR => 5FFFX AND DEVNR < 7000X THEN SYMNUM = 876; END;

IF DATEPART(SYNCTIME) > '17MAY04'D THEN DO ; IF DEVNR > 4FFFX AND DEVNR < 7000X THEN SYMNUM = 223; IF DEVNR > 6FFFX AND DEVNR < 7FFFX THEN SYMNUM = 456; IF DEVNR > 7FFFX THEN SYMNUM = 234; END;

TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ; FORMAT TIPPCT 5.2 ; IF SYMNUM = 0 THEN DELETE ; IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ; IO_22355 = 0 ; IF SYMNUM = 1111 THEN IO_1111 = IORATE ; IF SYMNUM = 4563 THEN IO_4563 = IORATE ; IF SYMNUM = 234 THEN IO_234 = IORATE ; IF SYMNUM = 8765 THEN IO_8765 = IORATE ; IF SYMNUM = 22355 THEN IO_22355 = IORATE ;

DATE = DATEPART(SYNCTIME) ; FORMAT DATE DATE7. ; INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME19.2 ;

EMCTYPE = 'ESCON' ; IF SYMNUM = 22355 THEN EMCTYPE = 'FICON' ;

IF IORATE < 10 THEN DELETE ;

KEEP VOLSER DEVNR TIPPCT DATE INTE SYMNUM IO_1111 IO_4563 IO_234 IO_8765 SYNCTIME IO_22355 EMCTYPE IORATE AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR PCTDVCON PCTDVUSE KOUNT ;

Page 7: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Do filtering as early as possibleTIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ; FORMAT TIPPCT 5.2 ; IF SYMNUM = 0 THEN DELETE ; IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ; IO_22355 = 0 ; IF SYMNUM = 1111 THEN IO_1111 = IORATE ; IF SYMNUM = 4563 THEN IO_4563 = IORATE ; IF SYMNUM = 234 THEN IO_234 = IORATE ; IF SYMNUM = 8765 THEN IO_8765 = IORATE ; IF SYMNUM = 22355 THEN IO_22355 = IORATE ;

DATE = DATEPART(SYNCTIME) ; FORMAT DATE DATE7. ; INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME19.2 ;

EMCTYPE = 'ESCON' ; IF SYMNUM = 22355 THEN EMCTYPE = 'FICON' ;

IF IORATE < 10 THEN DELETE; KEEP VOLSER DEVNR TIPPCT DATE INTE SYMNUM IO_1111 IO_4563 IO_234 IO_8765 SYNCTIME IO_22355 EMCTYPE IORATE AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR PCTDVCON PCTDVUSE KOUNT ;

Move to top of DATA Step

CPU Time reduction

67%

Page 8: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Additional Steps

Put KEEP= as DATA SET option to bring in fewer variables into the DATA step. Note: This decreases CPU time, but not I/O time.

Use IF-THEN-ELSE or SELECT instead of just IF-THEN.

Eliminated redundant DATEPART function calls.

Cumulative CPU time reduction:

80%

Page 9: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Final Step

Move filtering of blank VOLSER and IORATE <10 to WHERE clause DATA SET option.

Total cumulative CPU time reduction:

86%

Net savings of 368 CPU seconds

Page 10: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

The Value of CPU Time Reduction

Always important on the mainframe because it is inherently a multi-workload beast.

Often considered unimportant (or less so anyway) on Windows and UNIX platforms because of dedicated nature of systems. Elapsed time is often more important.

Changing with increasing use of server virtualization. Affects how many virtual servers can run on a physical platform. • Logical Partitions or Domains on UNIX systems

• Virtual Machines on Windows and Linux systems

Page 11: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Some General Strategies for Improving Processing of Large Data Volumes

Reduce volume of data passed (e.g. keep only required variables in intermediate files)

Reduce number of data basses

Eliminate or reduce use of non-linearly scalable techniques such as sorting.

Exploit memory

Exploit processing overlap and parallelism

Page 12: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Exploiting New SAS Features We’ll use two scenarios from common

processing challenges encountered when processing transaction data for performance and service level reporting purposes.

The improvements made to the processing strategy for these scenarios …..• Reduce number of data basses

• Eliminate or reduce use of non-linearly scalable techniques such as sorting.

• Exploit memory

• Exploit processing overlap and parallelism

Page 13: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

General Scenario Chrematistics

Very high data volumes (millions of records, tens or hundreds of Gigabytes

Multiple summarizations desired

Detail records retained only for exceptional cases.

Page 14: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Scenario One

High-volume transaction data, say from web log, CICS, DB2, SAP

Desired summarized file for service level management, accounting, performance and capacity management.

Not interested in keeping every detail transaction record.

Page 15: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

DATA Step Views

Can be used to eliminate a data passes

Runs two tasks in parallel, but does not multi-process

In this case, eliminates one pass of the data.

data lib.a / view=lib.a;infile ……;input x ……;run;

proc sort data=lib.a; by x; run;

Page 16: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

SAS DATA Step View caveats

Can inhibit use of indexed I/O; Data Set Option WHERE clause cannot use index with a DATA Step view.

DATA Step views are sensitive not only to SAS release and version levels, but sometimes to maintenance levels.

Page 17: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

DATA Step Views with Proc Summary

Eliminate data passes and saves disk space.

Eliminate sort

Can produce multiple summarization data sets in one pass

Benefits from large region size (enough to hold crossings)

SUMMARY in SAS 9.1• Multithreaded

• Does not keep n-way in memory unless needed.

data lib.a / view=lib.a;infile ……;input a b x y……;run;

proc summary data=lib.a; CLASS statement; TYPES statement; OUTPUT statement(s);run;

Page 18: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

SAS9 Threaded Procedures

SORT

SUMMARY/MEANS

TABULATE

REPORT

SQL

REG, GLM, LOESS, DMREG,DMINE

Page 19: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Scenario Two

High Volume Event data (time-oriented (e.g. ARM log)

Transactions must be constructed from multiple event records • Type S – transaction start ( ID, start time, code, )

• Type E – transaction end ( ID, end time, CPU time)

Page 20: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Data arrival pattern

Start 1

Start 2

End 1 (write out 1)

Start 3

End 2 (write out 2)

Start 4

Start 5

End 4 (write out 4)

End 5 (write out 5)

End 3 (write out 3)

Page 21: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

DATA Step Hash Table Support (New in SAS9)

Can replace lookup formats

Can have entries dynamically added, modified, and removed

For this Scenario, use a Hash Table to accumulate transaction records from start and end events.

Page 22: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

data transactions view=transactions;declare hash transactions();transactions.defineKey("tr_id");transactions.defineData("tr_start", "tr_code“);transactions.defineDone();

input type @; if type = 'S' then do; input tr_id tr_code tr_start; rc=transactions.add(); end;

else if type='E' then do; input tr_id tr_end tr_cpu; rc = transactions.find(); response = tr_end - tr_start; output; rc = transactions.remove(); end;

Page 23: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

The Scalable Parallel Data Engine (SPDE)

New in SAS 9.1

Included with BASE

Available on all 9.1 platforms

Advantages• Parallel data loading and index creation

• Parallel reads and searches

• Uses multiple indices to resolve a search

Page 24: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

SPDE – Scalable Performance Data Engine

SAS® System Scalable Performance Data Engine

data

index

metadata

data1

data2

data3

data4

Bitmap/B-tree

Hybrid index

Bitmap/B-tree

Page 25: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

SAS SPDE implementation on z/OS

USS thread services

USS directory-based file systems• zFS

• hFS

• NFS file systems

Exploitation• Define file system

• Change LIBNAME engine specification

Page 26: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

SPDE data set allocation on z/OS

NFS – follow same guidelines as for Open Systems

HFS – Use separate HFS file systems for DATA and INDEX components; perhaps multiple for DATA. Spread HFS’s across Shark (ESS 2105) loops.

zFS - No special considerations! Use multi-volume zFS particularly if • Storage system has Parallel Access Volumes (PAV)

• ESS 2105-800 has Arrays Across Loops feature

Page 27: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

Scalable SAS/ACCESS

OracleDB2SybaseTeradata

Scalable Performance Data Access

CPU 1 RemoteHost

CPU 2

SASCONNECT

SASCONNECT

SAS

SASCONNECT

THREAD 1THREAD 2

Threaded Procedures

THREAD N…

Piping Piping

Scalability – SAS 9.1SAS Scalable Architecture in SAS Foundation

Page 28: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

MP Connect Pipes

New in SAS9

Uses TCP/IP socket engine

Superior to DATA Step View approach

Provides true multi-processing

Page 29: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

/* ----- DATA STEP - PROCESS P1 ------ */

SIGNON P1 SASCMD='!SASCMD';RSUBMIT P1 WAIT=NO;LIBNAME OUTLIB SASESOCK ":PIPE1";

data outlib.transactions;declare hash transactions();transactions.defineKey("tr_id");transactions.defineData("tr_start", "tr_code“);transactions.defineDone();

input type @; if type = 'S' then do; input tr_id tr_code tr_start; rc=transactions.add(); end;

else if type='E' then do; input tr_id tr_end tr_cpu; rc = transactions.find(); response = tr_end - tr_start; output; rc = transactions.remove(); end;

ENDRSUBMIT;

/* ---- SUMMARY - PROCESS P2 ----- */

SIGNON P2 SASCMD='!SASCMD';RSUBMIT P2 WAIT=NO;LIBNAME INLIB SASESOCK ":PIPE1";

proc summary data=inlib.transactions; CLASS statement; TYPES statement; OUTPUT statement(s);run;

PROC PRINT;RUN;ENDRSUBMIT;WAITFOR _ALL_ P1 P2;

Page 30: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.

In Summary……

Remember the importance of basic SAS program tuning skills which have been well-known for years.

Take advantage of the significant SAS9 features which can help you• Improve response and turnaround times

• Improve availability times for BI applications by shortening the batch window.

• Reduce costs by cutting resource consumption and utilizing the most effective combination of CPU, memory, and I/O resources

Page 31: Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support

Copyright © 2005, SAS Institute Inc. All rights reserved.Copyright © 2005, SAS Institute Inc. All rights reserved. 31