© 2012 IBM Corporation z/OS New Year's resolutions for Saving CPU Cycles & Improving I/O Utilization February 12, 2013

© 2012 IBM Corporation

z/OS New Year's resolutions for Saving CPU Cycles & Improving I/O Utilization

February 12, 2013

© 2012 IBM Corporation2

Session Agenda

DB2 and IMS Database and Storage Integration Overview

DB2 and IMS System Level Backup Methodologies and Storage System Integration

DB2 and IMS Back Ups Using Storage-Based Fast-replication

Exposing DFSMShsm Resource Utilization

Optimizing Your Batch Window


Database and Storage Administration Trends and Directions Large IMS and DB2 systems require high availability

– Fast and non-intrusive backup and cloning facilities are required– Fast recovery capabilities minimize downtime and promote high

availability– Most backup, recovery and cloning solutions do not leverage storage-

based fast-replication facilities

Storage-based fast-replication facilities are under-utilized– Tend to be used by storage organizations– Tend not to be used by database administrators (DBAs)

Storage-aware database products allow DBAs to use fast-replication in a safe and transparent manner– Provides fast and non-intrusive backup and cloning operations– Simplifies recovery operations and reduces recovery time– Simplifies disaster recovery procedures


Database and Storage Integration

MainframeDatabase Systems

Storage-AwareDatabase Tools

Application and Database Management

Domain

Storage Administration and

Business ContinuityDomain

• Organizational Integration

• New Backup Methods

• New Recovery Strategies

• Business Recovery Monitoring

• Cloning Automation

• Disaster Restart Solutions

SourceDatabase

Backup,Clone,

DR


Host Based Data Copy Options

Volume copy options – DFSMSdss (IBM)– FDR (Innovation Data Processing)– TDMF (IBM)– FDRPAS (Innovation Data Processing)

Data set copy options– DFSMSdss (IBM)– FDR (Innovation Data Processing)

Host-basedCopy Process

Data copy processes use host based CPU and I/O facilities More costly and slower than storage-based fast replication


What is Storage-based Fast Replication?

An instant copy of a volume/data set at a specific point in time

– Builds a bitmap to describe the source volume– After the bitmap has been created, the source and target

volume data can be used immediately

Data movement (CPU and I/O) offloaded to storage processor

– Frees up resources on host processor– No host CPU or I/O costs

For volume replication a relationship is established between a source and a target

– Geometrically similar devices

Consistency Groups– Group of volumes copied at exactly the same point in time

while maintaining the order of dependent writes

Storage Processor-based

Copy Process


Advantages Using Storage-Based Fast Replication

Fast• Copies data instantaneously

Provides high availability• Provides a consistent copy of production without sacrificing availability• Allows clones or recoveries to be available quicker

Provides huge cost savings• Doesn’t use host CPU or I/O resources

– Copy process is done in the storage processor> Save CPU and I/O costs

• Save personnel time

7


Database System Level Backup Overview

A Backup or Clone of the entire DB2 or IMS environment at a point in time

– Recorded in metadata repositories

Leverages storage-based fast replication to drive the volume backup

– Backup instantly - performed in seconds– Offloading data copy process to the storage processor

saves CPU and I/O resources– Faster than data set copies

Backup DB2 and IMS without affecting applications

– Backup windows reduced by replacing image copies– Extends processing windows

Data consistency ensures data is dependent-write consistent

– DB2 suspend, IMS suspend– Storage-based consistency functions– Equivalent to a power failure

8

Source DB2 or IMS

Volumes

Storage Processor APIs

TargetVolumes DB2 or IMS

System Backup

Storage aware DB2 and IMS

backup

DB2 or IMS


Database System Level Backup Overview

Backup validation each time ensures successful recoveries– Insurance that a backup is available

Automated backup offload (archive/recall)– Copies system backup from fast

replication disk to tape for use at either local or disaster site (or both)

Can be used in combination with image copies

Tape Processing

Storage Processor APIs

Storage-AwareBackup and

Recovery

Offload

SLB

System Backup

SourceDatabaseVolumes

DB2 or IMS


Benefits of SLB over Image Copies and Change Accums

Creating SLB with Fast Replication is equivalent to:– Creating all Image Copies with < 1 second of IMS or DB2 unavailable time– SLB created using storage processor CPU (not Host CPU)– Significant CPU cost savings

Guaranteed Recoverability– Validation of IMS and DB2 configuration each time SLB is created

Fast restore with Parallel Log Apply – Reduces recovery time and complexity– Executes the restore in parallel with the log apply

Change Accumulations may not be needed– System Level Backups can be created frequently– Save host CPU and I/O

Significantly reduce costs by using less CPU and I/O resources– Reduce costs to create backups– Save cost by reducing number of image copies needed


SLB Disaster Recovery Benefits Simplifies disaster recovery operations

–System level backup for restart–System level backup and roll forward

Taking full volume dumps for disaster recovery?‒System level backups add automation and a meta-data repository

• Can now use the backup for multiple purposes Basis for tape-based DB2 and IMS coordinated recovery

-Restore IMS and DB2 systems back to a transactionally consistent point which is the backup time or end of the last common log

Sim

plify

Automate

Coordinate


Integrating SLB’s into RecoveryUsing an Intelligent Recovery Manager

Recovers application, individual database, or indexes– Using Current, Timestamp, or PITR

Application profile is created in advance – Single database or group of databases– Logically related databases and indexes can be included automatically

Determines best recovery method– Restores from either IC or SLB– Indexes that can not be restored are rebuilt– Recovery using log apply needs one pass of the logs– Access to DBs is automatically stopped and restarted at end of recovery

Storage-based fast-replication performs restore– Performs an instantaneous data set restore process


Customer Experience Using

SLB Resource Assessment Tool


Customer Experience

EXCP Consumption for Image Copies over 28 day period

– Top 5 systems

IMS System EXCPs

1. IMS1 573,323,3422. IMS2 549,197,3443. IMS3 547,836,7734. IMS4 446,749,0905. IMS5 263,317,210

DB2 System EXCPs

1. DB21 88,390,9712. DB22 85,007,4953. DB23 78,792,9824. DB24 53,788,2175. DB25 34,337,687


Customer Experience Minimizing EXCP Consumption

– Product using Fast Replication Technologies– Offloads the backup processing

• From the CPU to the Storage Processor– Reducing number of EXCPs results in:

• CPU reduction• Elapsed time to execute• Frees up resources for other business processing

EXCPs consumed today vs. Estimated EXCPs using SLB’s

IMS DB2


Customer Experience

Backup Processing

– 9 IMS systems• More than 60 hours of elapsed

time running Image Copy backups

– 15 DB2 systems• More than 57 hours of elapsed

time running Image Copy backups


Financials Projected Image Copy vs. SLB Cost Savings for IMS

SECTION A - Monthly Image Copy Costs

CPU and I/O Cost:

Total Image Copy CPU seconds 429,536

Total Image Copy EXCPs 3,316,752,222

Total CPU costs for Image Copies $ 50,685.21

Total EXCP cost for Image Copies $ 132,670.09

Total CPU and EXCP costs for Image Copies $ 183,355.30

Total annual cost of image copies 2,200,263.64$

SECTION B - System Level Backup

CPU and EXCP Cost:

Per volume CPU seconds (default 0.023) 0.023

Per volume EXCP (default 155) 155

Total CPU costs for specif ied number of volumes $ 19.74

Total EXCP costs for specif ied number of volumes $ 45.09

Total CPU and EXCP costs 64.82

Total Cost:

System level backups per day - 1 per day / per system 9

Weekly system level backup cost $ 4,083.82

Yearly system level backup cost $ 212,358.86

Total annual cost of system level backups 212,358.86$

Note: Costs of CPU and EXCPs are agreed upon by Rocket Software and customer. Defaults values were used for the purpose of this assessment. CPU cost per second used is $0.118 and cost per 1000’s EXCPs used is $0.04.


Financials Projected Image Copy vs. SLB Cost Savings for DB2

Note: Costs of CPU and EXCPs are agreed upon by Rocket Software and customer. Defaults values were used for the purpose of this assessment. CPU cost per second used is $0.118 and cost per 1000’s EXCPs used is $0.04.

SECTION A - Monthly Image Copy Costs

CPU and I/O Cost:

Total Image Copy CPU seconds (includes DBM1 Address Space Work) 94,640

Total Image Copy EXCPs (includes DBM1 Address Space Work) 1,244,865,263

Total CPU costs for Image Copies $ 11,167.47

Total EXCP cost for Image Copies $ 49,794.61

Total CPU and EXCP costs for Image Copies $ 60,962.08

Total annual cost of image copies 731,544.92$

SECTION B - System Level Backup

CPU and EXCP Cost:

Per volume CPU seconds (default 0.023)(This includes CPU from the system level backup and DB2 address space for the system level backup operation from testing performed at Rocket) 0.023

Per volume EXCP (default 155)(This includes EXCP from the system level backup and DB2 address space for the system level backup operation from testing performed at Rocket) 155

Total CPU costs for specif ied number of volumes $ 7.11

Total EXCP costs for specif ied number of volumes $ 16.24

Total CPU and EXCP costs 23.35

Total Cost:

System level backups per day - 1 per day / per system 15

Weekly system level backup cost $ 2,451.31

Yearly system level backup cost $ 127,467.88

Total annual cost of system level backups 127,467.88$


Financial Summary

Projected Image Copy vs. SLB Cost Savings Summary– IMS

– DB2

System level backup Versus Image Copy Savings

Estimated annual cost of image copies 2,200,263.64$ Estimated savings by replacing 95% of image copies with system level backups $ 2,090,250.46 Estimated annual cost of image copies (retain 5% of image copies) when using system level backups $ 110,013.18

Estimated annual cost using system level backup $ 212,358.86

Estimated annual cost using system level backup with remaining (5%) image copies 322,372.05$

Total estimated annual savings using system level backups 1,877,891.59$

System level backup Versus Image Copy Savings

Estimated annual cost of image copies 731,544.92$ Estimated savings by replacing 95% of image copies with system level backups $ 694,967.67 Estimated annual cost of image copies (retain 5% of image copies) when using system level backups $ 36,577.25

Estimated annual cost using system level backup $ 127,467.88

Estimated annual cost using system level backup with remaining (5%) image copies 164,045.13$

Total estimated annual savings using system level backups 567,499.79$



Exposing DFSMShsm Resource Utilization and

Associated Costs


A Look Inside Your DFSMShsm

Costs– What resources are used by DFSMShsm to perform scheduled and

requested work?– Migration / Recall / Backup / Recycle – Successful vs. Unsuccessful (failures)– Data duplication

Efficiency– Where can performance and configuration tuning help?– Reduce failed migrations, improve backup failures

Savings– Can the reclaimed resource savings save CPU?– Lost tapes, questionable old DFSMShsm data, failed cycles, thrashing,

etc.

22


DFSMShsm Migration Failures

Data that won’t migrate– HSM attempts to migrate the data sets every day, using both CPU and

I/O until the processes fails

– This can go on every day for months, even years because the administrator is not aware that it’s failing

– These data sets remain on disk, occupying space that should have been released for new allocations

Why won’t the data migrate?– Structural errors

– Not enough space on ML1

– Unknown DSORG or otherwise not manageable by HSM


Migration Failure Error Summary Example

Rc Count Message

05 9202 NO MIGRATION VOLUME AVAILABLE

06 15 DUPLICATE DSN IN MCDS

16 8 PRIMARY COPY READ ERROR

19 447 DATA SET IN USE

24 4 DATA SET NOT AVAILABLE FOR MIGRATION

37 1344 NO SPACE ON MIGRATION VOLUME

39 1 RACF PROCESSING ERROR

58 13 MIGRATION OR DBA DBU FAILED

82 24 TAPE MIGRATION UNSUPPORTED

99 3296 UNSUPPORTED DS

24


DFSMShsm Backup Failures

Data that Fails Backup– HSM attempts to backup the data every day

– Sometimes this goes on every day for months, even years because the administrator is not aware that it’s failing

– These data sets may rely on HSM for backup

Why data can’t be backed up?– Data sets are in use during backup

– Unknown DSORG or otherwise not manageable by HSM

– Errors in the data set

Does this data really need to be backed up by HSM– Are multiple back ups of the data occurring?

– Which back up is the right back up?


DFSMShsm Recall Failures

Data that Fails Recall– HSM must move the data from ML1 or ML2 storage back to primary

DASD

– When a recall request fails, the requesting application may fail as well; causing an outage

– If the data set cannot be recalled and no backup copy exist; application disruption may occur until the situation is resolved

Why data recalls fail?– Data sets are not migrated, not managed by HSM

– Users issue multiple recalls for the same data set

– Tape volume not available


DFSMShsm Data Thrashing Analysis

Data Sets that are Thrashing– Thrashing is data that is migrated and recalled, migrated and recalled, migrated

and recalled in a short period of time• These data are typically production GDGs that are created earlier in the month and

then used again in weekly or monthly processing

Thrashing Costs in Terms of CPU– HSM uses CPU and I/O to migrate and recall data; compressing and

decompressing data from ML1• The compression/decompression is all CPU

• Data migrated to ML2 uses both CPU and I/O – the data may be compressed by the hardware but not by DFSMShsm

• If data on ML2 is being recalled from physical tape, this typically takes longer (wall-clock time) than ML1

• A high number of recalls can place a burden on a virtual tape subsystem since the data has been de-staged to physical tape and must be re-staged into the cache

– Executing jobs (or TSO sessions) wait for recalls


Managing Aged (Unreferenced Data) in DFSMShsm

28


Retaining Data in DFSMShsm

DFSMShsm is a Life Cycle Management System – It makes perfect sense to retain data in DFSMShsm until it expires; that

is why we have DFSMShsm!

– However, there is a substantial cost associated with it!

Where are the costs?– In daily RECYCLE

– In the daily backup of the DFSMShsm control data sets

– In duplicating ML2 tapes and or a mirrored virtual tape subsystem

– In moving the data every 3 years or so to refresh storage media


Cost of Managing Aged Data in DFSMShsm

Managing inactive data in DFSMShsm for long periods of time has a cost in terms of daily CPU, I/O and Storage Resources

Inactive data is data that is 2 years old or older and has not been recalled (used) in 1 year or more– CPU and I/O to RECYCLE the tapes (recycle typically runs daily)

• Recycle is the act of deleting expired data and moving non-expired data to another tape

– Data Storage Costs• DFSMShsm data is typically stored on DASD, physical TAPE or virtual tape

– DFSMShsm Backup and Reorganization Costs• Every migrated data set has at least 2 CDS records; 3 if VSAM• Every data set that is backed up has at least 2 CDS records plus 1 MCC record for

each backup copy• DFSMShsm Control Data Sets are backed up daily; catalogs are backed up multiple

times per day– Duplication of ML2 and Backup Data

• The cost of storing inactive data in DFSMShsm is further exasperated by the duplication of this data for DR purposes


Cost of Managing Aged Data in DFSMShsm

Bottom Line…the costs associated are:– Daily recycle

– Daily backup of the control data sets

– Daily backup of catalogs

– Data duplication (remote mirroring for Business Continuity)

Recycle

Backups

Duplicatio

n

Duplication


In Addition… The majority of customers polled are now using a virtual tape

subsystem for DFSMShsm ML2 data– More tape drives available

– Faster recalls (typically)

– Capability to mirror remotely

A virtual tape subsystem has an average life span of just 3 years– Data must be copied from one virtual tape subsystem to another every 3 years

• Data must be “migrated” to newer technology when disk is replaced

– This adds to the cost of data storage for long term retention

Data that needs to be retained longer than three years will outlive the virtual tape it’s stored on– Data with long term storage requirements must be housed on media that can

support the requirement• Generally, all tape media used in zOS environments meet these requirements• Current tape media has a 10 -15 year life span


If You MUST Keep This Data…

Data migrated more than 2 years and has not been recalled are candidates for archival– Possible solution is to use an Archive Manager

– Deletes entries from the MCDS, BCDS and Catalog

– Improves performance and saves CPU resources

Benefits of archiving aged data– HSM MCDS and BCDS record count reduction– DASD space requirements reduction for CDSs and CDS backup copies– Saved CPU and I/O from moving aged from tape to tape during recycle– Tape recycle activity and CPU time reduction– Related data archived together; expires together– Possible catalog record count reduction, catalog backup and CPU time

reduction



Optimizing Your Batch Window


Performance Challenges are Increasing

Online availability requirement are increasing– High demand for information access and up to the minute data

Service Level Agreements are more stringent

Determining system-wide impact of application tuning activities is difficult

Batch windows are getting smaller– Batch jobs get bottlenecked by extensive I/Os, blocking their ability to

run at peak speed

– Little time to optimize batch jobs

36


Why is Optimization of I/O Important?

Growth and batch processing window constraints

Business needs out of alignment with application design

Legacy application integration with e-business

Data center consolidations

Extending business without the need to upgrade systems

Time = $$$

Optimization Saves Time

37


Batch Window Constraints

Batch processing is composed of:

– CPU Cycles

– Memory

– I/O

How can I/O constraints be reduced to improve batch elapsed time?

38


Reducing Batch I/O Constraints

Look for an intelligent, intuitive and integrated optimization tool that:

– Significantly reduces elapsed times of batch processing

– Reduces batch processing requirements

– Is storage platform independent

– Automatically enhances buffering to improve batch cycles

39


I/O Without Using an Optimization Product

Inefficient I/O operations

Relying on system defaults

Improper tuning

Lack of flexibility when change is required from sequential to random access (or vice versa)

Low performance

System is not utilized to its maximum capacity

Buffer

Get / Put

Program EXCP I/O

Small buffer size Many I/O operations

HOST DASD

40


I/O Using an Optimization Product

Reduces number of I/Os dramatically

Automatically adjusts the buffers

Increases performanceFrees system resources

Buffer

Program EXCP I/O

Large buffer size Fewer I/O operations

HOST DASD

No need for application modification

Get / Put

41


Optimization Product Functions Automated Batch I/O Tuning Solution

– Significantly improve system-wide performance for VSAM and non-VSAM batch processing

– Reductions of batch elapsed time in the range of 25-75%

– Benefits VSAM, non-VSAM (QSAM, BSAM) and VSAM loads

Accomplishes this by:– Reducing CPU overhead associated with I/O (EXCPs)

– Exploiting “locality of reference” principle in real storage• Refers to ‘reuse of specific data, and/or resources, within a relatively small

time duration’

– Adapting NSR/LSR Buffering to changes in file processing

– Enabling VSAM LSR and Hiperspace for high level code

42


Customer Experience

Results of I/O Optimization – Wall Clock Savings

Without I/O Optimization

With I/O Optimization

Percent Improvement

VSAM Job1 00:00:12.06 00:00:02.20 81.76

VSAM Job 2 00:01:17.53 00:00:17.68 77.20

VSAM Job 3 00:01:38.01 00:00:19.05 80.56

Non-VSAM Job1 00:00:11.97 00:00:06.36 46.87

Non-VSAM Job2 00:00:11.74 00:00:06.44 45.14

Load Job1 00:01:20.71 00:00:14.02 82.63

Load Job2 00:00:23.03 00:00:05.08 77.94

Load Job3 00:03:34.37 00:00:33.88 84.20


Customer Experience

Results of I/O Optimization – EXCP Savings

EXCPs Without I/O Optimization

EXCPs With I/O Optimization

Percent Improvement

VSAM Job 4 1,457,551 110,461 92

VSAM Job 5 847,287 89,247 89

VSAM Job 6 2,589,771 334,058 87

Non-VSAM Job3 4,839,708 1,995,825 58

Non-VSAM Job4 3,800,729 1,454,560 61

Load Job 4 9,498,212 227,177 97

Load Job 5 8,665,813 205,981 97

Load Job 6 8,694,282 184,257 97


Documents

© 2012 IBM Corporation z/OS New Year's resolutions for Saving CPU Cycles & Improving I/O Utilization February 12, 2013