16
Advanced Technical Skills (ATS) North America © 2010 IBM Corporation Accelerate with ATS: TPC 4.1.1 Performance Management Enhancements and Demonstration John Hollis

Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Embed Size (px)

DESCRIPTION

ATS

Citation preview

Page 1: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation

Accelerate with ATS: TPC 4.1.1 Performance Management Enhancements and Demonstration

John Hollis

Page 2: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation2

AgendaTPC 4.1.1 (Available October 2009) – review of the significant new performance metrics, alerts and thresholds– New performance metrics– New performance alerts and thresholds– New filters on thresholds for selected times and percentages– New alert triggers and suppression– Demo of using the "volume utilization" metric

Demo of the Storage Optimizer– Comparison of the Storage Optimizer's recommendations with findings

using "volume utilization" metric

Positioning the use of filtered reports, alerts, SAN Planner and Storage Optimizer

Page 3: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation3

New Performance Metrics, Alerts and ThresholdsNew performance metrics:• Port Send/Receive/Overall Bandwidth Percentage for SVC and switch ports• Port Send/Receive/Overall Utilization Percentage for storage subsystem ports• Port FCP/FICON/PPRC Send/Receive/Total I/O Rate• Port FCP/FICON/PPRC Send/Receive/Total Data Rate• Port FCP/FICON/PPRC Send/Receive/Total Response Time• Read/Write/Total HPF I/O Rate• HPF I/O Percentage• Volume Utilization• New SVC 4.3.1 Performance Counters

– Peak Backend Read/Write Response Time– Peak Backend Read/Write Queue Times– Non-Preferred Node Usage Percentage– Overall Host Attributed Response Time Percentage

Utilization is percent of time busy; requires service time from the storage device (ie RT). IBM Education class SGA07 will cover port and volume utilization.

The approximate utilization percentage of a volume over a time interval. Available on systems reporting IO rate and RT.

This provides an aid to diagnosing slow hosts and poorly performing fabrics. This is the time taken for a host to respond to a transfer-ready notification from the node (for read) or the time taken for a host to send the write data after the node has responded to a transfer-ready notification (for write)

Based on data rate.

Page 4: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation4

New Performance Metrics, Alerts and ThresholdsNew performance alerts and thresholds:• Total Backend I/O Rate for ESS/DS6K/DS8K arrays/SVC MDisk• Total Backend Data Rate for ESS/DS6K/DS8K arrays/SVC MDisk• Backend Read/Write Response Time for ESS/DS6K/DS8K arrays/SVC MDisk• Backend Overall Response Time for SVC MDisk• Backend Read/Write Queue Time for SVC MDisks• Backend Peak Write Response Time for SVC Nodes• Port to Local Node Send/Receive Response Time for SVC Nodes• Port to Local Node Send/Receive Queue Time for SVC Nodes• Non-Preferred Node Usage Percentage for SVC I/O Groups• Port Send/Receive Busy (time) Utilization Percentage for storage subsystem ports• Port Send/Receive Bandwidth Percentage for SVC and switch ports

Exists prior to 4.1.1, but I/O rate threshold boundary added – covered in next slide

Sets thresholds on the average number of milliseconds it took to service each send operation to another node in the local SVC cluster. Violation of these threshold boundaries means that it is taking too long to send data between nodes (on the fabric), and suggests either congestion around these FC ports, or an internal SVC microcode problem.

This threshold is enabled by default,(SVC and switches) with default boundaries 85,75,-1,-1.

Page 5: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation5

New Performance Metrics, Alerts and ThresholdsNew filters on Thresholds for selected times and percentages:

• If the I/O Rate is less than a user-specified value (ops/sec), no alert will be generated even if the response time exceeds the threshold boundary.

• Write Cache Delay Percentage (pre-populated 10%, 3,, I/O 10)

• Overall Backend Response Time Threshold (SVC) (pre-populated blank, blank…I/O 5)

• Non-preferred Node Usage Percentage (SVC)• Backend Write Queue Time (SVC) (pre-populated 5, 3,, I/O 5)

• Backend Read Queue Time (SVC) (pre-populated 5, 3,, I/O 5)

• Backend Write Response Time (pre-populated 120, 80,, I/O 5)

• Backend Read Response Time (pre-populated 35,25,,I/O 5)

Page 6: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation6

New Performance Metrics, Alerts and ThresholdsAlert and threshold definition mechanism additions:

• Trigger alerts based on critical/warning condition levels• Suppress alerts when there are insufficient repetitions• Suppress alerts when there are repeated conditions• All events shown in ‘Constraint Violations” reports

Page 7: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation7

Make use of new thresholds and alert suppression

Tailor the alert suppression options to an “attention getting” level.All alerts, including suppressed alerts, are shown in the “Constraints Violation Report.”

Tailor thresholds to “attention getting” levels.If the alerts are being ignored – they are of no value.Consider adjusting alert levels such that the administrator will take action.

Page 8: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation8

Make use of “volume utilization”

Define the data columns you want to report on.Performance management experts may add columns for data used in diagnosing performance problems. (for example I/O rates and response times)

This screen is shown in the demo. It is here for reference.

Page 9: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation9

In performance management, there is a concept called “Population.” It is “Total IO rate” multiplied by “Overall Response Time” divided by 1000.The calculation for “Volume Utilization” is based on “Population.”Techniques for analyzing “Population” and “Volume Utilization” are covered in IBM Education course SGA07.

Make use of “volume utilization”

Press the save icon to save the report in “My Reports”

As you gain experience in your environment, add the names of volumes you want excluded.For example: volumes that you have seen as always violating the thresholds, their situation is that they will not be fixed and you just do not want to see them in this report any more.Also: the volume name filter “LIKE”can be used to define reports on critical volumes, applications, servers…. (if clever volume naming policies have been used).

This screen is shown in the demo. It is here for reference.

Page 10: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation10

Make use of “volume utilization”

Use the “Drill up” option to go to reports that may provide insight to the root cause.

This volume had many occurrences of high utilization so it is a “volume of performance interest”

This volume had only one occurrence of high utilization so it is not yet a “volume of performance interest”

This screen is shown in the demo. It is here for reference.

Page 11: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation11

Make use of the Storage Optimizer

A

B

P0 is in need of attention

Thresholds define not only the colors used in the heat map, but also the parameters within the optimizer must work.For example, if a threshold of 20% were chosen, and moving a volume off of P0 resulted in the target Pool’s utilization going over 20%, that possible action would be eliminated. It is possible to set thresholds so low that no action is recommended.Working with the Storage Optimizer is an iterative process.

This screen is shown in the demo. It is here for reference.

Page 12: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation12

Make use of the Storage Optimizer

The recommendations will spread some of the workload that was previously only on P0. The results are circled below.This is a much better balance.

These volumes are also identified in a filtered “Volume Utilization” report when done for the same dates as this Storage Optimizer.

Before

After

This screen is shown in the demo. It is here for reference.

Page 13: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation13

Positioning the use of filtered reports, alerts, SAN Planner and Storage Optimizer

Filtered reports– Uses historic performance information– Good for trending, “spikes,” root cause analysis– Use them daily/weekly/monthly/as-needed

Alerts– Thresholds compared to performance data as it is gathered – Note: For storage systems, these are generally at a controller or I/O group level (i.e. no alert/threshold for “volume utilization”)

– Good for: immediate notification of situations– Use them regularly for constraint violations report analysis of affected volumes and hosts

SAN Planner– Uses historic performance information for planning new volumes– Use it as needed for creating new volumes

Storage Optimizer– Uses historic performance information for optimizing performance of existing volumes– Use it as needed for performance problem resolution and weekly/monthly for performance

problem avoidance

Page 14: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation14

Questions

Page 15: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation15

Trademarks and notesIBM and the IBM logo are registered trademarks, and other company, product or service names may be trademarks or service marks of International Business Machines Corporation in the United States, other countries, or both. For a list of IBM trademarks, please see: http://www.ibm.com/legal/copytrade.shtml

Intel and related trademarks and logos, IT Infrastructure Library and ITIL, Java and all Java-based trademarks and logos, Linux, Microsoft and Windows, and UNIX are trademarks or service marks of others as described under “Special attributions” at: http://www.ibm.com/legal/copytrade.shtml

Other company, product and service names may be trademarks or service marks of others.

References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.

IBM’s provision of products or services does not constitute the provision of legal advice, and IBM does not represent or warrant that its services or products will guarantee or assist your compliance with any laws or regulations. You are solely responsible for identifying, interpreting and ensuring your compliance with all applicable laws, regulations and rules relevant to your business needs and should seek competent legal advice as needed.

Page 16: Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

Advanced Technical Skills (ATS) North America

© 2010 IBM Corporation16

Demonstration contents – Options on creating a “write cache delay percentage” threshold, defaults,

alert suppression– “Alerts –> Storage Subsystem” sort by date– Constraint violations report, change begin date to 2009, DS6KA Disk Utiliz%

thresholds, drill up, “affected volumes” report– DS6KA perf monitor logfile thresholds and defaults– Filtered “Exercise - Volume Utilization” report:

• utilization, I/O rate, response time column comparison• Change start date to Nov 28, 2009, 1:11am, sort results on time• “NOT LIKE” *162*

– “Exercise – Storage Optimizer”– Options used in creating a volume with the SAN planner – Slide 13 – “Positioning.”