Upload
rajesh-chowdary-para
View
218
Download
1
Embed Size (px)
DESCRIPTION
ATS
Citation preview
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation
Accelerate with ATS: TPC 4.1.1 Performance Management Enhancements and Demonstration
John Hollis
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation2
AgendaTPC 4.1.1 (Available October 2009) – review of the significant new performance metrics, alerts and thresholds– New performance metrics– New performance alerts and thresholds– New filters on thresholds for selected times and percentages– New alert triggers and suppression– Demo of using the "volume utilization" metric
Demo of the Storage Optimizer– Comparison of the Storage Optimizer's recommendations with findings
using "volume utilization" metric
Positioning the use of filtered reports, alerts, SAN Planner and Storage Optimizer
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation3
New Performance Metrics, Alerts and ThresholdsNew performance metrics:• Port Send/Receive/Overall Bandwidth Percentage for SVC and switch ports• Port Send/Receive/Overall Utilization Percentage for storage subsystem ports• Port FCP/FICON/PPRC Send/Receive/Total I/O Rate• Port FCP/FICON/PPRC Send/Receive/Total Data Rate• Port FCP/FICON/PPRC Send/Receive/Total Response Time• Read/Write/Total HPF I/O Rate• HPF I/O Percentage• Volume Utilization• New SVC 4.3.1 Performance Counters
– Peak Backend Read/Write Response Time– Peak Backend Read/Write Queue Times– Non-Preferred Node Usage Percentage– Overall Host Attributed Response Time Percentage
Utilization is percent of time busy; requires service time from the storage device (ie RT). IBM Education class SGA07 will cover port and volume utilization.
The approximate utilization percentage of a volume over a time interval. Available on systems reporting IO rate and RT.
This provides an aid to diagnosing slow hosts and poorly performing fabrics. This is the time taken for a host to respond to a transfer-ready notification from the node (for read) or the time taken for a host to send the write data after the node has responded to a transfer-ready notification (for write)
Based on data rate.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation4
New Performance Metrics, Alerts and ThresholdsNew performance alerts and thresholds:• Total Backend I/O Rate for ESS/DS6K/DS8K arrays/SVC MDisk• Total Backend Data Rate for ESS/DS6K/DS8K arrays/SVC MDisk• Backend Read/Write Response Time for ESS/DS6K/DS8K arrays/SVC MDisk• Backend Overall Response Time for SVC MDisk• Backend Read/Write Queue Time for SVC MDisks• Backend Peak Write Response Time for SVC Nodes• Port to Local Node Send/Receive Response Time for SVC Nodes• Port to Local Node Send/Receive Queue Time for SVC Nodes• Non-Preferred Node Usage Percentage for SVC I/O Groups• Port Send/Receive Busy (time) Utilization Percentage for storage subsystem ports• Port Send/Receive Bandwidth Percentage for SVC and switch ports
Exists prior to 4.1.1, but I/O rate threshold boundary added – covered in next slide
Sets thresholds on the average number of milliseconds it took to service each send operation to another node in the local SVC cluster. Violation of these threshold boundaries means that it is taking too long to send data between nodes (on the fabric), and suggests either congestion around these FC ports, or an internal SVC microcode problem.
This threshold is enabled by default,(SVC and switches) with default boundaries 85,75,-1,-1.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation5
New Performance Metrics, Alerts and ThresholdsNew filters on Thresholds for selected times and percentages:
• If the I/O Rate is less than a user-specified value (ops/sec), no alert will be generated even if the response time exceeds the threshold boundary.
• Write Cache Delay Percentage (pre-populated 10%, 3,, I/O 10)
• Overall Backend Response Time Threshold (SVC) (pre-populated blank, blank…I/O 5)
• Non-preferred Node Usage Percentage (SVC)• Backend Write Queue Time (SVC) (pre-populated 5, 3,, I/O 5)
• Backend Read Queue Time (SVC) (pre-populated 5, 3,, I/O 5)
• Backend Write Response Time (pre-populated 120, 80,, I/O 5)
• Backend Read Response Time (pre-populated 35,25,,I/O 5)
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation6
New Performance Metrics, Alerts and ThresholdsAlert and threshold definition mechanism additions:
• Trigger alerts based on critical/warning condition levels• Suppress alerts when there are insufficient repetitions• Suppress alerts when there are repeated conditions• All events shown in ‘Constraint Violations” reports
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation7
Make use of new thresholds and alert suppression
Tailor the alert suppression options to an “attention getting” level.All alerts, including suppressed alerts, are shown in the “Constraints Violation Report.”
Tailor thresholds to “attention getting” levels.If the alerts are being ignored – they are of no value.Consider adjusting alert levels such that the administrator will take action.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation8
Make use of “volume utilization”
Define the data columns you want to report on.Performance management experts may add columns for data used in diagnosing performance problems. (for example I/O rates and response times)
This screen is shown in the demo. It is here for reference.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation9
In performance management, there is a concept called “Population.” It is “Total IO rate” multiplied by “Overall Response Time” divided by 1000.The calculation for “Volume Utilization” is based on “Population.”Techniques for analyzing “Population” and “Volume Utilization” are covered in IBM Education course SGA07.
Make use of “volume utilization”
Press the save icon to save the report in “My Reports”
As you gain experience in your environment, add the names of volumes you want excluded.For example: volumes that you have seen as always violating the thresholds, their situation is that they will not be fixed and you just do not want to see them in this report any more.Also: the volume name filter “LIKE”can be used to define reports on critical volumes, applications, servers…. (if clever volume naming policies have been used).
This screen is shown in the demo. It is here for reference.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation10
Make use of “volume utilization”
Use the “Drill up” option to go to reports that may provide insight to the root cause.
This volume had many occurrences of high utilization so it is a “volume of performance interest”
This volume had only one occurrence of high utilization so it is not yet a “volume of performance interest”
This screen is shown in the demo. It is here for reference.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation11
Make use of the Storage Optimizer
A
B
P0 is in need of attention
Thresholds define not only the colors used in the heat map, but also the parameters within the optimizer must work.For example, if a threshold of 20% were chosen, and moving a volume off of P0 resulted in the target Pool’s utilization going over 20%, that possible action would be eliminated. It is possible to set thresholds so low that no action is recommended.Working with the Storage Optimizer is an iterative process.
This screen is shown in the demo. It is here for reference.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation12
Make use of the Storage Optimizer
The recommendations will spread some of the workload that was previously only on P0. The results are circled below.This is a much better balance.
These volumes are also identified in a filtered “Volume Utilization” report when done for the same dates as this Storage Optimizer.
Before
After
This screen is shown in the demo. It is here for reference.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation13
Positioning the use of filtered reports, alerts, SAN Planner and Storage Optimizer
Filtered reports– Uses historic performance information– Good for trending, “spikes,” root cause analysis– Use them daily/weekly/monthly/as-needed
Alerts– Thresholds compared to performance data as it is gathered – Note: For storage systems, these are generally at a controller or I/O group level (i.e. no alert/threshold for “volume utilization”)
– Good for: immediate notification of situations– Use them regularly for constraint violations report analysis of affected volumes and hosts
SAN Planner– Uses historic performance information for planning new volumes– Use it as needed for creating new volumes
Storage Optimizer– Uses historic performance information for optimizing performance of existing volumes– Use it as needed for performance problem resolution and weekly/monthly for performance
problem avoidance
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation14
Questions
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation15
Trademarks and notesIBM and the IBM logo are registered trademarks, and other company, product or service names may be trademarks or service marks of International Business Machines Corporation in the United States, other countries, or both. For a list of IBM trademarks, please see: http://www.ibm.com/legal/copytrade.shtml
Intel and related trademarks and logos, IT Infrastructure Library and ITIL, Java and all Java-based trademarks and logos, Linux, Microsoft and Windows, and UNIX are trademarks or service marks of others as described under “Special attributions” at: http://www.ibm.com/legal/copytrade.shtml
Other company, product and service names may be trademarks or service marks of others.
References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.
IBM’s provision of products or services does not constitute the provision of legal advice, and IBM does not represent or warrant that its services or products will guarantee or assist your compliance with any laws or regulations. You are solely responsible for identifying, interpreting and ensuring your compliance with all applicable laws, regulations and rules relevant to your business needs and should seek competent legal advice as needed.
Advanced Technical Skills (ATS) North America
© 2010 IBM Corporation16
Demonstration contents – Options on creating a “write cache delay percentage” threshold, defaults,
alert suppression– “Alerts –> Storage Subsystem” sort by date– Constraint violations report, change begin date to 2009, DS6KA Disk Utiliz%
thresholds, drill up, “affected volumes” report– DS6KA perf monitor logfile thresholds and defaults– Filtered “Exercise - Volume Utilization” report:
• utilization, I/O rate, response time column comparison• Change start date to Nov 28, 2009, 1:11am, sort results on time• “NOT LIKE” *162*
– “Exercise – Storage Optimizer”– Options used in creating a volume with the SAN planner – Slide 13 – “Positioning.”