Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
IntelliMagic Vision for z/OS Overview
Ogé Nduka, Enterprise Account Manager
Dave Heggen, z/OS Performance Consultant
2
Agenda
• Introduction
• IntelliMagic Vision for z/OS Disk and z/OS Systems Overview and
Demonstration
‒ Dashboards and Thresholds
‒ New Process SMF record type 113 (Processor Hardware Capacity
Reporting)
‒ WLM Reporting
‒ New Data Set Summary Reporting
‒ New GTF IO Summary trace analysis
• IntelliMagic Vision as a Service
3
IntelliMagic
• Leadership in “Availability Intelligence” Solutions:
‒ Provides new visibility of threats to continuous availability by using built-in expert knowledge automatically applied to the data (RMF, SMF, etc.)
• Over 20 year history of solutions for deep analysis
• Privately held, financially independent
• Customer centric, responsive
• Solutions used daily in some of the world’s largest data centers
4
Some IntelliMagic Clients
“…My team has been able to deal proactively with upcoming resource bottlenecks in our storage installation. Complexity had previously kept these bottlenecks hidden. …gave us visibility to easily and timely reveal emerging constraints inside our storage systems…”
- Andreas Reimann, Finanz Informatik (126.9M Accounts | 2.3 PB | 86 Disk
Systems | 3 Datacenters)
“IntelliMagic Vision proactively and centrally monitors storage performance and SLAs. Potential problems are discovered early, analyzed and prevented from developing any further. Competently and reliably.”
-Thomas Ehmke, Schaeffler (50+ Storage Systems | 20+ Locations |3
Continents)
5
• Availability:
‒ “The ability of a configuration item or IT service to perform its agreed function when required.” (ITIL v.3)
• Intelligence:
‒ “Foreknowledge of an adversary” – (Sun Tzu, 500 bc)
• Availability Intelligence:
‒ What: Knowledge about hidden threats to availability
‒ Why: Better protect continuous availability at primary site
‒ How: Automatically apply expert knowledge in the analysis of performance and configuration data
What is Availability Intelligence?
6
Application of Expert Knowledge
• Built-in expert knowledge automatically applied to RMF/SMF in 7 areas
• Protects availability by assessing risk every day, every device, every data center
• Only way to continually fulfil ITIL v3 definition Capacity Management:
– The Process responsible for ensuring that the Capacity of IT Services and the IT Infrastructure is able to deliver agreed Service Level Targets in a Cost Effective and timely manner… considers all Resources required to deliver the IT Service...
7
Business Use Cases
• High Visibility of risks with single-pane-of-glass
‒ Multiple storage HW vendors, data centers
‒ Avoid Service Disruptions with foresight to take action
• Accelerate Recovery from issues with quick root
cause analysis
• SLA/SLO Compliance
• Safely Provision & Extend Life of Hardware
‒ Optimize performance, throughput, balance
‒ Maximize investment
• Capacity Planning
1
2
3
4
5
8
Data Center Views of Key Risk Indicators
8 © IntelliMagic 2014
Disk Storage Systems
Performance Metrics
Key Risk Indicators
Highest Rating for this Dashboard
Consolidate individual ratings on infrastructure resources into data center views to see risk across enterprise at a glance
9
Visualizing Risk to Continuous Availability
Automatically rate key metrics according to built-in expert knowledge, to obtain intelligence about threats you can use to
protect availability
No Border, No Rating Green Border, Good
Yellow Border, Early Warning
Red Border, Performance Exceptions
10
Rate Risk using Contextual Knowledge
A three level rating system
based on hardware
capabilities
A three level, dynamic
rating based on both
workload characteristics and hardware
11
Adjusting or Disabling Thresholds
• Thresholds come pre-populated based on domain expertise about the hardware and configuration, but they can be adjusted or disabled
• Provides the ability to match your installation policies to IntelliMagic Vision threshold settings
• Rated Element goes from rated value to gray
• Rest of the chart unchanged, Rating for chart recalculated for remaining rated values
12
Analysis and Reporting
Previous search results didn’t process past the report sets. Search results increased as much as 106x with drilldowns included.
Improved Search – search beyond Report Previews
13
New DSSPORT REDUCE Specification
• IntelliMagic Vision default Replication Technique is SYNC
• DSSPORT allows specification of Replication Technique to ASYNC,XRC,SYNC, NONE
• Works with ASYNC D/R Environments (i.e. GM, zGM, SRDFA, HUR)
• Corrects treatment of Couple Datasets in zHyperswap environment
• DSSPORT REPLICATIONTYPE=ASYNC, Include=((SERIAL=(IBM-12345, EMC-12345 HDS-12345), PORT=(ALL)));
• Port addresses are specified in Hex, ‘*’ Wildcard is supported
14
Analysis and Reporting Improvements in Accuracy and Usability - all
reporting
• Web Reporter adds export to .CSV and report search feature
• Decimal rather than Hexadecimal values for IBM DSS Extent Pools
• Datasets focal point expanded to Datasets and Summarized Datasets
• PowerPoint Template updated to support MS PowerPoint 2007, 2010 and 2013
15
SMF 113 Records
• Reduce command ‘COLLECT SMF113’ and License key for VISZMVS
needed
• Measurements taken from the processor chips
• Covers the following areas:
‒ # instructions and cycles executed
• both overall and in problem-state
‒ Cache architecture effectiveness including Cache Levels
‒ Translation Lookaside Buffer delays and effectiveness
‒ Cryptographic Coprocessor Activity
• Difficult to relate measurements to applications
• Potential to improve overall efficiency and MIPS delivered
16
Relative Nest Intensity For all Processor Hardware Counters by Processor Complex serial
IntelliMagic Vision supports the derived metrics specified by
IBM like Relative Nest Intensity.
17
Cache Miss Probabilities for CPs only
IntelliMagic Vision analyzes the cache hierarchy
18
Estimated TLB1 CPU Miss % of Total CPU (%) For all Processor Hardware Counters by Processor Complex serial
IntelliMagic Vision reports on Translation
Lookaside Buffers
19
Estimated Impact Cache and TLB Misses (Cycles/Inst) For all Processor Hardware Counters
IntelliMagic Vision shows how the cache
miss delay is distributed over the
TLB and cache hierarchy.
20
Importance LPAR Workload
WLM Improvements – Goal Reporting
21
WLM Improvements
22
WLM Improvements (cont.)
23
Data Set Analysis (1 of 2)
• Requires use of License Key VISZJDSN
• Obtained from SMF type 42 records
‒ Recommend same duration and time offset as RMF
• Individual Data Set Selection Examples:
‒ Just I/O Rate: INCLUDE(MINIORATE=10)
‒ Just I/O Intensity: INCLUDE(MININTENSITY=50)
‒ I/O Rate OR I/O Intensity:
• INCLUDE(MINIORATE=10,MININTENSITY=50)
‒ I/O Rate AND I/O Intensity:
• INCLUDE(MINIORATE=10), INCLUDE(MININTENSITY=50)
24
Data Set Analysis (2 of 2)
• REDUCE’s new ‘AGGREGATE’ statement
‒ Subparameter of COLLECT DATASET=(
‒ AGGREGATE=(OFF|DSNTYPE|SERVICECLASS|SSID| STORAGEGROUP)
‒ AGGREGATE=(DSNTYPE) is the default
• Creates Optional Data Set Summaries
‒ Data set summaries are from all data sets records in the SMF data
‒ Usually, it’s impractical to keep all individual data set records in the Vision DB thus the selection statements for minimum I/O Rate and/or minimum I/O Intensity
25
Data Set Focal Points
• New reports on maximum service and
response times
• Summary of all data sets, not just the busy
ones
26
Sample Max Service Time Report
• Reports on all data set records stored in the Vision database
27
Summarized Data Sets (1 of 5)
• Compares captured I/Os in data set record (SMF 42) vs RMF (type 74)
DSS001
DSS002
DSS003
DSS004
DSS001
DSS002
DSS003
DSS004
DSS001
DSS002
DSS003
DSS004
DSS001
DSS002
DSS003
DSS004
28
Summarized Data Sets (2 of 5)
• zHPF candidate reports covers all data set records, not just the active ones
29
Summarized Data Sets (3 of 5)
• Easily track data set activity type over time of day
30
Summarized Data Sets (4 of 5)
• Drill downs available from other focal points
31
Summarized Data Sets (5 of 5)
• Drill down from DSS level pend time to data set types to find culprit for early AM pend time – STCDB2 (DB2 master)
DSS001
DSS001
32
What is a GTF I/O Summary Trace?
• Generalized Trace Facility (GTF) is a tool on MVS (aka, zOS)
‒ z/OS V2R1.0 MVS Diagnosis: Tools and Service Aids (GA32-0905-00)
‒ http://www-01.ibm.com/support/docview.wss?uid=pub1ga32090500
• I/O Summary trace records added in the 1990s
‒ Less overhead than tracing entire channel program (CCW)
‒ Single event record (IOX) including:
• Start/End timestamps
• Control info
• Orientation
• Data transfer
• Components of response time
33
Sample GTF IOX Trace Request
• Describe what you want… 1. Concurrent GTF IOX trace on LPARs SYSA, SYSB 2. SGDB2WRK volumes only 3. Any weekday, Tuesday-Friday, 11am-noon (1 hour)
• Show GTF control file needed
The GTF trace control file in SYS1.PARMLIB would look something like:
TRACE=IOXP
IOX=(3F5F-3F62,3F68-3F77,3FDF,3FE3-3FEA,3FF1,3FF2)
END
34
IntelliMagic Vision for zOS GTF Summary Trace Analysis
• Takes GTF Summary Trace data
‒ Breaks down data into a .CSV for analysis
‒ Provides an IO level view of activity
• Microsecond granularity
• Allows investigation into situations that have gotten buried in Averages (Bursty IO)
• Best to isolate Trace Activity to a specific device address or set of addresses
• Run GTF thru RMFPACK prior to FTP to IntelliMagic
35
Sample Reports… HyperPAV Consumption
36
Sample Reports… I/O Response @0.1 sec Intervals
37
Sample Reports… Write I/Os with Overlap (time & extent)
38
• Good fit for SaaS, frequently updated HW knowledge
• Allows for very quick deployment (~24 hours)
• Okay for security - no PII in infrastructure measurement data
• Easy report distribution, data manipulation off host
• Easy augmentation with specialized/expert consultants
Easy Starting Point IntelliMagic Vision as a Service