Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
© 2016 OpenPOWER Foundation
Emerging Workload Performance Evaluation on Future Generation OpenPOWER Processors
Saritha Vinod
Power Systems Performance Analyst
IBM Systems
© 2016 OpenPOWER Foundation
Agenda
2
• Emerging Workloads Characteristics and Performance• Performance Modelling Lifecycle for Future Generation Processors• Workload Tracing Process• Workload Tracing Methods & Tools • Key Challenges in Workload Tracing• Performance Evaluations using Traces
• Microarchitecture Design Analysis • Software Performance Optimizations• Performance Verification
• Summary
© 2016 OpenPOWER Foundation
Emerging Workloads Characteristics and Performance
3
• New industry trends leading to emerging workloads in domains such as Cognitive
computing, Deep Learning, Analytics, Cloud etc.
• To achieve best performance it is important for the next generation processor design to
address some of the following emerging workload characteristics
Instruction mixes & compute needs
Cache access patterns & prefetch
Data access patterns
Sharing of data
Data affinities
Branch prediction
OS and Hypervisor calls
© 2016 OpenPOWER Foundation
Performance Modelling Lifecycle for Future Generation Processors
4
Develop/Config
ure processor
Model
Design/ Feature
Evaluation
Identify
bottlenecks
Design
Enhancements
WorkloadsInstruction
Traces
Processor Performance Modeling Lifecycle
Remodel
Reached Target
Performance ?
Model
Final
Processor
Model
Traces provide key workload
characteristics
Enable performance
evaluation of future
generation processors
© 2016 OpenPOWER Foundation
Workload Tracing Process
5
Instruction Traces
Core Model
I/O Model
Memory Model
Model statistics
Pipeline Visualizations
In
put
Mo
dels
Outp
ut
Workload
Trace Post processing & Validation
Recaptu
re T
ra
ce
Tra
ce
Genera
tion
Perform
ance
Mo
delling
Functional
SimulatorHW Trace Valgrind
© 2016 OpenPOWER Foundation
Workload Tracing Methods & Tools
6
Functional Simulator Hardware Traces Valgrind Framework
• Highly Controlled simulation environment
• Supports sampling of multi-phase workloads
• System level tracing
• Not well-suited for workloads with complex stack, large memory and highly threaded workloads
• Used forcommercial workloads with high core counts and memory requirements
• Instruction and bus traces
• System level tracing
• Complex setup process
• Lacks support for generating sampled traces
• Useful for tracing hot functions or problem areas in the application
• Supports sampling
• Provides only application tracing, no system level
Reference : IBM SDK for Linux on Power https://www-304.ibm.com/webapp/set2/sas/f/lopdiags/sdklop.html
Reference : IBM POWER8 Functional Simulator (systemsim)http://www-304.ibm.com/webapp/set2/sas/f/pwrfs/pwrfsinstall.html
© 2016 OpenPOWER Foundation
Key Challenges in Workload Tracing
7
Challenges
• Hardware models execute only a subset of instructions; most workloads run into billions of instructions.
• Overall runtime of emerging workloads increasing
• A smaller subset of runtime with representative workload behavior required for design studies.
• Selection depends on the design needs and the workload characteristics
• The selected segment need to retain the original workload characteristics
Resolutions
• Identify workload interval to trace –workload steady state, phases based on performance counter data
• Representative trace segment selection – sampled, contiguous, filtered or at unit level
• Trace profile validation – capturing the right application runtime, maintaining the CPI characteristics
© 2016 OpenPOWER Foundation
Performance Evaluations using Traces
8
Microarchitecture
Design
• Design evaluations of new processor features
• Tuning and trade-off analysis
Software Performance Optimizations
• Analysis of hot functions and bottlenecks in applications
• Compiler optimizations
• System tuning
Performance Verification
• Hardware model performance verification
Workload Traces
© 2016 OpenPOWER Foundation
Microarchitecture Design Analysis and Optimization
9
• Tuning and trade-off analysis• Determine capacity – Cache size , queue size• Sensitivity analysis using various categories of workload traces
• New Design evaluations• New techniques for load-store handling• Branch prediction algorithms• Data prefetch design
© 2016 OpenPOWER Foundation
Software Performance Optimizations
10
• Analyzing application performance bottlenecks• Back to back latency issues, LSU stalls, Branch mispredictions etc.
• Compiler optimizations• Microarchitecture dependent
• Scheduling, ISA exploitation
• Microarchitecture independent• Inlining, unrolling etc.
• Flag tuning
• System tuning • SMT levels• Prefetch settings• Large pages
© 2016 OpenPOWER Foundation
Micro-architecture Pipeline View for Optimizations
11
Cycle accurate simulator • Micro-architecture
statistics • Pipeline view for the
instruction mix
References: IBM Power 8 Performance Simulator https://www-304.ibm.com/webapp/set2/sas/f/lopdiags/sdkdownload.html
© 2016 OpenPOWER Foundation
Performance Verification
12
• Workload traces used for performance verification of hardware model• Broader performance comparison of final hardware model and the
performance model• To identify delta gaps in performance
© 2016 OpenPOWER Foundation
Summary
13
• OpenPOWER processors designed to deliver superior performance
• Performance evaluation and micro-architecture analysis tools and methods available for open innovation
• Key insights derived from emerging workloads through traces• Enables micro-architecture design evaluations, trade-off
analysis, software/compiler optimizations and verification
© 2016 OpenPOWER Foundation
Thank you
14