View
221
Download
0
Category
Tags:
Preview:
Citation preview
Allen D. Malony, Sameer Shende, Li Li, Kevin Huck {malony,sameer,lili,khuck}@cs.uoregon.edu
Department of Computer and Information Science
Performance Research Laboratory
University of Oregon
Parallel Performance Mapping,Diagnosis, and Data Mining
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 2
Research Motivation
Tools for performance problem solving Empirical-based performance optimization process Performance technology concerns
characterization
PerformanceTuning
PerformanceDiagnosis
PerformanceExperimentation
PerformanceObservation
hypotheses
properties
• Instrumentation• Measurement• Analysis• Visualization
PerformanceTechnology
• Experimentmanagement
• Performancestorage
PerformanceTechnology
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 3
Challenges in Performance Problem Solving
How to make the process more effective (productive)? Process may depend on scale of parallel system What are the important events and performance metrics?
Tied to application structure and computational model Tied to application domain and algorithms
Process and tools can/must be more application-aware Tools have poor support for application-specific aspects
What are the significant issues that will affect the technology used to support the process?
Enhance application development and benchmarking New paradigm in performance process and technology
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 4
Large Scale Performance Problem Solving
How does our view of this process change when we consider very large-scale parallel systems?
What are the significant issues that will affect the technology used to support the process?
Parallel performance observation is clearly needed In general, there is the concern for intrusion
Seen as a tradeoff with performance diagnosis accuracy Scaling complicates observation and analysis
Performance data size becomes a concern Analysis complexity increases
Nature of application development may change
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 5
Role of Intelligence, Automation, and Knowledge
Scale forces the process to become more intelligent Even with intelligent and application-specific tools, the
decisions of what to analyze is difficult and intractable More automation and knowledge-based decision making Build automatic/autonomic capabilities into the tools
Support broader experimentation methods and refinement Access and correlate data from several sources Automate performance data analysis / mining / learning Include predictive features and experiment refinement
Knowledge-driven adaptation and optimization guidance Will allow scalability issues to be addressed in context
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 6
Outline of Talk
Performance problem solving Scalability, productivity, and performance technology Application-specific and autonomic performance tools
TAU parallel performance system (Bernd said “No!”) Parallel performance mapping Performance data management and data mining
Performance Data Management Framework (PerfDMF) PerfExplorer
Model-based parallel performance diagnosis Poirot and Hercule
Conclusions
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 7
TAU Performance System
eventselection
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 8
Semantics-Based Performance Mapping
Associate performance measurements with high-level semantic abstractions
Need mapping support in the performance measurement system to assign data correctly
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 9
Hypothetical Mapping Example Particles distributed on surfaces of a cubeParticle* P[MAX]; /* Array of particles */
int GenerateParticles() {
/* distribute particles over all faces of the cube */
for (int face=0, last=0; face < 6; face++){
/* particles on this face */
int particles_on_this_face = num(face);
for (int i=last; i < particles_on_this_face; i++) {
/* particle properties are a function of face */ P[i] = ... f(face);
...
}
last+= particles_on_this_face;
}
}
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 10
Hypothetical Mapping Example (continued)
How much time (flops) spent processing face i particles? What is the distribution of performance among faces? How is this determined if execution is parallel?
int ProcessParticle(Particle *p) {
/* perform some computation on p */
}
int main() {
GenerateParticles();
/* create a list of particles */
for (int i = 0; i < N; i++)
/* iterates over the list */
ProcessParticle(P[i]);
}
…
engine
workpackets
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 11
No Performance Mapping versus Mapping
Typical performance tools report performance with respect to routines
Does not provide support for mapping
TAU’s performance mapping can observe performance with respect to scientist’s programming and problem abstractions
TAU (no mapping) TAU (w/ mapping)
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 12
ParaMap (Miller and Irvin) Low-level performance to high-level source constructs Noun-Verb (NV) model to describe the mapping
noun is an program entity verb represents an action performed on a noun sentences (nouns and verb) map to other sentences
Mappings: static, dynamic, set of active sentences (SAS) Semantics Entities / Abstractions/ Associations (SEAA)
Entities defined at any level of abstraction (user-level) Attribute entity with semantic information Entity-to-entity associations Target measurement layer and asynchronous operation
Performance Mapping Approaches
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 13
Two association types (implemented in TAU API) Embedded – extends associated
object to store performancemeasurement entity
External – creates an externallook-up table using address ofobject as key to locate performancemeasurement entity
Implemented in TAU API Applied to performance measurement problems
callpath/phase profiling, C++ templates, …
SEAA Implementation
…
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 14
Uintah Problem Solving Environment (PSE)
Uintah component architecture for Utah C-SAFE project Application programmers provide:
description of computation (tasks and variables) code to perform task on single “patch” (sub-region of
space) Components for scheduling, partitioning, load balance, …
Uintah Computational Framework (UCF) Execution model based on software (macro) dataflow
computations expressed a directed acyclic graphs of tasks input/outputs specified for each patch in a structured grid
Abstraction of global single-assignment memory Task graph gets mapped to processing resources Communications schedule approximates global optimal
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 15
Uintah Task Graph (Material Point Method)
Diagram of named tasks (ovals) and data (edges)
Imminent computation Dataflow-constrained
MPM Newtonian material point
motion time step Solid: values defined at
material point (particle) Dashed: values defined at
vertex (grid) Prime (’): values updated
during time step
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 16
Task Execution in Uintah Parallel Scheduler
Profile methods and functions in scheduler and in MPI library
Need to map performance data!
Task execution time dominates (what task?)
MPI communication overheads (where?)
Task execution time distribution
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 17
Mapping Instrumentation in UCF (example)
Use TAU performance mapping APIvoid MPIScheduler::execute(const ProcessorGroup * pc,
DataWarehouseP & old_dw, DataWarehouseP & dw ) {
...TAU_MAPPING_CREATE(
task->getName(), "[MPIScheduler::execute()]", (TauGroup_t)(void*)task->getName(), task->getName(), 0);...TAU_MAPPING_OBJECT(tautimer)TAU_MAPPING_LINK(tautimer,(TauGroup_t)(void*)task->getName());
// EXTERNAL ASSOCIATION...TAU_MAPPING_PROFILE_TIMER(doitprofiler, tautimer, 0)TAU_MAPPING_PROFILE_START(doitprofiler,0);task->doit(pc);TAU_MAPPING_PROFILE_STOP(0);...
}
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 18
Task Performance Mapping (Profile)
Performance mapping for different tasks
Mapped task performance across processes
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 19
Work Packet – to – Task Mapping (Trace)
Work packet computation events colored by task type
Distinct phases of computation can be identifed based on task
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 20
Comparing Uintah Traces for Scalability Analysis
8 processes
8 processes
32 processes
32 processes
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 21
Important Questions for Application Developers
How does performance vary with different compilers? Is poor performance correlated with certain OS features? Has a recent change caused unanticipated performance? How does performance vary with MPI variants? Why is one application version faster than another? What is the reason for the observed scaling behavior? Did two runs exhibit similar performance? How are performance data related to application events? Which machines will run my code the fastest and why? Which benchmarks predict my code performance best?
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 22
Performance Problem Solving Goals Answer questions at multiple levels of interest
Data from low-level measurements and simulations use to predict application performance
High-level performance data spanning dimensions machine, applications, code revisions, data sets examine broad performance trends
Discover general correlations application performance and features of their external environment
Develop methods to predict application performance on lower-level metrics
Discover performance correlations between a small set of benchmarks and a collection of applications that represent a typical workload for a given system
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 23
Empirical-Based Performance Optimization
characterization
PerformanceTuning
PerformanceDiagnosis
PerformanceExperimentation
PerformanceObservation
hypotheses
properties
observabilityrequirements ?
ProcessExperiment
Schemas
ExperimentTrials
Experimentmanagement
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 24
Performance Data Management Framework
ICPP 2005 paper
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 25
PerfExplorer (K. Huck, Ph.D. student, UO) Performance knowledge discovery framework
Use the existing TAU infrastructure TAU instrumentation data, PerfDMF
Client-server based system architecture Data mining analysis applied to parallel performance data
comparative, clustering, correlation, dimension reduction, ...
Technology integration Relational DatabaseManagement Systems (RDBMS) Java API and toolkit R-project / Omegahat statistical analysis WEKA data mining package Web-based client
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 26
PerfExplorer Architecture
SC’05 paper
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 27
PerfExplorer Client GUI
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 28
Hierarchical and K-means Clustering (sPPM)
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 29
Miranda Clustering on 16K Processors
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 30
Parallel Performance Diagnosis
Performance tuning process Process to find and report performance problems Performance diagnosis: detect and explain problems Performance optimization: performance problem repair
Experts approach systematically and use experience Hard to formulate and automate expertise Performance optimization is fundamentally hard
Focus on the performance diagnosis problem Characterize diagnosis processes How it integrates with performance experimentation Understand the knowledge engineering
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 31
Parallel Performance Diagnosis Architecture
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 32
Performance Diagnosis System Architecture
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 33
Problems in Existing Diagnosis Approaches
Low-level abstraction of properties/metrics Independent of program semantics Relate to component structure
not algorithmic structure or parallelism model
Insufficient explanation power Hard to interpret in the context of program semantics Performance behavior not tied to operational parallelism
Low applicability and adaptability Difficult to apply in different contexts Hard to adapt to new requirements
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 34
Poirot Project
Lack of a formal theory of diagnosis processes Compare and analyze performance diagnosis systems Use theory to create system that is automated / adaptable
Poirot performance diagnosis (theory, architecture) Survey of diagnosis methods / strategies in tools Heuristic classification approach (match to characteristics) Heuristic search approach (based on problem knowledge)
Problems Descriptive results do not explain with respect to context
users must reason about high-level causes Performance experimentation not guided by diagnosis
Lacks automation
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 35
Model-Based Approach
Knowledge-based performance diagnosis Capture knowledge about performance problems Capture knowledge about how to detect and explain them
Where does the knowledge come from? Extract from parallel computational models
Structural and operational characteristics Associate computational models with performance
Do parallel computational models help in diagnosis? Enables better understanding of problems Enables more specific experimentation Enables more efffective hypothesize testing and search
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 36
Implications for Performance Diagnosis
Models benefit performance diagnosis Base instrumentation on program semantics Capture performance-critical features Enable explanations close to user’s understanding
of computation operation of performance behavior
Reuse performance analysis expertise on the commonly-used models
Model examples Master-worker model Pipeline Divide-and-conquer Domain
decomposition Phase-based Compositional
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 37
Hercule Project
Goals of automation , adaptability, validation
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 38
Approach
Make use of model knowledge to diagnose performance Start with commonly-used computational models Engineering model knowledge Integrate model knowledge with performance
measurement system Build a cause inference system
define “causes” at parallelism level build causality relation
between the low-level “effects” and the “causes”
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 39
Master-Worker Parallel Computation Model
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 40
Init. or final.time significant
1.Insufficient-parallelism
Low speedup
3.Master-being-bottleneck
Worker numbersaturation
Worker starvation
master-assign-tasktime significant
2.Fine-granularity
Large amount of messageexchanged every time
: Hypotheses
: Causesnumber : priority
Num of reqs inmaster queue > Κ1
in some time intervals
Waiting long time for Master assigning each
individual taskSuch intervals
>Κ2 Such intervals <Κ2
+ + +
+
Time imbalance
4. Some workersNoticeably inefficient
+
Κi : threshold
+ : coexistence
: Observation
The workers waited quite a while in master queue in
Some time intervals +
Performance Diagnosis Inference Tree (MW)
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 41
Knowledge Engineering - Abstract Event (MW)
Use CLIPS expert system building tool
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 42
Diagnosis Results Output (MW)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 43
Experimental Diagnosis Results (MW)
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 44
Concluding Discussion
Performance tools must be used effectively More intelligent performance systems for productive use
Evolve to application-specific performance technology Deal with scale by “full range” performance exploration Autonomic and integrated tools Knowledge-based and knowledge-driven process
Performance observation methods do not necessarily need to change in a fundamental sense More automatically controlled and efficiently use
Support model-driven performance diagnosis Develop next-generation tools and deliver to community
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 2005 45
Support Acknowledgements
Department of Energy (DOE) Office of Science contracts University of Utah ASCI Level 1
sub-contract ASC/NNSA Level 3 contract
NSF High-End Computing Grant
Research Centre Juelich John von Neumann Institute Dr. Bernd Mohr
Los Alamos National Laboratory
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Recommended