43
Performance Performance Measurement Measurement Assignment? Assignment? Timing Timing clude <sys/time.h> ble When() struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec * 1

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Embed Size (px)

Citation preview

Page 1: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Performance MeasurementPerformance Measurement

Assignment?Assignment? TimingTiming

#include <sys/time.h>double When(){

struct timeval tp;gettimeofday(&tp, NULL);return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6);

}

Page 2: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

A Quantitative Basis for DesignA Quantitative Basis for Design

Parallel programming is an optimization Parallel programming is an optimization problem.problem.

Must take into account several factors:Must take into account several factors:– execution timeexecution time– scalabilityscalability– efficiencyefficiency

Page 3: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

A Quantitative Basis for DesignA Quantitative Basis for Design

Parallel programming is an optimization Parallel programming is an optimization problem.problem.

Must take into account several factors:Must take into account several factors: Also must take into account the costs:Also must take into account the costs:

– memory requirementsmemory requirements– implementation costsimplementation costs– maintenance costs etc.maintenance costs etc.

Page 4: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

A Quantitative Basis for DesignA Quantitative Basis for Design

Parallel programming is an optimization Parallel programming is an optimization problem.problem.

Must take into account several factors:Must take into account several factors: Also must take into account the costs:Also must take into account the costs: Mathematical performance models are used Mathematical performance models are used

to asses these costs and predict to asses these costs and predict performance.performance.

Page 5: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Defining PerformanceDefining Performance

How do you define parallel performance?How do you define parallel performance? What do you define it in terms of?What do you define it in terms of? ConsiderConsider

– Distributed databasesDistributed databases– Image processing pipelineImage processing pipeline– Nuclear weapons testbedNuclear weapons testbed

Page 6: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Amdahl's LawAmdahl's Law

Every algorithm has a sequential Every algorithm has a sequential component.component.

Sequential component limits speedupSequential component limits speedup

SequentialComponent

MaximumSpeedup

= 1/s = s

Page 7: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Amdahl's LawAmdahl's Law

s

Speedup

Page 8: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

What's wrong?What's wrong?

Works fine for a given algorithm.Works fine for a given algorithm.– But what if we change the algorithm?But what if we change the algorithm?

We may change algorithms to increase We may change algorithms to increase parallelism and thus eventually increase parallelism and thus eventually increase performance.performance.– May introduce inefficiencyMay introduce inefficiency

Page 9: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Metrics for PerformanceMetrics for Performance

EfficiencyEfficiency SpeedupSpeedup ScalabilityScalability Others …………..Others …………..

Page 10: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

EfficiencyEfficiency

pTp

T1E

The fraction of time a processor spends doing useful work

What about when pTWhat about when pTpp < T < T11

– Does cache make a processor work at 110%?Does cache make a processor work at 110%?

Page 11: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

SpeedupSpeedup

SpeedP

SpeedS

1

What is Speed?

What algorithm for Speed1?

What is the work performed?How much work?

Page 12: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Two kinds of SpeedupTwo kinds of Speedup

RelativeRelative– Uses parallel algorithm on 1 processorUses parallel algorithm on 1 processor– Most commonMost common

AbsoluteAbsolute– Uses best known serial algorithmUses best known serial algorithm– Eliminates overheads in calculation.Eliminates overheads in calculation.

Page 13: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

SpeedupSpeedup

Algorithm AAlgorithm A– Serial execution time is 10 sec.Serial execution time is 10 sec.– Parallel execution time is 2 sec.Parallel execution time is 2 sec.

Algorithm BAlgorithm B– Serial execution time is 2 sec.Serial execution time is 2 sec.– Parallel execution time is 1 sec.Parallel execution time is 1 sec.

What if I told you A = B?What if I told you A = B?

Page 14: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec
Page 15: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

LogicLogic

The art of thinking and reasoning in strict The art of thinking and reasoning in strict accordance with the limitations and accordance with the limitations and incapacities of the human misunderstanding. incapacities of the human misunderstanding.

The basis of logic is the syllogism, The basis of logic is the syllogism, consisting of a major and minor premise and consisting of a major and minor premise and a conclusion.a conclusion.

Page 16: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

ExampleExample

Major Premise: Sixty men can do a piece of Major Premise: Sixty men can do a piece of work sixty times as quickly as one man.work sixty times as quickly as one man.

Minor Premise: One man can dig a post-Minor Premise: One man can dig a post-hole in sixty seconds.hole in sixty seconds.

Conclusion: Sixty men can dig a post-hole Conclusion: Sixty men can dig a post-hole in one second.in one second.

Page 17: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Performance Analysis StatementsPerformance Analysis Statements

There is always a trade-off between time There is always a trade-off between time and solution quality.and solution quality.

We should compare the quality of the We should compare the quality of the answer for a given execution time.answer for a given execution time.

For any performance reporting, find and For any performance reporting, find and clearly state the quality measure.clearly state the quality measure.

Page 18: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

SpeedupSpeedup

Conventional speedup is defined as the Conventional speedup is defined as the reduction in execution time.reduction in execution time.

Consider running a problem on a slow Consider running a problem on a slow parallel computer and on a faster one.parallel computer and on a faster one.– Same serial componentSame serial component– Speedup will be lower on the faster computer.Speedup will be lower on the faster computer.

Page 19: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Speedup and Amdahl's LawSpeedup and Amdahl's Law

Conventional speedup Conventional speedup penalizes penalizes faster faster absolute speed.absolute speed.

Assumption that task size is constant as the Assumption that task size is constant as the computing power increases results in an computing power increases results in an exaggeration of task overhead.exaggeration of task overhead.

Scaling the problem size reduces these Scaling the problem size reduces these distortion effects.distortion effects.

Page 20: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

SolutionSolution

Gustafson introduces scaled speedup.Gustafson introduces scaled speedup. Scale the problem size as you increase the Scale the problem size as you increase the

number of processors.number of processors. Calculated in two waysCalculated in two ways

– ExperimentallyExperimentally– Analytical modelsAnalytical models

Page 21: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Traditional SpeedupTraditional Speedup

)(

)(1

NT

NTSpeedup

P

T1 is time taken on a single processor

TP is time taken on P processors

Page 22: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Scaled SpeedupScaled Speedup

)(

)(1

PNT

PNTSpeedup

P

T1 is time taken on a single processor

TP is time taken on P processors

Page 23: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Scaled Speedup vs TraditionalScaled Speedup vs Traditional

Page 24: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Traditional SpeedupTraditional Speedup

ideal

measured

Number of Processors

Speedup

Page 25: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Scaled SpeedupScaled Speedup

ideal

Number of Processors

Speedup

Small problem

Medium problem

Large Problem

Page 26: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Performance MeasurementPerformance Measurement

There is not a perfect way to measure and There is not a perfect way to measure and report performance.report performance.

Wall clock time seems to be the best.Wall clock time seems to be the best. But how much work do you do?But how much work do you do? Best Bet:Best Bet:

– Develop a model that fits experimental results.Develop a model that fits experimental results.

Page 27: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

A Parallel Programming ModelA Parallel Programming Model

Goal: Define an equation that predicts Goal: Define an equation that predicts execution time as a function of execution time as a function of – Problem sizeProblem size– Number of processorsNumber of processors– Number of tasksNumber of tasks– Etc.Etc.

,....),( PNfT

Page 28: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

A Parallel Programming ModelA Parallel Programming Model

Execution time can be broken up into Execution time can be broken up into – ComputingComputing– CommunicatingCommunicating– IdlingIdling

1

0

1

0

1

0

1 P

i

iidle

P

i

icomm

P

i

icomp TTT

PT

Page 29: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Computation TimeComputation Time

Normally depends on problem sizeNormally depends on problem size Also depends on machine characteristicsAlso depends on machine characteristics

– Processor speedProcessor speed– Memory systemMemory system– Etc.Etc.

Often, experimentally obtainedOften, experimentally obtained

Page 30: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Communication TimeCommunication Time

The amount of time spent sending & The amount of time spent sending & receiving messagesreceiving messages

Most often is calculated as Most often is calculated as – Cost of sending a single message * #messagesCost of sending a single message * #messages

Single message costSingle message cost– T = startuptime + T = startuptime +

time_to_send_one_word * #words time_to_send_one_word * #words

Page 31: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Idle TimeIdle Time

Difficult to determineDifficult to determine This is often the time waiting for a message This is often the time waiting for a message

to be sent to you.to be sent to you. Can be avoided by overlapping Can be avoided by overlapping

communication and computation.communication and computation.

Page 32: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Finite Difference ExampleFinite Difference Example

Finite Difference CodeFinite Difference Code 512 x 512 x 5 Elements512 x 512 x 5 Elements

Nine-point stencilNine-point stencil Row-wise decompositionRow-wise decomposition

– Each processor gets n/p*n*z elementsEach processor gets n/p*n*z elements

16 IBM RS6000 workstations16 IBM RS6000 workstations Connected via EthernetConnected via Ethernet

znn

Page 33: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Finite Difference ModelFinite Difference Model

Execution Time (per iteration)Execution Time (per iteration)– ExTime = (Tcomp + Tcomm)/PExTime = (Tcomp + Tcomm)/P

Communication Time (per iteration)Communication Time (per iteration)– Tcomm = 2 (lat + 2*n*z*bw)Tcomm = 2 (lat + 2*n*z*bw)

Computation TimeComputation Time– Estimate using some sample codeEstimate using some sample code

Page 34: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Estimated PerformanceEstimated Performance

Page 35: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Finite Difference ExampleFinite Difference Example

Page 36: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

What was wrong?What was wrong?

EthernetEthernet– Shared busShared bus

Change the computation of TcommChange the computation of Tcomm– Reduce the bandwithReduce the bandwith– Scale the message volume by the number of Scale the message volume by the number of

processors sending concurrently.processors sending concurrently.– Tcomm = 2 (lat + 2*n*z*bw * P/2)Tcomm = 2 (lat + 2*n*z*bw * P/2)

Page 37: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Finite Difference ExampleFinite Difference Example

Page 38: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Using analytical modelsUsing analytical models

Examine the control flow of the algorithmExamine the control flow of the algorithm Find a general algebraic form for the Find a general algebraic form for the

complexity (execution time).complexity (execution time). Fit the curve with experimental data.Fit the curve with experimental data. If the fit is poor, find the missing terms and If the fit is poor, find the missing terms and

repeat.repeat. Calculate the scaled speedup using formula.Calculate the scaled speedup using formula.

Page 39: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

ExampleExample

Serial Time = 2 + 12 N secondsSerial Time = 2 + 12 N seconds Parallel Time = 4 + 12 N/P + 5P secondsParallel Time = 4 + 12 N/P + 5P seconds Let N/P = 128Let N/P = 128 Scaled Speedup for 4 processors is:Scaled Speedup for 4 processors is:

93.31560

6146)4(5)4/)128(4(124

))128(4(122

)(

)(1 PNC

PNC

P

Page 40: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Performance EvaluationPerformance Evaluation

Identify the dataIdentify the data Design the experiments to obtain the dataDesign the experiments to obtain the data Report dataReport data

Page 41: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Performance EvaluationPerformance Evaluation

Identify the dataIdentify the data– Execution timeExecution time– Be sure to examine a range of data pointsBe sure to examine a range of data points

Design the experiments to obtain the dataDesign the experiments to obtain the data Report dataReport data

Page 42: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Performance EvaluationPerformance Evaluation

Identify the dataIdentify the data Design the experiments to obtain the dataDesign the experiments to obtain the data

– Make sure the experiment measures what you Make sure the experiment measures what you intend to measure.intend to measure.

– Remember: Execution time is max time taken.Remember: Execution time is max time taken.– Repeat your experiments many timesRepeat your experiments many times– Validate data by designing a modelValidate data by designing a model

Report dataReport data

Page 43: Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec

Performance EvaluationPerformance Evaluation

Identify the dataIdentify the data Design the experiments to obtain the dataDesign the experiments to obtain the data Report dataReport data

– Report all information that affects executionReport all information that affects execution– Results should be separate from ConclusionsResults should be separate from Conclusions– Present the data in an easily understandable Present the data in an easily understandable

format.format.