61
Code Tuning Chapter 25-26

Code Tuning

  • Upload
    sakura

  • View
    27

  • Download
    1

Embed Size (px)

DESCRIPTION

Code Tuning. Chapter 25-26. Textbook & Reference. Steve McConnell. Code Complete: A Practical Handbook of Software Construction . 2nd Edition. Microsoft Press, 2004. Chapters 25 and 26 Jon Bentley, Programming Pearls , Second Edition, Addison-Wesley, Inc., 2000. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Code Tuning

Code Tuning

Chapter 25-26

Page 2: Code Tuning

Textbook & Reference

2

Steve McConnell. Code Complete: A Practical Handbook of Software Construction. 2nd Edition. Microsoft Press, 2004. Chapters 25 and 26

Jon Bentley, Programming Pearls, Second Edition, Addison-Wesley, Inc., 2000.

Page 3: Code Tuning

Outline

3

Options of Performance Improvement What Is Code Tuning? Common Sources of Inefficiency Code Tuning Process Code Tuning Techniques

Logic Loops Data Transformations Expressions Object Creation and Flyweight/Factory Patterns

Page 4: Code Tuning

Options for Performance Improvement

4

Program requirements Make sure it is a problem that needs to be solved Boehm’s story (2000):

A system at TRW initially requiring subsecond response time led to a highly complex design and an estimated cost $100M

Further analysis determined that users would be satisfied with 4 second response 90% of the time

Program design Some problems have to be addressed at design

level Search algorithms

Page 5: Code Tuning

Options for Performance Improvement

5

OS interactions Perhaps the OS routines are slow or fat Working with external files, dynamic memory, or

output device probably interact with the OS Your compiler may generate (or your libraries

may invoke) system calls you would never dream of.

Code compilation Maybe no need to think about optimizing speed

any further if the right compiler is chosen Hardware

Sometimes buying new hardware is the cheapest and best way

Page 6: Code Tuning

What Is Code Tuning?

6

The practice of modifying correct code in ways that make it run more efficiently Small-scale changes that affect a single class /

routine or a few lines of code

Not the most effective or easiest or cheapest way to improve performance

Not necessarily ‘better’ code. No one, but programmers, usually cares how tight

the code is

Page 7: Code Tuning

Why Is Code Tuning Appealing?

7

It is incredibly satisfying to reduce execution time by tweaking a few lines!

Mastering the art of writing efficient code is a rite of passage to becoming a serious programmer Programming culture - writing micro-efficient code

proves you are cool Like a tennis player picking up a tennis ball

Page 8: Code Tuning

Pareto Principle: 80/20 Rule

8

‘get 80% of the result with 20% of the effort’ Boehm (1987): 20% of a program’s routines

consumes 80% of its execution time Knuth (1971): less than 4% of a program usually

accounts for more than 50% of its runtime. Bentley (1988): a 1 KLOC program spent 80% of

its time in a 5-line square root routine Measure the code to find the hot spots and

put your resources into optimizing the few percent

Write most of code in an interpreted language and rewrite the hot spots in a faster language

Page 9: Code Tuning

Which One Is Faster?

9

for i=1 to 10{ a[i] = i; }

a[1] = 1;a[2] = 2; a[3] = 3;a[4] = 4;a[5] = 5; a[6] = 6;a[7] = 7;a[8] = 8; a[9] = 9;a[10] = 10;

Page 10: Code Tuning

Old Wives’ Tales

10

Reducing LOC improves the speed or size of the resulting machine code – false Previous Java example: 12.6 vs 3.23

Certain operations are probably faster or smaller than others – false No room for probably when talking about performance The rule of the game change every time you change languages,

compilers, versions of libraries, processor, amount of memory ‘Fast’ is as important as ‘correct’ - false

Page 11: Code Tuning

Old Wives’ Tales - cont

11

You should optimize as you go – false If you strive to write the fastest and smallest

possible code for each routine, your program will be fast and small?

Problematic! It is almost impossible to identify performance

bottlenecks before a program is working completely. Programmers are very bad at guessing which 4%

accounts for 50% of the execution time. Focusing on optimization during initial development

detracts from achieving other program objectives.

Page 12: Code Tuning

When to Tune

12

Don’t optimize until you know you need to Use a high-quality design Make the program right Make it modular and easily modifiable Check performance when it is complete and

correct

Complier optimizations They might be more powerful than you expect Compiler-optimized code may be faster than

‘tricky’ code

Page 13: Code Tuning

13

Complier optimizations With a good optimizing compiler,

your code speed can improve 40+ percent

Many of the techniques described in the next chapter produce gains of only 15–30 percent.

Why not just write clear code and let the compiler do the work?

Here are the results of a few tests to check how much an optimizer speeded up an insertion-sort routine: The only difference between

versions of the routine was that compiler optimizations were turned off for the first compile and turned on for the second.

Page 14: Code Tuning

Compiler Optimizations - cont

14

sum=0; for (row=0; row<rowCount; row++) {

for (column=0; column<columnCount; column++) {sum = sum + martix[row][column]

} // expensive multiplications for access to 2D array}

sum=0;elementPointer = matrix;lastElementPointer = matrix[rowCount-1][columnCount-1]+1;while (elementPoint <lastElementPointer) {

sum = sum + elementPointer++; }

No improvement - digging into the assembly code turned out that the complier did the optimization.

Page 15: Code Tuning

Common Sources of Inefficiency

15

Input/output operations If you have a choice of working with a file in memory vs

on disk, in a database, or across a network, use an in-memory structure unless space is critical.

System calls: expensive due to context switch (Saving program states, recovering the kernel states and the reverse) Write own services if part of the function is needed Avoid going to the system Work with the system vendor to make the call faster

Interpreted languages Process each instruction before creating and executing

machine code

Page 16: Code Tuning

16

Input/output operations

Interpreted languages

Page 17: Code Tuning

Are They Functionally Equivalent?

17

for (column=0; column<MAX_COLUMNS; column++) {for (row=0; row<MAX_ROWS; row++) {

table [row][column] = BlankTableElement();}

}

for (row=0; row<MAX_ROWS; row++) {for (column=0; column<MAX_COLUMNS; column++) {

table [row][column] = BlankTableElement();}

}

1. Each element of table is about 4k bytes long2. Pages are usually at least 4K bytes in size

Page 18: Code Tuning

Common Sources of Inefficiency - cont

18

Paging An operation that causes the OS to swap pages of

memory It is much slower than an operation that works on

only one page of memory

Errors Leaving debugging code turned on Forgetting to de-allocate memory Polling non-existent devices until timeout

Page 19: Code Tuning

Paging Example

19

Assume: each row is about the same page size.

for (column=0; column<MAX_COLUMNS; column++) {for (row=0; row<MAX_ROWS; row++) {

table [row][column] = BlankTableElement();}

} // accessing a different row causes a page fault

for (row=0; row<MAX_ROWS; row++) {for (column=0; column<MAX_COLUMNS; column++) {

table [row][column] = BlankTableElement();}

} // 1000 times faster on a machine with limited memory,!

Page 20: Code Tuning

20

If table has too many rows, every time the program accesses a different row, the operating system will have to switch memory pages.

The way the loop is structured, every single array access switches rows, which means that every single array access causes paging to disk.

Page 21: Code Tuning

Relative Costs of Common OperationsOperation Example C++ Java

Baseline (int assignment) i=j 1 1Call private routine with no para. this.foo() 1 0.5Object routine call bar.foo() 2 1Polymorphic routine call abstractBar.foo() 2.5 2Object reference i=obj.num 1 1Integer division i=j/k 5 1.5Floating point sqrt x=sqrt(y) 15 4Floating point ey x = exp(y) 50 20Floating point logarithm x=log(y) 25 20

21

Measurements are sensitive to local machine environment and compiler.Measurements between C++ and Java are not directly comparable.

Page 22: Code Tuning

Measurement

22

Performance aspects can be counterintuitive. Experience from old machine or language does

not help much. It is not worth sacrificing clarity for a

performance gamble, if it is not worth measuring to know that it is more efficient Develop software by using well-designed code that

is easy to understand and modify. Measurements need to be precise.

Page 23: Code Tuning

Code Tuning Process

23

If the performance is poor: Save a working version Measure the system to find hot spots Determine where weak performance is

from: Go back if tuning is not appropriate

Tune the bottleneck identified Measure each improvement at a time If no improvement, revert to the code

saved

Page 24: Code Tuning

Code Tuning Techniques

24

Logic Loops Data Transformations Expressions Object Creation

Page 25: Code Tuning

Logic

25

Stop testing when you know the answer If (5<x) and (x<10) then … // Specification

Use short-circuit op: && vs. & If (5<x) then if (x<10) then … // If the language does

not support short-circuit evaluation. Search whether a negative is present

negativeInputFound = false;for (i=0; i<count; i++) {

if (input [i]<0) {negativeInputFound = true;

}}

Page 26: Code Tuning

26

A better approach? Is to stop scanning as soon as you find a negative

value. Any of these approaches would solve the problem:

Add a break statement after the negativeInputFound = true line.

If your language doesn't have break, emulate a break with a goto that goes to the first statement after the loop.

Change the for loop to a while loop, and check for negativeInputFound as well as for incrementing the loop counter past count.

Change the for loop to a while loop, put a sentinel value in the first array element after the last value entry, and simply check for a negative value in the while test. After the loop terminates, see whether the position of the first found value is in the array or one past the end.

Page 27: Code Tuning

27

Page 28: Code Tuning

Can You Tune This Code?

28

Keyboard input in a word processorSelect inputCharacter

Case “+”, “=“ ProcessMathSymbol(inputCharacter)

Case “0” to “9“ ProcessDigit(inputCharacter)

Case “ ,”, “.”, “:”, “;”, “!”, “?”ProcessPunctuation(inputCharacter)

Case “ ”ProcessSpace(inputCharacter)

Case “A” to “Z”, “a” to “z”ProcessAlpha(inputCharacter)

Case ElseProcessError(inputCharacter)

End Select

Page 29: Code Tuning

Logic - cont

29

Arrange test so that the one that is fastest and most likely to be true is performed first If you know the likely frequency of input

characters, put the most common cases first.Select inputCharacter

Case “A” to “Z”, “a” to “z” ProcessAlpha(inputCharacter)Case “ ” ProcessSpace(inputCharacter)Case “ ,”, “.”, “:”, “;”, “!”, “?”

ProcessPunctuation(inputCharacter)Case “0” to “9“ ProcessDigit(inputCharacter)Case “+”, “=“ ProcessMathSymbol(inputCharacter)Case Else ProcessError(inputCharacter)

End Select

Page 30: Code Tuning

Can You Tune This Code?

30

…if ( ( (‘a’<=inputChar) && (inputChar <=‘z’)) || ( (‘A’<=inputChar) && (inputChar <=‘Z’))) {

charType = CharacterType.Letter;}else if ( (inputChar==‘ ‘) ||(inputChar == ‘,’) ||

(inputChar==‘.‘) || (inputChar==‘!‘) || (inputChar==‘(‘) ||(inputChar==‘)‘) || (inputChar==‘:‘) || (inputChar==‘;‘) ||(inputChar==‘?‘) || (inputChar==‘-‘)) {charType = CharacterType.Punctuation;

} else if ((‘0’<=inputChar) && (inputChar <=‘9’)) {

charType = CharacterType.Digit; }…

Page 31: Code Tuning

Logic - cont

31

Substitute table lookups for complicated expressions (space for time) Computing only once and storing results in a

table. Example1: Character type:

Store the type of each character in an array that’s accessed by the character type code

charType = charTypeTable[inputChar]; Example 2: Integer square root of integers 1..100

Lazy evaluation – avoid doing any work until needed For a table of 5K entries, generate the whole table

at startup time vs the small percentage used

Page 32: Code Tuning

32

Substitute tableSuppose you want to assign a category number to something based on which of three groups— Groups A, B, and C—it falls into:

Page 33: Code Tuning

33

Page 34: Code Tuning

Can You Tune This Code?

34

for (i=0; i<count; i++) {if (sumType == SUMTYPE_NET) {

netSum = netSum+amount[i];}else {

grossSum = grossSum+ amount[i];}

}

Page 35: Code Tuning

Loops -Unswitching

35

if (sumType == SUMTYPE_NET) {for (i=0; i<count; i++) {

netSum = netSum+amount[i];}

}else {

for (i=0; i<count; i++) {grossSum = grossSum+ amount[i];

}}

Page 36: Code Tuning

36

Loops -Unswitching If the decision doesn't change while the loop is executing,

you can unswitch the loop by making the decision outside the loop putting loops inside the conditional rather than putting the

conditional inside the loop

Page 37: Code Tuning

Loops - Minimizing the work inside loops

37

for (i=0; i<rateCount; i++) {netRate[i] = baseRate[i] * rates ->discounts-

>factors->net;}

qualityDiscount = rates ->discounts->factors->net;for (i=0; i<rateCount; i++) {

netRate[i] = baseRate[i] * qualityDiscount;}

Page 38: Code Tuning

Can You Tune This Code

38

found = FALSE;i=0;while ( (!found) && (i<count) ) {

if (item[i] == testValue)found=TRUE;

else i++;

}if (found) {…}

Compound test

Page 39: Code Tuning

Loops - Sentinel values

39

For a loop with a compound test, you can often save time by simplifying the test.

For a search loop, one can use a sentinel value A value that is put just past the end of the search

range and guaranteed to terminate

Page 40: Code Tuning

Loops - Sentinel values: cont

40

// set sentinel value, preserving the original value initialValue = item[count];item[count] = testValue;

i=0;while (item[i] !=testValue) {

i++;} // 3 tests to 1 test

if (i<count) {…}

Time savings: Java (44%), C#(23%) for a 100-elemenmt of integers

Page 41: Code Tuning

Loops - Putting the busiest loop on the inside

41

for (column=0; column<100; column++) {for (row =0; row<5; row++) {

sum = sum + table[row][column]}

} The outer executes much more often than the

inner Each time the loop executes, it has to initialize

the loop index, increment it on each pass through the loop, and check it after each pass

Total number of loop executions: 100 + 100*5 Switching the inner and outer: 5 + 100*5

Page 42: Code Tuning

Loops - Strength Reduction

42

Replace an expensive operation (e.g. multiplication) with a cheaper operation (e.g. addition) for (i=0; i<=saleCount-1;i++){

commission(i) = (i+1) * revenue*basedCommission*discount

}

incrementalCommission = revenue*bassCommission*discountcumulativeCommission = incrementalCommissionfor (i=0;i<=saleCount-1;i++){

commission(i) = cumulativeCommissioncumulativeCommission = cumulativeCommission +

incrementalCommission}

Page 43: Code Tuning

Data Transformations

43

Use integers rather than floating point numbers

Use the fewest array dimensions possible One dimensional representation of an array

for (entry=0; entry<numRows*numColumns; entries++) {matrix [entry] =0;

}

Time savings: C++: 11%; Java: 47%, C#: 9%

Page 44: Code Tuning

Is Tuning Possible?

44

for (discountType =0; discountType <typeCount; discountType++) {for (discountLevel=0;

discountLevel<levelCount; discountLevel++){rate[discountLevel] = rate[discountLevel] *

discount[discountType];}

}

Page 45: Code Tuning

Data Transformations - cont

45

The reference to discount[discountType] doesn't change when discountLevel changes in the inner loop.

Minimize array referencesfor (discountType =0; discountType <typeCount;

discountType++) {for (discountLevel=0; discountLevel<levelCount;

discountLevel++){rate[discountLevel] = rate[discountLevel] *

discount[discountType];}

}thisDiscount = discount[discountType];for (discountLevel=0; discountLevel<levelCount;

discountLevel++){rate[discountLevel] = rate[discountLevel] *thisDiscount;

}

Page 46: Code Tuning

Expressions

46

Use strength reduction Replace multiplication with addition Replace exponentiation with multiplication Replace trigonometric routines with their

trigonometric identities Replace longlong integers with longs and ints Replace floating point with fixed-point numbers Replace double precision with single precision Replace integer multiplication-by-two and division-

by-two with shift operations

Page 47: Code Tuning

47

Is Tuning Possible? Ax2 + Bx + C. The letters A, B, and C are coefficients, and x is a

variable. Write code to evaluate an nth-order polynomial

Page 48: Code Tuning

48

Evaluate a polynomial – cont’

Page 49: Code Tuning

49

Evaluate a polynomial – cont’ (Ax + x)x + C

Page 50: Code Tuning

Is Tuning Necessary?

50

unsigned int log2(unsigned int x) {return (unsigned int) (log(x)/log(2));

}

Page 51: Code Tuning

Expressions - cont

51

Initialize at compile time or precompute

const double LOG2 = 0.69314718;…unsigned int log2(unsigned int x) {

return (unsigned int) (log(x)/LOG2);}

Any other ways to tune the code?

Page 52: Code Tuning

Expressions - cont

52

Be wary of system routines log2 returned an integer value but used a

floating point log() routine to compute it

unsigned int log2(unsigned int x) {if (x<2) return 0;if (x<4) return 1;if (x<8) return 2;if (x<16) return 3;…if (x<2147483648) return 30;return 31;}

unsigned int log2(unsigned int x) { unsigned int i=0; while ((x=(x>>1))!=0) { i++; } return i;}// hard to understand; should avoid // unless you have a good reason

Page 53: Code Tuning

Is Tuning Possible?

53

payment = loanAmount / ((1.0-Math.pow(1.0+(interestRate/12.0), -

months))/(interestRate/12.0)

);

Page 54: Code Tuning

Expressions - cont

54

Eliminate common subexpressionspayment = loanAmount / (

(1.0-Math.pow(1.0+(interestRate/12.0), -months))/

(interestRate/12.0)); monthlyInterest = interestRate/12.0;

Page 55: Code Tuning

Can Performance Be Improved?

55

Draw 1000 circles with 6 different colors. Version 1: Circle.java, Test.java

for (int i=0; i < NUMBER_OF_CIRCLES; ++i) { Circle circle = new Circle(getRandomColor()); circle.draw(g, getRandomX(), getRandomY(),

getRandomR());//1000 object created.

} Improvement with Flyweight

Version 2: Cricle.java, CircleFactory.java, Test.java From Java Design Patterns at a Glance

http://www.javacamp.org/designPattern/

Page 56: Code Tuning

56

Flyweight Intent: Use sharing to support large numbers of fine-grained objects

efficiently The FlyweightPattern describes how to support a large number of fine

grained objects efficiently, by sharing commonalities in state. For example, when designing a word processor application, you might

create an object for each character typed. Each Character object might contain information such as the font face, size

and weight of each character. The problem here is that a lengthy document might contain tens of

thousands of characters, and objects - quite a memory killer! The Flyweight pattern addresses the problem by

creating a new object to store such information, which is shared by all characters with the same formatting.

So, if I had a ten-thousand word document, with 800 characters in Bold Times-New-Roman, these 800 characters would contain a reference to a flyweight object that stores their common formatting information.

The key here is that you only store the information once, so memory consumption is greatly reduced. -- TobinHarris

Page 57: Code Tuning

57

Flyweight Structure Summary

Page 58: Code Tuning

58

Page 59: Code Tuning

59

CircleFactory.javaclass CircleFactory { //store color private static final HashMap circleByColor = new HashMap();

public static Circle getCircle(Color color) { Circle circle = (Circle)circleByColor.get(color);

if(circle == null) { circle = new Circle(color); circleByColor.put(color, circle); System.out.println("Creating " + color + " circle"); //see how many objects we create on command line } return circle; }}

Page 60: Code Tuning

Object Creation and Flyweight Pattern

60

Make instances of classes on the fly to improve performance efficiently

Show a file system with directories No need to load all the files or directories at one loading

time. Show the upper level folders first. If the user clicks a folder, then load its subdirectories and

files. Java String Class is designed with Flyweight

String s1 = "hello"; String s2 = "hello"; //store in a string pool. String s3 = new String("hello"); System.out.println(s1==s2); //true, share the memory

address System.out.println(s1==s3); //false

Page 61: Code Tuning

More about Code Tuning

61

Other techniques Recoding in a low-level language …

Code tuning is a controversial, emotional topic Apply them with care if you decide to use