Upload
sakura
View
27
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Code Tuning. Chapter 25-26. Textbook & Reference. Steve McConnell. Code Complete: A Practical Handbook of Software Construction . 2nd Edition. Microsoft Press, 2004. Chapters 25 and 26 Jon Bentley, Programming Pearls , Second Edition, Addison-Wesley, Inc., 2000. Outline. - PowerPoint PPT Presentation
Citation preview
Code Tuning
Chapter 25-26
Textbook & Reference
2
Steve McConnell. Code Complete: A Practical Handbook of Software Construction. 2nd Edition. Microsoft Press, 2004. Chapters 25 and 26
Jon Bentley, Programming Pearls, Second Edition, Addison-Wesley, Inc., 2000.
Outline
3
Options of Performance Improvement What Is Code Tuning? Common Sources of Inefficiency Code Tuning Process Code Tuning Techniques
Logic Loops Data Transformations Expressions Object Creation and Flyweight/Factory Patterns
Options for Performance Improvement
4
Program requirements Make sure it is a problem that needs to be solved Boehm’s story (2000):
A system at TRW initially requiring subsecond response time led to a highly complex design and an estimated cost $100M
Further analysis determined that users would be satisfied with 4 second response 90% of the time
Program design Some problems have to be addressed at design
level Search algorithms
Options for Performance Improvement
5
OS interactions Perhaps the OS routines are slow or fat Working with external files, dynamic memory, or
output device probably interact with the OS Your compiler may generate (or your libraries
may invoke) system calls you would never dream of.
Code compilation Maybe no need to think about optimizing speed
any further if the right compiler is chosen Hardware
Sometimes buying new hardware is the cheapest and best way
What Is Code Tuning?
6
The practice of modifying correct code in ways that make it run more efficiently Small-scale changes that affect a single class /
routine or a few lines of code
Not the most effective or easiest or cheapest way to improve performance
Not necessarily ‘better’ code. No one, but programmers, usually cares how tight
the code is
Why Is Code Tuning Appealing?
7
It is incredibly satisfying to reduce execution time by tweaking a few lines!
Mastering the art of writing efficient code is a rite of passage to becoming a serious programmer Programming culture - writing micro-efficient code
proves you are cool Like a tennis player picking up a tennis ball
Pareto Principle: 80/20 Rule
8
‘get 80% of the result with 20% of the effort’ Boehm (1987): 20% of a program’s routines
consumes 80% of its execution time Knuth (1971): less than 4% of a program usually
accounts for more than 50% of its runtime. Bentley (1988): a 1 KLOC program spent 80% of
its time in a 5-line square root routine Measure the code to find the hot spots and
put your resources into optimizing the few percent
Write most of code in an interpreted language and rewrite the hot spots in a faster language
Which One Is Faster?
9
for i=1 to 10{ a[i] = i; }
a[1] = 1;a[2] = 2; a[3] = 3;a[4] = 4;a[5] = 5; a[6] = 6;a[7] = 7;a[8] = 8; a[9] = 9;a[10] = 10;
Old Wives’ Tales
10
Reducing LOC improves the speed or size of the resulting machine code – false Previous Java example: 12.6 vs 3.23
Certain operations are probably faster or smaller than others – false No room for probably when talking about performance The rule of the game change every time you change languages,
compilers, versions of libraries, processor, amount of memory ‘Fast’ is as important as ‘correct’ - false
Old Wives’ Tales - cont
11
You should optimize as you go – false If you strive to write the fastest and smallest
possible code for each routine, your program will be fast and small?
Problematic! It is almost impossible to identify performance
bottlenecks before a program is working completely. Programmers are very bad at guessing which 4%
accounts for 50% of the execution time. Focusing on optimization during initial development
detracts from achieving other program objectives.
When to Tune
12
Don’t optimize until you know you need to Use a high-quality design Make the program right Make it modular and easily modifiable Check performance when it is complete and
correct
Complier optimizations They might be more powerful than you expect Compiler-optimized code may be faster than
‘tricky’ code
13
Complier optimizations With a good optimizing compiler,
your code speed can improve 40+ percent
Many of the techniques described in the next chapter produce gains of only 15–30 percent.
Why not just write clear code and let the compiler do the work?
Here are the results of a few tests to check how much an optimizer speeded up an insertion-sort routine: The only difference between
versions of the routine was that compiler optimizations were turned off for the first compile and turned on for the second.
Compiler Optimizations - cont
14
sum=0; for (row=0; row<rowCount; row++) {
for (column=0; column<columnCount; column++) {sum = sum + martix[row][column]
} // expensive multiplications for access to 2D array}
sum=0;elementPointer = matrix;lastElementPointer = matrix[rowCount-1][columnCount-1]+1;while (elementPoint <lastElementPointer) {
sum = sum + elementPointer++; }
No improvement - digging into the assembly code turned out that the complier did the optimization.
Common Sources of Inefficiency
15
Input/output operations If you have a choice of working with a file in memory vs
on disk, in a database, or across a network, use an in-memory structure unless space is critical.
System calls: expensive due to context switch (Saving program states, recovering the kernel states and the reverse) Write own services if part of the function is needed Avoid going to the system Work with the system vendor to make the call faster
Interpreted languages Process each instruction before creating and executing
machine code
16
Input/output operations
Interpreted languages
Are They Functionally Equivalent?
17
for (column=0; column<MAX_COLUMNS; column++) {for (row=0; row<MAX_ROWS; row++) {
table [row][column] = BlankTableElement();}
}
for (row=0; row<MAX_ROWS; row++) {for (column=0; column<MAX_COLUMNS; column++) {
table [row][column] = BlankTableElement();}
}
1. Each element of table is about 4k bytes long2. Pages are usually at least 4K bytes in size
Common Sources of Inefficiency - cont
18
Paging An operation that causes the OS to swap pages of
memory It is much slower than an operation that works on
only one page of memory
Errors Leaving debugging code turned on Forgetting to de-allocate memory Polling non-existent devices until timeout
Paging Example
19
Assume: each row is about the same page size.
for (column=0; column<MAX_COLUMNS; column++) {for (row=0; row<MAX_ROWS; row++) {
table [row][column] = BlankTableElement();}
} // accessing a different row causes a page fault
for (row=0; row<MAX_ROWS; row++) {for (column=0; column<MAX_COLUMNS; column++) {
table [row][column] = BlankTableElement();}
} // 1000 times faster on a machine with limited memory,!
20
If table has too many rows, every time the program accesses a different row, the operating system will have to switch memory pages.
The way the loop is structured, every single array access switches rows, which means that every single array access causes paging to disk.
Relative Costs of Common OperationsOperation Example C++ Java
Baseline (int assignment) i=j 1 1Call private routine with no para. this.foo() 1 0.5Object routine call bar.foo() 2 1Polymorphic routine call abstractBar.foo() 2.5 2Object reference i=obj.num 1 1Integer division i=j/k 5 1.5Floating point sqrt x=sqrt(y) 15 4Floating point ey x = exp(y) 50 20Floating point logarithm x=log(y) 25 20
21
Measurements are sensitive to local machine environment and compiler.Measurements between C++ and Java are not directly comparable.
Measurement
22
Performance aspects can be counterintuitive. Experience from old machine or language does
not help much. It is not worth sacrificing clarity for a
performance gamble, if it is not worth measuring to know that it is more efficient Develop software by using well-designed code that
is easy to understand and modify. Measurements need to be precise.
Code Tuning Process
23
If the performance is poor: Save a working version Measure the system to find hot spots Determine where weak performance is
from: Go back if tuning is not appropriate
Tune the bottleneck identified Measure each improvement at a time If no improvement, revert to the code
saved
Code Tuning Techniques
24
Logic Loops Data Transformations Expressions Object Creation
Logic
25
Stop testing when you know the answer If (5<x) and (x<10) then … // Specification
Use short-circuit op: && vs. & If (5<x) then if (x<10) then … // If the language does
not support short-circuit evaluation. Search whether a negative is present
negativeInputFound = false;for (i=0; i<count; i++) {
if (input [i]<0) {negativeInputFound = true;
}}
26
A better approach? Is to stop scanning as soon as you find a negative
value. Any of these approaches would solve the problem:
Add a break statement after the negativeInputFound = true line.
If your language doesn't have break, emulate a break with a goto that goes to the first statement after the loop.
Change the for loop to a while loop, and check for negativeInputFound as well as for incrementing the loop counter past count.
Change the for loop to a while loop, put a sentinel value in the first array element after the last value entry, and simply check for a negative value in the while test. After the loop terminates, see whether the position of the first found value is in the array or one past the end.
27
Can You Tune This Code?
28
Keyboard input in a word processorSelect inputCharacter
Case “+”, “=“ ProcessMathSymbol(inputCharacter)
Case “0” to “9“ ProcessDigit(inputCharacter)
Case “ ,”, “.”, “:”, “;”, “!”, “?”ProcessPunctuation(inputCharacter)
Case “ ”ProcessSpace(inputCharacter)
Case “A” to “Z”, “a” to “z”ProcessAlpha(inputCharacter)
Case ElseProcessError(inputCharacter)
End Select
Logic - cont
29
Arrange test so that the one that is fastest and most likely to be true is performed first If you know the likely frequency of input
characters, put the most common cases first.Select inputCharacter
Case “A” to “Z”, “a” to “z” ProcessAlpha(inputCharacter)Case “ ” ProcessSpace(inputCharacter)Case “ ,”, “.”, “:”, “;”, “!”, “?”
ProcessPunctuation(inputCharacter)Case “0” to “9“ ProcessDigit(inputCharacter)Case “+”, “=“ ProcessMathSymbol(inputCharacter)Case Else ProcessError(inputCharacter)
End Select
Can You Tune This Code?
30
…if ( ( (‘a’<=inputChar) && (inputChar <=‘z’)) || ( (‘A’<=inputChar) && (inputChar <=‘Z’))) {
charType = CharacterType.Letter;}else if ( (inputChar==‘ ‘) ||(inputChar == ‘,’) ||
(inputChar==‘.‘) || (inputChar==‘!‘) || (inputChar==‘(‘) ||(inputChar==‘)‘) || (inputChar==‘:‘) || (inputChar==‘;‘) ||(inputChar==‘?‘) || (inputChar==‘-‘)) {charType = CharacterType.Punctuation;
} else if ((‘0’<=inputChar) && (inputChar <=‘9’)) {
charType = CharacterType.Digit; }…
Logic - cont
31
Substitute table lookups for complicated expressions (space for time) Computing only once and storing results in a
table. Example1: Character type:
Store the type of each character in an array that’s accessed by the character type code
charType = charTypeTable[inputChar]; Example 2: Integer square root of integers 1..100
Lazy evaluation – avoid doing any work until needed For a table of 5K entries, generate the whole table
at startup time vs the small percentage used
32
Substitute tableSuppose you want to assign a category number to something based on which of three groups— Groups A, B, and C—it falls into:
33
Can You Tune This Code?
34
for (i=0; i<count; i++) {if (sumType == SUMTYPE_NET) {
netSum = netSum+amount[i];}else {
grossSum = grossSum+ amount[i];}
}
Loops -Unswitching
35
if (sumType == SUMTYPE_NET) {for (i=0; i<count; i++) {
netSum = netSum+amount[i];}
}else {
for (i=0; i<count; i++) {grossSum = grossSum+ amount[i];
}}
36
Loops -Unswitching If the decision doesn't change while the loop is executing,
you can unswitch the loop by making the decision outside the loop putting loops inside the conditional rather than putting the
conditional inside the loop
Loops - Minimizing the work inside loops
37
for (i=0; i<rateCount; i++) {netRate[i] = baseRate[i] * rates ->discounts-
>factors->net;}
qualityDiscount = rates ->discounts->factors->net;for (i=0; i<rateCount; i++) {
netRate[i] = baseRate[i] * qualityDiscount;}
Can You Tune This Code
38
found = FALSE;i=0;while ( (!found) && (i<count) ) {
if (item[i] == testValue)found=TRUE;
else i++;
}if (found) {…}
Compound test
Loops - Sentinel values
39
For a loop with a compound test, you can often save time by simplifying the test.
For a search loop, one can use a sentinel value A value that is put just past the end of the search
range and guaranteed to terminate
Loops - Sentinel values: cont
40
// set sentinel value, preserving the original value initialValue = item[count];item[count] = testValue;
i=0;while (item[i] !=testValue) {
i++;} // 3 tests to 1 test
if (i<count) {…}
Time savings: Java (44%), C#(23%) for a 100-elemenmt of integers
Loops - Putting the busiest loop on the inside
41
for (column=0; column<100; column++) {for (row =0; row<5; row++) {
sum = sum + table[row][column]}
} The outer executes much more often than the
inner Each time the loop executes, it has to initialize
the loop index, increment it on each pass through the loop, and check it after each pass
Total number of loop executions: 100 + 100*5 Switching the inner and outer: 5 + 100*5
Loops - Strength Reduction
42
Replace an expensive operation (e.g. multiplication) with a cheaper operation (e.g. addition) for (i=0; i<=saleCount-1;i++){
commission(i) = (i+1) * revenue*basedCommission*discount
}
incrementalCommission = revenue*bassCommission*discountcumulativeCommission = incrementalCommissionfor (i=0;i<=saleCount-1;i++){
commission(i) = cumulativeCommissioncumulativeCommission = cumulativeCommission +
incrementalCommission}
Data Transformations
43
Use integers rather than floating point numbers
Use the fewest array dimensions possible One dimensional representation of an array
for (entry=0; entry<numRows*numColumns; entries++) {matrix [entry] =0;
}
Time savings: C++: 11%; Java: 47%, C#: 9%
Is Tuning Possible?
44
for (discountType =0; discountType <typeCount; discountType++) {for (discountLevel=0;
discountLevel<levelCount; discountLevel++){rate[discountLevel] = rate[discountLevel] *
discount[discountType];}
}
Data Transformations - cont
45
The reference to discount[discountType] doesn't change when discountLevel changes in the inner loop.
Minimize array referencesfor (discountType =0; discountType <typeCount;
discountType++) {for (discountLevel=0; discountLevel<levelCount;
discountLevel++){rate[discountLevel] = rate[discountLevel] *
discount[discountType];}
}thisDiscount = discount[discountType];for (discountLevel=0; discountLevel<levelCount;
discountLevel++){rate[discountLevel] = rate[discountLevel] *thisDiscount;
}
Expressions
46
Use strength reduction Replace multiplication with addition Replace exponentiation with multiplication Replace trigonometric routines with their
trigonometric identities Replace longlong integers with longs and ints Replace floating point with fixed-point numbers Replace double precision with single precision Replace integer multiplication-by-two and division-
by-two with shift operations
47
Is Tuning Possible? Ax2 + Bx + C. The letters A, B, and C are coefficients, and x is a
variable. Write code to evaluate an nth-order polynomial
48
Evaluate a polynomial – cont’
49
Evaluate a polynomial – cont’ (Ax + x)x + C
Is Tuning Necessary?
50
unsigned int log2(unsigned int x) {return (unsigned int) (log(x)/log(2));
}
Expressions - cont
51
Initialize at compile time or precompute
const double LOG2 = 0.69314718;…unsigned int log2(unsigned int x) {
return (unsigned int) (log(x)/LOG2);}
Any other ways to tune the code?
Expressions - cont
52
Be wary of system routines log2 returned an integer value but used a
floating point log() routine to compute it
unsigned int log2(unsigned int x) {if (x<2) return 0;if (x<4) return 1;if (x<8) return 2;if (x<16) return 3;…if (x<2147483648) return 30;return 31;}
unsigned int log2(unsigned int x) { unsigned int i=0; while ((x=(x>>1))!=0) { i++; } return i;}// hard to understand; should avoid // unless you have a good reason
Is Tuning Possible?
53
payment = loanAmount / ((1.0-Math.pow(1.0+(interestRate/12.0), -
months))/(interestRate/12.0)
);
Expressions - cont
54
Eliminate common subexpressionspayment = loanAmount / (
(1.0-Math.pow(1.0+(interestRate/12.0), -months))/
(interestRate/12.0)); monthlyInterest = interestRate/12.0;
Can Performance Be Improved?
55
Draw 1000 circles with 6 different colors. Version 1: Circle.java, Test.java
for (int i=0; i < NUMBER_OF_CIRCLES; ++i) { Circle circle = new Circle(getRandomColor()); circle.draw(g, getRandomX(), getRandomY(),
getRandomR());//1000 object created.
} Improvement with Flyweight
Version 2: Cricle.java, CircleFactory.java, Test.java From Java Design Patterns at a Glance
http://www.javacamp.org/designPattern/
56
Flyweight Intent: Use sharing to support large numbers of fine-grained objects
efficiently The FlyweightPattern describes how to support a large number of fine
grained objects efficiently, by sharing commonalities in state. For example, when designing a word processor application, you might
create an object for each character typed. Each Character object might contain information such as the font face, size
and weight of each character. The problem here is that a lengthy document might contain tens of
thousands of characters, and objects - quite a memory killer! The Flyweight pattern addresses the problem by
creating a new object to store such information, which is shared by all characters with the same formatting.
So, if I had a ten-thousand word document, with 800 characters in Bold Times-New-Roman, these 800 characters would contain a reference to a flyweight object that stores their common formatting information.
The key here is that you only store the information once, so memory consumption is greatly reduced. -- TobinHarris
57
Flyweight Structure Summary
58
59
CircleFactory.javaclass CircleFactory { //store color private static final HashMap circleByColor = new HashMap();
public static Circle getCircle(Color color) { Circle circle = (Circle)circleByColor.get(color);
if(circle == null) { circle = new Circle(color); circleByColor.put(color, circle); System.out.println("Creating " + color + " circle"); //see how many objects we create on command line } return circle; }}
Object Creation and Flyweight Pattern
60
Make instances of classes on the fly to improve performance efficiently
Show a file system with directories No need to load all the files or directories at one loading
time. Show the upper level folders first. If the user clicks a folder, then load its subdirectories and
files. Java String Class is designed with Flyweight
String s1 = "hello"; String s2 = "hello"; //store in a string pool. String s3 = new String("hello"); System.out.println(s1==s2); //true, share the memory
address System.out.println(s1==s3); //false
More about Code Tuning
61
Other techniques Recoding in a low-level language …
Code tuning is a controversial, emotional topic Apply them with care if you decide to use