Upload
blackrabbitcoder
View
590
Download
0
Tags:
Embed Size (px)
Citation preview
The “Evils” of OptimizationOr: Performance Anxiety Can Cause Premature Optimization
James Hare – Application Architect, Scottrade
What is Optimization?
• Definition from freedictionary.com:– “The procedure or procedures used to make a
system or design as effective or functional as possible, especially the mathematical techniques involved”
• This is a general term applied to the optimization of any process or system.
• What does it mean for us?
Ada Lovelace
“In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selection amongst them for the purposes of a Calculating Engine. One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation.”
- Ada Byron’s Notes on Charles Babbage's Analytical Engine, 1842
Software Optimization
• Software can be optimized in several areas:– In the design of system
• Choice of appropriate techniques and structures.
– In the code that implements the system• Choice of logic that implements a feature.
– In the compiler that builds the system• Improvements the compiler bakes into assemblies.
– In the runtime that executes the system• Choices the CLR/JIT can make in executing assemblies.
So Optimization is Good, Right?
• Well, yes and no…– Yes:
• Time optimizing design is nearly always well spent.• Optimizing a known bottleneck will increase speed.• Compiler and CLR optimizations are already there and
do not impact readability or maintainability.
– No:• “Slower” is nearly always better than wrong.• Optimizing code before you have a measured need can
quickly become an anti-pattern.
William Wulf
“More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason – including blind stupidity.”
– "A Case Against the GOTO," Proceedings of the 25th National ACM Conference, August 1972, pp. 791-97.
Is an Anti-Pattern a Design Pattern?
• No, most of us are familiar with the term Design Pattern:– A general, reusable solution to a commonly
occurring problem in software design.– Not a finished design, but a general description or
template for how to solve a problem.– Became very popular as a concept after the book
Design Patterns: Elements of Reusable Object-Oriented Software released in 1994 by the “Gang of Four”.
Okay, So What’s an Anti-Pattern?
• It is the antithesis of a Design Pattern.• It is a pattern that tends to be commonly
used, but which usually turns out to be ineffective and/or counterproductive.
• Term was coined by Andrew Koenig in 1995 in his article Patterns and Antipatterns.
• Made popular in 1999 with the book AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis.
How Do Anti-Patterns Relate to Optimization?
• Anti-Patterns cover a wide range from:– Organizational (Analysis Paralysis, etc.)– Project Management (Death March, etc.)– Analysis (Bystander Apathy)– Software Design (Big Ball of Mud, etc.)– Object-Oriented Design (God Object, etc.)– Programming (Spaghetti Coder, etc.)– Configuration Management (Dependency Hell, etc.)– Methodological (Premature-Optimization, etc.)
Donald Knuth
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified”
– “Structured Programming with Goto Statements”, Computing Surveys 6:4 (1974), 261–301.
Premature Optimization?
• Performance anxiety disproportionately affects the design of a piece of code.
• Oftentimes in premature optimization, the developer has no measurable data to show the code needs optimization.
• This can result in code optimizations that have no net impact on overall performance.
• Many times this can happen by misapplied optimization “rules-of-thumb.”
Premature Optimization Can Introduce Bugs
• Code may become overly complicated since programmer is distracted by the perceived need to optimize.
• Bugs may be immediately introduced during an incorrect optimization.
• Since code becomes less maintainable, future enhancements are more likely to cause bugs.
• Remember: A slightly slower program is better than an erroneous program.
Premature Optimization CanBe a Waste of Time
• Remember Knuth: Ignore small efficiency gains 97% of the time, may pay attention to the 3% after they have been identified.
• Optimizing code that is not a bottleneck is almost always a waste of time.
• Slows the entire process for little benefit, or worse yet may add maintenance costs or bugs.
• Correct code delivered faster is often better than fastest incorrect code delivered slowly!
Rob Pike
“Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.”
– “Notes on Programming in C”, Feburary 21, 1989
So Is Optimization Always Bad?
• Not at all, only optimizing for the sake of optimizing is bad.
• Good places for optimization are:– Early in system and component design.– When you find measurable bottlenecks.– In those rare, but necessary, “real-time” systems.– When your group has “standard” optimizations.
Best Place to Catch Bottlenecks?
source: http://en.wikipedia.org/wiki/Waterfall_model
Design Optimizations• Most system bottlenecks are from bad design.• When designing systems:
– Use multithreading correctly when appropriate.– Use best algorithms/collections for the problem.– Avoid designing a single component/process that
will become a bottleneck to the whole system.– Choose the best communication protocol and
methodology for the problem.– Prefer to design web methods to be chunky.– Cache infrequently changing remote data.
Design: Multi-Threading Wisely
• Use the .NET 4.0 Concurrent Collections wherever appropriate.
• Strongly consider the .NET 4.0 TPL over traditional multi-threading.
• Use locking judiciously, keep locks as small in scope as possible and avoid holding multiple locks at same time.
• Avoid serial thread processing (handing off from bucket to bucket to bucket…)
Design: Use Best Algorithms
• Favor keeping algorithmic complexity low:Size Constant
O(1)Log
O(log n)LinearO(n)
Linear-LogO(n log n)
QuadraticO(n2)
CubicO(n3)
1 1 1 1 1 1 1
2 1 1 2 2 4 8
4 1 2 4 8 16 64
8 1 3 8 24 64 512
16 1 4 16 64 256 4096
1024 1 10 1024 10,240 106 109
1,048,576 1 20 1,048,576 20,971,520 1012 1018
Design: Use Best Algorithms
• Know and use your LINQ algorithms:– Finds: Any(), First(), Find(), etc.– Queries: Select(), Where(), etc.– Grouping: GroupBy()– Sorting: OrderBy()– Etc.
• Already written, optimized, and unit tested.• Can make code much easier to read and
maintain.
Design: Use Best Algorithms
Design: Use Best Collection
• Know each collection’s strengths and weaknesses.
• In general choose based on:– Ordering: Do you need to maintain order?– Lookup: Is fast lookup the ultimate goal?– Insert/Deletes: Where are inserts/deletes
performed? Are they needed?– Synchronization: Do you need multi-threaded
access of a mutable collection?
Collection Ordered Sequential Direct Access Lookup Insert Notes
Dictionary No Yes Via Key Key:O(1)
O(1) Best for high performance lookups.
LinkedList No No No Value:O(n)
O(1) Best for lists where inserting/deleting in middle is common and no direct access required.
List No Yes Via Index Index: O(1)Value: O(n)
O(n)* Best for smaller lists where direct access required and no ordering.
Queue No Yes Only Front Front: O(1)
O(1)* Essentially same as List<T> except only process as FIFO
SortedDictionary Yes No Via Key Key:O(log n)
O(log n) Compromise of Dictionary speed and ordering, uses binary search tree.
SortedList Yes Yes Via Key Key:O(log n)
O(n) Very similar to SortedDictionary, except tree is implemented in an array, so has faster lookup on preloaded data, but slower loads.
Stack No Yes Only Top Top: O(1) O(1)* Essentially same as List<T> except only process as LIFO
Design: Avoid Bottlenecks
• If one process or component in a system has to serialize and process all items slowly, it doesn’t matter how parallel the rest of your system is.
Design: Use Best Communication
• Know best method for the problem:– Do you need broadcast communication or not
care if some packets get dropped? Consider UDP.– Do you need connection-oriented reliable
synchronous communication? Consider TCP.– Do you need to be able to process messages
asynchronously? Consider message queues.– Do you need to be able to make cross-platform
calls? Consider web methods or message queues.
Design: Web Methods
• Good OO design ≠ good distributed design.• Prefer “chunky” to “chatty” web methods as
they will have less network overhead.– “chunky”: one method returns all data needed.– “chatty”: many small methods that return pieces.
• Do not return DataSet!– Very, very, very large!– Return array or List<T> of custom type instead.
Design: Caching
• When remote data never changes, cache it!– Can use unsynchronized Dictionary.– Much faster than network lookup.
• When remote data changes infrequently, cache with expiration or refresh.– Consider ConcurrentDictionary.– In ASP.NET use Application and Session caches.
• Consider distributed cache when appropriate.– AppFabric (Velocity), Coherence, etc.
Measurable Bottlenecks
• Most bottlenecks can be avoided in design.• However, some bottlenecks only become
apparent after testing or possibly even later in production after user or data size growth.
• Measure, Improve, and Re-Measure:– Don’t guess at location or cause ever!– Profile the code to find the bottleneck.– Solve the revealed bottleneck and document.– Re-profile to make sure code is streamlined.
Bottleneck in Lock Contention
After Removing Unnecessary Lock
“Real-time” Systems
• Obviously, there are some rare cases where performance is paramount:– Flight control systems.– Algorithmic trading.– Gaming.
• In these cases, you can always code with an eye towards performance.
• The decisions on how to optimize, though, should be well known and uniform for group.
“Standard” Optimizations
• These are a double-edged sword because they can often be misapplied or incorrectly understood.
• Framework changes may alter the efficiency of an operation/method to where net gain is nil.
• Any such optimizations should be decided on as a group and standardized.
• In general, these should be avoided except for Microsoft recommended best practices.
Microsoft Performance Recommendations
• Throw fewer exceptions
• Make chunky calls• Use value types for
small immutable data• Use AddRange() when
possible• Trim your Working Set
• Use for for string iteration (careful!)
• Use StringBuilder for complex String manipulation
• Use jagged arrays vs. rectangular arrays
• Etc.
“Standard” Misuse
• Example: Someone hears that StringBuilder is more efficient than concatenation and sacrifices readability for “performance”:
“Standard” Misuse• String concatenation (+) is faster for single-
concatenations with two or more arguments.– Knows size of arguments in advance and computes
correct buffer size.– StringBuilder uses a default buffer size and then
re-allocates if needs more space.• String concatenation of literals is done at
compile time and has no impact.• Don’t apply “standard” rules without fully
knowing the ramifications.
Summary
• Optimizing design and known bottlenecks is nearly always a beneficial activity.
• Sacrificing maintainability for unmeasured performance gain is problematic:– Code that is harder to maintain is more likely to
have initial bugs or bugs in modification.– Most of the time micro-optimizations have no
affect on overall system performance and just creates a time sink.
Michael A. Jackson
“There are two rules for when to optimize: 1. Don't do it. 2. (For experts only) Don't do it yet."
– “Principles of Program Design”, Academic Press, London and New York, 1975.
Questions?
• Blog:– http://www.geekswithblogs.net/BlackRabbitCoder
• Email:– [email protected]
• Twitter:– http://twitter.com/BlkRabbitCoder