CS 213 Fall 1998 A gentle introduction to algorithms Further reading: Introduction to Algorithms by Cormen, Leiserson and Rivest. (MIT press, 1990)

CS 213

Fall 1998

A gentle introduction

to algorithms

Further reading:

Introduction to Algorithms by Cormen, Leiserson and Rivest.

(MIT press, 1990)

• Very often in life, one finds oneself forced to examine different data structures and their associated algorithms, and choose between them.

• For example, imagine your collection of CD’s at home. They will always be in some order (order of preference, order of purchase, alphabetical order), but the order that you choose will affect how the CD’s are used.

• For example, if you choose order of preference, the great new Dave Matthews CD may be inserted into the beginning of the list, forcing you to spend hours moving every CD over by one. Also, new tastes may force you to re-order your entire collection every couple of months.

• But if you order the CD’s by time of purchase, new acquisitions won’t make you re-order your collection, right? Well, but now finding your favorite CD’s will be more difficult; you may have to look through the entire set before you find what you’re looking for.

• If, you decide to order them alphabetically, it’s the opposite trade-off: it’s easier to locate individual CD’s now, but new purchases will still require a reorg.

• Just like different choices of sort orders will affect the “run-time” performance of your CD collection, different choices of data structures and algorithms will affect the performance of code that you write.

• To discuss algorithms, notation and definitions are needed.

• An algorithm is a computational procedure that takes a set of values as input and returns a set of values as output (of course, a set may only have one element, or none).

• The running time of an algorithm is the time it takes to “transform” the input into the output, expressed in terms of the size of the input.

• For example, the running time of a square matrix addition could be expressed as cn2 + d , where c and d are constants and n is the number of rows or columns in each matrix.

• There are two different ways to express running time: worst case and average case.

• The worst case running time of an algorithm is an upper bound on the running time. For any input into the algorithm, its running time will be no worse than this upper bound.

• The average case (or expected) running time of an algorithm is the running time of the algorithm when given the average input. The average case can be thought of in statistical or “random” terms, although sometimes it’s not that easy to determine what the “average” input actually is.

• For example, if we’re inserting elements into a sorted array, the worst case running time is of course cn + d, where n is the number of elements already inserted into the array.

• However, the average case running time is somewhat less, because the “average” insertion will end up somewhere near the middle of the array. So the running time of the average case is cn/2 + d.

• One can also define the notion of best case running time, but this usually isn’t a useful consideration unless the best case is also the expected case.

• Up until now, algorithm running times have been expressed as polynomials with constant values. However, these polynomials can pointlessly complex, with many insignificant terms. A more useful way of expressing running time is in terms of its order of growth.

• So, for example, if the size of the input (n) is very large, the polynomial an2 + bn + c is dominated by the n2 factor. The constant a can be ignored because it too will eventually be dwarfed by a sufficiently large n.

• Therefore, we say that an algorithm with a worst case running time expressed as the polynomial an2 + bn + c has a worst case running time of “order n squared” or O(n2).

• So what does O(f(n)) mean?

• The formal definition is the following: for a given function f(n), we denote by O(f(n)) the set of functionsO(f(n)) = { g(n) : there exist positive constants c and n0

such that 0 <= g(n) <= cf(n) for all n >= n0 }.

(CLR, p. 26)

• What this means in English is that a function g(n) is O(f(n)) if after a given input size (n0), g(n) is less than or equal to cf(n) for some constant c.

• In other words, g(n) is O(f(n)) if g’s rate of growth is no worse than f’s. So n is O(n), and also O(n2).

• O-notation is generally considered to be the standard notation for describing algorithm running times. However, there are other notations (little o, omega, theta) that are also useful. CS 410 discusses them at length. We won’t.

• Most commonly used algorithms tend to possess one of the following runtimes:

O(1) ExcellentO(lg n)Very good (the log uses base 2 by default)O(n) Pretty goodO(n lg n) DecentO(n2) SlowO(n3) Very slow

• Of course, the “rating” of an algorithm’s running time depends on what the algorithm is doing. For example, (correct) matrix multiplication in O(n lg n) is so good that it’s impossible! The best matrix multiplication algorithms are between O(n2) and O(n3).

• General case sorting algorithms, on the other hand, are usually O(n lg n). There are sorting algorithms that are O(n), but they achieve this by making some type of assumption about the nature of the input.

• For example, a sorting algorithm called counting sort assumes that each of the elements in the array to be sorted is an integer between 1 and k, for some integer k.

• The classic sorting algorithm is called Quicksort. It uses a recursive “divide and conquer” scheme to provide an expected case running time of O(n lg n) with small constants. Its worst case running time is however O(n2).

• Quicksort works by choosing a “pivot” element and partitioning the array into two pieces: one containing the elements that are larger than the pivot element and one containing the elements that are smaller.

• Repeated recursive calls to Quicksort on the two pieces finally yield a sorted array. Each call in turn chooses a new pivot element and partitions the sub-array again.

template <class T> void Swap (T* pSortBy, int iLeft, int iRight) {

T tTempSortBy

tTempSortBy = pSortBy[iLeft];

pSortBy[iLeft] = pSortBy[iRight];

pSortBy[iRight] = tTempSortBy;

}

template <class T> void QuickSort (T* pSortBy, int iLeft, int iRight) {

int i, iLast

if (iLeft >= iRight) return

QSort<T> (pSortBy, iLeft, (iLeft + iRight) / 2);

iLast = iLeft;

for (i = iLeft + 1; i <= iRight; i ++)

if (pSortBy[i] > pSortBy[iLeft])

Swap<T> (pSortBy, ++ iLast, i);

Swap<T> (pSortBy, iLeft, iLast);

QSort<T> (pSortBy, iLeft, iLast - 1);

QSort<T> (pSortBy, iLast + 1, iRight);

}

• Quicksort is a frequently used sorting algorithm because it has good average case performance and it sorts arrays in place. This means that it is fast and needs no “scratch” memory outside of the original array.

• In real life, however, recursive implementations of Quicksort can consume significant amounts of memory because of the accumulation of stacks. Performance also tends to be poor, because functions calls require much overhead due to saving and re-loading registers, pipeline stalls caused by conditional branching, etc.

• Iterative implementations of Quicksort will run faster and consume less memory.

• As a simple algorithmic example, we’ll consider the problem of maintaining a sorted set of integers. The first implementation we’ll consider is a simple array or vector.

• Insertion of an integer into a sorted array is an O(n) operation, both worst case and expected case. That’s because any insertion will scan through the entire array, find the appropriate location and cause the rest of the elements in the array to be shifted over by one. Deletion is the same: O(n).

• Deletion includes a scanning phase and a compaction phase.

• The search case is more interesting: the naive approach is to scan the array from left to right until the element is found. This is an O(n) operation.

• However, a more intelligent approach will give us a algorithm with O(lg n) running time.

• It’s called binary search, and its clearest implementation is recursive. What it does is take advantage of the fact that the array is sorted. It repeatedly divides the current “search region” of the array into two pieces, one of which contains the desired value. It then recursively calls itself on the new, smaller region. This implies log base 2 of n operations if there are n elements: hence O (lg n).

int BinarySearch (int piArray[X], int iSearch, int iLeft, int iRight) {

if (iLeft == iRight) {

return (piArray[iLeft] == iSearch) ? iLeft : -1;

}

int iMiddle = (iRight + iLeft) / 2;

if (piArray [iMiddle] == iSearch) {

return iMiddle;

}

if (piArray [iMiddle] > iSearch) {

return BinarySearch (piArray, iSearch, iLeft, iMiddle - 1);

} else {

return BinarySearch (piArray, iSearch, iMiddle + 1, iRight);

}

}

• But insertion and deletion is still O(n). There’s really no way to improve upon this if we use linear data structures, such as vectors or linked lists. To optimize insertion and deletion, we’ll need to use a different data structure.

• We consider trees next.

• Trees are data structures that are organized into a set of nodes and edges. Ordered trees are trees that have a “root” node and distinguish between “ancestors” and “descendents” judging by their proximity to the root node. The nodes of ordered trees also enumerate their “children” and assign ordinal values to them.

• Binary trees are a special case of ordered trees.

• A binary tree is a structure that either- contains no nodes, or- is comprised of three disjoint sets of nodes: a root node, a binary tree called its left subtree, and a binary tree called its right subtree. (CLR, p. 94)

• Each node has a left node and a right node.

• The nodes at the bottom of the tree are called “leaf” nodes. So computer science trees are upside down when compared to “real” trees.

• If we use a binary tree to store our set of sorted integers, we can ensure that they remain sorted by pushing them down the tree using the following criteria: if the value being inserted is greater than the current value, insert into the right subtree; otherwise, insert into the right subtree.

• When the insertion hits the bottom of the tree, a new node is created for it.

• The expected insertion time for binary trees is O (lg n). However, the worst case insertion time is O(n), because if the input is inserted in sorted order, the binary tree essentially becomes a linked list.

• Deletions and searches in binary trees are also O(lg n) expected case, but O(n) worst case.

• The problem with binary trees is the issue of balance. A non-random input will unbalance the tree and hurt the performance of its operations.

• There exists a class of tree called “balanced trees” that handle this problem adequately: red/black trees, 234 trees, B trees, B+ trees, R trees, etc. All of these trees use rather complicated algorithms to balance loads between the sub-trees of each node. However, the price paid for this feature is dual: increased complexity and larger algorithm running time constants.

• Trees are very important data structures in every field of software development.

• File systems use them extensively because they provide good performance and scalability when loaded with large numbers of files and directories. Databases also make extensive use of trees to provide functionality like fast indexing on specific data fields from huge data sets. Trees are also used to provide priority queues, which are important for problems such as scheduling.

• In other words, O(lg n) is an enormous gain over O(n) when there is a lot of data to be scanned or inserted into.

• Hash-tables are data structures that allow very fast lookups of data given the data’s key. They use special algorithms called “hash functions” to produce an index from the given key.

• The index is used to look up the data in a large array of “buckets,” which (in the most common hash-table implementation: chaining) contains a linked list of elements, one of which is a node with the desired data.

• In order for a hash-table to be efficient, its hash function must be capable of distributing keys (with their associated data) over the range of the array of buckets. A bad hash function will lead to a slow hash table.

• Insertion into a hash-table with chaining is always O(1), assuming that the hash function is O(1).

• The worst-case running time for hash-table deletions and searches is of course O(n). If the data distribution is bad enough that each key hashes to the same bucket, then a hash-table is essentially a linked list.

• However, the expected running time for deletions and searches is O(1). A reasonably loaded hash-table can be expected to have O(1) elements in each bucket, so every search (a deletion is just a search and a linked list deletion) will be O(1).

• Hash tables are very useful data structures because of their excellent expected running times for common operations.

• Hash-tables, like Quicksort, are a good illustration of the fact that data structures and algorithms with bad worst-case running times aren’t necessarily always slow. The worst case may not happen very often, or steps can be taken to prevent it from becoming likely.

• Hash-tables are commonly used in databases and in some operating system functions. They are very useful for lookups using string or an integer keys. However, they provide no sorting properties outside of their hash functions, and are thus unfit for storing sorted data sets.

Documents

CS 213 Fall 1998 A gentle introduction to algorithms Further reading: Introduction to Algorithms by Cormen, Leiserson and Rivest. (MIT press, 1990)