91.102 - Computing II Sorting. How can we sort efficiently? You are in charge of the 1890 census and you just took delivery of one of the new “card sorters”

91.102 - Computing II

Sorting.

How can we sort efficiently?

You are in charge of the 1890 census and you just took delivery of one of the new “card sorters” that Mr. Hollerith got you to buy...

If you have a “card sorter”, how do you sort?

WHAT is a card sorter?

A machine that takes a deck of cards and has 26+ bins (one for each letter of the alphabet + blank + digits) to stuff the cards in.

How does it do it?

1) Put the deck in at one end. The last character of the “key” determines the bin the card ends up in. At the end, you have 26+ small decks, each with the same last symbol in the “key”.

2) Collect up the little decks, being careful not to mess the order, and pass the total deck through the sorter again, sorting on the “next-to-last” symbol.

3) Collect up the little decks, and repeat the process until you have sorted by the first symbol of the key.

4) Done…


With n items and a key m characters long, you have just carried out m*n “comparisons”. If n gets larger, m will remain the same as long as you don’t change the length of the key.

Not bad…O(m*n) ~ O(n), since m is constant...


This method is called “Radix Sorting” and is quite efficient (O(n)) - as long as you know some kinds of information about the data: in this case, the maximum length of the key, and the range of each key element.

If you don’t know much, what can you do? How efficient can you hope to be? O(n) for a set of n items? Worse?

Since you can't find the right place for something unless you have looked at it, you certainly won't expect anything better than O(n)...



Sorting Themes

Comparison Based Address Calculation

TranspositionSorting

Insert andKeep Sorted

Priority QueueSorting

Divide andConquer

DiminishingIncrement

Sorting

BubbleSort

InsertionSort

TreeSort

SelectionSort

HeapSort

QuickSort

MergeSort

ShellSort

ProxMapSort

RadixSort

Lower Bound on the Cost of Sorting by Comparisons.

In how many ways can n items be presented?

There are n ways in which you can choose the first item, n - 1 ways for the second, n - 2 for the third, etc…

Total number of different ways:

n*(n - 1)*(n - 2)*…*3*2*1 = n!

There are n! ways in which a sorted set could be scrambled - so there are n! possible ways in which our set could arrive to us, to be sorted.


Comparisons are “binary” operations: something is or is not less than something else (we need this, and not just “equal” or “not equal” - which is also a binary operation). If we think of a comparison as providing us with the decision of “going left” or “going right” in a binary three, we can think of sorting as choosing the path from the root (the incoming set) to the leaf (the sorted set) in a binary tree.

How many levels must a binary three with n! leaves have (i.e., how many comparisons - at least - must we be expected to carry out? - one comparison per level). The answer is: log2(n!) - remember that each level going UP the tree has half the number of nodes as the lower level.


It is not too hard to prove that:

log2(n!) ~ n log2(n)

This can be done via a formula called Stirling’s Approximation, usually proven in a Calculus II course.

What we have just shown is that “sorting by comparison” is, at best, an O(n log(n)) process… more precisely an (n log(n)) process. No O(n) for us down this path.

Unfortunately, we have NOT YET provided any O(n log(n)) algorithm to sort… Let’s go find us some...


There are a number of methods we can develop. Let’s start with some based on “priority queues”.

We will examine, primarily, methods that work when the whole data set can be kept in memory and for which the amount of extra space needed is small. Since one “desirable” property of a sorted set is that of quick search, and the fastest reliable search method we know is “binary search” we will expect that our sorted set will be kept in an array (hashing anyone?).

We will start with the set (sorted or unsorted - we just have no way of knowing) in an array, say from 1 to n.

5 6 3 2 7 1 9 8 4 0

Sort it in ascending order.


5 6 3 2 7 1 9 8 4 0

We are going to divide this array into TWO parts: one will be the “priority queue” part, and the other will be the “sorted” part.

To begin with, the “priority queue” part must go from index 1 to index n, while the “sorted” part will be empty.

5 6 3 2 7 1 9 8 4 0

Priority Queue


1 2 3 4 5 6 7 8 9 10Index

Value

ExtractMax(PQ) = 9, or: Select the Largest (using, for example, a function from Selection Sort), swap it with the LAST entry and reduce the size of PQ by 1:

5 6 3 2 7 1 0 8 4 9

Priority Queue


1 2 3 4 5 6 7 8 9 10Index

Value

5 6 3 2 7 1 9 8 4 01 2 3 4 5 6 7 8 9 10Index

ValueOriginal

New

5 6 3 2 7 1 0 8 4 9

Priority Queue

The maximum extracted (9) in the free array position:

Sorted

ExtractMax = 8; Select the largest again and swap with last. Reduce the Queue by 1.

5 6 3 2 7 1 0 4 8 9

Priority Queue Sorted

Repeating the process until the priority queue is empty, leaves us with a sorted array occupying the same space as the original one.


1 2 3 4 5 6 7 8 9 10

Cost?

Each ExtractMax looks at all the remaining elements of the PQ: n on the first pass, n - 1 on the second, etc, plus it has to perform a swap and some housekeeping. The cost is then, (n - 1) + … + 2 + 1 comparisons to find all the largest elements and (n - 1) swaps. Adding them, we get n(n - 1)/2 from the comparisons and n - 1 from the swaps - O(n2) in total… This is not too good, since it means that doubling the number of items to be sorted will require quadrupling the time it takes to sort them… 1,000 times as many items, 1,000,000 times as long… bummer…

Can we do better? Well, there is another way of managing a Priority Queue - a Heap… that might be better.


The requirement is that we first make a Heap out of the array - how cheap is that?

5 6 3 2 7 1 9 8 4 0

Think of this as being a complete binary tree. Start at n/2, and look at n, n+1 (the children of n/2 in the heap). If the item at n/2 is smaller than either of its children, swap it with the larger of the two.

Since 7 > 0 and array position 11 doesn’t exist, leave everything as it is.


1 2 3 4 5 6 7 8 9 10

Move up to n/2 - 1. This is position 4 with children at positions 8 and 9. Since 8 > 2, and 8 > 4, swap the 2 and the 8 to get:


5 6 9 8 7 1 3 2 4 01 2 3 4 5 6 7 8 9 10

Since index 8 has no children, stop. Move up one more position: 3. Children at 6 and 7. 9 > 3, and 9 > 1. Leave everything as is.

5 6 9 8 7 1 3 2 4 01 2 3 4 5 6 7 8 9 10

5 6 3 2 7 1 9 8 4 01 2 3 4 5 6 7 8 9 10

Now to position 2, with children at 4 and 5: 8 > 6, and 8 > 7. Swap 6 and 8:

5 8 9 6 7 1 3 2 4 0

We have to check that the item swapped into position 4 satisfies the heap property with respect to its children at 8 and 9: since 6 > 2 and 6 > 4, everything is OK.

To position 1. 5 < 8, 5 < 9. Swap 5 and 9.


5 6 9 8 7 1 3 2 4 01 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

9 8 5 6 7 1 3 2 4 0

Notice 5 is larger than either of its children - at 6 and 7. Stop. The next index on the left would be 0: we have a heap.


1 2 3 4 5 6 7 8 9 10

What did it cost? We started from the middle, so we repeated the process only floor(n/2) times. Each time we compared an item with its immediate descendants, and then (maybe) with a next pair of descendants, and so on, until we had no more descendants. Since we are looking at a complete binary tree, the number of descendants of each item we look at will never be more than 2*log2(n) (two for the root, two for the chosen child, two for the chosen grandchild, up to a maximum of log2(n) generations of descendants), so the worst case for constructing a heap is O(n log2(n)) - one can actually show that it is better than that, but we don’t need to.


We know that ExtractMax will have cost O(log2(n)) - since all we need is to rebuild a heap. Let’s take a look:


9 8 5 6 7 1 3 2 4 01 2 3 4 5 6 7 8 9 10

ExtractMax = 9; the 0 in position 10 is moved to position 1 and “percolates” to its correct position:

0 8 5 6 7 1 3 2 4

8 0 5 6 7 1 3 2 4

8 7 5 6 0 1 3 2 4Heap Free

8 7 5 6 0 1 3 2 4 9Heap Sorted

Repeat: ExtractMax = 8;

4 7 5 6 0 1 3 2 9To Heap Sorted

7 4 5 6 0 1 3 2 9

7 6 5 4 0 1 3 2 9


1 2 3 4 5 6 7 8 9 10

7 6 5 4 0 1 3 2 8 9

Heap Sorted

ExtractMax = 7; etc… Continue until done…

What is the cost of this part? We must extract n elements, and reconstructing the heap each time costs

O(log2(size_of_heap)) ≤ O(log2(n)).


1 2 3 4 5 6 7 8 9 10

Total cost:

O(n log2(n)) (to make the heap)

+ O(n log2(n)) (to sort with the Priority Queue method)

= O(n log2(n)).

Looks better, but more complicated (law of the no free lunch…).


What other methods can we cook up? If we can split the array into two roughly equal parts, we might be able to sort both parts separately and “glue” the results together. A requirement would be that the “glue” be cheap: little or no extra work… Can we do it?

5 6 3 2 7 1 9 8 4 0


1 2 3 4 5 6 7 8 9 10

This (just splitting) won't quite do it - but maybe a variant will… after all, if you split in half enough times, you get subarrays with just one element, and ALL sets with just one element are SORTED. The question becomes: do we do the work before we split or after?

Let's try doing the work before:

Move all the “small” elements into the “bottom half” and the large ones into the “top half” :

1 4 3 2 0 5 9 8 6 7

Notice that we have “just done it” - no algorithm given - without requiring that anything be sorted. Repeat the process with the left half and with the right half:

1 2 3 4 5 6 7 8 9 10


1 0 3 2 4 5 6 8 9 7

0 1 2 3 4 5 6 7 9 8

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9How many times did we divide the set? O(log2(n)), since we halved the size of the set on each pass. Unfortunately we did something else: move all the small items into the left half and all the large ones into the right half. This will cost O(n) on each pass...


1 2 3 4 5 6 7 8 9 10

Total Cost O(n log2(n)) - not bad, EXCEPT: we have no algorithm to decide on how to move the small items left and the large ones right. What is “small”? We don’t know what is coming in, and we would like to avoid a pass through the array just to find out - and it may not be too useful anyway.

Small: pick an “arbitrary” element of the array: small is everything smaller than it, large is everything larger than it. Split the array into two parts based on this arbitrary choice. Repeat as needed, until you have arrays of 1 element - which are sorted by definition.


Arbitrary: element in first position (any fixed, or random choice will do: the text chooses the MIDDLE element of the array).

There is NO BEST CHOICE… You may want to choose a random index position at each choice point: the problem is that you may use more computational power to look for a good pivot than you save by the choice...


5 6 3 2 7 1 9 8 4 0

Pivot

Start incrementing indices from the left (i) as long as you have elements LESS than the pivot (you stop immediately here).

Start decrementing indices from the right (j) for as long as you have elements GREATER THAN the pivot (you stop immediately here too).

If i ≤ j swap:


1 2 3 4 5 6 7 8 9 10

5

0 6 3 2 7 1 9 8 4 5

Increment i, decrement j, and go back to the top. Notice that i is 2 and j is 9: the inequalities are again right for a swap.

i j

1 2 3 4 5 6 7 8 9 10


0 6 3 2 7 1 9 8 4 5

i j

1 2 3 4 5 6 7 8 9 10

5

5

0 4 3 2 7 1 9 8 6 5

i j

0 4 3 2 7 1 9 8 6 5

i j

0 4 3 2 1 7 9 8 6 5

j iStop...

0 4 3 2 1 7 9 8 6 5

Repeat on each of the subarrays...


1 2 3 4 5 6 7 8 9 10

5


How about some code?

void QuickSort(SortingArray A, int m, int n)

// to sort the subarray{ // A[m:n] of array A into ascending order int i, j;

if (m < n) { i = m; j = n; // Initially i and j point to the // first and last items Partition(A,&i,&j);// partitions A[m:n] into A[m:j] // and A[i:n] QuickSort(A,m,j); QuickSort(A,i,n); }}


void Partition(SortingArray A, int *i, int *j){ KeyType Pivot, Temp;

Pivot = A[ ( *i + *j ) / 2 ] ;// middle key as pivot do { while (A[*i] < Pivot) (*i)++; // Find leftmost i such that A[*i] >= Pivot. while (A[*j] > Pivot) (*j)--; // Find rightmost j such that A[*j] <= Pivot.

if (*i <= *j) { // if i and j didn't cross, swap Temp = A[*i]; A[*i] = A[*j]; A[*j] = Temp; // A[*i] and A[*j] (*i)++; // move i one space right (*j)--; // move j one space left } } while (*i <= *j); // while i and j not crossed yet}

What is the down side? Sort:

0 1 2 3 4 5 6 7 8 9

Pivot0 1 2 3 4 5 6 7 8 9

Pivot1 2 3 4 5 6 7 8 9

Pivot

The process is repeated NOT log2(n) times but n times…

The size of the sets to be partitioned goes down by 1 at each level, and we have lost the O(n log2(n)) sort…


1 2 3 4 5 6 7 8 9 10

This method, called QuickSort, does all the work at the time of partitioning the original set into two “equal size” - you hope - subsets. If the data come in skewed - so that the sets of the partitions are no longer “roughly equal” (I haven’t defined this, and it is NOT as bad as it sounds) - the “quick sorting” becomes O(n2) and you might as well use other methods.


Since the partitioning phase is where QuickSort is vulnerable, is there some other way of “dividing and conquering” where the “divide” is reliable?

Divide in half BEFORE you do any work - so there is no chance that you could mess up the division - and leave the work for later. The ideal algorithm of the procrastinator! (There IS value in the procrastinator’s approach: if you can afford to wait, many problems go away by themselves...)


MergeSort. We will see that this algorithm is also good for data sets that are too large to fit in memory. It was originally devised when the only large scale memory available was the external TAPE...

5 6 3 2 7 1 9 8 4 0

5 6 3 2 7 1 9 8 4 0

5 6 3 2 7 1 9 8 4 0

Split in half

Split in half

5 6 3 2 7 1 9 8 4 0

Split in half

5 6 3 2 7 1 9 8 4 0

Split in half


We are all split up. Now what?

Note that all sets of cardinality 1 are sorted - by definition. We started with an unsorted set and we are finishing with n (= 10) sorted ones. The idea: put them back together, two at a time, keeping the sorted property.

5 6 3 2 7 1 9 8 4 0

2 7 0 4

5 6 2 3 7 1 9 0 4 8

Let’s see exactly how the algorithm works: merge

5 6 2 3 7


5 6 2 3 7

Start with two indices, referring to the beginning positions of the two sorted arrays (I[j], II[k]). Allocate an empty array III[l] of size equal to the sum of the sizes of I and II. Since II[k] < I[j], insert the contents of II[k] into III[l] and increment k and l.

j k

I II

III

l

5 6 2 3 7

2

j k

I II

III

l

Since II[k] < I[j], repeat the copy of II[k] into III[l] and increment k and l.


5 6 2 3 7

2 3

j k

I II

III

l

Now I[j] < II[k], so copy I[j] into III[l] and increment j and l.

5 6 2 3 7

2 3 5

j k

I II

III

l

Now I[j] < II[k], so copy I[j] into III[l] and increment j and l.

5 6 2 3 7

2 3 5 6

j k

I II

IIINow j is out of range for I, so just copy the rest of II into III...

l


5 6 2 3 7

2 3 5 6 7

j k

I II

IIIDone!!!

l1 9 0 4 8

Repeat for the other half:

0 1 4 8 9

And now repeat with the two original halves:

2 3 5 6 7 0 1 4 8 9


void MergeSort(ListType List){ if (the List has more than one item in it) { (break the List into two half-lists, L = LeftList and R = RightList) (sort the LeftList using MergeSort(L)) (sort the RightList using MergeSort(R)) (merge L and R into a single sorted List) } else { (do nothing, since the list is already sorted) }}


Cost?

In TIME: since “splitting in half” occurs O(log2(n)) times, we are going to have that many “levels of merging” before we are back up with the fully merged array. At each level, the merging will position one element correctly for every comparison - so O(n) comparisons (and swaps) per level. Total O(n log2(n)).

Since the splitting decision did NOT depend on chance, there is no way this scheme can go wrong… except…


Cost?

In SPACE: the mergings seem to require a NEW array of size equal to the sum of the sizes of the arrays to be merged: drats! Twice as much space. And I thought I was going to get a free lunch (or at least a cheap one)!

If we used linked lists, we would not need as much space… but that would complicate our “splittings”, and the subsequent search: again, no free lunch.


One possibility for space reduction observes that the two parts of the array are contiguous, copies the first half into a new array (only half the size) and frees enough space to carry out the procedure using the old array as the target.

We use only 50% more space; need more time to copy into the array before we merge; gain a little time (probabilistically), since the tail end - if nonempty - of the second part does not need to be copied at all.


5 6 2 3 7j k

I II

IIIl

2 3 7

5 6

l k

I II

IIIj

2 3 7

5 6

l k

I II

IIIj

….


Another possibility for space reduction would use a linked-list based queue. Elements of the first subarray about to be overwritten are enqueued and then overwritten. The comparison between head of the first and second subarray uses the queue to obtain the current head of the first subarray (unless the queue is empty).

In this case, it would be important to have the queue implemented so that calls to malloc and free are minimized (the queue maintains its own free list).

This method is likely (probabilistically) to use less space (about 1/2 for large data items - why?) than the previous one, and to require fewer copies to be made. Even more complex to code…

Why is this method good for “external sorting”?

Take three tapes: A, B, C. The sets on A and B are sorted in increasing order, C is empty.

1) read the first items from A and from B.

2) Compare the item from A with the item from B.

3) Copy the smaller item into C starting at the beginning of empty space.

4) Read a new item from the tape that provided the smaller item.

5) if there is a new item go back to 2), otherwise

6) Copy all the unread items in the remaining tape to tape C.

7) Quit.



If the two tapes A and B are unsorted, you need two more tapes, C and D.

a) Take the first two elements of A and B and sort them on C; take the next two elements of A and B and sort them on D;

b) Repeat a) until A and B have been read and copied. If one of the tapes ends before the other, just take two successive elements from the one stuill containing data, sort them and keep alternating the output until done. Now C and D have sorted PAIRS.

c) C -> A; D -> B; swap the tapes.

d) Repeat the process with the first two elements of A and and the first two of B sorted into C; the next two pairs get sorted into D;

e) repeat with sets of cardinality 4, 8, 16, 32, etc… until done.


Notice that in External MergeSort we never needed to hold more than two items in memory at the same time.

Why don’t we use this method all the time?

Tapes are REALLY, REALLY slow…

How about disk?

Same problem: memory-to-memory copy takes on the order of nanoseconds, while disk-to-disk copy takes 10 milliseconds or more… a factor of 105 or worse.


How about Linked Lists? After all, all you need to do there is move a pointer… no extra space.

Great! But fast searching still requires an array (or tree), so we need to convert from sorted linked lists to arrays (or trees): where do we get the space from?

You could copy the sorted list out to disk, and bring it back in as an array…

There is no point in bringing it in as a tree, since this would require building a balanced binary search tree from a sorted list… you might as well have built the balanced binary search tree to begin with - it would have been cheaper (why?).


Insertion Sort.

Into an array.5 6 3 2 7 1 9 8 4 0

As yet unsorted

Remove the first item in the unsorted portion of the array and insert it into the sorted portion so that it will remain sorted. Notice the removal freed a position at the end of the sorted portion:

5 6 3 2 7 1 9 8 4 0

As yet unsortedSorted

6 3 2 7 1 9 8 4 0


Since the initially sorted portion was empty, inserting this first element into the freed position will provide a sorted array of length 1:

5 3 2 7 1 9 8 4 0


Remove the next element from the unsorted subarray.

Insert it - comparing from the beginning - into the sorted subarray, making sure that it remains sorted:

5 6 3 2 7 1 9 8 4 0


Repeat: Remove Compare and Insert until done.

5 6 2 7 1 9 8 4 0

Sorted As yet unsorted

3 5 6 2 7 1 9 8 4 0

Sorted As yet unsorted


Cost:

With no elements in the sorted part: 0 comparisons, 0 moves, 1 insertion.

With 1 element: 1 comparison and either 1 move and 1 insertion or 1 insertion (the new item goes AFTER the existing one).

With k elements: at least k (comparisons + moves) and one insertion.

To manage all n elements, at least:

1 + 2 + … + n = n(n + 1)/ 2 ~ O(n2) operations.

It’s easy to code, though….


TreeSort: Insert the incoming data into a binary search tree. The tree could rely on randomness of input or be one of those guaranteed to remain balanced (AVL, Red-Black, etc.). In either case, each insertion takes time (probabilistically or deterministically) proportional to log2(k), where k is the number of items already in the tree.

log2(1) + log2 (2) + … + log2 (k) + … + log2 (n)

~ n log2 (n).

Downside: the probabilistic search tree can become a list with each insertion costing time proportional to n - total O(n2). The AVL trees are fairly complex to code up correctly and have more overhead...


ShellSort:

The idea: things that are far apart will probably remain apart (this may or may not be true).

5 11 3 12 7 1 10 8 4 0 2 9 1314 60 1 2 3 4 5 6 7 8 9 1011121314

ValueIndex

So: first move things that are far apart, and then move things that are closer together.

Start with an “increment” that is fairly large, say 5.

5 11 3 12 7 1 10 8 4 0 2 9 1314 60 1 2 3 4 5 6 7 8 9 1011121314

Indices of the same color denote subarrays that will be sorted by means of an InsertionSort algorithm.


5 11 3 12 7 1 10 8 4 0 2 9 1314 60 1 2 3 4 5 6 7 8 9 1011121314

Sort positions

0, 5, 10;

then 1, 6, 11;

etc..

1 11 3 12 7 2 10 8 4 0 5 9 1314 60 1 2 3 4 5 6 7 8 9 10111213141 9 3 12 7 2 10 8 4 0 5 111314 60 1 2 3 4 5 6 7 8 9 10111213141 9 3 12 7 2 10 8 4 0 5 111314 60 1 2 3 4 5 6 7 8 9 10111213141 9 3 4 7 2 10 8 12 0 5 111314 60 1 2 3 4 5 6 7 8 9 10111213141 9 3 4 0 2 10 8 12 6 5 111314 70 1 2 3 4 5 6 7 8 9 1011121314

Change the “increment” to something smaller, say 3: choosing a sequence of increments that are mutually relatively prime should avoid repetitions of work - hopefully.


1 9 3 4 0 2 10 8 12 6 5 111314 70 1 2 3 4 5 6 7 8 9 10111213141 9 3 4 0 2 6 8 1210 5 111314 70 1 2 3 4 5 6 7 8 9 10111213141 0 3 4 5 2 6 8 1210 9 111314 70 1 2 3 4 5 6 7 8 9 10111213141 0 2 4 5 3 6 8 7 10 9 111314120 1 2 3 4 5 6 7 8 9 1011121314

Now increment by 1: 1 0 2 4 5 3 6 8 7 10 9 11131412

0 1 2 3 4 5 6 7 8 9 1011121314Since the elements are now nearly sorted, it makes sense to perform the insertion “from the back”: check if the new element, from the unsorted part, is LARGER than the last element of the sorted part - if true, ONE comparison and no swaps are enough...


void ShellSort(SortingArray A){ int i, delta; delta = n; // pick a value - length of array?.

do { delta = 1 + delta / 3;

for (i = 0; i < delta; ++i) { DeltaInsertionSort(A,i,delta); } } while (delta > 1);}



void DeltaInsertionSort(SortingArray A, int i,int delta){ int j,k; KeyType KeyToInsert; bool NotDone; j = i + delta; while (j < n) { // obtain a new KeyToInsert KeyToInsert = A[j];// move each Key > KeyToInsert rightward by delta spaces // to open up a hole in which to place the KeyToInsert k = j; NotDone = true; do { if (A[k - delta] <= KeyToInsert) { NotDone = false; } else { A[k] = A[k - delta]; k -= delta; if (k == i) NotDone = false; } } while (NotDone);

//Continue...// put KeyToInsert in hole A[k] opened by moving// keys > KeyToInsert rightward A[k] = KeyToInsert;

// consider next KeyToInsert at an increment of delta // to the right j += delta;

}}


Analysis (i.e., cost): after more than 40 years, there is no complete theoretical study of this sorting method. Lots of people have tried, nobody has succeeded except in some special cases. Empirical studies indicate a time complexity of O(n1.25).

There are other sorts - one of the early ones is known as BubbleSort: the elements just “bubble up” to their proper position. Easy to code, bad for efficiency.


Some time comparisons.

Array SizeQuickSortHeapSortShellSort

BubbleSortInsertionSSelectionSMergeSort

O(n log2(n))O(n log2(n))

O(n1.25)O(n2)O(n2)O(n2)

O(n log2(n))

0.400.61

64

0.422.761.121.400.99

0.981.43

128

1.0411.364.475.562.28

2.223.28

256

2.3746.4217.5822.185.13

4.947.43

512

5.44189.3569.8988.6611.45

10.8616.57

1024

11.97766.22280.27354.4825.11

Some other considerations:

size of the data set vs. ease of coding;

What are you comparing? Integers, floats, strings, playing cards?


Bound


The Radix Sort we introduced at the beginning is O(n) - deterministically so. If you have enough information about the incoming data, this may be preferable to any other method…

If you are confident you can always choose a decent pivot, use QuickSort.

If you are convinced that a malevolent demon will always send you the worst possible data set for your sorting scheme, and the data set is large, then use HeapSort or MergeSort (if you have lots of extra memory - or too little, and so have to go 'external').

If you have a small data set, use Insertion Sort.

For a medium size data set, reasonably scrambled, you could use ShellSort.

Documents

91.102 - Computing II Sorting. How can we sort efficiently? You are in charge of the 1890 census and you just took delivery of one of the new “card sorters”