46
Sorting Algorithms Biostatistics 615/815 Lecture 6

615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

  • Upload
    trantu

  • View
    236

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Sorting Algorithms

Biostatistics 615/815Lecture 6

Page 2: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Last Lecture …

Recursive Functions• Natural expression for many algorithms

Dynamic Programming• Automatic strategy for generating efficient

versions of recursive algorithms

Page 3: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Today …

Properties of Sorting Algorithms

Elementary Sorting Algorithms• Selection Sort• Insertion Sort• Bubble Sort

Page 4: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Homework 1

Limits of floating point

Important concepts …• Precision is limited and relative• Errors can accumulate and lead to error• Mathematical soundness may not be enough

Page 5: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Floating Point PrecisionSmallest value that can be added to 1• 2-52 or 2.2 * 10-16

• 2-23 or 1.2 * 10-7

Smallest value that can be subtracted from 1• 2-53 or 1.1 * 10-16

• 2-24 or 6.0 * 10-8

Smallest value that is distinct from zero• 2-1074 or 4.9 * 10-324

• 2-149 or 1.4 * 10-45

Page 6: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Calculating Powers of Φ

Two possibilities

12

1

−−

−=

=

nnn

nn

φφφ

φφφ

Page 7: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Results Using Product Formula

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

Exponent

Cal

cula

tion

Usi

ng P

rodu

ct

Page 8: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Results Using Difference Formula

0 20 40 60 80

-0.5

0.0

0.5

1.0

Exponent

Cal

cula

tion

Usi

ng D

iffer

ence

Page 9: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Relative Error on Log Scale

0 20 40 60 80

-15

-10

-50

510

15

log(abs(product - difference)/product)

Exponent

Loga

rithm

of R

elat

ive

Erro

r

Page 10: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Applications of Sorting

Facilitate searching• Building indices

Identify quantiles of a distribution

Identify unique values

Browsing data

Page 11: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Elementary Methods

Suitable for• Small datasets• Specialized applications

Prelude to more complex methods• Illustrate ideas• Introduce terminology• Sometimes useful complement

Page 12: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

... but beware!

Elementary sorts are very inefficient• Typically, time requirements are O(N2)

Probably, most common inefficiency in scientific computing• Make programs “break” with large datasets

Page 13: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Aim

Rearrange a set of keys• Using some predefined order

• Integers• Doubles• Indices for records in a database

Keys stored as array in memory• More complex sorts when we can only load

part of the data

Page 14: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Basic Building Blocks

An type for each element#define Item int

Compare two elements

Exchange two elements

Compare and exchange two elements

Page 15: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Comparing Two Elements

Define a function to compare two elements

bool isLess(Item a, Item b){ return a < b; }

Alternative is to use macros, but I don’t recommend it

#define isLess(a,b) ((a)<(b))

Page 16: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Exchanging Two ElementsThe best way is to use a C++ function

void Exchange(Item & a, Item & b){ Item temp = a; a = b; b = temp; }

But using a macro is still an alternative

#define Exchange(a,b) \{ \Item tmp = (a); \(a) = (b); \(b) = tmp; \}

Page 17: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Comparing And Exchange

Using C++ function

Item CompExch(Item & a, Item & b){ if (isLess(b, a))

Exchange(a, b); }

Using a macro

#define CompExch(a,b) \if (isLess((b),(a))) Exchange((a),(b));

Page 18: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

A Simple Sort

Gradually sort the array by:

Sorting the first 2 elementsSorting the first 3 elements…Sort all N elements

Page 19: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

A Simple Sort Routinevoid sort(Item a[], int start, int stop){int i, j;

for (i = start + 1; i <= stop; i++)for (j = i; j > start; j--)

CompExch(a[j-1], a[j]);}

Page 20: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Properties of this Simple Sort

Non-adaptive• Comparisons do not depend on data

Stable • Preserves relative order for duplicates

Requires O(N2) running time

Page 21: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Sorts We Will Examine Today

Selection Sort

Insertion Sort

Bubble Sort

Page 22: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Recipe: Selection Sort

Find the smallest element• Place it at beginning of array

Find the next smallest element• Place it in the second slot

Page 23: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

C Code: Selection Sortvoid sort(Item a[], int start, int stop)

{ int i, j;

for (i = start; i < stop; i++){int min = i;for (j = i + 1; j < stop; j++)

if (isLess(a[j], a[min])min = j;

Exchange(a[i], a[min]);}

}

Page 24: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Selection Sort

Notice:

Each exchange moves elementinto final position.

Right portion of array looks random.

Page 25: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Properties of Selection Sort

Running time does not depend on input• Random data• Sorted data• Reverse ordered data…

Performs exactly N-1 exchanges

Most time spent on comparisons

Page 26: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Recipe: Insertion Sort

The “Simple Sort” we first considered

Consider one element at a time• Place it among previously considered elements• Must move several elements to “make room”

Can be improved, by “adapting to data”

Page 27: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Improvement I

Decide when further comparisons are futile

Stop comparisons when we reach a smaller element

What speed improvement do you expect?

Page 28: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Insertion Sort (I)void sort(Item a[], int start, int stop){int i, j;

for (i = start + 1; i <= stop; i++)for (j = i; j > start; j--)

if (isLess(a[j], a[j-1])Exchange(a[j-1], a[j]);

elsebreak;

}

Page 29: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Improvement II

Notice that inner loop continues until:• First element reached, or• Smaller element reached

If smallest element is at the beginning…• Only one condition to check

Page 30: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Insertion Sort (II)void sort(Item a[], int start, int stop)

{int i, j;

// This ensures that smallest element is at the beginningfor (i = stop; i > start; i--)CompExch(a[i-1], a[i]);

// Now, we don’t need to check that j > startfor (i = start + 2; i <= stop; i++){int j = i;while (isLess(a[j], a[j-1]))

{Exchange(a[j], a[j-1]); j--;}

}}

Page 31: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Improvement III

The basic approach requires many exchanges involving each element

Instead of carrying out many exchanges …

Find out position for the new element and shift elements to the right to make room

Page 32: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Insertion Sort (III)void sort(Item a[], int start, int stop)

{int i, j;

for (i = stop; i > start; i--)CompExch(a[i-1], a[i]);

for (i = start + 2; i <= stop; i++){int j = i; Item val = a[j]; // Store the value of new elementwhile (isLess(val, a[j-1])) // Proceed through larger elements

{a[j] = a[j-1]; // Shifting things to the right …j--; }

a[j] = val; // Finally, insert new element in place }

}

Page 33: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Insertion Sort

Notice:

Elements in left portion of arraycan still change position.

Right remains untouched.

Page 34: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Properties of Insertion Sort

Adaptive version running time depends on input• About 2x faster on random data• Improvement even greater on sorted data• Similar speed on reverse ordered data

Stable sort

Page 35: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Recipe: Bubble Sort

Pass through the array• Exchange elements that are out of order

Repeat until done…

Very “popular”• Very inefficient too!

Page 36: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

C Code: Bubble Sortvoid sort(Item a[], int start, int stop){int i, j;

for (i = start; i <= stop; i++)for (j = stop; j > i; j--)

CompExch(a[j-1], a[j]);}

Page 37: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Bubble Sort

Notice:

Each pass moves one elementinto position.

Right portion of array is partiallysorted

Page 38: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Shaker Sort

Notice:

Things improve slightly if bubblesort alternates directions…

Page 39: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Notes on Bubble Sort

Similar to non-adaptive Insertion Sort• Moves through unsorted portion of array

Similar to Selection Sort• Does more exchanges per element

Stop when no exchanges performed• Adaptive, but not as effective as Insertion Sort

Page 40: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Selection Insertion Bubble

Page 41: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Performance Characteristics

Selection, Insertion, Bubble Sorts

All quadratic• Running time differs by a constant

Which sorts do you think are stable?

Page 42: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Selection Sort

Exchanges• N – 1

Comparisons• N * (N – 1) / 2

Requires about N2 / 2 operationsIgnoring updates to min variable

Page 43: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Adaptive Insertion SortHalf - Exchanges• About N2 / 4 on average (random data)• N * (N – 1) / 2 (worst case)

Comparisons• About N2 / 4 on average (random data)• N * (N – 1) / 2 (worst case)

Requires about N2 / 4 operationsRequires nearly linear time on sorted data

Page 44: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Bubble Sort

Exchanges• N * (N – 1) / 2

Comparisons• N * (N – 1) / 2

Average case and worst case very similar, even for adaptive method

Page 45: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Empirical Comparison

13818262119854000

34451529212000

8114751000

ShakerBubbleInsertion (adaptive)InsertionSelectionN

Sorting Strategy

(Running times in seconds)

Page 46: 615.06 -- Basic Sorting - University of Michigancsg.sph.umich.edu/abecasis/class/2006/615.06.pdf · • Selection Sort • Insertion Sort • Bubble Sort. ... zThe best way is to

Reading

Sedgewick, Chapter 6