23
CMPS 6610/4610 Algorithms 1 CMPS 6610/4610 – Fall 2016 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk

Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

CMPS 6610/4610 Algorithms 1

CMPS 6610/4610 – Fall 2016

Order StatisticsCarola Wenk

Slides courtesy of Charles Leiserson with additions by Carola Wenk

Page 2: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Order statisticsSelect the ith smallest of n elements (the element with rank i).• i = 1: minimum;• i = n: maximum;• i = (n+1)/2 or (n+1)/2: median.

Naive algorithm: Sort and index ith element.Worst-case running time = (n log n + 1)

= (n log n),using merge sort (not quicksort).

CMPS 6610/4610 Algorithms 2

Page 3: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Randomized divide-and-conquer algorithm

RAND-SELECT(A, p, q, i) i-th smallest of A[ p . . q] if p = q then return A[p]r RAND-PARTITION(A, p, q)k r – p + 1 k = rank(A[r])if i = k then return A[r]if i < k

then return RAND-SELECT(A, p, r – 1, i )else return RAND-SELECT(A, r + 1, q, i – k )

A[r] A[r]rp q

k

CMPS 6610/4610 Algorithms 3

Page 4: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Example

pivoti = 76 10 13 5 8 3 2 11

k = 4

Select the 7 – 4 = 3rd smallest recursively.

Select the i = 7th smallest:

2 5 3 6 8 13 10 11Partition:

CMPS 6610/4610 Algorithms 4

Page 5: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Intuition for analysis

Lucky:101log 3/4 nn

CASE 3T(n) = T(3n/4) + dn

= (n)Unlucky:

T(n) = T(n – 1) + dn= (n2)

arithmetic series

Worse than sorting!

(All our analyses today assume that all elements are distinct.)

for RAND-PARTITION

CMPS 6610/4610 Algorithms 5

Page 6: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Analysis of expected time• Call a pivot good if its rank lies in [n/4,3n/4].• How many good pivots are there?A random pivot has 50% chance of being good.

• Let T(n,s) be the runtime random variable

T(n,s) T(3n/4,s) + X(s)dntime to reduce array size to 3/4n

#times it takes tofind a good pivot

n/2

Runtime of partition

CMPS 6610/4610 Algorithms 6

Page 7: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Analysis of expected timeLemma: A fair coin needs to be tossed an expected number of 2 times until the first “heads” is seen.

Proof: Let E(X) be the expected number of tosses until the first “heads”is seen.• Need at least one toss, if it’s “heads” we are done.• If it’s “tails” we need to repeat (probability ½).

E(X) = 1 + ½ E(X) E(X) = 2

CMPS 6610/4610 Algorithms 7

Page 8: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Analysis of expected time

T(n,s) T(3n/4,s) + X(s)dntime to reduce array size to 3/4n

#times it takes tofind a good pivot

Runtime of partition

E(T(n,s)) E(T(3n/4,s)) + E(X(s)dn) E(T(n,s)) E(T(3n/4,s)) + E(X(s))dn E(T(n,s)) E(T(3n/4,s)) + 2dn Texp(n) Texp(3n/4) + (n) Texp(n) (n)

Linearity of expectation

Lemma

CMPS 6610/4610 Algorithms 8

Page 9: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Summary of randomized order-statistic selection

• Works fast: linear expected time.• Excellent algorithm in practice.• But, the worst case is very bad: (n2).

Q. Is there an algorithm that runs in linear time in the worst case?

IDEA: Generate a good pivot recursively.This algorithm has large constants though andtherefore is not efficient in practice.

A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973].

CMPS 6610/4610 Algorithms 9

Page 10: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Worst-case linear-time order statistics

if i = k then return xelseif i < k

then recursively SELECT the ith smallest element in the lower part

else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.2. Recursively SELECT the median x of the n/5

group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.

Same as RAND-SELECT

CMPS 6610/4610 Algorithms 10

Page 11: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Choosing the pivot

CMPS 6610/4610 Algorithms 11

Page 12: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Choosing the pivot

1. Divide the n elements into groups of 5.

CMPS 6610/4610 Algorithms 12

Page 13: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Choosing the pivot

lesser

greater

1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.

CMPS 6610/4610 Algorithms 13

Page 14: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Choosing the pivot

lesser

greater

1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.

2. Recursively SELECT the median x of the n/5group medians to be the pivot.

x

CMPS 6610/4610 Algorithms 14

Page 15: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Developing the recurrence

if i = k then return xelseif i < k

then recursively SELECT the ith smallest element in the lower part

else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.2. Recursively SELECT the median x of the n/5

group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.

T(n)

(n)

T(n/5)(n)

T( )?

CMPS 6610/4610 Algorithms 15

Page 16: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Analysis

lesser

greater

x

At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.

(Assume all elements are distinct.)

CMPS 6610/4610 Algorithms 16

Page 17: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Analysis

lesser

greater

x

At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.• Therefore, at least 3 n/10elements are x.

(Assume all elements are distinct.)

CMPS 6610/4610 Algorithms 17

Page 18: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Analysis

lesser

greater

x

At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.• Therefore, at least 3 n/10elements are x.• Similarly, at least 3 n/10elements are x.

(Assume all elements are distinct.)

CMPS 6610/4610 Algorithms 18

Page 19: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

• At least 3 n/10elements are x at most n-3 n/10elements are x

• At least 3 n/10elements are x at most n-3 n/10elements are x

• The recursive call to SELECT in Step 4 is executed recursively on n-3 n/10elements.

Analysis (Assume all elements are distinct.)

Need “at most” for worst-case runtime

CMPS 6610/4610 Algorithms 19

Page 20: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

• Use fact that a/b (a-(b-1))/b (page 51)• n-3 n/10 n-3·(n-9)/10 = (10n -3n +27)/10 7n/10 + 3

• The recursive call to SELECT in Step 4 is executed recursively on at most 7n/10+3elements.

Analysis (Assume all elements are distinct.)

CMPS 6610/4610 Algorithms 20

Page 21: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Developing the recurrence

if i = k then return xelseif i < k

then recursively SELECT the ith smallest element in the lower part

else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.2. Recursively SELECT the median x of the n/5

group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.

T(n)

(n)

T(n/5)(n)

T(7n/10+3)

CMPS 6610/4610 Algorithms 21

Page 22: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Solving the recurrencednnTnTnT

3

107

51)(

if c is chosen large enough, e.g., c=10d)3(

101)3(

3109

)33107()3

51()(

nc

dncnnc

dnccn

dnncncnTBig-Oh Induction:T(n) c(n - 3)

for (n)

Technical trick. This shows that T(n) O(n)

CMPS 6610/4610 Algorithms 22

,

Page 23: Order Statistics - School of Science & Engineering · Worst-case linear-time order statistics if i = k then return x elseif i < k thenrecursively SELECT the ith smallest element in

Conclusions• Since the work at each level of recursion is

basically a constant fraction (9/10) smaller, the work per level is a geometric series dominated by the linear work at the root.

• In practice, this algorithm runs slowly, because the constant in front of n is large.

• The randomized algorithm is far more practical.

Exercise: Try to divide into groups of 3 or 7.CMPS 6610/4610 Algorithms 23