229
Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian Wild [email protected] originates from joint work with Martin Aumüller, Martin Dietzfelbinger, Conrado Martínez, and Markus Nebel AofA 2017 28th International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 0 / 16

Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Median-of-k Quicksort Is Optimal For Many Equal Keys

Sebastian [email protected]

originates from joint work with

Martin Aumüller, Martin Dietzfelbinger,Conrado Martínez, and Markus Nebel

AofA 201728th International Meeting on Probabilistic,

Combinatorial and Asymptotic Methodsfor the Analysis of Algorithms

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 0 / 16

Page 2: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Outline

0 Intro0 Intro

1 Quicksort and Search Trees1 Quicksort and Search Trees

2 Saturated Fringe-Balanced Trees2 Saturated Fringe-Balanced Trees

3 Back to Multiset Permutations3 Back to Multiset Permutations

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 0 / 16

Page 3: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 4: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 5: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 6: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 7: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 8: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 9: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrences

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 10: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 11: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 12: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 13: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction method

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 14: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 15: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 16: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 17: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 18: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 19: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 20: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 21: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 22: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

symbol comparisons

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 23: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

symbol comparisons

swaps

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 24: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

symbol comparisons

swaps

scanned elements

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 25: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

symbol comparisons

swaps

scanned elements

branch misses

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 26: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

symbol comparisons

swaps

scanned elements

branch misses

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 27: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

symbol comparisons

swaps

scanned elements

branch misses

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 28: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

symbol comparisons

swaps

scanned elements

branch misses

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 29: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Don’t we know everything about Quicksort by now?

Extensive literature and results on Quicksort

Type of result Analysis Techniques Algorithm variants Cost Measures

expected costs

variance

tail inequalities

limit distributions

(semi-)local limit laws

telescoping recurrencessingularity analysis

Euler differential eq.Martingales

contraction methodbranching processes

continuous master theorem

pivot sampling

Insertionsort cutoff

multiway partitioning

Quickselect

constant space

key comparisons

symbol comparisons

swaps

scanned elements

branch misses

but: most results consider random permutations as input!

partly justified: we can (shounot done in libraries . . .

ld!) randomize Quicksort, every input appears randomly ordered

Catch: Elements with equal keys won’t go away!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 1 / 16

Page 30: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu number of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 31: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu number of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 32: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 33: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 34: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv

random profile ~X D= Mult(n,~q)

~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 35: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv

random profile ~X D= Mult(n,~q)

~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 36: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv

random profile ~X D= Mult(n,~q)

~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 37: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv

random profile ~X D= Mult(n,~q)

~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 38: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv

random profile ~X D= Mult(n,~q)

~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 39: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv

random profile ~X D= Mult(n,~q)

~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 40: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv

random profile ~X D= Mult(n,~q)

~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 41: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Setup

Assumptions:1 Input: (A) Multiset Model:

Random permutation U1, . . . , Un of fixed multisetx1, . . . , xu

profile~x of inputnumber of occurrences of values 1, . . . , u

(B) Discrete i. i.d. Model:U1, . . . , Un i. i.d. with Pr[U1 = v] = qv

random profile ~X D= Mult(n,~q)

~q = (q1, . . . , qu) a fixed universe distribution

2 fat-pivot partitioning < P P P P > P

recursive call recursive callall duplicates of pivots removed subproblems of same type, (restricted to a sub-universe)

3 Cost: # ternary comparisons

Median-of-(2t+1) Quicksort:median-of-(2t+ 1)

Example:t = 3

P

t t(extension to asymmetric sampling possible)

not today

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 2 / 16

Page 42: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Previous work on equal keys

Rather little is known!

Sedgewick 1977: Quicksort on Equal Keys

Sedgewick & Bentley 2002: Quicksort is Optimal (Talk at Knuthfest)

A bit more on BSTs:

Burge 1976: An Analysis of BSTs Formed from Sequences of Nondistinct Keys

Kemp 1996: BSTs constructed from nondistinct keys with/without specified probabilities

Archibald & Clément 2006: Average depth in a BST with repeated keys

This is basically all literature on analysis of Quicksort with equal keys!

SIAM J. COMPUT.Vol. 6, No. 2, June 1977

QUICKSORT WITH EQUAL KEYS*

ROBERT SEDGEWICK?

Abstract. This paper considers the problem of implementing and analyzing a Quicksort programwhen equal keys are likely to be present in the file to be sorted. Upper and lower bounds are derived onthe average number of comparisons needed by any Quicksort programwhen equal keys are present. Itis shown that, of the three strategies which have been suggested for dealing with equal keys, themethod of always stopping the scanning pointers on keys equal to the partitioning element performsbest.

Key words, analysis of algorithms, equal keys, Quicksort, sorting

Introduction. The Quicksort algorithm, which was introduced by C. A. R.Hoare in 1960 [6], [7], has gained wide acceptance as the most efficientgeneral-purpose sorting method suitable for use on computers. The algorithm hasa rich history: many modifications have been suggested to improve its perfor-mance, and exact formulas have been derived describing the time and spacerequirements of the most important variants [7], [9], [14].

Although most files to be sorted contain at least some equal keys and sortingprograms must always deal with them properly, it is generally considered reasona-ble to assume in the analysis that the keys are distinct. This assumption isfundamental to the analysis of nearly all sorting programs, and it is very oftenrealistic. In any situation where the number of possible key values far exceeds thenumber of keys to be sorted, the probability that equal keys are present will bevery small. However, if the number of possible key values is not large, or if there issome other information about the file which indicates that equal keys are likely tobe present, then the performance of many sorting programs, including Quicksort,has not been carefully studied.

The purpose of this paper, then, is to investigate the performance ofQuicksort when equal keys are present. The following section describes thealgorithm and its analysis for distinct keys. Next, lower and upper bounds arederived for the average number of comparisons taken when equal keys arepresent. Following that, we shall consider, from a practical standpoint, theproblem of implementing a version of Quicksort to handle equal keys. Finally weshall compare the various methods and discover which is the most useful inpractical sorting applications.

1. Distinct keys. Suppose that an array of keys A[1],..., A[N] is to berearranged to make

A[1]<A[2]<... <A[N],

where the order relation < is any transitive relation whatever defined on all thekeys.

* Received by the editors September 2, 1975, and in revised form May 3, 1976.

" Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912. Thiswork was supported in part by the National Science Foundation under Grants GJ-28074 andMCS75-23738, and in part by the Fannie and John Hertz Foundation.

240

Now is the time

For all good men

To come to the aid

Of their party

QUICKSORT IS OPTIMAL

Robert SedgewickJon Bentley

An Analysis of Binary Search Trees Formed from Sequences of Nondistinct Keys

WILLIAM H. BURGE

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

ABSTRACT. The expected depth of each key in the set of binary search trees formed from all sequences composed from a multmet pl • 1, p2 ' 2, p3 • 3, . . . , p~ • n is obtained, and hence the expected weight of such trees. The expected number of left-to-right local minima and the expected number of cycles in sequences composed from a multiset are then deduced from these results.

K E Y W O R D S A N D P H R A S E S : binary search trees, multiset

CR CATEGORIES: 5.3

Introduction

The expected depth of the number r in the set of b inary search trees formed from all permutat ions of 1,2,3,..-,n is known [1] to be Hr + H.+l - r -- 2 where Hr = ~ - l 1/k. Also the expected number of rightgoing and leftgoing branches on the pa th from the root to the number r are H,+r - i -- 1 and Hr - 1, respectively. In this paper these results are extended to b inary search trees formed from sequences of nondist inct keys drawn from the multiset pi • 1, p2 • 2, . . . , p~ • n (i.e. the sequences containing p~ l ' s , p2 2's, • • . , p , n 's) .

Both the expected number of comparisons needed to construct the b inary search trees and the expected number of comparisons needed to search for a key in a tree are obtained. The results are applicable to the analysis of the quicksort algori thm [3] when applied to sequences of nondist inct keys.

Binary Search Trees

The binary search trees are constructed by inserting keys one at a t ime in the order in which they appear in the sequence. The entering key is inserted into the left subtree if i t is strictly less than the key at the root, and into the right subtree if i t is greater than or equal to it. As a consequence the binary trees tha t result from sequences of nondist inct keys will have more rightgoing than leftgoing branches. The rightgoing and leftgoing branches will be analyzed separately.

The depth of a key in the tree is the number of branches in the pa th from the root to the key. The number of leftgoing branches in this pa th will be called the "leftgoing depth" of the key, and the number of rightgoing branches will be called its "rightgoing depth ."

Leflgoing Branches

Suppose tha t D(p~,p2,p3,... ,p,) is the expected sum of the leftgoing depths of all the l ' s in the b inary search trees constructed from all sequences drawn from Pl" 1,

Copyr igh t © 1976, Associat ion for Compu t ing Machinery , Inc. General permiss ion to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, NY 10598.

Journal of the A~ociation for Comput/ng Machinery, Vol. 23, No 3, July 1976, pp 4Sl...454.

Theoretical Computer Science

ELSEVIER Theoretical Computer Science 156 (1996) 39%70

Binary search trees constructed from nondistinct keys with/without specified probabilities

Rainer Kemp

Johann Wo@ng Goethe-Universitiit, Fachbereich Informarik, D-60054 Fran!+rt am Main, German)

Received May 1994; revised November 1994

Communicated by H. Prodinger

Abstract

We investigate binary search trees formed from sequences of nondistinct keys under two models. In the first model, an input sequence is composed from elements of a given finite multiset and all possible sequences are equally-likely. In the second model, an input sequence is composed from n elements of a (possibly infinite) set, where each key has a specified probability; the n keys are independently chosen from the given set.

Under both models, we shall derive general closed-form expressions for the expected values of the characteristic parameters defined on the corresponding binary search trees. These para- meters include the (left, right) depth of a given key, the level of a given external node and the left (right) side or the internal (external) path length of a search tree. Furthermore, we find some nonobvious relations between these expected values. In some respects, the second model tends to the first model for large n. All results are illustrated by concrete examples sometimes showing unexpected phenomena.

1. Introduction and basic definitions

A binary search tree (BST) is a common choice for a data structure in order to store a set of keys which can be compared by an ordering relation. The keys are inserted into the BST one at a time in the order in which they appear in the input sequence.

A root node is created for the first element; a further entering key is inserted into the left subtree if it is strictly less than the key at the root, and into the right subtree if it is greater than it. The inserting key is subjected recursively to the same treatment, until the key itself or a unique insertion position is found (cf. [S, p. 4241). A Pascal data definition and the insertion algorithm are as follows:’

’ For the sake of simplicity, we assume that the keys are elements of N

0304-3975/96/%15.00 0 1996-Elsevier Science B.V. All rights reserved SSDI 0304-3975(95)00302-5

Fourth Colloquium on Mathematics and Computer Science DMTCS proc. AG, 2006, 309–320

Average depth in a binary search tree withrepeated keys

Margaret Archibald1 and Julien Clement2,3

1 School of Mathematics, University of the Witwatersrand, P. O. Wits, 2050 Johannesburg, South Africa,email: [email protected] UMR 8049, Institut Gaspard-Monge, Laboratoire d’informatique, Universite de Marne-la-Vallee, France3CNRS UMR 6072, GREYC Laboratoire d’informatique, Universite de Caen, France,email: [email protected]

Random sequences from alphabet 1, . . . , r are examined where repeated letters are allowed. Binary search trees areformed from these, and the average left-going depth of the first 1 is found. Next, the right-going depth of the first r isexamined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which canbe expressed in terms of the left-going and right-going depths. The variance of each of these parameters is also found.

Keywords: Binary search trees, average case analysis, repeated keys, multiset, shuffle product

1 IntroductionWe examine binary search trees (BSTs) formed from sequences with equal entries. A BST is a planar treewhere each node has a maximum of 2 children, which are either left or right of the parent node. BSTs area commonly used data structure in Computer Science but are usually built from distinct entries. Here weconsider a suitable definition of a BST when duplicated values are allowed: the first element in the sequenceis the root of the tree and thereafter elements which are strictly less than the parent node are placed to theleft (as the left child) and those greater than or equal to the parent node are inserted as the right child (seeFig. 1 (left)).

Fig. 1: The principle for binary search tree with repeated keys (left). The binary search tree of sequence 323123411343when inserting all symbols (middle) or when inserting only the first occurrence of a symbol (right).

We examine various parameters of these trees and give an average case analysis under two standard prob-abilistic models (‘probability’ and ‘multiset’). BSTs built over permutations are a very intensively studieddata structure. One explanation is the close link between the construction of the tree and the Quicksortalgorithm(i). As with many sorting algorithms, most research has been done under the assumption that allkeys are distinct, i.e., that repeats are not allowed. However, given a large tree and a small pool of data fromwhich to choose the keys, it may well happen that equal keys are common. This is a motivation for exam-ining the case of BSTs with equal keys (see Sedgewick (1977)). Previous research on this topic includesBurge (1976), Kemp (1996) and Sedgewick (1977), where the expectation has been discussed.

Our aim in this paper is to apply modern techniques of analysis of algorithms to confirm and revisit someof these results in a somewhat simpler manner. This allows us to find both the expectation and the variance.Related partial results along the same lines can be found in Clement et al. (1998).(i) The Quicksort algorithm runs recursively: A certain key is chosen and, by comparing it to the other keys, is placed in its final

position. Thereafter, the remaining left and right subsequences (whose elements are all either greater than or less than the chosenkey) are treated in the same way. For more details see Sedgewick (1977).

1365–8050 c© 2006 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France

All these works only consider classic Quicksort:No sampling to choose pivots.(No multiway partitioning.)

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 3 / 16

Page 43: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Previous work on equal keys

Rather little is known!

Sedgewick 1977: Quicksort on Equal Keys

Sedgewick & Bentley 2002: Quicksort is Optimal (Talk at Knuthfest)

A bit more on BSTs:

Burge 1976: An Analysis of BSTs Formed from Sequences of Nondistinct Keys

Kemp 1996: BSTs constructed from nondistinct keys with/without specified probabilities

Archibald & Clément 2006: Average depth in a BST with repeated keys

This is basically all literature on analysis of Quicksort with equal keys!

SIAM J. COMPUT.Vol. 6, No. 2, June 1977

QUICKSORT WITH EQUAL KEYS*

ROBERT SEDGEWICK?

Abstract. This paper considers the problem of implementing and analyzing a Quicksort programwhen equal keys are likely to be present in the file to be sorted. Upper and lower bounds are derived onthe average number of comparisons needed by any Quicksort programwhen equal keys are present. Itis shown that, of the three strategies which have been suggested for dealing with equal keys, themethod of always stopping the scanning pointers on keys equal to the partitioning element performsbest.

Key words, analysis of algorithms, equal keys, Quicksort, sorting

Introduction. The Quicksort algorithm, which was introduced by C. A. R.Hoare in 1960 [6], [7], has gained wide acceptance as the most efficientgeneral-purpose sorting method suitable for use on computers. The algorithm hasa rich history: many modifications have been suggested to improve its perfor-mance, and exact formulas have been derived describing the time and spacerequirements of the most important variants [7], [9], [14].

Although most files to be sorted contain at least some equal keys and sortingprograms must always deal with them properly, it is generally considered reasona-ble to assume in the analysis that the keys are distinct. This assumption isfundamental to the analysis of nearly all sorting programs, and it is very oftenrealistic. In any situation where the number of possible key values far exceeds thenumber of keys to be sorted, the probability that equal keys are present will bevery small. However, if the number of possible key values is not large, or if there issome other information about the file which indicates that equal keys are likely tobe present, then the performance of many sorting programs, including Quicksort,has not been carefully studied.

The purpose of this paper, then, is to investigate the performance ofQuicksort when equal keys are present. The following section describes thealgorithm and its analysis for distinct keys. Next, lower and upper bounds arederived for the average number of comparisons taken when equal keys arepresent. Following that, we shall consider, from a practical standpoint, theproblem of implementing a version of Quicksort to handle equal keys. Finally weshall compare the various methods and discover which is the most useful inpractical sorting applications.

1. Distinct keys. Suppose that an array of keys A[1],..., A[N] is to berearranged to make

A[1]<A[2]<... <A[N],

where the order relation < is any transitive relation whatever defined on all thekeys.

* Received by the editors September 2, 1975, and in revised form May 3, 1976.

" Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912. Thiswork was supported in part by the National Science Foundation under Grants GJ-28074 andMCS75-23738, and in part by the Fannie and John Hertz Foundation.

240

Now is the time

For all good men

To come to the aid

Of their party

QUICKSORT IS OPTIMAL

Robert SedgewickJon Bentley

An Analysis of Binary Search Trees Formed from Sequences of Nondistinct Keys

WILLIAM H. BURGE

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

ABSTRACT. The expected depth of each key in the set of binary search trees formed from all sequences composed from a multmet pl • 1, p2 ' 2, p3 • 3, . . . , p~ • n is obtained, and hence the expected weight of such trees. The expected number of left-to-right local minima and the expected number of cycles in sequences composed from a multiset are then deduced from these results.

K E Y W O R D S A N D P H R A S E S : binary search trees, multiset

CR CATEGORIES: 5.3

Introduction

The expected depth of the number r in the set of b inary search trees formed from all permutat ions of 1,2,3,..-,n is known [1] to be Hr + H.+l - r -- 2 where Hr = ~ - l 1/k. Also the expected number of rightgoing and leftgoing branches on the pa th from the root to the number r are H,+r - i -- 1 and Hr - 1, respectively. In this paper these results are extended to b inary search trees formed from sequences of nondist inct keys drawn from the multiset pi • 1, p2 • 2, . . . , p~ • n (i.e. the sequences containing p~ l ' s , p2 2's, • • . , p , n 's) .

Both the expected number of comparisons needed to construct the b inary search trees and the expected number of comparisons needed to search for a key in a tree are obtained. The results are applicable to the analysis of the quicksort algori thm [3] when applied to sequences of nondist inct keys.

Binary Search Trees

The binary search trees are constructed by inserting keys one at a t ime in the order in which they appear in the sequence. The entering key is inserted into the left subtree if i t is strictly less than the key at the root, and into the right subtree if i t is greater than or equal to it. As a consequence the binary trees tha t result from sequences of nondist inct keys will have more rightgoing than leftgoing branches. The rightgoing and leftgoing branches will be analyzed separately.

The depth of a key in the tree is the number of branches in the pa th from the root to the key. The number of leftgoing branches in this pa th will be called the "leftgoing depth" of the key, and the number of rightgoing branches will be called its "rightgoing depth ."

Leflgoing Branches

Suppose tha t D(p~,p2,p3,... ,p,) is the expected sum of the leftgoing depths of all the l ' s in the b inary search trees constructed from all sequences drawn from Pl" 1,

Copyr igh t © 1976, Associat ion for Compu t ing Machinery , Inc. General permiss ion to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, NY 10598.

Journal of the A~ociation for Comput/ng Machinery, Vol. 23, No 3, July 1976, pp 4Sl...454.

Theoretical Computer Science

ELSEVIER Theoretical Computer Science 156 (1996) 39%70

Binary search trees constructed from nondistinct keys with/without specified probabilities

Rainer Kemp

Johann Wo@ng Goethe-Universitiit, Fachbereich Informarik, D-60054 Fran!+rt am Main, German)

Received May 1994; revised November 1994

Communicated by H. Prodinger

Abstract

We investigate binary search trees formed from sequences of nondistinct keys under two models. In the first model, an input sequence is composed from elements of a given finite multiset and all possible sequences are equally-likely. In the second model, an input sequence is composed from n elements of a (possibly infinite) set, where each key has a specified probability; the n keys are independently chosen from the given set.

Under both models, we shall derive general closed-form expressions for the expected values of the characteristic parameters defined on the corresponding binary search trees. These para- meters include the (left, right) depth of a given key, the level of a given external node and the left (right) side or the internal (external) path length of a search tree. Furthermore, we find some nonobvious relations between these expected values. In some respects, the second model tends to the first model for large n. All results are illustrated by concrete examples sometimes showing unexpected phenomena.

1. Introduction and basic definitions

A binary search tree (BST) is a common choice for a data structure in order to store a set of keys which can be compared by an ordering relation. The keys are inserted into the BST one at a time in the order in which they appear in the input sequence.

A root node is created for the first element; a further entering key is inserted into the left subtree if it is strictly less than the key at the root, and into the right subtree if it is greater than it. The inserting key is subjected recursively to the same treatment, until the key itself or a unique insertion position is found (cf. [S, p. 4241). A Pascal data definition and the insertion algorithm are as follows:’

’ For the sake of simplicity, we assume that the keys are elements of N

0304-3975/96/%15.00 0 1996-Elsevier Science B.V. All rights reserved SSDI 0304-3975(95)00302-5

Fourth Colloquium on Mathematics and Computer Science DMTCS proc. AG, 2006, 309–320

Average depth in a binary search tree withrepeated keys

Margaret Archibald1 and Julien Clement2,3

1 School of Mathematics, University of the Witwatersrand, P. O. Wits, 2050 Johannesburg, South Africa,email: [email protected] UMR 8049, Institut Gaspard-Monge, Laboratoire d’informatique, Universite de Marne-la-Vallee, France3CNRS UMR 6072, GREYC Laboratoire d’informatique, Universite de Caen, France,email: [email protected]

Random sequences from alphabet 1, . . . , r are examined where repeated letters are allowed. Binary search trees areformed from these, and the average left-going depth of the first 1 is found. Next, the right-going depth of the first r isexamined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which canbe expressed in terms of the left-going and right-going depths. The variance of each of these parameters is also found.

Keywords: Binary search trees, average case analysis, repeated keys, multiset, shuffle product

1 IntroductionWe examine binary search trees (BSTs) formed from sequences with equal entries. A BST is a planar treewhere each node has a maximum of 2 children, which are either left or right of the parent node. BSTs area commonly used data structure in Computer Science but are usually built from distinct entries. Here weconsider a suitable definition of a BST when duplicated values are allowed: the first element in the sequenceis the root of the tree and thereafter elements which are strictly less than the parent node are placed to theleft (as the left child) and those greater than or equal to the parent node are inserted as the right child (seeFig. 1 (left)).

Fig. 1: The principle for binary search tree with repeated keys (left). The binary search tree of sequence 323123411343when inserting all symbols (middle) or when inserting only the first occurrence of a symbol (right).

We examine various parameters of these trees and give an average case analysis under two standard prob-abilistic models (‘probability’ and ‘multiset’). BSTs built over permutations are a very intensively studieddata structure. One explanation is the close link between the construction of the tree and the Quicksortalgorithm(i). As with many sorting algorithms, most research has been done under the assumption that allkeys are distinct, i.e., that repeats are not allowed. However, given a large tree and a small pool of data fromwhich to choose the keys, it may well happen that equal keys are common. This is a motivation for exam-ining the case of BSTs with equal keys (see Sedgewick (1977)). Previous research on this topic includesBurge (1976), Kemp (1996) and Sedgewick (1977), where the expectation has been discussed.

Our aim in this paper is to apply modern techniques of analysis of algorithms to confirm and revisit someof these results in a somewhat simpler manner. This allows us to find both the expectation and the variance.Related partial results along the same lines can be found in Clement et al. (1998).(i) The Quicksort algorithm runs recursively: A certain key is chosen and, by comparing it to the other keys, is placed in its final

position. Thereafter, the remaining left and right subsequences (whose elements are all either greater than or less than the chosenkey) are treated in the same way. For more details see Sedgewick (1977).

1365–8050 c© 2006 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France

All these works only consider classic Quicksort:No sampling to choose pivots.(No multiway partitioning.)

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 3 / 16

Page 44: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Previous work on equal keys

Rather little is known!

Sedgewick 1977: Quicksort on Equal Keys

Sedgewick & Bentley 2002: Quicksort is Optimal (Talk at Knuthfest)

A bit more on BSTs:

Burge 1976: An Analysis of BSTs Formed from Sequences of Nondistinct Keys

Kemp 1996: BSTs constructed from nondistinct keys with/without specified probabilities

Archibald & Clément 2006: Average depth in a BST with repeated keys

This is basically all literature on analysis of Quicksort with equal keys!

SIAM J. COMPUT.Vol. 6, No. 2, June 1977

QUICKSORT WITH EQUAL KEYS*

ROBERT SEDGEWICK?

Abstract. This paper considers the problem of implementing and analyzing a Quicksort programwhen equal keys are likely to be present in the file to be sorted. Upper and lower bounds are derived onthe average number of comparisons needed by any Quicksort programwhen equal keys are present. Itis shown that, of the three strategies which have been suggested for dealing with equal keys, themethod of always stopping the scanning pointers on keys equal to the partitioning element performsbest.

Key words, analysis of algorithms, equal keys, Quicksort, sorting

Introduction. The Quicksort algorithm, which was introduced by C. A. R.Hoare in 1960 [6], [7], has gained wide acceptance as the most efficientgeneral-purpose sorting method suitable for use on computers. The algorithm hasa rich history: many modifications have been suggested to improve its perfor-mance, and exact formulas have been derived describing the time and spacerequirements of the most important variants [7], [9], [14].

Although most files to be sorted contain at least some equal keys and sortingprograms must always deal with them properly, it is generally considered reasona-ble to assume in the analysis that the keys are distinct. This assumption isfundamental to the analysis of nearly all sorting programs, and it is very oftenrealistic. In any situation where the number of possible key values far exceeds thenumber of keys to be sorted, the probability that equal keys are present will bevery small. However, if the number of possible key values is not large, or if there issome other information about the file which indicates that equal keys are likely tobe present, then the performance of many sorting programs, including Quicksort,has not been carefully studied.

The purpose of this paper, then, is to investigate the performance ofQuicksort when equal keys are present. The following section describes thealgorithm and its analysis for distinct keys. Next, lower and upper bounds arederived for the average number of comparisons taken when equal keys arepresent. Following that, we shall consider, from a practical standpoint, theproblem of implementing a version of Quicksort to handle equal keys. Finally weshall compare the various methods and discover which is the most useful inpractical sorting applications.

1. Distinct keys. Suppose that an array of keys A[1],..., A[N] is to berearranged to make

A[1]<A[2]<... <A[N],

where the order relation < is any transitive relation whatever defined on all thekeys.

* Received by the editors September 2, 1975, and in revised form May 3, 1976.

" Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912. Thiswork was supported in part by the National Science Foundation under Grants GJ-28074 andMCS75-23738, and in part by the Fannie and John Hertz Foundation.

240

Now is the time

For all good men

To come to the aid

Of their party

QUICKSORT IS OPTIMAL

Robert SedgewickJon Bentley

An Analysis of Binary Search Trees Formed from Sequences of Nondistinct Keys

WILLIAM H. BURGE

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

ABSTRACT. The expected depth of each key in the set of binary search trees formed from all sequences composed from a multmet pl • 1, p2 ' 2, p3 • 3, . . . , p~ • n is obtained, and hence the expected weight of such trees. The expected number of left-to-right local minima and the expected number of cycles in sequences composed from a multiset are then deduced from these results.

K E Y W O R D S A N D P H R A S E S : binary search trees, multiset

CR CATEGORIES: 5.3

Introduction

The expected depth of the number r in the set of b inary search trees formed from all permutat ions of 1,2,3,..-,n is known [1] to be Hr + H.+l - r -- 2 where Hr = ~ - l 1/k. Also the expected number of rightgoing and leftgoing branches on the pa th from the root to the number r are H,+r - i -- 1 and Hr - 1, respectively. In this paper these results are extended to b inary search trees formed from sequences of nondist inct keys drawn from the multiset pi • 1, p2 • 2, . . . , p~ • n (i.e. the sequences containing p~ l ' s , p2 2's, • • . , p , n 's) .

Both the expected number of comparisons needed to construct the b inary search trees and the expected number of comparisons needed to search for a key in a tree are obtained. The results are applicable to the analysis of the quicksort algori thm [3] when applied to sequences of nondist inct keys.

Binary Search Trees

The binary search trees are constructed by inserting keys one at a t ime in the order in which they appear in the sequence. The entering key is inserted into the left subtree if i t is strictly less than the key at the root, and into the right subtree if i t is greater than or equal to it. As a consequence the binary trees tha t result from sequences of nondist inct keys will have more rightgoing than leftgoing branches. The rightgoing and leftgoing branches will be analyzed separately.

The depth of a key in the tree is the number of branches in the pa th from the root to the key. The number of leftgoing branches in this pa th will be called the "leftgoing depth" of the key, and the number of rightgoing branches will be called its "rightgoing depth ."

Leflgoing Branches

Suppose tha t D(p~,p2,p3,... ,p,) is the expected sum of the leftgoing depths of all the l ' s in the b inary search trees constructed from all sequences drawn from Pl" 1,

Copyr igh t © 1976, Associat ion for Compu t ing Machinery , Inc. General permiss ion to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, NY 10598.

Journal of the A~ociation for Comput/ng Machinery, Vol. 23, No 3, July 1976, pp 4Sl...454.

Theoretical Computer Science

ELSEVIER Theoretical Computer Science 156 (1996) 39%70

Binary search trees constructed from nondistinct keys with/without specified probabilities

Rainer Kemp

Johann Wo@ng Goethe-Universitiit, Fachbereich Informarik, D-60054 Fran!+rt am Main, German)

Received May 1994; revised November 1994

Communicated by H. Prodinger

Abstract

We investigate binary search trees formed from sequences of nondistinct keys under two models. In the first model, an input sequence is composed from elements of a given finite multiset and all possible sequences are equally-likely. In the second model, an input sequence is composed from n elements of a (possibly infinite) set, where each key has a specified probability; the n keys are independently chosen from the given set.

Under both models, we shall derive general closed-form expressions for the expected values of the characteristic parameters defined on the corresponding binary search trees. These para- meters include the (left, right) depth of a given key, the level of a given external node and the left (right) side or the internal (external) path length of a search tree. Furthermore, we find some nonobvious relations between these expected values. In some respects, the second model tends to the first model for large n. All results are illustrated by concrete examples sometimes showing unexpected phenomena.

1. Introduction and basic definitions

A binary search tree (BST) is a common choice for a data structure in order to store a set of keys which can be compared by an ordering relation. The keys are inserted into the BST one at a time in the order in which they appear in the input sequence.

A root node is created for the first element; a further entering key is inserted into the left subtree if it is strictly less than the key at the root, and into the right subtree if it is greater than it. The inserting key is subjected recursively to the same treatment, until the key itself or a unique insertion position is found (cf. [S, p. 4241). A Pascal data definition and the insertion algorithm are as follows:’

’ For the sake of simplicity, we assume that the keys are elements of N

0304-3975/96/%15.00 0 1996-Elsevier Science B.V. All rights reserved SSDI 0304-3975(95)00302-5

Fourth Colloquium on Mathematics and Computer Science DMTCS proc. AG, 2006, 309–320

Average depth in a binary search tree withrepeated keys

Margaret Archibald1 and Julien Clement2,3

1 School of Mathematics, University of the Witwatersrand, P. O. Wits, 2050 Johannesburg, South Africa,email: [email protected] UMR 8049, Institut Gaspard-Monge, Laboratoire d’informatique, Universite de Marne-la-Vallee, France3CNRS UMR 6072, GREYC Laboratoire d’informatique, Universite de Caen, France,email: [email protected]

Random sequences from alphabet 1, . . . , r are examined where repeated letters are allowed. Binary search trees areformed from these, and the average left-going depth of the first 1 is found. Next, the right-going depth of the first r isexamined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which canbe expressed in terms of the left-going and right-going depths. The variance of each of these parameters is also found.

Keywords: Binary search trees, average case analysis, repeated keys, multiset, shuffle product

1 IntroductionWe examine binary search trees (BSTs) formed from sequences with equal entries. A BST is a planar treewhere each node has a maximum of 2 children, which are either left or right of the parent node. BSTs area commonly used data structure in Computer Science but are usually built from distinct entries. Here weconsider a suitable definition of a BST when duplicated values are allowed: the first element in the sequenceis the root of the tree and thereafter elements which are strictly less than the parent node are placed to theleft (as the left child) and those greater than or equal to the parent node are inserted as the right child (seeFig. 1 (left)).

Fig. 1: The principle for binary search tree with repeated keys (left). The binary search tree of sequence 323123411343when inserting all symbols (middle) or when inserting only the first occurrence of a symbol (right).

We examine various parameters of these trees and give an average case analysis under two standard prob-abilistic models (‘probability’ and ‘multiset’). BSTs built over permutations are a very intensively studieddata structure. One explanation is the close link between the construction of the tree and the Quicksortalgorithm(i). As with many sorting algorithms, most research has been done under the assumption that allkeys are distinct, i.e., that repeats are not allowed. However, given a large tree and a small pool of data fromwhich to choose the keys, it may well happen that equal keys are common. This is a motivation for exam-ining the case of BSTs with equal keys (see Sedgewick (1977)). Previous research on this topic includesBurge (1976), Kemp (1996) and Sedgewick (1977), where the expectation has been discussed.

Our aim in this paper is to apply modern techniques of analysis of algorithms to confirm and revisit someof these results in a somewhat simpler manner. This allows us to find both the expectation and the variance.Related partial results along the same lines can be found in Clement et al. (1998).(i) The Quicksort algorithm runs recursively: A certain key is chosen and, by comparing it to the other keys, is placed in its final

position. Thereafter, the remaining left and right subsequences (whose elements are all either greater than or less than the chosenkey) are treated in the same way. For more details see Sedgewick (1977).

1365–8050 c© 2006 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France

All these works only consider classic Quicksort:No sampling to choose pivots.(No multiway partitioning.)

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 3 / 16

Page 45: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Previous work on equal keys

Rather little is known!

Sedgewick 1977: Quicksort on Equal Keys

Sedgewick & Bentley 2002: Quicksort is Optimal (Talk at Knuthfest)

A bit more on BSTs:

Burge 1976: An Analysis of BSTs Formed from Sequences of Nondistinct Keys

Kemp 1996: BSTs constructed from nondistinct keys with/without specified probabilities

Archibald & Clément 2006: Average depth in a BST with repeated keys

This is basically all literature on analysis of Quicksort with equal keys!

SIAM J. COMPUT.Vol. 6, No. 2, June 1977

QUICKSORT WITH EQUAL KEYS*

ROBERT SEDGEWICK?

Abstract. This paper considers the problem of implementing and analyzing a Quicksort programwhen equal keys are likely to be present in the file to be sorted. Upper and lower bounds are derived onthe average number of comparisons needed by any Quicksort programwhen equal keys are present. Itis shown that, of the three strategies which have been suggested for dealing with equal keys, themethod of always stopping the scanning pointers on keys equal to the partitioning element performsbest.

Key words, analysis of algorithms, equal keys, Quicksort, sorting

Introduction. The Quicksort algorithm, which was introduced by C. A. R.Hoare in 1960 [6], [7], has gained wide acceptance as the most efficientgeneral-purpose sorting method suitable for use on computers. The algorithm hasa rich history: many modifications have been suggested to improve its perfor-mance, and exact formulas have been derived describing the time and spacerequirements of the most important variants [7], [9], [14].

Although most files to be sorted contain at least some equal keys and sortingprograms must always deal with them properly, it is generally considered reasona-ble to assume in the analysis that the keys are distinct. This assumption isfundamental to the analysis of nearly all sorting programs, and it is very oftenrealistic. In any situation where the number of possible key values far exceeds thenumber of keys to be sorted, the probability that equal keys are present will bevery small. However, if the number of possible key values is not large, or if there issome other information about the file which indicates that equal keys are likely tobe present, then the performance of many sorting programs, including Quicksort,has not been carefully studied.

The purpose of this paper, then, is to investigate the performance ofQuicksort when equal keys are present. The following section describes thealgorithm and its analysis for distinct keys. Next, lower and upper bounds arederived for the average number of comparisons taken when equal keys arepresent. Following that, we shall consider, from a practical standpoint, theproblem of implementing a version of Quicksort to handle equal keys. Finally weshall compare the various methods and discover which is the most useful inpractical sorting applications.

1. Distinct keys. Suppose that an array of keys A[1],..., A[N] is to berearranged to make

A[1]<A[2]<... <A[N],

where the order relation < is any transitive relation whatever defined on all thekeys.

* Received by the editors September 2, 1975, and in revised form May 3, 1976.

" Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912. Thiswork was supported in part by the National Science Foundation under Grants GJ-28074 andMCS75-23738, and in part by the Fannie and John Hertz Foundation.

240

Now is the time

For all good men

To come to the aid

Of their party

QUICKSORT IS OPTIMAL

Robert SedgewickJon Bentley

An Analysis of Binary Search Trees Formed from Sequences of Nondistinct Keys

WILLIAM H. BURGE

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

ABSTRACT. The expected depth of each key in the set of binary search trees formed from all sequences composed from a multmet pl • 1, p2 ' 2, p3 • 3, . . . , p~ • n is obtained, and hence the expected weight of such trees. The expected number of left-to-right local minima and the expected number of cycles in sequences composed from a multiset are then deduced from these results.

K E Y W O R D S A N D P H R A S E S : binary search trees, multiset

CR CATEGORIES: 5.3

Introduction

The expected depth of the number r in the set of b inary search trees formed from all permutat ions of 1,2,3,..-,n is known [1] to be Hr + H.+l - r -- 2 where Hr = ~ - l 1/k. Also the expected number of rightgoing and leftgoing branches on the pa th from the root to the number r are H,+r - i -- 1 and Hr - 1, respectively. In this paper these results are extended to b inary search trees formed from sequences of nondist inct keys drawn from the multiset pi • 1, p2 • 2, . . . , p~ • n (i.e. the sequences containing p~ l ' s , p2 2's, • • . , p , n 's) .

Both the expected number of comparisons needed to construct the b inary search trees and the expected number of comparisons needed to search for a key in a tree are obtained. The results are applicable to the analysis of the quicksort algori thm [3] when applied to sequences of nondist inct keys.

Binary Search Trees

The binary search trees are constructed by inserting keys one at a t ime in the order in which they appear in the sequence. The entering key is inserted into the left subtree if i t is strictly less than the key at the root, and into the right subtree if i t is greater than or equal to it. As a consequence the binary trees tha t result from sequences of nondist inct keys will have more rightgoing than leftgoing branches. The rightgoing and leftgoing branches will be analyzed separately.

The depth of a key in the tree is the number of branches in the pa th from the root to the key. The number of leftgoing branches in this pa th will be called the "leftgoing depth" of the key, and the number of rightgoing branches will be called its "rightgoing depth ."

Leflgoing Branches

Suppose tha t D(p~,p2,p3,... ,p,) is the expected sum of the leftgoing depths of all the l ' s in the b inary search trees constructed from all sequences drawn from Pl" 1,

Copyr igh t © 1976, Associat ion for Compu t ing Machinery , Inc. General permiss ion to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, NY 10598.

Journal of the A~ociation for Comput/ng Machinery, Vol. 23, No 3, July 1976, pp 4Sl...454.

Theoretical Computer Science

ELSEVIER Theoretical Computer Science 156 (1996) 39%70

Binary search trees constructed from nondistinct keys with/without specified probabilities

Rainer Kemp

Johann Wo@ng Goethe-Universitiit, Fachbereich Informarik, D-60054 Fran!+rt am Main, German)

Received May 1994; revised November 1994

Communicated by H. Prodinger

Abstract

We investigate binary search trees formed from sequences of nondistinct keys under two models. In the first model, an input sequence is composed from elements of a given finite multiset and all possible sequences are equally-likely. In the second model, an input sequence is composed from n elements of a (possibly infinite) set, where each key has a specified probability; the n keys are independently chosen from the given set.

Under both models, we shall derive general closed-form expressions for the expected values of the characteristic parameters defined on the corresponding binary search trees. These para- meters include the (left, right) depth of a given key, the level of a given external node and the left (right) side or the internal (external) path length of a search tree. Furthermore, we find some nonobvious relations between these expected values. In some respects, the second model tends to the first model for large n. All results are illustrated by concrete examples sometimes showing unexpected phenomena.

1. Introduction and basic definitions

A binary search tree (BST) is a common choice for a data structure in order to store a set of keys which can be compared by an ordering relation. The keys are inserted into the BST one at a time in the order in which they appear in the input sequence.

A root node is created for the first element; a further entering key is inserted into the left subtree if it is strictly less than the key at the root, and into the right subtree if it is greater than it. The inserting key is subjected recursively to the same treatment, until the key itself or a unique insertion position is found (cf. [S, p. 4241). A Pascal data definition and the insertion algorithm are as follows:’

’ For the sake of simplicity, we assume that the keys are elements of N

0304-3975/96/%15.00 0 1996-Elsevier Science B.V. All rights reserved SSDI 0304-3975(95)00302-5

Fourth Colloquium on Mathematics and Computer Science DMTCS proc. AG, 2006, 309–320

Average depth in a binary search tree withrepeated keys

Margaret Archibald1 and Julien Clement2,3

1 School of Mathematics, University of the Witwatersrand, P. O. Wits, 2050 Johannesburg, South Africa,email: [email protected] UMR 8049, Institut Gaspard-Monge, Laboratoire d’informatique, Universite de Marne-la-Vallee, France3CNRS UMR 6072, GREYC Laboratoire d’informatique, Universite de Caen, France,email: [email protected]

Random sequences from alphabet 1, . . . , r are examined where repeated letters are allowed. Binary search trees areformed from these, and the average left-going depth of the first 1 is found. Next, the right-going depth of the first r isexamined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which canbe expressed in terms of the left-going and right-going depths. The variance of each of these parameters is also found.

Keywords: Binary search trees, average case analysis, repeated keys, multiset, shuffle product

1 IntroductionWe examine binary search trees (BSTs) formed from sequences with equal entries. A BST is a planar treewhere each node has a maximum of 2 children, which are either left or right of the parent node. BSTs area commonly used data structure in Computer Science but are usually built from distinct entries. Here weconsider a suitable definition of a BST when duplicated values are allowed: the first element in the sequenceis the root of the tree and thereafter elements which are strictly less than the parent node are placed to theleft (as the left child) and those greater than or equal to the parent node are inserted as the right child (seeFig. 1 (left)).

Fig. 1: The principle for binary search tree with repeated keys (left). The binary search tree of sequence 323123411343when inserting all symbols (middle) or when inserting only the first occurrence of a symbol (right).

We examine various parameters of these trees and give an average case analysis under two standard prob-abilistic models (‘probability’ and ‘multiset’). BSTs built over permutations are a very intensively studieddata structure. One explanation is the close link between the construction of the tree and the Quicksortalgorithm(i). As with many sorting algorithms, most research has been done under the assumption that allkeys are distinct, i.e., that repeats are not allowed. However, given a large tree and a small pool of data fromwhich to choose the keys, it may well happen that equal keys are common. This is a motivation for exam-ining the case of BSTs with equal keys (see Sedgewick (1977)). Previous research on this topic includesBurge (1976), Kemp (1996) and Sedgewick (1977), where the expectation has been discussed.

Our aim in this paper is to apply modern techniques of analysis of algorithms to confirm and revisit someof these results in a somewhat simpler manner. This allows us to find both the expectation and the variance.Related partial results along the same lines can be found in Clement et al. (1998).(i) The Quicksort algorithm runs recursively: A certain key is chosen and, by comparing it to the other keys, is placed in its final

position. Thereafter, the remaining left and right subsequences (whose elements are all either greater than or less than the chosenkey) are treated in the same way. For more details see Sedgewick (1977).

1365–8050 c© 2006 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France

All these works only consider classic Quicksort:No sampling to choose pivots.(No multiway partitioning.)

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 3 / 16

Page 46: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Previous work on equal keys

Rather little is known!

Sedgewick 1977: Quicksort on Equal Keys

Sedgewick & Bentley 2002: Quicksort is Optimal (Talk at Knuthfest)

A bit more on BSTs:

Burge 1976: An Analysis of BSTs Formed from Sequences of Nondistinct Keys

Kemp 1996: BSTs constructed from nondistinct keys with/without specified probabilities

Archibald & Clément 2006: Average depth in a BST with repeated keys

This is basically all literature on analysis of Quicksort with equal keys!

SIAM J. COMPUT.Vol. 6, No. 2, June 1977

QUICKSORT WITH EQUAL KEYS*

ROBERT SEDGEWICK?

Abstract. This paper considers the problem of implementing and analyzing a Quicksort programwhen equal keys are likely to be present in the file to be sorted. Upper and lower bounds are derived onthe average number of comparisons needed by any Quicksort programwhen equal keys are present. Itis shown that, of the three strategies which have been suggested for dealing with equal keys, themethod of always stopping the scanning pointers on keys equal to the partitioning element performsbest.

Key words, analysis of algorithms, equal keys, Quicksort, sorting

Introduction. The Quicksort algorithm, which was introduced by C. A. R.Hoare in 1960 [6], [7], has gained wide acceptance as the most efficientgeneral-purpose sorting method suitable for use on computers. The algorithm hasa rich history: many modifications have been suggested to improve its perfor-mance, and exact formulas have been derived describing the time and spacerequirements of the most important variants [7], [9], [14].

Although most files to be sorted contain at least some equal keys and sortingprograms must always deal with them properly, it is generally considered reasona-ble to assume in the analysis that the keys are distinct. This assumption isfundamental to the analysis of nearly all sorting programs, and it is very oftenrealistic. In any situation where the number of possible key values far exceeds thenumber of keys to be sorted, the probability that equal keys are present will bevery small. However, if the number of possible key values is not large, or if there issome other information about the file which indicates that equal keys are likely tobe present, then the performance of many sorting programs, including Quicksort,has not been carefully studied.

The purpose of this paper, then, is to investigate the performance ofQuicksort when equal keys are present. The following section describes thealgorithm and its analysis for distinct keys. Next, lower and upper bounds arederived for the average number of comparisons taken when equal keys arepresent. Following that, we shall consider, from a practical standpoint, theproblem of implementing a version of Quicksort to handle equal keys. Finally weshall compare the various methods and discover which is the most useful inpractical sorting applications.

1. Distinct keys. Suppose that an array of keys A[1],..., A[N] is to berearranged to make

A[1]<A[2]<... <A[N],

where the order relation < is any transitive relation whatever defined on all thekeys.

* Received by the editors September 2, 1975, and in revised form May 3, 1976.

" Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912. Thiswork was supported in part by the National Science Foundation under Grants GJ-28074 andMCS75-23738, and in part by the Fannie and John Hertz Foundation.

240

Now is the time

For all good men

To come to the aid

Of their party

QUICKSORT IS OPTIMAL

Robert SedgewickJon Bentley

An Analysis of Binary Search Trees Formed from Sequences of Nondistinct Keys

WILLIAM H. BURGE

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

ABSTRACT. The expected depth of each key in the set of binary search trees formed from all sequences composed from a multmet pl • 1, p2 ' 2, p3 • 3, . . . , p~ • n is obtained, and hence the expected weight of such trees. The expected number of left-to-right local minima and the expected number of cycles in sequences composed from a multiset are then deduced from these results.

K E Y W O R D S A N D P H R A S E S : binary search trees, multiset

CR CATEGORIES: 5.3

Introduction

The expected depth of the number r in the set of b inary search trees formed from all permutat ions of 1,2,3,..-,n is known [1] to be Hr + H.+l - r -- 2 where Hr = ~ - l 1/k. Also the expected number of rightgoing and leftgoing branches on the pa th from the root to the number r are H,+r - i -- 1 and Hr - 1, respectively. In this paper these results are extended to b inary search trees formed from sequences of nondist inct keys drawn from the multiset pi • 1, p2 • 2, . . . , p~ • n (i.e. the sequences containing p~ l ' s , p2 2's, • • . , p , n 's) .

Both the expected number of comparisons needed to construct the b inary search trees and the expected number of comparisons needed to search for a key in a tree are obtained. The results are applicable to the analysis of the quicksort algori thm [3] when applied to sequences of nondist inct keys.

Binary Search Trees

The binary search trees are constructed by inserting keys one at a t ime in the order in which they appear in the sequence. The entering key is inserted into the left subtree if i t is strictly less than the key at the root, and into the right subtree if i t is greater than or equal to it. As a consequence the binary trees tha t result from sequences of nondist inct keys will have more rightgoing than leftgoing branches. The rightgoing and leftgoing branches will be analyzed separately.

The depth of a key in the tree is the number of branches in the pa th from the root to the key. The number of leftgoing branches in this pa th will be called the "leftgoing depth" of the key, and the number of rightgoing branches will be called its "rightgoing depth ."

Leflgoing Branches

Suppose tha t D(p~,p2,p3,... ,p,) is the expected sum of the leftgoing depths of all the l ' s in the b inary search trees constructed from all sequences drawn from Pl" 1,

Copyr igh t © 1976, Associat ion for Compu t ing Machinery , Inc. General permiss ion to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, NY 10598.

Journal of the A~ociation for Comput/ng Machinery, Vol. 23, No 3, July 1976, pp 4Sl...454.

Theoretical Computer Science

ELSEVIER Theoretical Computer Science 156 (1996) 39%70

Binary search trees constructed from nondistinct keys with/without specified probabilities

Rainer Kemp

Johann Wo@ng Goethe-Universitiit, Fachbereich Informarik, D-60054 Fran!+rt am Main, German)

Received May 1994; revised November 1994

Communicated by H. Prodinger

Abstract

We investigate binary search trees formed from sequences of nondistinct keys under two models. In the first model, an input sequence is composed from elements of a given finite multiset and all possible sequences are equally-likely. In the second model, an input sequence is composed from n elements of a (possibly infinite) set, where each key has a specified probability; the n keys are independently chosen from the given set.

Under both models, we shall derive general closed-form expressions for the expected values of the characteristic parameters defined on the corresponding binary search trees. These para- meters include the (left, right) depth of a given key, the level of a given external node and the left (right) side or the internal (external) path length of a search tree. Furthermore, we find some nonobvious relations between these expected values. In some respects, the second model tends to the first model for large n. All results are illustrated by concrete examples sometimes showing unexpected phenomena.

1. Introduction and basic definitions

A binary search tree (BST) is a common choice for a data structure in order to store a set of keys which can be compared by an ordering relation. The keys are inserted into the BST one at a time in the order in which they appear in the input sequence.

A root node is created for the first element; a further entering key is inserted into the left subtree if it is strictly less than the key at the root, and into the right subtree if it is greater than it. The inserting key is subjected recursively to the same treatment, until the key itself or a unique insertion position is found (cf. [S, p. 4241). A Pascal data definition and the insertion algorithm are as follows:’

’ For the sake of simplicity, we assume that the keys are elements of N

0304-3975/96/%15.00 0 1996-Elsevier Science B.V. All rights reserved SSDI 0304-3975(95)00302-5

Fourth Colloquium on Mathematics and Computer Science DMTCS proc. AG, 2006, 309–320

Average depth in a binary search tree withrepeated keys

Margaret Archibald1 and Julien Clement2,3

1 School of Mathematics, University of the Witwatersrand, P. O. Wits, 2050 Johannesburg, South Africa,email: [email protected] UMR 8049, Institut Gaspard-Monge, Laboratoire d’informatique, Universite de Marne-la-Vallee, France3CNRS UMR 6072, GREYC Laboratoire d’informatique, Universite de Caen, France,email: [email protected]

Random sequences from alphabet 1, . . . , r are examined where repeated letters are allowed. Binary search trees areformed from these, and the average left-going depth of the first 1 is found. Next, the right-going depth of the first r isexamined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which canbe expressed in terms of the left-going and right-going depths. The variance of each of these parameters is also found.

Keywords: Binary search trees, average case analysis, repeated keys, multiset, shuffle product

1 IntroductionWe examine binary search trees (BSTs) formed from sequences with equal entries. A BST is a planar treewhere each node has a maximum of 2 children, which are either left or right of the parent node. BSTs area commonly used data structure in Computer Science but are usually built from distinct entries. Here weconsider a suitable definition of a BST when duplicated values are allowed: the first element in the sequenceis the root of the tree and thereafter elements which are strictly less than the parent node are placed to theleft (as the left child) and those greater than or equal to the parent node are inserted as the right child (seeFig. 1 (left)).

Fig. 1: The principle for binary search tree with repeated keys (left). The binary search tree of sequence 323123411343when inserting all symbols (middle) or when inserting only the first occurrence of a symbol (right).

We examine various parameters of these trees and give an average case analysis under two standard prob-abilistic models (‘probability’ and ‘multiset’). BSTs built over permutations are a very intensively studieddata structure. One explanation is the close link between the construction of the tree and the Quicksortalgorithm(i). As with many sorting algorithms, most research has been done under the assumption that allkeys are distinct, i.e., that repeats are not allowed. However, given a large tree and a small pool of data fromwhich to choose the keys, it may well happen that equal keys are common. This is a motivation for exam-ining the case of BSTs with equal keys (see Sedgewick (1977)). Previous research on this topic includesBurge (1976), Kemp (1996) and Sedgewick (1977), where the expectation has been discussed.

Our aim in this paper is to apply modern techniques of analysis of algorithms to confirm and revisit someof these results in a somewhat simpler manner. This allows us to find both the expectation and the variance.Related partial results along the same lines can be found in Clement et al. (1998).(i) The Quicksort algorithm runs recursively: A certain key is chosen and, by comparing it to the other keys, is placed in its final

position. Thereafter, the remaining left and right subsequences (whose elements are all either greater than or less than the chosenkey) are treated in the same way. For more details see Sedgewick (1977).

1365–8050 c© 2006 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France

All these works only consider classic Quicksort:No sampling to choose pivots.(No multiway partitioning.)

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 3 / 16

Page 47: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Previous work on equal keys

Rather little is known!

Sedgewick 1977: Quicksort on Equal Keys

Sedgewick & Bentley 2002: Quicksort is Optimal (Talk at Knuthfest)

A bit more on BSTs:

Burge 1976: An Analysis of BSTs Formed from Sequences of Nondistinct Keys

Kemp 1996: BSTs constructed from nondistinct keys with/without specified probabilities

Archibald & Clément 2006: Average depth in a BST with repeated keys

This is basically all literature on analysis of Quicksort with equal keys!

SIAM J. COMPUT.Vol. 6, No. 2, June 1977

QUICKSORT WITH EQUAL KEYS*

ROBERT SEDGEWICK?

Abstract. This paper considers the problem of implementing and analyzing a Quicksort programwhen equal keys are likely to be present in the file to be sorted. Upper and lower bounds are derived onthe average number of comparisons needed by any Quicksort programwhen equal keys are present. Itis shown that, of the three strategies which have been suggested for dealing with equal keys, themethod of always stopping the scanning pointers on keys equal to the partitioning element performsbest.

Key words, analysis of algorithms, equal keys, Quicksort, sorting

Introduction. The Quicksort algorithm, which was introduced by C. A. R.Hoare in 1960 [6], [7], has gained wide acceptance as the most efficientgeneral-purpose sorting method suitable for use on computers. The algorithm hasa rich history: many modifications have been suggested to improve its perfor-mance, and exact formulas have been derived describing the time and spacerequirements of the most important variants [7], [9], [14].

Although most files to be sorted contain at least some equal keys and sortingprograms must always deal with them properly, it is generally considered reasona-ble to assume in the analysis that the keys are distinct. This assumption isfundamental to the analysis of nearly all sorting programs, and it is very oftenrealistic. In any situation where the number of possible key values far exceeds thenumber of keys to be sorted, the probability that equal keys are present will bevery small. However, if the number of possible key values is not large, or if there issome other information about the file which indicates that equal keys are likely tobe present, then the performance of many sorting programs, including Quicksort,has not been carefully studied.

The purpose of this paper, then, is to investigate the performance ofQuicksort when equal keys are present. The following section describes thealgorithm and its analysis for distinct keys. Next, lower and upper bounds arederived for the average number of comparisons taken when equal keys arepresent. Following that, we shall consider, from a practical standpoint, theproblem of implementing a version of Quicksort to handle equal keys. Finally weshall compare the various methods and discover which is the most useful inpractical sorting applications.

1. Distinct keys. Suppose that an array of keys A[1],..., A[N] is to berearranged to make

A[1]<A[2]<... <A[N],

where the order relation < is any transitive relation whatever defined on all thekeys.

* Received by the editors September 2, 1975, and in revised form May 3, 1976.

" Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912. Thiswork was supported in part by the National Science Foundation under Grants GJ-28074 andMCS75-23738, and in part by the Fannie and John Hertz Foundation.

240

Now is the time

For all good men

To come to the aid

Of their party

QUICKSORT IS OPTIMAL

Robert SedgewickJon Bentley

An Analysis of Binary Search Trees Formed from Sequences of Nondistinct Keys

WILLIAM H. BURGE

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

ABSTRACT. The expected depth of each key in the set of binary search trees formed from all sequences composed from a multmet pl • 1, p2 ' 2, p3 • 3, . . . , p~ • n is obtained, and hence the expected weight of such trees. The expected number of left-to-right local minima and the expected number of cycles in sequences composed from a multiset are then deduced from these results.

K E Y W O R D S A N D P H R A S E S : binary search trees, multiset

CR CATEGORIES: 5.3

Introduction

The expected depth of the number r in the set of b inary search trees formed from all permutat ions of 1,2,3,..-,n is known [1] to be Hr + H.+l - r -- 2 where Hr = ~ - l 1/k. Also the expected number of rightgoing and leftgoing branches on the pa th from the root to the number r are H,+r - i -- 1 and Hr - 1, respectively. In this paper these results are extended to b inary search trees formed from sequences of nondist inct keys drawn from the multiset pi • 1, p2 • 2, . . . , p~ • n (i.e. the sequences containing p~ l ' s , p2 2's, • • . , p , n 's) .

Both the expected number of comparisons needed to construct the b inary search trees and the expected number of comparisons needed to search for a key in a tree are obtained. The results are applicable to the analysis of the quicksort algori thm [3] when applied to sequences of nondist inct keys.

Binary Search Trees

The binary search trees are constructed by inserting keys one at a t ime in the order in which they appear in the sequence. The entering key is inserted into the left subtree if i t is strictly less than the key at the root, and into the right subtree if i t is greater than or equal to it. As a consequence the binary trees tha t result from sequences of nondist inct keys will have more rightgoing than leftgoing branches. The rightgoing and leftgoing branches will be analyzed separately.

The depth of a key in the tree is the number of branches in the pa th from the root to the key. The number of leftgoing branches in this pa th will be called the "leftgoing depth" of the key, and the number of rightgoing branches will be called its "rightgoing depth ."

Leflgoing Branches

Suppose tha t D(p~,p2,p3,... ,p,) is the expected sum of the leftgoing depths of all the l ' s in the b inary search trees constructed from all sequences drawn from Pl" 1,

Copyr igh t © 1976, Associat ion for Compu t ing Machinery , Inc. General permiss ion to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, NY 10598.

Journal of the A~ociation for Comput/ng Machinery, Vol. 23, No 3, July 1976, pp 4Sl...454.

Theoretical Computer Science

ELSEVIER Theoretical Computer Science 156 (1996) 39%70

Binary search trees constructed from nondistinct keys with/without specified probabilities

Rainer Kemp

Johann Wo@ng Goethe-Universitiit, Fachbereich Informarik, D-60054 Fran!+rt am Main, German)

Received May 1994; revised November 1994

Communicated by H. Prodinger

Abstract

We investigate binary search trees formed from sequences of nondistinct keys under two models. In the first model, an input sequence is composed from elements of a given finite multiset and all possible sequences are equally-likely. In the second model, an input sequence is composed from n elements of a (possibly infinite) set, where each key has a specified probability; the n keys are independently chosen from the given set.

Under both models, we shall derive general closed-form expressions for the expected values of the characteristic parameters defined on the corresponding binary search trees. These para- meters include the (left, right) depth of a given key, the level of a given external node and the left (right) side or the internal (external) path length of a search tree. Furthermore, we find some nonobvious relations between these expected values. In some respects, the second model tends to the first model for large n. All results are illustrated by concrete examples sometimes showing unexpected phenomena.

1. Introduction and basic definitions

A binary search tree (BST) is a common choice for a data structure in order to store a set of keys which can be compared by an ordering relation. The keys are inserted into the BST one at a time in the order in which they appear in the input sequence.

A root node is created for the first element; a further entering key is inserted into the left subtree if it is strictly less than the key at the root, and into the right subtree if it is greater than it. The inserting key is subjected recursively to the same treatment, until the key itself or a unique insertion position is found (cf. [S, p. 4241). A Pascal data definition and the insertion algorithm are as follows:’

’ For the sake of simplicity, we assume that the keys are elements of N

0304-3975/96/%15.00 0 1996-Elsevier Science B.V. All rights reserved SSDI 0304-3975(95)00302-5

Fourth Colloquium on Mathematics and Computer Science DMTCS proc. AG, 2006, 309–320

Average depth in a binary search tree withrepeated keys

Margaret Archibald1 and Julien Clement2,3

1 School of Mathematics, University of the Witwatersrand, P. O. Wits, 2050 Johannesburg, South Africa,email: [email protected] UMR 8049, Institut Gaspard-Monge, Laboratoire d’informatique, Universite de Marne-la-Vallee, France3CNRS UMR 6072, GREYC Laboratoire d’informatique, Universite de Caen, France,email: [email protected]

Random sequences from alphabet 1, . . . , r are examined where repeated letters are allowed. Binary search trees areformed from these, and the average left-going depth of the first 1 is found. Next, the right-going depth of the first r isexamined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which canbe expressed in terms of the left-going and right-going depths. The variance of each of these parameters is also found.

Keywords: Binary search trees, average case analysis, repeated keys, multiset, shuffle product

1 IntroductionWe examine binary search trees (BSTs) formed from sequences with equal entries. A BST is a planar treewhere each node has a maximum of 2 children, which are either left or right of the parent node. BSTs area commonly used data structure in Computer Science but are usually built from distinct entries. Here weconsider a suitable definition of a BST when duplicated values are allowed: the first element in the sequenceis the root of the tree and thereafter elements which are strictly less than the parent node are placed to theleft (as the left child) and those greater than or equal to the parent node are inserted as the right child (seeFig. 1 (left)).

Fig. 1: The principle for binary search tree with repeated keys (left). The binary search tree of sequence 323123411343when inserting all symbols (middle) or when inserting only the first occurrence of a symbol (right).

We examine various parameters of these trees and give an average case analysis under two standard prob-abilistic models (‘probability’ and ‘multiset’). BSTs built over permutations are a very intensively studieddata structure. One explanation is the close link between the construction of the tree and the Quicksortalgorithm(i). As with many sorting algorithms, most research has been done under the assumption that allkeys are distinct, i.e., that repeats are not allowed. However, given a large tree and a small pool of data fromwhich to choose the keys, it may well happen that equal keys are common. This is a motivation for exam-ining the case of BSTs with equal keys (see Sedgewick (1977)). Previous research on this topic includesBurge (1976), Kemp (1996) and Sedgewick (1977), where the expectation has been discussed.

Our aim in this paper is to apply modern techniques of analysis of algorithms to confirm and revisit someof these results in a somewhat simpler manner. This allows us to find both the expectation and the variance.Related partial results along the same lines can be found in Clement et al. (1998).(i) The Quicksort algorithm runs recursively: A certain key is chosen and, by comparing it to the other keys, is placed in its final

position. Thereafter, the remaining left and right subsequences (whose elements are all either greater than or less than the chosenkey) are treated in the same way. For more details see Sedgewick (1977).

1365–8050 c© 2006 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France

All these works only consider classic Quicksort:No sampling to choose pivots.(No multiway partitioning.)

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 3 / 16

Page 48: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Previous work on equal keys

Rather little is known!

Sedgewick 1977: Quicksort on Equal Keys

Sedgewick & Bentley 2002: Quicksort is Optimal (Talk at Knuthfest)

A bit more on BSTs:

Burge 1976: An Analysis of BSTs Formed from Sequences of Nondistinct Keys

Kemp 1996: BSTs constructed from nondistinct keys with/without specified probabilities

Archibald & Clément 2006: Average depth in a BST with repeated keys

This is basically all literature on analysis of Quicksort with equal keys!

SIAM J. COMPUT.Vol. 6, No. 2, June 1977

QUICKSORT WITH EQUAL KEYS*

ROBERT SEDGEWICK?

Abstract. This paper considers the problem of implementing and analyzing a Quicksort programwhen equal keys are likely to be present in the file to be sorted. Upper and lower bounds are derived onthe average number of comparisons needed by any Quicksort programwhen equal keys are present. Itis shown that, of the three strategies which have been suggested for dealing with equal keys, themethod of always stopping the scanning pointers on keys equal to the partitioning element performsbest.

Key words, analysis of algorithms, equal keys, Quicksort, sorting

Introduction. The Quicksort algorithm, which was introduced by C. A. R.Hoare in 1960 [6], [7], has gained wide acceptance as the most efficientgeneral-purpose sorting method suitable for use on computers. The algorithm hasa rich history: many modifications have been suggested to improve its perfor-mance, and exact formulas have been derived describing the time and spacerequirements of the most important variants [7], [9], [14].

Although most files to be sorted contain at least some equal keys and sortingprograms must always deal with them properly, it is generally considered reasona-ble to assume in the analysis that the keys are distinct. This assumption isfundamental to the analysis of nearly all sorting programs, and it is very oftenrealistic. In any situation where the number of possible key values far exceeds thenumber of keys to be sorted, the probability that equal keys are present will bevery small. However, if the number of possible key values is not large, or if there issome other information about the file which indicates that equal keys are likely tobe present, then the performance of many sorting programs, including Quicksort,has not been carefully studied.

The purpose of this paper, then, is to investigate the performance ofQuicksort when equal keys are present. The following section describes thealgorithm and its analysis for distinct keys. Next, lower and upper bounds arederived for the average number of comparisons taken when equal keys arepresent. Following that, we shall consider, from a practical standpoint, theproblem of implementing a version of Quicksort to handle equal keys. Finally weshall compare the various methods and discover which is the most useful inpractical sorting applications.

1. Distinct keys. Suppose that an array of keys A[1],..., A[N] is to berearranged to make

A[1]<A[2]<... <A[N],

where the order relation < is any transitive relation whatever defined on all thekeys.

* Received by the editors September 2, 1975, and in revised form May 3, 1976.

" Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912. Thiswork was supported in part by the National Science Foundation under Grants GJ-28074 andMCS75-23738, and in part by the Fannie and John Hertz Foundation.

240

Now is the time

For all good men

To come to the aid

Of their party

QUICKSORT IS OPTIMAL

Robert SedgewickJon Bentley

An Analysis of Binary Search Trees Formed from Sequences of Nondistinct Keys

WILLIAM H. BURGE

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

ABSTRACT. The expected depth of each key in the set of binary search trees formed from all sequences composed from a multmet pl • 1, p2 ' 2, p3 • 3, . . . , p~ • n is obtained, and hence the expected weight of such trees. The expected number of left-to-right local minima and the expected number of cycles in sequences composed from a multiset are then deduced from these results.

K E Y W O R D S A N D P H R A S E S : binary search trees, multiset

CR CATEGORIES: 5.3

Introduction

The expected depth of the number r in the set of b inary search trees formed from all permutat ions of 1,2,3,..-,n is known [1] to be Hr + H.+l - r -- 2 where Hr = ~ - l 1/k. Also the expected number of rightgoing and leftgoing branches on the pa th from the root to the number r are H,+r - i -- 1 and Hr - 1, respectively. In this paper these results are extended to b inary search trees formed from sequences of nondist inct keys drawn from the multiset pi • 1, p2 • 2, . . . , p~ • n (i.e. the sequences containing p~ l ' s , p2 2's, • • . , p , n 's) .

Both the expected number of comparisons needed to construct the b inary search trees and the expected number of comparisons needed to search for a key in a tree are obtained. The results are applicable to the analysis of the quicksort algori thm [3] when applied to sequences of nondist inct keys.

Binary Search Trees

The binary search trees are constructed by inserting keys one at a t ime in the order in which they appear in the sequence. The entering key is inserted into the left subtree if i t is strictly less than the key at the root, and into the right subtree if i t is greater than or equal to it. As a consequence the binary trees tha t result from sequences of nondist inct keys will have more rightgoing than leftgoing branches. The rightgoing and leftgoing branches will be analyzed separately.

The depth of a key in the tree is the number of branches in the pa th from the root to the key. The number of leftgoing branches in this pa th will be called the "leftgoing depth" of the key, and the number of rightgoing branches will be called its "rightgoing depth ."

Leflgoing Branches

Suppose tha t D(p~,p2,p3,... ,p,) is the expected sum of the leftgoing depths of all the l ' s in the b inary search trees constructed from all sequences drawn from Pl" 1,

Copyr igh t © 1976, Associat ion for Compu t ing Machinery , Inc. General permiss ion to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, NY 10598.

Journal of the A~ociation for Comput/ng Machinery, Vol. 23, No 3, July 1976, pp 4Sl...454.

Theoretical Computer Science

ELSEVIER Theoretical Computer Science 156 (1996) 39%70

Binary search trees constructed from nondistinct keys with/without specified probabilities

Rainer Kemp

Johann Wo@ng Goethe-Universitiit, Fachbereich Informarik, D-60054 Fran!+rt am Main, German)

Received May 1994; revised November 1994

Communicated by H. Prodinger

Abstract

We investigate binary search trees formed from sequences of nondistinct keys under two models. In the first model, an input sequence is composed from elements of a given finite multiset and all possible sequences are equally-likely. In the second model, an input sequence is composed from n elements of a (possibly infinite) set, where each key has a specified probability; the n keys are independently chosen from the given set.

Under both models, we shall derive general closed-form expressions for the expected values of the characteristic parameters defined on the corresponding binary search trees. These para- meters include the (left, right) depth of a given key, the level of a given external node and the left (right) side or the internal (external) path length of a search tree. Furthermore, we find some nonobvious relations between these expected values. In some respects, the second model tends to the first model for large n. All results are illustrated by concrete examples sometimes showing unexpected phenomena.

1. Introduction and basic definitions

A binary search tree (BST) is a common choice for a data structure in order to store a set of keys which can be compared by an ordering relation. The keys are inserted into the BST one at a time in the order in which they appear in the input sequence.

A root node is created for the first element; a further entering key is inserted into the left subtree if it is strictly less than the key at the root, and into the right subtree if it is greater than it. The inserting key is subjected recursively to the same treatment, until the key itself or a unique insertion position is found (cf. [S, p. 4241). A Pascal data definition and the insertion algorithm are as follows:’

’ For the sake of simplicity, we assume that the keys are elements of N

0304-3975/96/%15.00 0 1996-Elsevier Science B.V. All rights reserved SSDI 0304-3975(95)00302-5

Fourth Colloquium on Mathematics and Computer Science DMTCS proc. AG, 2006, 309–320

Average depth in a binary search tree withrepeated keys

Margaret Archibald1 and Julien Clement2,3

1 School of Mathematics, University of the Witwatersrand, P. O. Wits, 2050 Johannesburg, South Africa,email: [email protected] UMR 8049, Institut Gaspard-Monge, Laboratoire d’informatique, Universite de Marne-la-Vallee, France3CNRS UMR 6072, GREYC Laboratoire d’informatique, Universite de Caen, France,email: [email protected]

Random sequences from alphabet 1, . . . , r are examined where repeated letters are allowed. Binary search trees areformed from these, and the average left-going depth of the first 1 is found. Next, the right-going depth of the first r isexamined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which canbe expressed in terms of the left-going and right-going depths. The variance of each of these parameters is also found.

Keywords: Binary search trees, average case analysis, repeated keys, multiset, shuffle product

1 IntroductionWe examine binary search trees (BSTs) formed from sequences with equal entries. A BST is a planar treewhere each node has a maximum of 2 children, which are either left or right of the parent node. BSTs area commonly used data structure in Computer Science but are usually built from distinct entries. Here weconsider a suitable definition of a BST when duplicated values are allowed: the first element in the sequenceis the root of the tree and thereafter elements which are strictly less than the parent node are placed to theleft (as the left child) and those greater than or equal to the parent node are inserted as the right child (seeFig. 1 (left)).

Fig. 1: The principle for binary search tree with repeated keys (left). The binary search tree of sequence 323123411343when inserting all symbols (middle) or when inserting only the first occurrence of a symbol (right).

We examine various parameters of these trees and give an average case analysis under two standard prob-abilistic models (‘probability’ and ‘multiset’). BSTs built over permutations are a very intensively studieddata structure. One explanation is the close link between the construction of the tree and the Quicksortalgorithm(i). As with many sorting algorithms, most research has been done under the assumption that allkeys are distinct, i.e., that repeats are not allowed. However, given a large tree and a small pool of data fromwhich to choose the keys, it may well happen that equal keys are common. This is a motivation for exam-ining the case of BSTs with equal keys (see Sedgewick (1977)). Previous research on this topic includesBurge (1976), Kemp (1996) and Sedgewick (1977), where the expectation has been discussed.

Our aim in this paper is to apply modern techniques of analysis of algorithms to confirm and revisit someof these results in a somewhat simpler manner. This allows us to find both the expectation and the variance.Related partial results along the same lines can be found in Clement et al. (1998).(i) The Quicksort algorithm runs recursively: A certain key is chosen and, by comparing it to the other keys, is placed in its final

position. Thereafter, the remaining left and right subsequences (whose elements are all either greater than or less than the chosenkey) are treated in the same way. For more details see Sedgewick (1977).

1365–8050 c© 2006 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France

All these works only consider classic Quicksort:No sampling to choose pivots.(No multiway partitioning.)

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 3 / 16

Page 49: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Previous work on equal keys

Rather little is known!

Sedgewick 1977: Quicksort on Equal Keys

Sedgewick & Bentley 2002: Quicksort is Optimal (Talk at Knuthfest)

A bit more on BSTs:

Burge 1976: An Analysis of BSTs Formed from Sequences of Nondistinct Keys

Kemp 1996: BSTs constructed from nondistinct keys with/without specified probabilities

Archibald & Clément 2006: Average depth in a BST with repeated keys

This is basically all literature on analysis of Quicksort with equal keys!

SIAM J. COMPUT.Vol. 6, No. 2, June 1977

QUICKSORT WITH EQUAL KEYS*

ROBERT SEDGEWICK?

Abstract. This paper considers the problem of implementing and analyzing a Quicksort programwhen equal keys are likely to be present in the file to be sorted. Upper and lower bounds are derived onthe average number of comparisons needed by any Quicksort programwhen equal keys are present. Itis shown that, of the three strategies which have been suggested for dealing with equal keys, themethod of always stopping the scanning pointers on keys equal to the partitioning element performsbest.

Key words, analysis of algorithms, equal keys, Quicksort, sorting

Introduction. The Quicksort algorithm, which was introduced by C. A. R.Hoare in 1960 [6], [7], has gained wide acceptance as the most efficientgeneral-purpose sorting method suitable for use on computers. The algorithm hasa rich history: many modifications have been suggested to improve its perfor-mance, and exact formulas have been derived describing the time and spacerequirements of the most important variants [7], [9], [14].

Although most files to be sorted contain at least some equal keys and sortingprograms must always deal with them properly, it is generally considered reasona-ble to assume in the analysis that the keys are distinct. This assumption isfundamental to the analysis of nearly all sorting programs, and it is very oftenrealistic. In any situation where the number of possible key values far exceeds thenumber of keys to be sorted, the probability that equal keys are present will bevery small. However, if the number of possible key values is not large, or if there issome other information about the file which indicates that equal keys are likely tobe present, then the performance of many sorting programs, including Quicksort,has not been carefully studied.

The purpose of this paper, then, is to investigate the performance ofQuicksort when equal keys are present. The following section describes thealgorithm and its analysis for distinct keys. Next, lower and upper bounds arederived for the average number of comparisons taken when equal keys arepresent. Following that, we shall consider, from a practical standpoint, theproblem of implementing a version of Quicksort to handle equal keys. Finally weshall compare the various methods and discover which is the most useful inpractical sorting applications.

1. Distinct keys. Suppose that an array of keys A[1],..., A[N] is to berearranged to make

A[1]<A[2]<... <A[N],

where the order relation < is any transitive relation whatever defined on all thekeys.

* Received by the editors September 2, 1975, and in revised form May 3, 1976.

" Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912. Thiswork was supported in part by the National Science Foundation under Grants GJ-28074 andMCS75-23738, and in part by the Fannie and John Hertz Foundation.

240

Now is the time

For all good men

To come to the aid

Of their party

QUICKSORT IS OPTIMAL

Robert SedgewickJon Bentley

An Analysis of Binary Search Trees Formed from Sequences of Nondistinct Keys

WILLIAM H. BURGE

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

ABSTRACT. The expected depth of each key in the set of binary search trees formed from all sequences composed from a multmet pl • 1, p2 ' 2, p3 • 3, . . . , p~ • n is obtained, and hence the expected weight of such trees. The expected number of left-to-right local minima and the expected number of cycles in sequences composed from a multiset are then deduced from these results.

K E Y W O R D S A N D P H R A S E S : binary search trees, multiset

CR CATEGORIES: 5.3

Introduction

The expected depth of the number r in the set of b inary search trees formed from all permutat ions of 1,2,3,..-,n is known [1] to be Hr + H.+l - r -- 2 where Hr = ~ - l 1/k. Also the expected number of rightgoing and leftgoing branches on the pa th from the root to the number r are H,+r - i -- 1 and Hr - 1, respectively. In this paper these results are extended to b inary search trees formed from sequences of nondist inct keys drawn from the multiset pi • 1, p2 • 2, . . . , p~ • n (i.e. the sequences containing p~ l ' s , p2 2's, • • . , p , n 's) .

Both the expected number of comparisons needed to construct the b inary search trees and the expected number of comparisons needed to search for a key in a tree are obtained. The results are applicable to the analysis of the quicksort algori thm [3] when applied to sequences of nondist inct keys.

Binary Search Trees

The binary search trees are constructed by inserting keys one at a t ime in the order in which they appear in the sequence. The entering key is inserted into the left subtree if i t is strictly less than the key at the root, and into the right subtree if i t is greater than or equal to it. As a consequence the binary trees tha t result from sequences of nondist inct keys will have more rightgoing than leftgoing branches. The rightgoing and leftgoing branches will be analyzed separately.

The depth of a key in the tree is the number of branches in the pa th from the root to the key. The number of leftgoing branches in this pa th will be called the "leftgoing depth" of the key, and the number of rightgoing branches will be called its "rightgoing depth ."

Leflgoing Branches

Suppose tha t D(p~,p2,p3,... ,p,) is the expected sum of the leftgoing depths of all the l ' s in the b inary search trees constructed from all sequences drawn from Pl" 1,

Copyr igh t © 1976, Associat ion for Compu t ing Machinery , Inc. General permiss ion to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, NY 10598.

Journal of the A~ociation for Comput/ng Machinery, Vol. 23, No 3, July 1976, pp 4Sl...454.

Theoretical Computer Science

ELSEVIER Theoretical Computer Science 156 (1996) 39%70

Binary search trees constructed from nondistinct keys with/without specified probabilities

Rainer Kemp

Johann Wo@ng Goethe-Universitiit, Fachbereich Informarik, D-60054 Fran!+rt am Main, German)

Received May 1994; revised November 1994

Communicated by H. Prodinger

Abstract

We investigate binary search trees formed from sequences of nondistinct keys under two models. In the first model, an input sequence is composed from elements of a given finite multiset and all possible sequences are equally-likely. In the second model, an input sequence is composed from n elements of a (possibly infinite) set, where each key has a specified probability; the n keys are independently chosen from the given set.

Under both models, we shall derive general closed-form expressions for the expected values of the characteristic parameters defined on the corresponding binary search trees. These para- meters include the (left, right) depth of a given key, the level of a given external node and the left (right) side or the internal (external) path length of a search tree. Furthermore, we find some nonobvious relations between these expected values. In some respects, the second model tends to the first model for large n. All results are illustrated by concrete examples sometimes showing unexpected phenomena.

1. Introduction and basic definitions

A binary search tree (BST) is a common choice for a data structure in order to store a set of keys which can be compared by an ordering relation. The keys are inserted into the BST one at a time in the order in which they appear in the input sequence.

A root node is created for the first element; a further entering key is inserted into the left subtree if it is strictly less than the key at the root, and into the right subtree if it is greater than it. The inserting key is subjected recursively to the same treatment, until the key itself or a unique insertion position is found (cf. [S, p. 4241). A Pascal data definition and the insertion algorithm are as follows:’

’ For the sake of simplicity, we assume that the keys are elements of N

0304-3975/96/%15.00 0 1996-Elsevier Science B.V. All rights reserved SSDI 0304-3975(95)00302-5

Fourth Colloquium on Mathematics and Computer Science DMTCS proc. AG, 2006, 309–320

Average depth in a binary search tree withrepeated keys

Margaret Archibald1 and Julien Clement2,3

1 School of Mathematics, University of the Witwatersrand, P. O. Wits, 2050 Johannesburg, South Africa,email: [email protected] UMR 8049, Institut Gaspard-Monge, Laboratoire d’informatique, Universite de Marne-la-Vallee, France3CNRS UMR 6072, GREYC Laboratoire d’informatique, Universite de Caen, France,email: [email protected]

Random sequences from alphabet 1, . . . , r are examined where repeated letters are allowed. Binary search trees areformed from these, and the average left-going depth of the first 1 is found. Next, the right-going depth of the first r isexamined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which canbe expressed in terms of the left-going and right-going depths. The variance of each of these parameters is also found.

Keywords: Binary search trees, average case analysis, repeated keys, multiset, shuffle product

1 IntroductionWe examine binary search trees (BSTs) formed from sequences with equal entries. A BST is a planar treewhere each node has a maximum of 2 children, which are either left or right of the parent node. BSTs area commonly used data structure in Computer Science but are usually built from distinct entries. Here weconsider a suitable definition of a BST when duplicated values are allowed: the first element in the sequenceis the root of the tree and thereafter elements which are strictly less than the parent node are placed to theleft (as the left child) and those greater than or equal to the parent node are inserted as the right child (seeFig. 1 (left)).

Fig. 1: The principle for binary search tree with repeated keys (left). The binary search tree of sequence 323123411343when inserting all symbols (middle) or when inserting only the first occurrence of a symbol (right).

We examine various parameters of these trees and give an average case analysis under two standard prob-abilistic models (‘probability’ and ‘multiset’). BSTs built over permutations are a very intensively studieddata structure. One explanation is the close link between the construction of the tree and the Quicksortalgorithm(i). As with many sorting algorithms, most research has been done under the assumption that allkeys are distinct, i.e., that repeats are not allowed. However, given a large tree and a small pool of data fromwhich to choose the keys, it may well happen that equal keys are common. This is a motivation for exam-ining the case of BSTs with equal keys (see Sedgewick (1977)). Previous research on this topic includesBurge (1976), Kemp (1996) and Sedgewick (1977), where the expectation has been discussed.

Our aim in this paper is to apply modern techniques of analysis of algorithms to confirm and revisit someof these results in a somewhat simpler manner. This allows us to find both the expectation and the variance.Related partial results along the same lines can be found in Clement et al. (1998).(i) The Quicksort algorithm runs recursively: A certain key is chosen and, by comparing it to the other keys, is placed in its final

position. Thereafter, the remaining left and right subsequences (whose elements are all either greater than or less than the chosenkey) are treated in the same way. For more details see Sedgewick (1977).

1365–8050 c© 2006 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France

All these works only consider classic Quicksort:No sampling to choose pivots.(No multiway partitioning.)

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 3 / 16

Page 50: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Sedgewick’s analysis for classic Quicksort

Classic Quicksort:

Expected comparisons expressible exactly.

C(1,n) = N − 1 + 1N

xj(C(1, j− 1) + C(j + 1,n))1≤j≤n∑

Analysis of Quicksort with equal keys

NC(1,n) = N(N − 1) + xjC(1,j− 1) + xj1≤j≤n∑ C(j + 1,n)

1≤j≤n∑

(x1 + ... + xn)D(1,n) = x12 − x1 + 2x1(x2 + ... + xn) + xjD(1,j− 1)

2≤j≤n∑

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

1. Define

C(x1,...,xn) ≡ C(1,n) to be the mean # compares to sort the file

3. Subtract same equation for

x2,..., xn and let

D(1,n) ≡ C(1,n) − C(2,n)

2. Multiply both sides by

N = x1 + ... + xn

4. Subtract same equation for

x1,...,xn−1

D(1,n) = D(1,n − 1) +2x1xn

x1 + ... + xn

Analysis of Quicksort with equal keys (cont.)

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

5. Simplify, divide both sides by

N = x1 + ... + xn

6. Telescope (twice)

THEOREM. Quicksort (with 3-way partitioning, randomized) uses

N − n + 2QN compares (where

Q =pkpj

pk + ... + pj1≤k<j≤n∑ , with

pi = xi N)

to sort an

(x1,..., xn) −file, on the average .

C(1,n) = N − n +2xkxj

xk + ... + xj1≤k<j≤n∑

D(1,n) = D(1,n − 1) +2x1xn

x1 + ... + xn

Analysis of Quicksort with equal keys (cont.)

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

5. Simplify, divide both sides by

N = x1 + ... + xn

6. Telescope (twice)

THEOREM. Quicksort (with 3-way partitioning, randomized) uses

N − n + 2QN compares (where

Q =pkpj

pk + ... + pj1≤k<j≤n∑ , with

pi = xi N)

to sort an

(x1,..., xn) −file, on the average .

C(1,n) = N − n +2xkxj

xk + ... + xj1≤k<j≤n∑

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 4 / 16

Page 51: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Sedgewick’s analysis for classic Quicksort

Classic Quicksort:

Expected comparisons expressible exactly.

C(1,n) = N − 1 + 1N

xj(C(1, j− 1) + C(j + 1,n))1≤j≤n∑

Analysis of Quicksort with equal keys

NC(1,n) = N(N − 1) + xjC(1,j− 1) + xj1≤j≤n∑ C(j + 1,n)

1≤j≤n∑

(x1 + ... + xn)D(1,n) = x12 − x1 + 2x1(x2 + ... + xn) + xjD(1,j− 1)

2≤j≤n∑

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

1. Define

C(x1,...,xn) ≡ C(1,n) to be the mean # compares to sort the file

3. Subtract same equation for

x2,..., xn and let

D(1,n) ≡ C(1,n) − C(2,n)

2. Multiply both sides by

N = x1 + ... + xn

4. Subtract same equation for

x1,...,xn−1

D(1,n) = D(1,n − 1) +2x1xn

x1 + ... + xn

Analysis of Quicksort with equal keys (cont.)

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

5. Simplify, divide both sides by

N = x1 + ... + xn

6. Telescope (twice)

THEOREM. Quicksort (with 3-way partitioning, randomized) uses

N − n + 2QN compares (where

Q =pkpj

pk + ... + pj1≤k<j≤n∑ , with

pi = xi N)

to sort an

(x1,..., xn) −file, on the average .

C(1,n) = N − n +2xkxj

xk + ... + xj1≤k<j≤n∑

D(1,n) = D(1,n − 1) +2x1xn

x1 + ... + xn

Analysis of Quicksort with equal keys (cont.)

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

5. Simplify, divide both sides by

N = x1 + ... + xn

6. Telescope (twice)

THEOREM. Quicksort (with 3-way partitioning, randomized) uses

N − n + 2QN compares (where

Q =pkpj

pk + ... + pj1≤k<j≤n∑ , with

pi = xi N)

to sort an

(x1,..., xn) −file, on the average .

C(1,n) = N − n +2xkxj

xk + ... + xj1≤k<j≤n∑

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 4 / 16

Page 52: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Sedgewick’s analysis for classic Quicksort

Classic Quicksort:

Expected comparisons expressible exactly.

C(1,n) = N − 1 + 1N

xj(C(1, j− 1) + C(j + 1,n))1≤j≤n∑

Analysis of Quicksort with equal keys

NC(1,n) = N(N − 1) + xjC(1,j− 1) + xj1≤j≤n∑ C(j + 1,n)

1≤j≤n∑

(x1 + ... + xn)D(1,n) = x12 − x1 + 2x1(x2 + ... + xn) + xjD(1,j− 1)

2≤j≤n∑

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

1. Define

C(x1,...,xn) ≡ C(1,n) to be the mean # compares to sort the file

3. Subtract same equation for

x2,..., xn and let

D(1,n) ≡ C(1,n) − C(2,n)

2. Multiply both sides by

N = x1 + ... + xn

4. Subtract same equation for

x1,...,xn−1

D(1,n) = D(1,n − 1) +2x1xn

x1 + ... + xn

Analysis of Quicksort with equal keys (cont.)

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

5. Simplify, divide both sides by

N = x1 + ... + xn

6. Telescope (twice)

THEOREM. Quicksort (with 3-way partitioning, randomized) uses

N − n + 2QN compares (where

Q =pkpj

pk + ... + pj1≤k<j≤n∑ , with

pi = xi N)

to sort an

(x1,..., xn) −file, on the average .

C(1,n) = N − n +2xkxj

xk + ... + xj1≤k<j≤n∑

D(1,n) = D(1,n − 1) +2x1xn

x1 + ... + xn

Analysis of Quicksort with equal keys (cont.)

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

5. Simplify, divide both sides by

N = x1 + ... + xn

6. Telescope (twice)

THEOREM. Quicksort (with 3-way partitioning, randomized) uses

N − n + 2QN compares (where

Q =pkpj

pk + ... + pj1≤k<j≤n∑ , with

pi = xi N)

to sort an

(x1,..., xn) −file, on the average .

C(1,n) = N − n +2xkxj

xk + ... + xj1≤k<j≤n∑

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 4 / 16

Page 53: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Sedgewick’s analysis for classic Quicksort

Classic Quicksort: Expected comparisons expressible exactly.

C(1,n) = N − 1 + 1N

xj(C(1, j− 1) + C(j + 1,n))1≤j≤n∑

Analysis of Quicksort with equal keys

NC(1,n) = N(N − 1) + xjC(1,j− 1) + xj1≤j≤n∑ C(j + 1,n)

1≤j≤n∑

(x1 + ... + xn)D(1,n) = x12 − x1 + 2x1(x2 + ... + xn) + xjD(1,j− 1)

2≤j≤n∑

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

1. Define

C(x1,...,xn) ≡ C(1,n) to be the mean # compares to sort the file

3. Subtract same equation for

x2,..., xn and let

D(1,n) ≡ C(1,n) − C(2,n)

2. Multiply both sides by

N = x1 + ... + xn

4. Subtract same equation for

x1,...,xn−1

D(1,n) = D(1,n − 1) +2x1xn

x1 + ... + xn

Analysis of Quicksort with equal keys (cont.)

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

5. Simplify, divide both sides by

N = x1 + ... + xn

6. Telescope (twice)

THEOREM. Quicksort (with 3-way partitioning, randomized) uses

N − n + 2QN compares (where

Q =pkpj

pk + ... + pj1≤k<j≤n∑ , with

pi = xi N)

to sort an

(x1,..., xn) −file, on the average .

C(1,n) = N − n +2xkxj

xk + ... + xj1≤k<j≤n∑

D(1,n) = D(1,n − 1) +2x1xn

x1 + ... + xn

Analysis of Quicksort with equal keys (cont.)

(x1 + ... + xn)D(1,n) − (x1 + ... + xn−1)D(1,n− 1) = 2x1xn + xnD(1,n − 1)

5. Simplify, divide both sides by

N = x1 + ... + xn

6. Telescope (twice)

THEOREM. Quicksort (with 3-way partitioning, randomized) uses

N − n + 2QN compares (where

Q =pkpj

pk + ... + pj1≤k<j≤n∑ , with

pi = xi N)

to sort an

(x1,..., xn) −file, on the average .

C(1,n) = N − n +2xkxj

xk + ... + xj1≤k<j≤n∑

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 4 / 16

Page 54: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

The conjecture of Sedgewick and Bentley

Quicksort is optimal

The average number of compares per element C/N is always within a constant factor of the entropy H lower bound:

C > NH−N (information theory) upper bound:

C < 2ln2NH + N (Burge analysis, Melhorn bound)

No comparison-based algorithm can do better.

Conjecture: With sampling,

C / N → H as sample size increases.

X∗

* subject to some assumptions

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 5 / 16

Page 55: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

The conjecture of Sedgewick and Bentley

Quicksort is optimal

The average number of compares per element C/N is always within a constant factor of the entropy H lower bound:

C > NH−N (information theory) upper bound:

C < 2ln2NH + N (Burge analysis, Melhorn bound)

No comparison-based algorithm can do better.

Conjecture: With sampling,

C / N → H as sample size increases.

X∗

* subject to some assumptions

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 5 / 16

Page 56: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

The conjecture of Sedgewick and Bentley

Quicksort is optimal

The average number of compares per element C/N is always within a constant factor of the entropy H lower bound:

C > NH−N (information theory) upper bound:

C < 2ln2NH + N (Burge analysis, Melhorn bound)

No comparison-based algorithm can do better.

Conjecture: With sampling,

C / N → H as sample size increases.X∗

* subject to some assumptions

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 5 / 16

Page 57: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

The conjecture of Sedgewick and Bentley

Quicksort is optimal

The average number of compares per element C/N is always within a constant factor of the entropy H lower bound:

C > NH−N (information theory) upper bound:

C < 2ln2NH + N (Burge analysis, Melhorn bound)

No comparison-based algorithm can do better.

Conjecture: With sampling,

C / N → H as sample size increases.X∗

* subject to some assumptions

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 5 / 16

Page 58: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Outline

0 Intro0 Intro

1 Quicksort and Search Trees1 Quicksort and Search Trees

2 Saturated Fringe-Balanced Trees2 Saturated Fringe-Balanced Trees

3 Back to Multiset Permutations3 Back to Multiset Permutations

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 5 / 16

Page 59: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact:

(without duplicates)

Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 60: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact:

(without duplicates)

Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 61: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact:

(without duplicates)

Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 62: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 63: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 64: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 65: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 5

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 66: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 67: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 68: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 69: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 70: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 71: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 72: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot)

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 73: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 74: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 75: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 76: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 77: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 78: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 33

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 79: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 80: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

4

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 81: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

1

4

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 82: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

13

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 83: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

1

5

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 84: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

1

2

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 85: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

1

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 86: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

1

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 87: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Quicksort & search trees

Classic Fact: (without duplicates)Recursion Tree of Quicksort = Naturally grown BST from input Comparisons in Quicksort = Comparisons to built BST

= Comparisons to search input in final BST

How about inputs with duplicates?

Quicksort (Fat-Pivot) Binary Search Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 3 2 5 54 44

1 3 3 32 2 5 5

1 3 33

4 2 1 3 3 5 4 4 3 5 2

4

2

1 3

5

1

Equivalence holds also with duplicates.

This was only basic Quicksort . . . how about pivot sampling?Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 6 / 16

Page 88: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 89: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 90: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2)

4 2 1 3 3 5 4 4 3 5 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 91: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2)

4 2 1 3 3 5 4 4 3 5 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 92: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2)

4 2 1 3 3 5 4 4 3 5 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 93: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2)

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 94: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2)

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

Insertionsort

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 95: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2)

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 96: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2)

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 97: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2)

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 44

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 98: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 44

Insertionsort

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 99: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 100: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 101: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

4

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 102: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

4 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 103: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

4 2 1

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 104: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

4 2 1 3

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 105: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

4 2 1 3 3

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 106: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

2 1 3 3 4

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 107: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

2 1 43

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 108: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

2 1 4

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 109: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

2 1 4 5

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 110: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

2 1 4 5 4

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 111: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

2 1 4 5 4 4

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 112: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

4 5 4 42 1

3

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 113: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

2 1 4 5 4 4 5

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 114: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

2 1 4 4 4 5 5

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 115: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

2 1 5 54

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 116: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

2 1 4

5 5

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 117: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

4

5 5

2 1 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 118: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

4

5 5

2 1 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 119: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

4

5 5

2 1 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 120: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

4

5 5

2 1 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 121: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Fringe-balanced trees

k-Fringe-Balanced Search Trees:Leaves buffer k = 2t+ 1 elements.

If buffer is full, leaf is split new internal node with chosen pivot.

Median-of-5 Quicksort (t = 2) 5-Fringe-Balanced Tree

4 2 1 3 3 5 4 4 3 5 2

2 1 2 4 5 4 4 533 3

5 54 442 1 2

5 5

4 2 1 3 3 5 4 4 3 5 2

3

4

5 5

2 1 2

Correspondence extends toPivot Sampling (any scheme, not only median)(s-way Partitioning s-ary search trees) not today

Analyze search trees instead of Quicksort.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 7 / 16

Page 122: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 123: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 124: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 125: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 126: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 127: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 128: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 129: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 130: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 131: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 132: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 133: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 134: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 135: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 136: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 137: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 138: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 139: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Fringe-balanced: stationary after each valueinserted k = 2t+ 1 times(up to k duplicates in buffer)

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 140: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

Observation: T becomes stationary after each value was inserted!

Fringe-balanced: stationary after each valueinserted k = 2t+ 1 times(up to k duplicates in buffer)

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 141: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

tree-growing part T

Observation: T becomes stationary after each value was inserted!

Fringe-balanced: stationary after each valueinserted k = 2t+ 1 times(up to k duplicates in buffer)

Split input into tree-growing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 142: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

tree-growing part T

Observation: T becomes stationary after each value was inserted!

Fringe-balanced: stationary after each valueinserted k = 2t+ 1 times(up to k duplicates in buffer)

Split input into tree-growhopefully a short prefix!

ing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 143: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

tree-growing part T searching part ~XS

Observation: T becomes stationary after each value was inserted!

Fringe-balanced: stationary after each valueinserted k = 2t+ 1 times(up to k duplicates in buffer)

Split input into tree-growhopefully a short prefix!

ing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements. profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 144: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

tree-growing part T searching part ~XS

Observation: T becomes stationary after each value was inserted!

Fringe-balanced: stationary after each valueinserted k = 2t+ 1 times(up to k duplicates in buffer)

Split input into tree-growhopefully a short prefix!

ing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements.

Two parts of inputalways dependent!(profiles must sum to~x)

Two parts areindependent (i. i.d.!) profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 145: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

tree-growing part T searching part ~XS

Observation: T becomes stationary after each value was inserted!

Fringe-balanced: stationary after each valueinserted k = 2t+ 1 times(up to k duplicates in buffer)

Split input into tree-growhopefully a short prefix!

ing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements.

Two parts of inputalways dependent!(profiles must sum to~x)

Two parts areindependent (i. i.d.!) profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 146: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Tree-growing and searching

Quicksort costs = costs to search input ~U = (U1, . . . , Un) in final tree T.

T fixed search cost depends only on profile ~X = (X1, . . . , Xu)

but: T also depends on ~U (Recall: T is built from ~U!)

direct analysis no simpler than for Quicksort

4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

(k = 1)

tree-growing part T searching part ~XS

Observation: T becomes stationary after each value was inserted!

Fringe-balanced: stationary after each valueinserted k = 2t+ 1 times(up to k duplicates in buffer)

Split input into tree-growhopefully a short prefix!

ing part and searching part:1 We built T until it is stationary, ignoring costs.2 Determine costs of searching remaining elements.

Two parts of inputalways dependent!(profiles must sum to~x)

Two parts areindependent (i. i.d.!) profile ~XS of search part independent of T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 8 / 16

Page 147: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

Allow only first nT elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete

Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 148: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nT elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete

Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 149: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete

Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 150: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete

Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 151: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete

Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 152: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 153: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 154: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 155: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 156: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:

never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 157: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:never more than 6 nT · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 158: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:never more than 6 n

can’t use large nT

T · u

. . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 159: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:never more than 6 n

can’t use large nT

T · u . . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 160: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:never more than 6 n

can’t use large nT

T · u . . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.

can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 161: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:never more than 6 n

can’t use large nT

T · u . . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 162: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:never more than 6 n

can’t use large nT

T · u . . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 163: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:never more than 6 n

can’t use large nT

T · u . . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 164: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Bounding the tree-growing part

Goal: ignore tree-growing for analysis.4 5 5 5 2 3 5 4 2 2 2 5 5 5 4 2 5 1 5 4 2 3 4 2 2 1 2 4 2 2 5 5 2 3 4 2 3 4 1 3 2 2 1 4 4 4 1 2 3 3

tree-growing part T searching part ~XSnT

Allow only first nto be chosen

T elements for tree growing.

Problem: if a value occurs < k times in first nT elements, T not complete Choose nT large enough to make those degenerate inputs rare.

Require “many duplicates”: E[Xv] = Ω(nε) for ε > 0 Note: implies u =O(n1−ε)

nT = dn1−εe with ε < ε Binomial tail bound

non-degenerate w.h.p. (Pr[degenerate] = o(n−c) for any c)

Costs to grow T:never more than 6 n

can’t use large nT

T · u . . . but that is too coarse! (nT · u can be close to n2)

folklore result: random BSTs have logarithmic height w.h.p.can extend this to fringe-balanced trees nT ·O(logn) = O(n1−ε logn) to build T

Expected Quicksort costs: E[Cn,~q] = α(~q) · n ± O(n1−δ) (for any δ ∈ (0, ε))

α(~q) = expected search cost in random T

error term covers: tree-growing, sorting samples, degenerate inputs, inputs with high T

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 165: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Outline

0 Intro0 Intro

1 Quicksort and Search Trees1 Quicksort and Search Trees

2 Saturated Fringe-Balanced Trees2 Saturated Fringe-Balanced Trees

3 Back to Multiset Permutations3 Back to Multiset Permutations

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 9 / 16

Page 166: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q) elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 167: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q) elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 168: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 169: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 170: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 171: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 172: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 173: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 174: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 175: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 176: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 177: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln

base e Shannon entropy

(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 178: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln

base e Shannon entropy

(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,

only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 179: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln

base e Shannon entropy

(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 180: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Search Costs in Saturated Trees

Recall: α(~q) =

u∑v=1

qv · depthT(v) T from inserting i. i.d. D(~q)

distribution with prob. weights q1, . . . ,qu

elements until saturation

Warmup: Ordinary BSTs (t = 0)

Old result: Allen & Munro 1978:Self-Organizing Binary Search Trees

α(~q) = 2HQ(~q) + 1 with

HQ(~q) =∑

16i<j6u

qiqj

qi + · · ·+ qj

Proof sketch:Sum prob. that i is ancestor of j over all i, jancestor ⇐⇒ i first inserted key among i, . . . , j

In the same paper: HQ(~q) < Hln

base e Shannon entropy

(~q)

α(~q) < 2 ln 2 ·Hld

base 2 entropy

(~q) + 1,only factor 2 ln 2 ≈ 1.386 from optimal!

Fringe-balanced trees (t > 1)

probability of given value in root:

P[P = 3]P[P = 3]

t = 0

0

q1 q2 q3 q4 q5 q6

1

P[P = 3]P[P = 3]

t = 2

0

q1 q2 q3 q4 q5 q6

1

prob. that i inserted first among i, . . . j ??

old approach does not work

Try to generalize this!

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 10 / 16

Page 181: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 ·

0

+ 12 ·

H(23 ,13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 182: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 ·

0

+ 12 ·

H(23 ,13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 183: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 ·

0

+ 12 ·

H(23 ,13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 184: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 ·

0

+ 12 ·

H(23 ,13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 185: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16)

=

12

12

23

13

H(12 ,12) + 1

2 ·

0

+ 12 ·

H(23 ,13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 186: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 ·

0

+ 12 ·

H(23 ,13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 187: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 188: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)

Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 189: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)

Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 190: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)

Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 191: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

Z1 Z2

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)

Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 192: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

Z1 Z2

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 193: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

Z1 Z2

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

same shape!

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 194: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

Z1 Z2

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

same shape!

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 195: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

Z1 Z2

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

same shape!

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 196: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

Z1 Z2

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

same shape!

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 197: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

Z1 Z2

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

same shape!

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 198: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Aggregation of Entropy

One of the defining propertiesof Shannon entropy: aggregation

12 1

3

16

H(12 ,13 ,16) =

12

12

23

13

H(12 ,12) + 1

2 · 0 + 12 ·H(23 ,

13)

P

T1 T2

HV1 V2

q1 q2 q3 q4 q5 q6

<P =P >P

Z1 Z2

First partitioning step / Root of BST: Split into <P , =P , >P

H(~q) = H(V1, H, V2) +

2∑j=1

Vj ·H(Zj)Z1 =

(q1V1, . . . ,

qP−1

V1

)Z2 =

(qP+1

V2, . . . , qu

V2

)Recurrence for search costs: α(~q) = 1 +

2∑j=1

Vj · α(Zj)

same shape!

H(V1, H, V2) vs. 1?

Technical Issues1 Pivot P is random take expectations over P (and thus V1,2, Z1,2).2 E

[Hln(V1, H, V2)

]≈ E

[Hln(D, 1−D)

]= Hk+1 −Ht+1 whereD D= Beta(t+ 1, t+ 1)

but not an inequality in either direction

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 11 / 16

Page 199: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Entropy Bounds for Search Costs

α(~q) = c ·H(~q) does not seem to hold for any constant c

But we can show

α(~q) 6 c ·H(~q) + d

α~q > c ′ ·H(~q) − d ′

for family of constants (c, d) and (c ′, d ′).

Always have c ′ < αk < c where

αk =ln 2

Hk+1 −Ht+1

1 1.25 1.5 1.75−20

−10

0

10

20

upper bound

lower bound

t = 1

rel. difference of c and αk

±ln(d)

α(~q) 6 1.5 · α3H(~q) + 1777

α(~q) > 11.5· α3H(~q) − 1334

Asymptotically matching values for c and c ′

α(~q) = αk ·Hld(~q) ± O(H(~q)

t+2t+3 log

(H(~q)

))

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 12 / 16

Page 200: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Entropy Bounds for Search Costs

α(~q) = c ·H(~q) does not seem to hold for any constant c

But we can show

α(~q) 6 c ·H(~q) + d

α~q > c ′ ·H(~q) − d ′

for family of constants (c, d) and (c ′, d ′).

Always have c ′ < αk < c where

αk =ln 2

Hk+1 −Ht+1

1 1.25 1.5 1.75−20

−10

0

10

20

upper bound

lower bound

t = 1

rel. difference of c and αk

±ln(d)

α(~q) 6 1.5 · α3H(~q) + 1777

α(~q) > 11.5· α3H(~q) − 1334

Asymptotically matching values for c and c ′

α(~q) = αk ·Hld(~q) ± O(H(~q)

t+2t+3 log

(H(~q)

))

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 12 / 16

Page 201: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Entropy Bounds for Search Costs

α(~q) = c ·H(~q) does not seem to hold for any constant c

But we can show

α(~q) 6 c ·H(~q) + d

α~q > c ′ ·H(~q) − d ′

for family of constants (c, d) and (c ′, d ′).

Always have c ′ < αk < c where

αk =ln 2

Hk+1 −Ht+11 1.25 1.5 1.75

−20

−10

0

10

20

upper bound

lower bound

t = 1

rel. difference of c and αk

±ln(d)

α(~q) 6 1.5 · α3H(~q) + 1777

α(~q) > 11.5· α3H(~q) − 1334

Asymptotically matching values for c and c ′

α(~q) = αk ·Hld(~q) ± O(H(~q)

t+2t+3 log

(H(~q)

))

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 12 / 16

Page 202: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Entropy Bounds for Search Costs

α(~q) = c ·H(~q) does not seem to hold for any constant c

But we can show

α(~q) 6 c ·H(~q) + d

α~q > c ′ ·H(~q) − d ′

for family of constants (c, d) and (c ′, d ′).

Always have c ′ < αk < c where

αk =ln 2

Hk+1 −Ht+11 1.25 1.5 1.75

−20

−10

0

10

20

upper bound

lower bound

t = 1

rel. difference of c and αk

±ln(d)

α(~q) 6 1.5 · α3H(~q) + 1777

α(~q) > 11.5· α3H(~q) − 1334

Asymptotically matching values for c and c ′

α(~q) = αk ·Hld(~q) ± O(H(~q)

t+2t+3 log

(H(~q)

))

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 12 / 16

Page 203: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Entropy Bounds for Search Costs

α(~q) = c ·H(~q) does not seem to hold for any constant c

But we can show

α(~q) 6 c ·H(~q) + d

α~q > c ′ ·H(~q) − d ′

for family of constants (c, d) and (c ′, d ′).

Always have c ′ < αk < c where

αk =ln 2

Hk+1 −Ht+11 1.25 1.5 1.75

−20

−10

0

10

20

upper bound

lower bound

t = 1

rel. difference of c and αk

±ln(d)

α(~q) 6 1.5 · α3H(~q) + 1777

α(~q) > 11.5· α3H(~q) − 1334

Asymptotically matching values for c and c ′

α(~q) = αk ·Hld(~q) ± O(H(~q)

t+2t+3 log

(H(~q)

))

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 12 / 16

Page 204: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Results in the i. i.d. Model

Time to put the pieces together!

Separation Theorem:Quicksort costs

in the i. i.d. modelwith “many duplicates”(Ω(nε) duplicates of each value in expectation)

are given by (as n→∞, for any δ ∈ (0, ε))

E[Cn,~q] = α(~q) · n ± O(n1−δ)

Average search costsin saturated k-fringe-balanced trees:

(as H→∞)

α(~q) = αk ·H ± O(H

t+2t+3 logH

)H = Hld(~q)

αk =ln 2

Hk+1 −Ht+1

Quicksort Costs (i. i.d. model)

Under the assumptions above, we have for any δ ∈(0, 1t+3

)E[Cn,~q] = αkHld(~q) · n ± O

((H(~q)1−δ + 1)n

).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 13 / 16

Page 205: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Results in the i. i.d. Model

Time to put the pieces together!

Separation Theorem:Quicksort costs

in the i. i.d. modelwith “many duplicates”(Ω(nε) duplicates of each value in expectation)

are given by (as n→∞, for any δ ∈ (0, ε))

E[Cn,~q] = α(~q) · n ± O(n1−δ)

Average search costsin saturated k-fringe-balanced trees:

(as H→∞)

α(~q) = αk ·H ± O(H

t+2t+3 logH

)H = Hld(~q)

αk =ln 2

Hk+1 −Ht+1

Quicksort Costs (i. i.d. model)

Under the assumptions above, we have for any δ ∈(0, 1t+3

)E[Cn,~q] = αkHld(~q) · n ± O

((H(~q)1−δ + 1)n

).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 13 / 16

Page 206: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Results in the i. i.d. Model

Time to put the pieces together!

Separation Theorem:Quicksort costs

in the i. i.d. modelwith “many duplicates”(Ω(nε) duplicates of each value in expectation)

are given by (as n→∞, for any δ ∈ (0, ε))

E[Cn,~q] = α(~q) · n ± O(n1−δ)

Average search costsin saturated k-fringe-balanced trees:

(as H→∞)

α(~q) = αk ·H ± O(H

t+2t+3 logH

)H = Hld(~q)

αk =ln 2

Hk+1 −Ht+1

Quicksort Costs (i. i.d. model)

Under the assumptions above, we have for any δ ∈(0, 1t+3

)E[Cn,~q] = αkHld(~q) · n ± O

((H(~q)1−δ + 1)n

).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 13 / 16

Page 207: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Results in the i. i.d. Model

Time to put the pieces together!

Separation Theorem:Quicksort costs

in the i. i.d. modelwith “many duplicates”(Ω(nε) duplicates of each value in expectation)

are given by (as n→∞, for any δ ∈ (0, ε))

E[Cn,~q] = α(~q) · n ± O(n1−δ)

Average search costsin saturated k-fringe-balanced trees:

(as H→∞)

α(~q) = αk ·H ± O(H

t+2t+3 logH

)H = Hld(~q)

αk =ln 2

Hk+1 −Ht+1

Quicksort Costs (i. i.d. model)

Under the assumptions above, we have for any δ ∈(0, 1t+3

)E[Cn,~q] = αkHld(~q) · n ± O

((H(~q)1−δ + 1)n

).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 13 / 16

Page 208: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Outline

0 Intro0 Intro

1 Quicksort and Search Trees1 Quicksort and Search Trees

2 Saturated Fringe-Balanced Trees2 Saturated Fringe-Balanced Trees

3 Back to Multiset Permutations3 Back to Multiset Permutations

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 13 / 16

Page 209: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Back to the Multiset Model

How about the multiset model?

Many duplicates profile ~X concentrated around E[~X] = ~qn

1 Replace multiset model with profile~x by i. i.d. model with ~q =~x/n

2 Use Chernoff bounds to bound difference between costs. Need Chernoff bound for multinomial variables.

Same result holds for multiset modelwith “many duplicates”, i.e., xv = Ω(nε).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 14 / 16

Page 210: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Back to the Multiset Model

How about the multiset model?

Many duplicates profile ~X concentrated around E[~X] = ~qn

1 Replace multiset model with profile~x by i. i.d. model with ~q =~x/n

2 Use Chernoff bounds to bound difference between costs. Need Chernoff bound for multinomial variables.

Same result holds for multiset modelwith “many duplicates”, i.e., xv = Ω(nε).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 14 / 16

Page 211: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Back to the Multiset Model

How about the multiset model?

Many duplicates profile ~X concentrated around E[~X] = ~qn

1 Replace multiset model with profile~x by i. i.d. model with ~q =~x/n

2 Use Chernoff bounds to bound difference between costs. Need Chernoff bound for multinomial variables.

Same result holds for multiset modelwith “many duplicates”, i.e., xv = Ω(nε).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 14 / 16

Page 212: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Back to the Multiset Model

How about the multiset model?

Many duplicates profile ~X concentrated around E[~X] = ~qn

1 Replace multiset model with profile~x by i. i.d. model with ~q =~x/n

2 Use Chernoff bounds to bound difference between costs. Need Chernoff bound for multinomial variables.

Same result holds for multiset modelwith “many duplicates”, i.e., xv = Ω(nε).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 14 / 16

Page 213: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Back to the Multiset Model

How about the multiset model?

Many duplicates profile ~X concentrated around E[~X] = ~qn

1 Replace multiset model with profile~x by i. i.d. model with ~q =~x/n

2 Use Chernoff bounds to bound difference between costs. Need Chernoff bound for multinomial variables.

Same result holds for multiset modelwith “many duplicates”, i.e., xv = Ω(nε).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 14 / 16

Page 214: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Back to the Multiset Model

How about the multiset model?

Many duplicates profile ~X concentrated around E[~X] = ~qn

1 Replace multiset model with profile~x by i. i.d. model with ~q =~x/n

2 Use Chernoff bounds to bound difference between costs. Need Chernoff bound for multinomial variables.

Same result holds for multiset modelwith “many duplicates”, i.e., xv = Ω(nε).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 14 / 16

Page 215: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Back to the Multiset Model

How about the multiset model?

Many duplicates profile ~X concentrated around E[~X] = ~qn

1 Replace multiset model with profile~x by i. i.d. model with ~q =~x/n

2 Use Chernoff bounds to bound difference between costs. Need Chernoff bound for multinomial variables.

Same result holds for multiset modelwith “many duplicates”, i.e., xv = Ω(nε).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 14 / 16

Page 216: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Back to the Multiset Model

How about the multiset model?

Many duplicates profile ~X concentrated around E[~X] = ~qn

1 Replace multiset model with profile~x by i. i.d. model with ~q =~x/n

2 Use Chernoff bounds to bound difference between costs. Need Chernoff bound for multinomial variables.

Same result holds for multiset modelwith “many duplicates”, i.e., xv = Ω(nε).

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 14 / 16

Page 217: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 218: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 219: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 220: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 221: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 222: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 223: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 224: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 225: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 226: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 227: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Conclusion

FindingsFirst analysis of median-of-k Quicksort on equal keys . . . for “many duplicates”. Same relative speedup as for random permutations.

Partial Answer to conjecture of Sedgewick & Bentley:Median-of-k Quicksort approaches lower bound for k→∞.

Not in this talk: For uniform ~q = ( 1u , . . . ,1u) with u = O(n1−ε)

better error boundsextension for multiway partitioning

Open Problems

Get rid of “many duplicates” restriction; n1−ε seems (to me) best possible so thatinputs are non-degenerate w.h.p.tree-building costs are still negligibledifference between i. i.d. model and multiset model is negligiblethe entropy is a lower bound

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 15 / 16

Page 228: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 16 / 16

Page 229: Median-of-k Quicksort Is Optimal For Many Equal Keysaofa2017.cs.princeton.edu/talks/contributed/aofa2017-talk-wild.pdf · Median-of-k Quicksort Is Optimal For Many Equal Keys Sebastian

Icons made by Freepik and Gregor Cresnar from www.flaticon.com.

Sebastian Wild Quicksort Is Optimal For Many Equal Keys 2017-06-19 17 / 16