403: Algorithms and Data Structures Quicksortpetko/classes/FALL16-CSI403-slides/08-Quicksort.pdfAnalyzing Quicksort: Average Case • IntuiDvely, a real-life run of quicksort will

403:AlgorithmsandDataStructures

QuicksortFall2016UAlbany

ComputerScienceSomeslidesborrowedfromDavidLuebke

Sofar:SorDng

•  Inser6on O(n2) in-place•  Merge O(nlogn) 2ndarraytomerge •  Heapsort O(nlogn) in-place•  Quicksort fromO(nlogn)toO(n2)in-place–  verygoodinpracDce(smallconstants)– QuadraDcDmeisrare

Next

Algorithm Time Space

Quicksort

•  Anotherdivide-and-conqueralgorithm– DIVIDE:ThearrayA[p..r]ispar11onedintotwonon-emptysubarraysA[p..q]andA[q+1..r]•  Invariant:AllelementsinA[p..q]arelessthanallelementsinA[q+1..r]

– CONQUER:Thesubarraysarerecursivelysortedbycallstoquicksort

– COMBINE:Unlikemergesort,nocombiningstep:twosubarraysformanalready-sortedarray

QuicksortCode

Quicksort(A, p, r) { if (p < r) { q = Partition(A, p, r); Quicksort(A, p, q); Quicksort(A, q+1, r); } }

ParDDon

•  Clearly,alltheacDontakesplaceinthepartition()funcDon– Rearrangesthesubarrayinplace– Endresult:

•  Twosubarrays•  Allvaluesinfirstsubarray≤allvaluesinsecond

– Returnstheindexofthe“pivot”elementseparaDngthetwosubarrays

•  Howdoyousupposeweimplementthis?

ParDDonInWords

•  ParDDon(A,p,r):–  Selectanelementtoactasthe“pivot”(which?)– Growtworegions,A[p..i]andA[j..r]

•  AllelementsinA[p..i]<=pivot•  AllelementsinA[j..r]>=pivot

–  IncrementiunDlA[i]>=pivot– DecrementjunDlA[j]<=pivot–  SwapA[i]andA[j]–  RepeatunDli>=j–  Returnj Note: slightly different from

book’s partition()

ParDDonCodePartition(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; until A[j] <= x; repeat i++; until A[i] >= x; if (i < j) Swap(A, i, j); else return j;

Illustrate on A = {4,5,9,7,2,13,6,3};

i j

Choosepivotx

Scanlookingforelementexceeding

x

Scanlookingforelementatmost

x

Whenwefindsuchelements,Exchangethem

Pivot=4Goal:

4 5 9 7 2 13 6 3

i=0 j=9

3 5 9 7 2 13 6 4

i=0 j=9i=2 j=5

3 2 9 7 5 13 6 4

i=3 j=5i=2j=2

i>j:DONE

<=x>=x

Example

AssumeallelementsaredisDnct

ParDDonCodePartition(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; until A[j] <= x; repeat i++; until A[i] >= x; if (i < j) Swap(A, i, j); else return j;

partition() runs in O(n) time •  O(1) at each element: skip or

swap •  Linear in the size of the array

What is the running time of partition()?

BacktoQuicksortQuicksort(A, p, r) if (p < r) q = Partition(A, p, r);

Quicksort(A, p, q); Quicksort(A, q+1, r);

3 9 5 7

Qsort(A,1,4)

A

Part(A,1,4)Returns:1

3 9 5 7

Qsort(A,1,1) Qsort(A,2,4)


3 7 5 9

Qsort(A,2,3)


3 5 7 9

Qsort(A,2,2) Qsort(A,3,3)

Qsort(A,4,4)

AnalyzingQuicksort

•  Whatwillbeabadcaseforthealgorithm?– ParDDonisalwaysunbalanced

•  Whatwillbethebestcaseforthealgorithm?– ParDDonisperfectlybalanced

•  Whichismorelikely?– Thelader,byfar,except...

•  Willanypar1cularinputelicittheworstcase?– Yes:Already-sortedinput

AnalyzingQuicksort:Balancedsplits

•  Inthebalancedsplitcase:T(n)=2T(n/2)+Θ(n)

•  Whatdoesthisworkoutto?T(n)=Θ(nlgn)

Takehome:Agoodbalanceisimportant

AnalyzingQuicksort:Sortedcase

•  Sortedcase:T(1)=Θ(1)T(n)=T(n-1)+Θ(n)bysubsDtuDon…

T(n)=T(1)+nΘ(n)

•  WorksouttoT(n)=Θ(n2)

2 3 6 7 10 13 14 16

Firstcall:jwilldecreaseto1(nsteps)Second:jdecreaseto2(n-1steps)…n+n-1+n-2+…=Θ(n2)

Issortedreallytheworstcase?

•  Argueformallythatthingscannotgetworse•  Aformalargumentwithgeneralsplit•  Assumethateverysplitresultsintwoarrays–  Sizeq–  Sizen-q

•  T(n)=max1<=q<=n-1[T(q)+T(n-q)]+O(n)– whereT(1)=O(1)

•  ShowthatT(n)=O(n2)ITCANNOTGET

WORSE

Averagebehavior:IntuiDon

•  Worstcase:assumes1:n-1split–  rareinpracDce

•  TheO(nlogn)behavioroccursevenifthesplitissay10%:90%

•  Ifallsplitsareequallylikely– 1:n-1,2:n-2…n-1:1–  thenonaverage,wewillnotgetaverytalltree– detailsinextraslideattheend(notrequired)

AvoidingtheO(n2)case

•  TherealliabilityofquicksortisthatitrunsinO(n2)onalready-sortedinput

•  SoluDons– Randomizetheinputarray– Pickarandompivotelement– choose3elementsandtakemedianforpivot

•  Howwillthesesolvetheproblem?– ByensuringthatnoparDcularinputcanbechosentomakequicksortruninO(n2)Dme

OtherImprovements(lowerconstants)

•  Whenasubarrayissmall(saysmallerthan5)switchtoasimplesorDngproceduresayinserDonsortinsteadofQuicksort– whydoesthishelp?

•  Pickmorethanonepivot–  ParDDonsthearrayinmorethan2parts–  Smallernumberofcomparisons(1.9nlognvs2nlogn)andoverallbederperformanceinpracDce

– Details:Kushagraetal.“MulD-PivotQuicksort:TheoryandExperiments”,SIAM,2013

Announcements

•  ReadthroughChapter7•  HW2dueonWednesday

Extraslides*

•  Averagecaserigorousanalysisfollows•  Thisisadvancedmaterial(willnotappearinHWsandexam)

AnalyzingQuicksort:AverageCase

•  Assumingrandominput,average-caserunningDmeismuchclosertoO(nlgn)thanO(n2)

•  First,amoreintuiDveexplanaDon/example:– SupposethatparDDon()alwaysproducesa9-to-1split.Thislooksquiteunbalanced!

– Therecurrenceisthus:T(n)=T(9n/10)+T(n/10)+n–  Howdeepwilltherecursiongo?

Use n instead of O(n) for convenience (how?)


•  IntuiDvely,areal-liferunofquicksortwillproduceamixof“bad”and“good”splits– Randomlydistributedamongtherecursiontree– PretendforintuiDonthattheyalternatebetweenbest-case(n/2:n/2)andworst-case(n-1:1)

– Whathappensifwebad-splitrootnode,thengood-splittheresul1ngsize(n-1)node?


•  IntuiDvely,areal-liferunofquicksortwillproduceamixof“bad”and“good”splits– Randomlydistributedamongtherecursiontree– PretendforintuiDonthattheyalternatebetweenbest-case(n/2:n/2)andworst-case(n-1:1)

– Whathappensifwebad-splitrootnode,thengood-splittheresul1ngsize(n-1)node?• WefailEnglish

AnalyzingQuicksort:AverageCase•  IntuiDvely,areal-liferunofquicksortwillproduceamixof“bad”and“good”splits–  Randomlydistributedamongtherecursiontree–  PretendforintuiDonthattheyalternatebetweenbest-case(n/2:n/2)andworst-case(n-1:1)

– Whathappensifwebad-splitrootnode,thengood-splittheresul1ngsize(n-1)node?•  Weendupwiththreesubarrays,size1,(n-1)/2,(n-1)/2•  Combinedcostofsplits=n+n-1=2n-1=O(n)•  Noworsethanifwehadgood-splittherootnode!


•  IntuiDvely,theO(n)costofabadsplit(or2or3badsplits)canbeabsorbedintotheO(n)costofeachgoodsplit

•  ThusrunningDmeofalternaDngbadandgoodsplitsissDllO(nlgn),withslightlyhigherconstants

•  Howcanwebemorerigorous?


•  Forsimplicity,assume:– AllinputsdisDnct(norepeats)– Slightlydifferentpartition() procedure

•  parDDonaroundarandomelement,whichisnotincludedinsubarrays•  allsplits(0:n-1,1:n-2,2:n-3,…,n-1:0)equallylikely

•  Whatistheprobabilityofapar1cularsplithappening?

•  Answer:1/n

AnalyzingQuicksort:AverageCase•  SoparDDongeneratessplits(0:n-1,1:n-2,2:n-3,…,n-2:1,n-1:0)eachwithprobability1/n

•  IfT(n)istheexpectedrunningDme,

•  Whatiseachtermunderthesumma1onfor?•  WhatistheΘ(n)termfor?

( ) ( ) ( )[ ] ( )∑−

=

Θ+−−+=1

0

11 n

knknTkT

nnT


•  So…

– Note:thisisjustlikethebook’srecurrence(p166),exceptthatthesummaDonstartswithk=0

– We’lltakecareofthatinasecond

( ) ( ) ( )[ ] ( )

( ) ( )∑

∑

−

=

−

=

Θ+=

Θ+−−+=

1

0

1

0

2

11

n

k

n

k

nkTn

nknTkTn

nT

Write it on the board


•  WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer– AssumethattheinducDvehypothesisholds– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn


•  WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer

• What’stheanswer?

– AssumethattheinducDvehypothesisholds– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn



•  T(n)=O(nlgn)– AssumethattheinducDvehypothesisholds– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn



•  T(n)=O(nlgn)– AssumethattheinducDvehypothesisholds

• What’stheinduc1vehypothesis?

– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn




•  T(n)≤anlgn+bforsomeconstantsaandb

– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn





– SubsDtuteitinforsomevalue<n• Whatvalue?

– Provethatitfollowsforn





– SubsDtuteitinforsomevalue<n•  Thevaluekintherecurrence

– Provethatitfollowsforn




•  T(n)≤anlgn+bforsomeconstantsaandb– SubsDtuteitinforsomevalue<n

•  Thevaluekintherecurrence– Provethatitfollowsforn

•  Grindthroughit…

Note: leaving the same recurrence as the book

What are we doing here?

AnalyzingQuicksort:AverageCase( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( )

( ) ( )∑

∑

∑

∑

∑

−

=

−

=

−

=

−

=

−

=

Θ++=

Θ+++=

Θ+⎥⎦

⎤⎢⎣

⎡++≤

Θ++≤

Θ+=

1

1

1

1

1

1

1

0

1

0

lg2

2lg2

lg2

lg2

2

n

k

n

k

n

k

n

k

n

k

nbkakn

nnbbkak

n

nbkakbn

nbkakn

nkTn

nT The recurrence to be solved



Plug in inductive hypothesis

Expand out the k=0 case

2b/n is just a constant, so fold it into Θ(n)



Evaluate the summation: b+b+…+b = b (n-1)

The recurrence to be solved

Since n-1<n, 2b(n-1)/n < 2b


( ) ( ) ( )

( )

( )

( )nbkkna

nnnbkk

na

nbn

kakn

nbkakn

nT

n

k

n

k

n

k

n

k

n

k

Θ++≤

Θ+−+=

Θ++=

Θ++=

∑

∑

∑∑

∑

−

=

−

=

−

=

−

=

−

=

2lg2

)1(2lg2

2lg2

lg2

1

1

1

1

1

1

1

1

1

1

What are we doing here? Distribute the summation

This summation gets its own set of slides later

How did we do this? Pick a large enough that an/4 dominates Θ(n)+b

What are we doing here? Remember, our goal is to get T(n) ≤ an lg n + b

What the hell? We’ll prove this later

What are we doing here? Distribute the (2a/n) term

The recurrence to be solved


( ) ( )

( )

( )

( )

bnan

nabnbnan

nbnanan

nbnnnna

nbkknanTn

k

+≤

⎟⎠

⎞⎜⎝

⎛ −+Θ++=

Θ++−=

Θ++⎟⎠

⎞⎜⎝

⎛ −≤

Θ++≤ ∑−

=

lg4

lg

24

lg

281lg

212

2lg2

22

1

1


•  SoT(n)≤anlgn+bforcertainaandb– ThustheinducDonholds– ThusT(n)=O(nlgn)– ThusquicksortrunsinO(nlgn)Dmeonaverage(phew!)

•  Ohyeah,thesummaDon…

What are we doing here? The lg k in the second term is bounded by lg n

TightlyBoundingTheKeySummaDon

⎡ ⎤

⎡ ⎤

⎡ ⎤

⎡ ⎤

⎡ ⎤

⎡ ⎤∑∑

∑∑

∑∑∑

−

=

−

=

−

=

−

=

−

=

−

=

−

=

+=

+≤

+=

1

2

12

1

1

2

12

1

1

2

12

1

1

1

lglg

lglg

lglglg

n

nk

n

k

n

nk

n

k

n

nk

n

k

n

k

knkk

nkkk

kkkkkk

What are we doing here? Move the lg n outside the summation

What are we doing here? Split the summation for a tighter bound

The summation bound so far


⎡ ⎤

⎡ ⎤

( )⎡ ⎤

⎡ ⎤

( )⎡ ⎤

⎡ ⎤

( )⎡ ⎤

⎡ ⎤∑∑

∑∑

∑∑

∑∑∑

−

=

−

=

−

=

−

=

−

=

−

=

−

=

−

=

−

=

+−=

+−=

+≤

+≤

1

2

12

1

1

2

12

1

1

2

12

1

1

2

12

1

1

1

lg1lg

lg1lg

lg2lg

lglglg

n

nk

n

k

n

nk

n

k

n

nk

n

k

n

nk

n

k

n

k

knkn

knnk

knnk

knkkkk

What are we doing here? The lg k in the first term is bounded by lg n/2

What are we doing here? lg n/2 = lg n - 1

What are we doing here? Move (lg n - 1) outside the summation



( )⎡ ⎤

⎡ ⎤

⎡ ⎤ ⎡ ⎤

⎡ ⎤

⎡ ⎤

( ) ⎡ ⎤

∑

∑∑

∑∑∑

∑∑∑

−

=

−

=

−

=

−

=

−

=

−

=

−

=

−

=

−

=

−⎟⎠

⎞⎜⎝

⎛ −=

−=

+−=

+−≤

12

1

12

1

1

1

1

2

12

1

12

1

1

2

12

1

1

1

2)(1lg

lg

lglg

lg1lglg

n

k

n

k

n

k

n

nk

n

k

n

k

n

nk

n

k

n

k

knnn

kkn

knkkn

knknkk

What are we doing here? Distribute the (lg n - 1)

What are we doing here? The summations overlap in range; combine them

What are we doing here? The Guassian series



( ) ⎡ ⎤

( )[ ]

( )[ ]

( )48

1lglg21

1222

1lg121

lg121

lg2

)(1lg

22

12

1

12

1

1

1

nnnnnn

nnnnn

knnn

knnnkk

n

k

n

k

n

k

+−−≤

⎟⎠

⎞⎜⎝

⎛ −⎟⎠

⎞⎜⎝

⎛−−≤

−−≤

−⎟⎠

⎞⎜⎝

⎛ −≤

∑

∑∑−

=

−

=

−

=

What are we doing here? Rearrange first term, place upper bound on second

What are we doing? X Guassian series

What are we doing? Multiply it all out


( )

!!Done!

2when81lg

21

481lglg

21lg

22

221

1

≥−≤

+−−≤∑−

=

nnnn

nnnnnnkkn

k

Documents

403: Algorithms and Data Structures Quicksortpetko/classes/FALL16-CSI403-slides/08-Quicksort.pdfAnalyzing Quicksort: Average Case • IntuiDvely, a real-life run of quicksort will