Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
403:AlgorithmsandDataStructures
QuicksortFall2016UAlbany
ComputerScienceSomeslidesborrowedfromDavidLuebke
Sofar:SorDng
• Inser6on O(n2) in-place• Merge O(nlogn) 2ndarraytomerge • Heapsort O(nlogn) in-place• Quicksort fromO(nlogn)toO(n2)in-place– verygoodinpracDce(smallconstants)– QuadraDcDmeisrare
Next
Algorithm Time Space
Quicksort
• Anotherdivide-and-conqueralgorithm– DIVIDE:ThearrayA[p..r]ispar11onedintotwonon-emptysubarraysA[p..q]andA[q+1..r]• Invariant:AllelementsinA[p..q]arelessthanallelementsinA[q+1..r]
– CONQUER:Thesubarraysarerecursivelysortedbycallstoquicksort
– COMBINE:Unlikemergesort,nocombiningstep:twosubarraysformanalready-sortedarray
QuicksortCode
Quicksort(A, p, r) { if (p < r) { q = Partition(A, p, r); Quicksort(A, p, q); Quicksort(A, q+1, r); } }
ParDDon
• Clearly,alltheacDontakesplaceinthepartition()funcDon– Rearrangesthesubarrayinplace– Endresult:
• Twosubarrays• Allvaluesinfirstsubarray≤allvaluesinsecond
– Returnstheindexofthe“pivot”elementseparaDngthetwosubarrays
• Howdoyousupposeweimplementthis?
ParDDonInWords
• ParDDon(A,p,r):– Selectanelementtoactasthe“pivot”(which?)– Growtworegions,A[p..i]andA[j..r]
• AllelementsinA[p..i]<=pivot• AllelementsinA[j..r]>=pivot
– IncrementiunDlA[i]>=pivot– DecrementjunDlA[j]<=pivot– SwapA[i]andA[j]– RepeatunDli>=j– Returnj Note: slightly different from
book’s partition()
ParDDonCodePartition(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; until A[j] <= x; repeat i++; until A[i] >= x; if (i < j) Swap(A, i, j); else return j;
Illustrate on A = {4,5,9,7,2,13,6,3};
i j
Choosepivotx
Scanlookingforelementexceeding
x
Scanlookingforelementatmost
x
Whenwefindsuchelements,Exchangethem
Pivot=4Goal:
4 5 9 7 2 13 6 3
i=0 j=9
3 5 9 7 2 13 6 4
i=0 j=9i=2 j=5
3 2 9 7 5 13 6 4
i=3 j=5i=2j=2
i>j:DONE
<=x>=x
Example
AssumeallelementsaredisDnct
ParDDonCodePartition(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; until A[j] <= x; repeat i++; until A[i] >= x; if (i < j) Swap(A, i, j); else return j;
partition() runs in O(n) time • O(1) at each element: skip or
swap • Linear in the size of the array
What is the running time of partition()?
BacktoQuicksortQuicksort(A, p, r) if (p < r) q = Partition(A, p, r);
Quicksort(A, p, q); Quicksort(A, q+1, r);
3 9 5 7
Qsort(A,1,4)
A
Part(A,1,4)Returns:1
3 9 5 7
Qsort(A,1,1) Qsort(A,2,4)
Part(A,2,4)Returns:3
3 7 5 9
Qsort(A,2,3)
Part(A,2,4)Returns:2
3 5 7 9
Qsort(A,2,2) Qsort(A,3,3)
Qsort(A,4,4)
AnalyzingQuicksort
• Whatwillbeabadcaseforthealgorithm?– ParDDonisalwaysunbalanced
• Whatwillbethebestcaseforthealgorithm?– ParDDonisperfectlybalanced
• Whichismorelikely?– Thelader,byfar,except...
• Willanypar1cularinputelicittheworstcase?– Yes:Already-sortedinput
AnalyzingQuicksort:Balancedsplits
• Inthebalancedsplitcase:T(n)=2T(n/2)+Θ(n)
• Whatdoesthisworkoutto?T(n)=Θ(nlgn)
Takehome:Agoodbalanceisimportant
AnalyzingQuicksort:Sortedcase
• Sortedcase:T(1)=Θ(1)T(n)=T(n-1)+Θ(n)bysubsDtuDon…
T(n)=T(1)+nΘ(n)
• WorksouttoT(n)=Θ(n2)
2 3 6 7 10 13 14 16
Firstcall:jwilldecreaseto1(nsteps)Second:jdecreaseto2(n-1steps)…n+n-1+n-2+…=Θ(n2)
Issortedreallytheworstcase?
• Argueformallythatthingscannotgetworse• Aformalargumentwithgeneralsplit• Assumethateverysplitresultsintwoarrays– Sizeq– Sizen-q
• T(n)=max1<=q<=n-1[T(q)+T(n-q)]+O(n)– whereT(1)=O(1)
• ShowthatT(n)=O(n2)ITCANNOTGET
WORSE
Averagebehavior:IntuiDon
• Worstcase:assumes1:n-1split– rareinpracDce
• TheO(nlogn)behavioroccursevenifthesplitissay10%:90%
• Ifallsplitsareequallylikely– 1:n-1,2:n-2…n-1:1– thenonaverage,wewillnotgetaverytalltree– detailsinextraslideattheend(notrequired)
AvoidingtheO(n2)case
• TherealliabilityofquicksortisthatitrunsinO(n2)onalready-sortedinput
• SoluDons– Randomizetheinputarray– Pickarandompivotelement– choose3elementsandtakemedianforpivot
• Howwillthesesolvetheproblem?– ByensuringthatnoparDcularinputcanbechosentomakequicksortruninO(n2)Dme
OtherImprovements(lowerconstants)
• Whenasubarrayissmall(saysmallerthan5)switchtoasimplesorDngproceduresayinserDonsortinsteadofQuicksort– whydoesthishelp?
• Pickmorethanonepivot– ParDDonsthearrayinmorethan2parts– Smallernumberofcomparisons(1.9nlognvs2nlogn)andoverallbederperformanceinpracDce
– Details:Kushagraetal.“MulD-PivotQuicksort:TheoryandExperiments”,SIAM,2013
Announcements
• ReadthroughChapter7• HW2dueonWednesday
Extraslides*
• Averagecaserigorousanalysisfollows• Thisisadvancedmaterial(willnotappearinHWsandexam)
AnalyzingQuicksort:AverageCase
• Assumingrandominput,average-caserunningDmeismuchclosertoO(nlgn)thanO(n2)
• First,amoreintuiDveexplanaDon/example:– SupposethatparDDon()alwaysproducesa9-to-1split.Thislooksquiteunbalanced!
– Therecurrenceisthus:T(n)=T(9n/10)+T(n/10)+n– Howdeepwilltherecursiongo?
Use n instead of O(n) for convenience (how?)
AnalyzingQuicksort:AverageCase
• IntuiDvely,areal-liferunofquicksortwillproduceamixof“bad”and“good”splits– Randomlydistributedamongtherecursiontree– PretendforintuiDonthattheyalternatebetweenbest-case(n/2:n/2)andworst-case(n-1:1)
– Whathappensifwebad-splitrootnode,thengood-splittheresul1ngsize(n-1)node?
AnalyzingQuicksort:AverageCase
• IntuiDvely,areal-liferunofquicksortwillproduceamixof“bad”and“good”splits– Randomlydistributedamongtherecursiontree– PretendforintuiDonthattheyalternatebetweenbest-case(n/2:n/2)andworst-case(n-1:1)
– Whathappensifwebad-splitrootnode,thengood-splittheresul1ngsize(n-1)node?• WefailEnglish
AnalyzingQuicksort:AverageCase• IntuiDvely,areal-liferunofquicksortwillproduceamixof“bad”and“good”splits– Randomlydistributedamongtherecursiontree– PretendforintuiDonthattheyalternatebetweenbest-case(n/2:n/2)andworst-case(n-1:1)
– Whathappensifwebad-splitrootnode,thengood-splittheresul1ngsize(n-1)node?• Weendupwiththreesubarrays,size1,(n-1)/2,(n-1)/2• Combinedcostofsplits=n+n-1=2n-1=O(n)• Noworsethanifwehadgood-splittherootnode!
AnalyzingQuicksort:AverageCase
• IntuiDvely,theO(n)costofabadsplit(or2or3badsplits)canbeabsorbedintotheO(n)costofeachgoodsplit
• ThusrunningDmeofalternaDngbadandgoodsplitsissDllO(nlgn),withslightlyhigherconstants
• Howcanwebemorerigorous?
AnalyzingQuicksort:AverageCase
• Forsimplicity,assume:– AllinputsdisDnct(norepeats)– Slightlydifferentpartition() procedure
• parDDonaroundarandomelement,whichisnotincludedinsubarrays• allsplits(0:n-1,1:n-2,2:n-3,…,n-1:0)equallylikely
• Whatistheprobabilityofapar1cularsplithappening?
• Answer:1/n
AnalyzingQuicksort:AverageCase• SoparDDongeneratessplits(0:n-1,1:n-2,2:n-3,…,n-2:1,n-1:0)eachwithprobability1/n
• IfT(n)istheexpectedrunningDme,
• Whatiseachtermunderthesumma1onfor?• WhatistheΘ(n)termfor?
( ) ( ) ( )[ ] ( )∑−
=
Θ+−−+=1
0
11 n
knknTkT
nnT
AnalyzingQuicksort:AverageCase
• So…
– Note:thisisjustlikethebook’srecurrence(p166),exceptthatthesummaDonstartswithk=0
– We’lltakecareofthatinasecond
( ) ( ) ( )[ ] ( )
( ) ( )∑
∑
−
=
−
=
Θ+=
Θ+−−+=
1
0
1
0
2
11
n
k
n
k
nkTn
nknTkTn
nT
Write it on the board
AnalyzingQuicksort:AverageCase
• WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer– AssumethattheinducDvehypothesisholds– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn
AnalyzingQuicksort:AverageCase
• WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer
• What’stheanswer?
– AssumethattheinducDvehypothesisholds– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn
AnalyzingQuicksort:AverageCase
• WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer
• T(n)=O(nlgn)– AssumethattheinducDvehypothesisholds– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn
AnalyzingQuicksort:AverageCase
• WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer
• T(n)=O(nlgn)– AssumethattheinducDvehypothesisholds
• What’stheinduc1vehypothesis?
– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn
AnalyzingQuicksort:AverageCase
• WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer
• T(n)=O(nlgn)– AssumethattheinducDvehypothesisholds
• T(n)≤anlgn+bforsomeconstantsaandb
– SubsDtuteitinforsomevalue<n– Provethatitfollowsforn
AnalyzingQuicksort:AverageCase
• WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer
• T(n)=O(nlgn)– AssumethattheinducDvehypothesisholds
• T(n)≤anlgn+bforsomeconstantsaandb
– SubsDtuteitinforsomevalue<n• Whatvalue?
– Provethatitfollowsforn
AnalyzingQuicksort:AverageCase
• WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer
• T(n)=O(nlgn)– AssumethattheinducDvehypothesisholds
• T(n)≤anlgn+bforsomeconstantsaandb
– SubsDtuteitinforsomevalue<n• Thevaluekintherecurrence
– Provethatitfollowsforn
AnalyzingQuicksort:AverageCase
• WecansolvethisrecurrenceusingthedreadedsubsDtuDonmethod– Guesstheanswer
• T(n)=O(nlgn)– AssumethattheinducDvehypothesisholds
• T(n)≤anlgn+bforsomeconstantsaandb– SubsDtuteitinforsomevalue<n
• Thevaluekintherecurrence– Provethatitfollowsforn
• Grindthroughit…
Note: leaving the same recurrence as the book
What are we doing here?
AnalyzingQuicksort:AverageCase( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )∑
∑
∑
∑
∑
−
=
−
=
−
=
−
=
−
=
Θ++=
Θ+++=
Θ+⎥⎦
⎤⎢⎣
⎡++≤
Θ++≤
Θ+=
1
1
1
1
1
1
1
0
1
0
lg2
2lg2
lg2
lg2
2
n
k
n
k
n
k
n
k
n
k
nbkakn
nnbbkak
n
nbkakbn
nbkakn
nkTn
nT The recurrence to be solved
What are we doing here?
What are we doing here?
Plug in inductive hypothesis
Expand out the k=0 case
2b/n is just a constant, so fold it into Θ(n)
What are we doing here?
What are we doing here?
Evaluate the summation: b+b+…+b = b (n-1)
The recurrence to be solved
Since n-1<n, 2b(n-1)/n < 2b
AnalyzingQuicksort:AverageCase
( ) ( ) ( )
( )
( )
( )nbkkna
nnnbkk
na
nbn
kakn
nbkakn
nT
n
k
n
k
n
k
n
k
n
k
Θ++≤
Θ+−+=
Θ++=
Θ++=
∑
∑
∑∑
∑
−
=
−
=
−
=
−
=
−
=
2lg2
)1(2lg2
2lg2
lg2
1
1
1
1
1
1
1
1
1
1
What are we doing here? Distribute the summation
This summation gets its own set of slides later
How did we do this? Pick a large enough that an/4 dominates Θ(n)+b
What are we doing here? Remember, our goal is to get T(n) ≤ an lg n + b
What the hell? We’ll prove this later
What are we doing here? Distribute the (2a/n) term
The recurrence to be solved
AnalyzingQuicksort:AverageCase
( ) ( )
( )
( )
( )
bnan
nabnbnan
nbnanan
nbnnnna
nbkknanTn
k
+≤
⎟⎠
⎞⎜⎝
⎛ −+Θ++=
Θ++−=
Θ++⎟⎠
⎞⎜⎝
⎛ −≤
Θ++≤ ∑−
=
lg4
lg
24
lg
281lg
212
2lg2
22
1
1
AnalyzingQuicksort:AverageCase
• SoT(n)≤anlgn+bforcertainaandb– ThustheinducDonholds– ThusT(n)=O(nlgn)– ThusquicksortrunsinO(nlgn)Dmeonaverage(phew!)
• Ohyeah,thesummaDon…
What are we doing here? The lg k in the second term is bounded by lg n
TightlyBoundingTheKeySummaDon
⎡ ⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤
⎡ ⎤∑∑
∑∑
∑∑∑
−
=
−
=
−
=
−
=
−
=
−
=
−
=
+=
+≤
+=
1
2
12
1
1
2
12
1
1
2
12
1
1
1
lglg
lglg
lglglg
n
nk
n
k
n
nk
n
k
n
nk
n
k
n
k
knkk
nkkk
kkkkkk
What are we doing here? Move the lg n outside the summation
What are we doing here? Split the summation for a tighter bound
The summation bound so far
TightlyBoundingTheKeySummaDon
⎡ ⎤
⎡ ⎤
( )⎡ ⎤
⎡ ⎤
( )⎡ ⎤
⎡ ⎤
( )⎡ ⎤
⎡ ⎤∑∑
∑∑
∑∑
∑∑∑
−
=
−
=
−
=
−
=
−
=
−
=
−
=
−
=
−
=
+−=
+−=
+≤
+≤
1
2
12
1
1
2
12
1
1
2
12
1
1
2
12
1
1
1
lg1lg
lg1lg
lg2lg
lglglg
n
nk
n
k
n
nk
n
k
n
nk
n
k
n
nk
n
k
n
k
knkn
knnk
knnk
knkkkk
What are we doing here? The lg k in the first term is bounded by lg n/2
What are we doing here? lg n/2 = lg n - 1
What are we doing here? Move (lg n - 1) outside the summation
The summation bound so far
TightlyBoundingTheKeySummaDon
( )⎡ ⎤
⎡ ⎤
⎡ ⎤ ⎡ ⎤
⎡ ⎤
⎡ ⎤
( ) ⎡ ⎤
∑
∑∑
∑∑∑
∑∑∑
−
=
−
=
−
=
−
=
−
=
−
=
−
=
−
=
−
=
−⎟⎠
⎞⎜⎝
⎛ −=
−=
+−=
+−≤
12
1
12
1
1
1
1
2
12
1
12
1
1
2
12
1
1
1
2)(1lg
lg
lglg
lg1lglg
n
k
n
k
n
k
n
nk
n
k
n
k
n
nk
n
k
n
k
knnn
kkn
knkkn
knknkk
What are we doing here? Distribute the (lg n - 1)
What are we doing here? The summations overlap in range; combine them
What are we doing here? The Guassian series
The summation bound so far
TightlyBoundingTheKeySummaDon
( ) ⎡ ⎤
( )[ ]
( )[ ]
( )48
1lglg21
1222
1lg121
lg121
lg2
)(1lg
22
12
1
12
1
1
1
nnnnnn
nnnnn
knnn
knnnkk
n
k
n
k
n
k
+−−≤
⎟⎠
⎞⎜⎝
⎛ −⎟⎠
⎞⎜⎝
⎛−−≤
−−≤
−⎟⎠
⎞⎜⎝
⎛ −≤
∑
∑∑−
=
−
=
−
=
What are we doing here? Rearrange first term, place upper bound on second
What are we doing? X Guassian series
What are we doing? Multiply it all out
TightlyBoundingTheKeySummaDon
( )
!!Done!
2when81lg
21
481lglg
21lg
22
221
1
≥−≤
+−−≤∑−
=
nnnn
nnnnnnkkn
k