Upload
hadung
View
225
Download
0
Embed Size (px)
Citation preview
14.09.17
1
AdvancedAlgorithmics(6EAP)MTAT.03.238
Linearstructures,sorting,searching,etc
JaakVilo2017Fall
1JaakVilo
Big-Ohnotationclasses
Class Informal Intuition Analogy
f(n)∈ ο (g(n)) fisdominatedbyg Strictlybelow <f(n)∈O(g(n)) Boundedfromabove Upperbound ≤f(n)∈Θ(g(n)) Boundedfrom
aboveand below“equalto” =
f(n)∈Ω(g(n)) Boundedfrombelow Lowerbound ≥f(n)∈ω(g(n)) fdominatesg Strictlyabove >
Conclusions
• Algorithmcomplexitydealswiththebehaviorinthelong-term– worstcase -- typical– averagecase -- quitehard– bestcase -- bogus,“cheating”
• Inpractice,long-termsometimesnotnecessary– E.g.forsorting20elements,youdon’tneedfancyalgorithms…
Linear,sequential,ordered,list…
Memory, disk, tape etc – is an ordered sequentially addressed media.
Physicalorderedlist~array
• Memory/address/– Garbagecollection
• Files(character/bytelist/linesintextfile,…)
• Disk– Diskfragmentation
Lineardatastructures:Arrays• Array• Bidirectionalmap• Bitarray• Bitfield• Bitboard• Bitmap• Circularbuffer• Controltable• Image• Dynamicarray• Gapbuffer
• Hashedarraytree• Heightmap• Lookuptable• Matrix• Parallelarray• Sortedarray• Sparsearray• Sparsematrix• Iliffevector• Variable-lengtharray
14.09.17
2
Lineardatastructures:Lists
• Doublylinkedlist• Arraylist• Linkedlist• Self-organizinglist• Skiplist• Unrolledlinkedlist• VList
• Xorlinkedlist• Zipper• Doublyconnectededgelist
• Differencelist
Lists:Array
3 6 7 5 2
0 1 size MAX_SIZE-1
L = int[MAX_SIZE]
L[2]=7
Lists:Array
3 6 7 5 2
0 1 size MAX_SIZE-1
L = int[MAX_SIZE]
L[2]=7 L[size++] = new
3 6 7 5 2
1 2 size MAX_SIZE
L[3]=7 L[++size] = new
Multiplelists,2-D-arrays,etc…
2Darray
& A[i,j] = A + i*(nr_el_in_row*el_size) + j*el_size
LinearLists
• Operationswhichonemaywanttoperformonalinearlistofn elementsinclude:
– gainaccess tothekthelementofthelisttoexamine and/orchange thecontents
– insert anewelementbeforeorafterthekthelement
–delete thekthelementofthelist
Reference:Knuth,TheArtofComptuerProgramming,Vol1,FundamentalAlgorithms,3rd Ed.,p.238-9
14.09.17
3
AbstractDataType(ADT)• High-leveldefinitionofdatatypes
• AnADTspecifies– Acollection ofdata– Asetofoperations onthedataorsubsetsofthedata
• ADTdoesnotspecifyhowtheoperationsshouldbeimplemented
• Examples– vector,list,stack,queue,deque,priorityqueue,table(map),
associativearray,set,graph,digraph
ADT• Adatatype isasetofvaluesandanassociatedsetof
operations• Adatatypeisabstract iffitiscompletelydescribedbyitsset
ofoperationsregradlessofitsimplementation• Thismeansthatitispossibletochangetheimplementation
ofthedatatypewithoutchangingitsuse• Thedatatypeisthusdescribedbyasetofprocedures• Theseoperationsaretheonlythingthatauserofthe
abstractioncanassume
Primitive&compositetypesPrimitivetypes• Boolean (forboolean values
True/False)• Char (forcharactervalues)• int (forintegralorfixed-precision
values)• Float (forstoringrealnumber
values)• Double (alargersizeoftypefloat)• String (forstringofchars)• Enumerated type
Compositetypes• Array• Record (alsocalledtupleorstruct)
• Union• Taggedunion(alsocalledavariant,variantrecord,discriminatedunion,ordisjointunion)
• Plainolddatastructure
AbstractDataTypes(ADT)• SomecommonADTs,whichhaveprovedusefulinagreatvarietyofapplications,are
• EachoftheseADTsmaybedefinedinmanywaysandvariants,notnecessarilyequivalent.
ContainerListSetMultisetMapMultimap
StackGraphQueuePriority queueDouble-ended queueDouble-ended priority queue
Abstractdatatypes:
• Dictionary (key,value)• Stack(LIFO)• Queue(FIFO)• Queue(double-ended)• Priorityqueue(fetchhighest-priorityobject)• ...
Dictionary
• Containerofkey-element(k,e)pairs• Requiredoperations:
– insert(k,e),– remove(k),– find(k),– isEmpty()
• Mayalsosupport(whenanorderisprovided):– closestKeyBefore(k),– closestElemAfter(k)
• Note:Noduplicatekeys
14.09.17
4
Somedatastructuresfor DictionaryADT• Unordered
– Array– Sequence/list
• Ordered– Array– Sequence(SkipLists)– BinarySearchTree(BST)– AVLtrees,red-blacktrees– (2;4)Trees– B-Trees
• Valued– HashTables– ExtendibleHashing
Lineardatastructures
Arrays• Array• Bidirectional
map• Bitarray• Bitfield• Bitboard• Bitmap• Circularbuffer• Controltable• Image• Dynamicarray• Gapbuffer
• Hashedarraytree
• Heightmap• Lookuptable• Matrix• Parallelarray• Sortedarray• Sparsearray• Sparsematrix• Iliffe vector• Variable-length
array
Lists• Doublylinkedlist• Linkedlist• Self-organizinglist• Skiplist• Unrolledlinkedlist• VList• Xor linkedlist• Zipper• Doublyconnectededgelist
Trees…Binarytrees• AAtree• AVLtree• Binarysearch
tree• Binarytree• Cartesiantree• Pagoda• Randomized
binarysearchtree
• Red-blacktree• Rope• Scapegoattree• Self-balancing
binarysearchtree
• Splaytree• T-tree• Tangotree• Threadedbinary
tree• Toptree
• Treap• Weight-balanced
tree
B-trees• B-tree• B+tree• B*-tree• Bsharptree• Dancingtree• 2-3tree• 2-3-4tree• Queap• Fusiontree• Bx-tree
Heaps• Heap• Binaryheap• Binomialheap• Fibonacciheap• AF-heap• 2-3heap
• Softheap• Pairingheap• Leftistheap• Treap• Beap• Skewheap• Ternaryheap• D-ary heap•• Tries• Trie• Radixtree• Suffixtree• Suffixarray• Compressed
suffixarray• FM-index• Generalised
suffixtree• B-trie• Judyarray• X-fasttrie• Y-fasttrie
• Ctrie
Multiway trees• Ternarysearch
tree• And–ortree• (a,b)-tree• Link/cuttree• SPQR-tree• Spaghettistack• Disjoint-setdata
structure• Fusiontree• Enfilade• Exponentialtree• Fenwicktree• VanEmde Boas
tree
Space-partitioningtrees• Segmenttree
• Intervaltree• Rangetree• Bin• Kd-tree• Implicitkd-tree• Min/maxkd-tree• Adaptivek-dtree• Kdb tree• Quadtree• Octree• Linearoctree• Z-order• UB-tree• R-tree• R+tree• R*tree• HilbertR-tree• X-tree• Metrictree• Covertree• M-tree• VP-tree
• BK-tree• Bounding
intervalhierarchy
• BSPtree• Rapidly-exploring
randomtree
Application-specifictrees• Syntaxtree• Abstractsyntax
tree• Parsetree• Decisiontree• Alternating
decisiontree• Minimax tree• Expectiminimax
tree• Fingertree
Hashes,Graphs,Other• Hashes• Bloomfilter• Distributedhashtable• Hasharraymapped
trie• Hashlist• Hashtable• Hashtree• Hashtrie• Koorde• Prefixhashtree•
Graphs• Adjacencylist• Adjacencymatrix• Graph-structured
stack• Scenegraph• Binarydecision
diagram• Zerosuppressed
decisiondiagram• And-invertergraph• Directedgraph• Directedacyclicgraph
• Propositionaldirectedacyclicgraph
• Multigraph• Hypergraph
Other• Lightmap• Wingededge• Quad-edge• Routingtable• Symboltable
Lists:Array
3 6 7 5 2
0 1 size MAX_SIZE-1
3 6 7 8 2
0 1 size
5 2 Insert 8 after L[2]
3 6 7 8 2
0 1 size
5 2 Delete last
*array (memory address)sizeMAX_SIZE
Lists:Array
3 6 7 8 2
0 1 size
5 2 Insert 8 after L[2]
3 6 7 8 2
0 1 size
5 2 Delete last
• Access i O(1)• Insert to end O(1)• Delete from end O(1)• Insert O(n) • Delete O(n)• Search O(n)
14.09.17
5
LinearLists
• Otheroperationsonalinearlistmayinclude:– determinethenumberofelements– searchthelist– sortalist– combinetwoormorelinearlists– splitalinearlistintotwoormorelists– makeacopyofalist
Stack
• push(x) -- addtoend(addtotop)• pop() -- fetchfromend(top)
• O(1)inallreasonablecasesJ
• LIFO– LastIn,FirstOut
Linkedlistshead tail
head tail
Singly linked
Doubly linked
Linkedlists:addhead tail
head tail
size
Linkedlists:delete(+garbagecollection?)
head tail
head tail
size
Operations
• Arrayindexedfrom0 ton – 1:
• Singly-linkedlistwithheadandtailpointers
1 undertheassumptionwehaveapointertothekthnode,O(n) otherwise
k = 1 1 < k < n k = naccess/change the kth element O(1) O(1) O(1)
insert before or after the kth element O(n) O(n) O(1)delete the kth element O(n) O(n) O(1)
k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)
insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)delete the kth element O(1) O(n) O(n)
14.09.17
6
ImprovingRun-TimeEfficiency
• Wecanimprovetherun-timeefficiencyofalinkedlistbyusingadoubly-linkedlist:
Singly-linkedlist:
Doubly-linkedlist:
– Improvementsatoperationsrequiringaccesstothepreviousnode
– Increasesmemoryrequirements...
ImprovingEfficiency
Singly-linkedlist:
Doubly-linkedlist:
1 undertheassumptionwehaveapointertothekthnode,O(n) otherwise
k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)
insert before or after the kth element O(1) O(1)1 O(1)delete the kth element O(1) O(1)1 O(1)
k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)
insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)delete the kth element O(1) O(n) O(n)
• Arrayindexedfrom0 ton – 1:
• Singly-linkedlistwithheadandtailpointers
• Doublylinkedlist
k = 1 1 < k < n k = naccess/change the kth element O(1) O(1) O(1)insert before or after the kth element O(n) O(n) O(1)
delete the kth element O(n) O(n) O(1)
k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)delete the kth element O(1) O(n) O(n)
k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)
insert before or after the kth element O(1) O(1)1 O(1)
delete the kth element O(1) O(1)1 O(1)
Introductiontolinkedlists• Considerthefollowingstructdefinition
struct node {string word;int num;node *next; //pointer for the next node
};node *p = new node;
? ?
num word next
p ?
Introductiontolinkedlists:insertinganode• node *p;
• p = new node;
• p->num = 5; • p->word = "Ali";• p->next = NULL
•5 Ali
num word next
p
Introductiontolinkedlists:addinganewnode
• Howcanyouaddanothernodethatispointedbyp->link?
• node *p;• p = new node;• p->num = 5; • p->word = "Ali";• p->next = NULL;• node *q;
•
5 Ali
num word link
?p
q
14.09.17
7
Introductiontolinkedlistsnode *p;p = new node;p->num = 5; p->word = "Ali";p->next = NULL;
node *q;q = new node;
5 Ali
num word link
? ? ?
num word link
?
q
p
Introductiontolinkedlistsnode *p, *q;p = new node;p->num = 5; p->word = "Ali";p->next = NULL;
q = new node;q->num=8;q->word = "Veli";
5 Ali
num word next
? 8 Veli
num word next
?p
q
Introductiontolinkedlistsnode *p, *q;p = new node;p->num = 5; p->word = "Ali";p->next = NULL;
q = new node;q->num=8;q->word = "Veli";p->next = q;q->next = NULL;
5 Ali
num word link
? 8 Veli
num word link
p
q
PointersinC/C++
p=newnode;deletep;p=newnode[20];
p=malloc(sizeof(node));freep;
p=malloc(sizeof(node)*20);(p+10)->next=NULL;/*11thelements*/
Book-keeping
• malloc,new – “remember” whathasbeencreatedfree(p),delete (C/C++)
• Whenyouneedmanysmallareastobeallocated,reserveabigchunk(array)andmaintainyourownsetoffreeobjects
• Elementsofarrayofobjects canbepointedbythepointertoanobject.
Object
• Object=newobject_type;
• Equalstocreatinganewobjectwithnecessarysizeofallocatedmemory(deletecanfreeit)
14.09.17
8
Somelinks
• http://en.wikipedia.org/wiki/Pointer
• Pointerbasics:http://cslibrary.stanford.edu/106/
• C++MemoryManagement:Whatisthedifferencebetweenmalloc/freeandnew/delete?– http://www.codeguru.com/forum/showthread.php?t=401848
Alternative:arraysandintegers
• Ifyouwanttotestpointersandlinkedlistetc.datastructures,butdonothavepointersfamiliar(yet)
• Usearraysandindexestoarrayelementsinstead…
Replacingpointerswitharrayindex
/
7
5
5
8
/
1
4
3
1 2 3 4 5 6 7head=3
nextkeyprev
8 4 7
head
Maintaininglistoffreeobjects
/
7
5
5
8
/
1
4
3
1 2 3 4 5 6 7head=3
nextkeyprev
8 4 7
head
/
7
5
4 5
8
/
/ 1
4
3
7 21 2 3 4 5 6 7
head=3
nextkeyprev
free=6 free = -1 => array is full
allocate object:new = free;free = next[free] ;
free object xnext[x]= freefree = x
Multiplelists,singlefreelist
/
7
5
4 5
8
/
/ 1
4
3
7
3
/
/
9
6
1 2 3 4 5 6 7nextkeyprev
head1=3 => 8, 4, 7head2=6 => 3, 9free =2 (2)
Hack:allocatemorearrays…
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
AA
AA[ (i-1)/7 ] -> [ (i -1) % 7 ]
LIST(10) = AA[ 1 ][ 2 ]
LIST(19) = AA[ 2 ][ 5 ]
use integer division and mod
14.09.17
9
Queue
• enqueue(x) - addtoend• dequeue() - fetchfrombeginning
• FIFO– FirstInFirstOut
• O(1)inallreasonablecasesJ
Queue(FIFO)
3 6 7 5 2
F L
Queue(basicidea,doesnotcontainallcontrols!)
3 6 7 5 2
F L MAX_SIZE-1
7 5 2
F L MAX_SIZE-1
First = List[F]
Last = List[L-1]
Pop_first : { return List[F++] }
Pop_last : { return List[--L] }
Full: return ( L==MAX_SIZE )Empty: F< 0 or F >= L
Circularbuffer
• Acircularbuffer orringbuffer isadatastructure thatusesasingle,fixed-sizebufferasifitwereconnectedend-to-end.Thisstructurelendsitselfeasilytobufferingdatastreams.
CircularQueue
3 6
FL
First = List[F]
Last = List[ (L-1+MAX_SIZE) % MAX_SIZE ]
Full: return ( (L+1)%MAX_SIZE == F )Empty: F==L
7 5 2
MAX_SIZE-1
Add_to_end( x ) : { List[L]=x ; L= (L+1) % MAX_SIZE ] } // % = modulo
3 6
FL,
7 5 2
MAX_SIZE-1
14.09.17
10
Stack
• push(x) -- addtoend(addtotop)• pop() -- fetchfromend(top)
• O(1)inallreasonablecasesJ
• LIFO– LastIn,FirstOut
Stackbasedlanguages
• Implementapostfixcalculator– ReversePolishnotation
• 543*2- + => 5+((4*3)-2)
• Verysimpletoparse andinterpret
• FORTH,Postscriptarestack-basedlanguages
Arraybasedstack
• Howtoknowhowbigastackshalleverbe?
• Whenfull,allocatebiggertabledynamically,andcopyallpreviousvaluesthere
• O(n)?
3 6 7 5
3 6 7 5 2
• Whenfull,create2xbiggertable,copypreviousnelements:
• Afterevery2k insertions,performO(n)copy
• O(n)individualinsertions+• n/2+n/4+n/8…copy-ing• Total:O(n)effort!
Whataboutdeletions?
• whenn=32->33(copy32,insert1)• delete:33->32
– shouldyoudeleteimmediately?– Deleteonlywhenbecomeslessthan1/4thfull
– Havetodeleteatleastn/2todecrease– Havetoaddatleastntoincreasesize– Mostoperations,O(1)effort– ButfewoperationstakeO(n)tocopy– Foranymoperations,O(m)time
Amortizedanalysis
• Analyzethetimecomplexityovertheentire“lifespan” ofthealgorithm
• Someoperationsthatcostmorewillbe“covered”bymanyotheroperationstakingless
14.09.17
11
ListsanddictionaryADT…
• Howtomaintainadictionaryusing(linked)lists?
• IskinD?– gothroughallelementsdofD,testifd==kO(n)– Ifsorted:d=first(D);while(d<=k)d=next(D);– onaveragen/2tests…
• Add(k,D)=>insert(k,D)=O(1)orO(n)– testforuniqueness
Arraybasedsortedlist
• isdinD?• BinarysearchinD
low highmid
Binarysearch– recursiveBinarySearch(A[0..N-1],value,low,high){
if (high<low)return -1//notfound
mid=(low+high)/2if (A[mid]>value)
return BinarySearch(A,value,low,mid-1)elseif(A[mid]<value)
return BinarySearch(A,value,mid+1,high)else
returnmid//found}
Binarysearch– recursiveBinarySearch(A[0..N-1],value,low,high){
if (high<low)return -1//notfound
mid=low+((high- low)/2) //Note:not(low+high)/2!!if (A[mid]>value)
return BinarySearch(A,value,low,mid-1)elseif(A[mid]<value)
return BinarySearch(A,value,mid+1,high)else
returnmid//found}
Binarysearch– iterativeBinarySearch(A[0..N-1],value){
low=0;high=N- 1;
while (low<=high){mid=low+((high- low)/2)//Note:not(low+high)/2 !!if (A[mid]>value)
high=mid- 1else if (A[mid]<value)
low=mid+1else
returnmid//found}return -1//notfound
}
Workperformed
• x<=>A[18]?<• x<=>A[9]? >• x<=>A[13]?==
• O(lgn)
1 3618
14.09.17
12
Sorting
• givenalist,arrangevaluessothatL[1]<=L[2]<=…<=L[n]
• nelements=>n!possibleorderings• OnetestL[i]<=L[j]candividen!to2
– Makeabinarytreeandcalculatethedepth
• log(n!)=Ω(nlogn)
• Hence,lowerboundforsortingisΩ(nlogn)– usingcomparisons…
Decisiontreemodel
• n!orderings(leaves)• Heightofsuchtree?
Proof:log(n!)=Ω(nlogn)
• log(n!)=logn+log(n-1)+log(n-2)+…log(1)
>=n/2*log(n/2)
=Ω(nlogn)Half of elements are larger than log(n/2)
14.09.17
13
Thedivide-and-conquerdesignparadigm
1. Divide theproblem(instance)intosubproblems.
2. Conquer thesubproblems bysolvingthemrecursively.
3. Combine subproblem solutions.
Mergesort
Merge-Sort(A,p,r)if p<r
then q=(p+r)/2 //floorMerge-Sort( A,p,q)Merge-Sort( A,q+1,r)Merge( A,p,q,r)
It was invented by John von Neumann in 1945.
Example
• Applyingthemergesortalgorithm:
Mergeoftwolists:Θ(n)
A,B– liststobemergedL=newlist;//emptywhile(Anotemptyand Bnotempty)
ifA.first()<=B.first()thenappend(L,A.first());A=rest(A);elseappend(L,B.first());B=rest(B);
append(L,A); //allremainingelementsofAappend(L,B); //allremainingelementsofBreturnL
Wikipedia/viz.
Valu
e
Pos in array
Run-timeAnalysisofMergeSort
• Thus,thetimerequiredtosortanarrayofsizen > 1 is:– thetimerequiredtosortthefirsthalf,– thetimerequiredtosortthesecondhalf,and– thetimerequiredtomergethetwolists
• Thatis:
( )îíì
>+=
=1)(T21)1(
)T(2 nn
nn n Θ
Θ
14.09.17
14
Mergesort
• Worstcase,averagecase,bestcase…Θ(nlogn)
• Commonwisdom:– Requires additionalspaceformerging(incaseofarrays)
• Homework*:developin-placemergeoftwolistsimplementedinarrays/comparespeed/
a b
a bL[a] <= L[b] a b
L[a] > L[b]
Quicksort
• ProposedbyC.A.R.Hoarein1962.• Divide-and-conqueralgorithm.• Sorts“inplace”(likeinsertionsort,butnotlikemergesort).
• Verypractical(withtuning).
Quicksort
14.09.17
15
Partitioningversion2
pivot=A[R];//i=L;j=R-1;while(i<=j)while (A[i]<pivot)i++;//willstopatpivotlatestwhile (i<=jand A[j]>=pivot)j-- ;if (i<j){swap(A[i],A[j]);i++;j-- }
A[R]=A[i];A[i]=pivot;returni;
L R
L R <= pivot
> pivot
i j
L Rij
L Rij
pivot
Wikipedia/“video”
14.09.17
16
ChoiceofpivotinQuicksort
• Selectmedianofthree…
• Selectrandom– opponentcannotchoosethewinningstrategyagainstyou!
http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-046JFall-2005/VideoLectures/detail/embed04.htm
Randompivot
Selectpivot randomlyfromtheregion(blue)andswapwithlastposition
Selectpivotasamedianof3[ormore]randomvaluesfromregion
Applynon-recursivesortforarraylessthan10-20
L R
L R
<= pivot
> pivoti j
L Rij
• 2-pivotversionofQuicksort– (splitin3regions!)
• Handleequalvalues(equaltopivots)
14.09.17
19
Alternativematerials
• Quicksortaveragecaseanalysishttp://eid.ee/10z
• https://secweb.cs.odu.edu/~zeil/cs361/web/website/Lectures/quick/pages/ar01s05.html
• http://eid.ee/10y - MITOpenCourseware-Asymptoticnotation,Recurrences,SubstitutonMasterMethod
•
f(n)
f(n/b) f(n/b) f(n/b)
f(n)
f(n/b) f(n/b) f(n/b)
n vs f(n)log b a
14.09.17
22
WecansortinO(nlogn)
• Isthatthebestwecando?
• Remember:usingcomparisons<,>,<=,>=wecannotdobetterthanO(nlogn)
Howfastcanwesortnintegers?
• Sortpeoplebytheirsex?(F/M,0/1)
• Sortpeoplebyyearofbirth?
14.09.17
24
Radixsort
Radix-Sort(A,d)1. for i=1to d/*leastsignificanttomostsignificant*/
2. useastablesorttosortAondigiti
14.09.17
25
Radixsortusinglists(stable)bbabbb adb aad aac ccc ccb aba cca
Radixsortusinglists(stable)
a
b
c
d
bba aba cca
bbb adb ccb
aac ccc
aad
1.
bbabbb adb aad aac ccc ccb aba cca
Radixsortusinglists(stable)
a
b
c
d
bba aba cca
bbb adb ccb
aac ccc
aad
a
b
c
d
cca
bbb
adb
ccb
aac
ccc
aad
bba aba
a
b
c
d
cca
bbb
adb
ccb
aac
ccc
aad
bba
aba
1. 2.
3.
bbabbb adb aad aac ccc ccb aba cca
Whynotfromlefttoright?
• Swap‘0’ withfirst‘1’• Idea1:recursivelysortfirstandsecondhalf
– Exercise?
01011001001010111100010010010010010100100101010000010000
01011000010010111100010010011001010100100101010000010000
01011000010010010100010010011001010100100111110000010000
01011000010010010100000100001001010100100111110001001001
14.09.17
26
Bitwisesortlefttoright
• Idea2:– swapelementsonlyiftheprefixesmatch…
– Forallbitsfrommostsignificant• advancewhen0• when1->lookfornext0
– ifprefixmatches,swap– otherwisekeepadvancingon0’sandlookfornext1
Bitwiselefttorightsort/*Historicalsorting– wasusedinUniv.ofTartuusingassembler….*//*Cimplementation– JaakVilo,1989*/
voidbitwisesort(SORTTYPE*ARRAY,intsize){
inti,j,tmp,nrbits;
registerSORTTYPEmask,curbit,group;
nrbits=sizeof(SORTTYPE)*8;
curbit=1<<(nrbits-1);/*setmostsignificantbit1*/mask=0; /*maskofthealreadysortedarea */
Jaak Vilo, Univ. of Tartu
do{/*Foreachbit*/i=0;
new_mask:for(;(i<size)&&(!(ARRAY[i]&curbit));i++) ; /*Advancewhilebit==0*/if(i>=size)gotoarray_end;group=ARRAY[i]&mask; /*Savecurrentprefixsnapshot*/
j=i; /*memorize locationof1*/for(;;) {
if(++i>=size)gotoarray_end; /* reachedendofarray*/if((ARRAY[i]&mask)!=group)goto new_mask ;/*newprefix*/
if(!(ARRAY[i]&curbit)) {/*bitis0– needtoswapwithpreviouslocationof1,A[i]ó A[j]*/tmp =ARRAY[i]; ARRAY[i]=ARRAY[j]; ARRAY[j]=tmp;j+=1; /*swapandincreasejtothe
nextpossible1*/}
}array_end:
mask=mask|curbit; /*areaundermaskisnowsorted*/curbit>>=1; /*nextbit */
}while(curbit); /*untilallbitshavebeensorted… */}
Jaak Vilo, Univ. of Tartu
Bitwisefromlefttoright
• Swap‘0’ withfirst‘1’
00100000010010010100001011001001010100100110010011111000
Jaak Vilo, Univ. of Tartu
Bucketsort
• Assumeuniformdistribution
• AllocateO(n)buckets
• Assigneachvaluetopre-assignedbucket
.78
.17
.39
.26
.72
.94
.21
.12
.23
.68
/
/
/
1
0
Sortsmallbucketswithinsertionsort
3
2
5
4
7
6
9
8
.12 .17
.21 .23 .26
.39
.68
.72 .78
.94
14.09.17
27
http://sortbenchmark.org/
• Thesortinputrecordsmustbe100bytesinlength,withthefirst10bytesbeingarandomkey
• Minutesort– maxamountsortedin1minute– 116GBin58.7sec(JimWyllie,IBMResearch)– 40-node80-Itaniumcluster,SANarrayof2,520disks
• 2009,500GBHadoop 1406nodesx(2QuadcoreXeons,8GBmemory,4SATA)OwenO'Malley andArunMurthyYahooInc.
• Performance/PriceSortandPennySort
SortBenchmark• http://sortbenchmark.org/• SortBenchmarkHomePage• Wehaveanewbenchmarkcallednew GraySort,new inmemoryofthefatherofthesort
benchmarks,JimGray.ItreplacesTeraByteSortwhichisnowretired.• Unlike2010,wewillnotbeacceptingearlyentriesforthe2011year.Thedeadlinefor
submittingentriesisApril1,2011.– Allhardwareusedmustbeoff-the-shelfandunmodified.– ForDaytonaclustersortswhereinputsamplingisusedtodeterminetheoutputpartitionboundaries,theinput
samplingmustbedoneevenlyacrossallinputpartitions.
NewrulesforGraySort:• Theinputfilesizeisnowminimum~100TBor1Trecords.Entrieswithlargerinputsizesalso
qualify.• ThewinnerwillhavethefastestSortedRecs/Min.• Wenowprovideanewinputgeneratorthatworksinparallelandgeneratesbinarydata.See
below.• FortheDaytonacategory,wehavetwonewrequirements.(1)Thesortmustrun
continuously(repeatedly)foraminimum1hour.(Thisisaminimumreliabilityrequirement).(2)Thesystemcannotoverwritetheinputfile.
14.09.17
28
Orderstatistics
• Minimum– thesmallestvalue• Maximum– thelargestvalue• Ingenerali’thvalue.• Findthemedianofthevaluesinthearray• MedianinsortedarrayA:
– nisoddA[(n+1)/2]– niseven– A[(n+1)/2]orA[(n+1)/2]
Orderstatistics
• Input:AsetAofnnumbersandi,1≤i≤n• Output:xfromAthatislargerthanexactlyi-1elementsofA
Q:Findi’thvalueinunsorteddata
A. O(n)B. O(nloglogn)C. O(nlogn)D. O(nlog2 n)
Minimum
Minimum(A)1 min=A[1]2 fori=2tolength(A)3 ifmin>A[i]4 thenmin=A[i]5 returnmin
n-1comparisons.
Minandmaxtogether
• compareeverytwoelementsA[i],A[i+1]• Comparelargeragainstcurrentmax• Smalleragainstcurrentmin
• 3n/2
SelectioninexpectedO(n)
Randomised-select(A,p,r,i)if p=rthen return A[p]q=Randomised-Partition(A,p,r)k=q– p+1 //nrofelementsinsubarrif i<=kthen return Randomised-Partition(A,p,q,i)else return Randomised-Partition(A,q+1,r,i-k)
14.09.17
29
Conclusion
• SortingingeneralO(nlogn)• Quicksortisrathergood
• Lineartimesortingisachievablewhenonedoesnotassumeonlydirectcomparisons
• Findi’thvalueinunsorted– expectedO(n)
• Findi’thvalue:worstcaseO(n)– seeCLRS
Ok…
• lists– aversatiledatastructureforvariouspurposes
• Sorting– atypicalalgorithm(manyways)• Whichsortingmethodsforarray/list?
• Array:mostoftheimportant(e.g.update)tasksseemtobeO(n),whichisbad
Q:searchforavalueXinlinkedlist?
A. O(1)
B. O(logn)
C. O(n)
Canwesearchfasterinlinkedlists?
• Whysortlinkedlistsifsearch anywayO(n)?
• Linkedlists:– whatisthe“mid-point” ofanysublist?– Therefore,binarysearchcannotbeused…
• Orcanit?
SkipList Skiplists
• Buildseverallistsatdifferent“skip” steps
• O(n)list• Level1:~n/2• Level2:~n/4• …• Levellogn~2-3elements…
14.09.17
30
9/14/171:33PM SkipLists 175
SkipLists
+¥-¥
S0
S1
S2
S3
+¥-¥ 10 362315
+¥-¥ 15
+¥-¥ 2315
SkipList
typedefstructnodeStructure*node;typedefstructnodeStructure{keyTypekey;valueTypevalue;nodeforward[1];/*variablesizedarrayof forwardpointers*/
};
9/14/171:33PM SkipLists 177
WhatisaSkipList• Askiplist forasetS ofdistinct(key,element)itemsisaseriesoflists
S0, S1 , … , Sh suchthat– EachlistSi containsthespecialkeys+¥ and-¥– ListS0 containsthekeysofS innondecreasingorder– Eachlistisasubsequenceofthepreviousone,i.e.,
S0 ÊS1 Ê … Ê Sh
– ListSh containsonlythetwospecialkeys• WeshowhowtouseaskiplisttoimplementthedictionaryADT
56 64 78 +¥31 34 44-¥ 12 23 26
+¥-¥
+¥31-¥
64 +¥31 34-¥ 23
S0
S1
S2
S3
Illustrationoflists
-inf inf25 30 47 99173
KeyRight[ .. ] - right links in arrayLeft[ .. ] - left links in array, array size – how high is the list
9/14/171:33PM SkipLists 179
Search• Wesearchforakeyx inaaskiplistasfollows:
– Westartatthefirstpositionofthetoplist– Atthecurrentpositionp,wecomparex withy ¬ key(after(p))
x = y:wereturnelement(after(p))x > y:we“scanforward”x < y:we“dropdown”
– Ifwetrytodropdownpastthebottomlist,wereturnNO_SUCH_KEY• Example:searchfor78
+¥-¥
S0
S1
S2
S3
+¥31-¥
64 +¥31 34-¥ 23
56 64 78 +¥31 34 44-¥ 12 23 26
9/14/171:33PM SkipLists 180
RandomizedAlgorithms• Arandomizedalgorithm
performscointosses(i.e.,usesrandombits)tocontrolitsexecution
• Itcontainsstatementsofthetype
b ¬ random()if b = 0
do A …else { b = 1}
do B … • Itsrunningtimedependsonthe
outcomesofthecointosses
• Weanalyzetheexpectedrunningtimeofarandomizedalgorithmunderthefollowingassumptions– thecoinsareunbiased,and– thecointossesare
independent• Theworst-caserunningtimeof
arandomizedalgorithmisoftenlargebuthasverylowprobability(e.g.,itoccurswhenallthecointossesgive“heads”)
• Weusearandomizedalgorithmtoinsertitemsintoaskiplist
14.09.17
31
9/14/171:33PM SkipLists 181
• Toinsertanitem(x, o) intoaskiplist,weusearandomizedalgorithm:– Werepeatedlytossacoinuntilwegettails,andwedenotewithi the
numberoftimesthecoincameupheads– Ifi ³ h,weaddtotheskiplistnewlistsSh+1, … , Si +1,eachcontaining
onlythetwospecialkeys– Wesearchforx intheskiplistandfindthepositionsp0, p1 , …, pi ofthe
itemswithlargestkeylessthanx ineachlistS0, S1, … , Si
– Forj ¬ 0, …, i,weinsertitem(x, o) intolistSj afterpositionpj
• Example:insertkey15,withi = 2
Insertion
+¥-¥ 10 36
+¥-¥
23
23 +¥-¥
S0
S1
S2
+¥-¥
S0
S1
S2
S3
+¥-¥ 10 362315
+¥-¥ 15
+¥-¥ 2315p0
p1
p2
9/14/171:33PM SkipLists 182
Deletion
• Toremoveanitemwithkeyx fromaskiplist,weproceedasfollows:– Wesearchforx intheskiplistandfindthepositionsp0, p1 , …, pi ofthe
itemswithkeyx,wherepositionpj isinlistSj
– Weremovepositionsp0, p1 , …, pi fromthelistsS0, S1, … , Si
– Weremoveallbutonelistcontainingonlythetwospecialkeys• Example:removekey34
-¥ +¥4512
-¥ +¥
23
23-¥ +¥
S0
S1
S2
-¥ +¥
S0
S1
S2
S3
-¥ +¥4512 23 34
-¥ +¥34
-¥ +¥23 34p0
p1
p2
9/14/171:33PM SkipLists 183
Implementationv2
• Wecanimplementaskiplistwithquad-nodes
• Aquad-nodestores:– item– linktothenodebefore– linktothenodeafter– linktothenodebelow– linktothenodeafter
• Also,wedefinespecialkeysPLUS_INFandMINUS_INF,andwemodifythekeycomparatortohandlethem
x
quad-node
9/14/171:33PM SkipLists 184
SpaceUsage
• Thespaceusedbyaskiplistdependsontherandombitsusedbyeachinvocationoftheinsertionalgorithm
• Weusethefollowingtwobasicprobabilisticfacts:Fact1: Theprobabilityofgettingi
consecutiveheadswhenflippingacoinis1/2i
Fact2: Ifeachofn itemsispresentinasetwithprobabilityp,theexpectedsizeofthesetisnp
• Consideraskiplistwithn items– ByFact1,weinsertanitemin
listSi withprobability1/2i
– ByFact2,theexpectedsizeoflistSi isn/2i
• Theexpectednumberofnodesusedbytheskiplistis
nnn h
ii
h
ii 2
21
2 00<= åå
==
Thus, the expected space usage of a skip list with nitems is O(n)
9/14/171:33PM SkipLists 185
Height
• Therunningtimeofthesearchaninsertionalgorithmsisaffectedbytheheighth oftheskiplist
• Weshowthatwithhighprobability,askiplistwithnitemshasheightO(log n)
• Weusethefollowingadditionalprobabilisticfact:Fact3: Ifeachofn eventshas
probabilityp,theprobabilitythatatleastoneeventoccursisatmostnp
• Consideraskiplistwithn items– ByFact1,weinsertaniteminlist
Si withprobability1/2i
– ByFact3,theprobabilitythatlistSi hasatleastoneitemisatmostn/2i
• Bypickingi = 3log n,wehavethattheprobabilitythatS3log nhasatleastoneitemisatmost
n/23log n = n/n3 = 1/n2
• Thusaskiplistwithn itemshasheightatmost3log n withprobabilityatleast1 - 1/n2
9/14/171:33PM SkipLists 186
SearchandUpdateTimes• Thesearchtimeinaskiplistis
proportionalto– thenumberofdrop-downsteps,
plus– thenumberofscan-forward
steps• Thedrop-downstepsare
boundedbytheheightoftheskiplistandthusareO(log n) withhighprobability
• Toanalyzethescan-forwardsteps,weuseyetanotherprobabilisticfact:Fact4:Theexpectednumberof
cointossesrequiredinordertogettailsis2
• Whenwescanforwardinalist,thedestinationkeydoesnotbelongtoahigherlist– Ascan-forwardstepisassociated
withaformercointossthatgavetails
• ByFact4,ineachlisttheexpectednumberofscan-forwardstepsis2
• Thus,theexpectednumberofscan-forwardstepsisO(log n)
• WeconcludethatasearchinaskiplisttakesO(log n) expectedtime
• Theanalysisofinsertionanddeletiongivessimilarresults
14.09.17
32
9/14/171:33PM SkipLists 187
Summary• Askiplistisadatastructure
fordictionariesthatusesarandomizedinsertionalgorithm
• Inaskiplistwithn items– Theexpectedspaceusedis
O(n)– Theexpectedsearch,
insertionanddeletiontimeisO(log n)
• Usingamorecomplexprobabilisticanalysis,onecanshowthattheseperformanceboundsalsoholdwithhighprobability
• Skiplistsarefastandsimpletoimplementinpractice
Conclusions
• Abstractdatatypeshideimplementations• ImportantisthefunctionalityoftheADT• Datastructuresandalgorithms determinethespeedoftheoperationsondata
• Lineardatastructuresprovidegoodversatility• Sorting– amosttypicalneed/algorithm• SortinginO(nlogn)MergeSort,Quicksort• SolvingRecurrences– meanstoanalyse• Skiplists– logn randomised datastructure