180
Advanced Algorithmics (6EAP) MTAT.03.238 Linear structures, sor.ng, searching, etc Jaak Vilo 2013 Fall 1 Jaak Vilo

Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Advanced  Algorithmics  (6EAP)    

MTAT.03.238      Linear  structures,  sor.ng,  

searching,  etc  Jaak  Vilo  2013  Fall  

1  Jaak  Vilo  

Page 2: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Linear,  sequenGal,  ordered,  list  …  

Memory, disk, tape etc – is an ordered sequentially addressed media.

Page 3: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Physical  ordered  list  ~  array  

•  Memory        /address/  –  Garbage  collecGon    

•  Files  (character/byte  list/lines  in  text  file,…)  

•  Disk  – Disk  fragmentaGon  

Page 4: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Lists:  Array  

3   6   7   5   2  

0 1 size MAX_SIZE-1

L = int[MAX_SIZE] L[2]=7

Page 5: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Lists:  Array  

3   6   7   5   2  

0 1 size MAX_SIZE-1

L = int[MAX_SIZE] L[2]=7

3   6   7   5   2  

1 2 size MAX_SIZE

L[3]=7

Page 6: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

2D  array  

& A[i,j] = A + i*(nr_el_in_row*el_size) + j*el_size

Page 7: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

MulGple  lists,  2-­‐D-­‐arrays,  etc…    

Page 8: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Linear  Lists  

•  OperaGons  which  one  may  want  to  perform  on  a  linear  list  of  n  elements  include:  

– gain  access  to  the  kth  element  of  the  list  to  examine  and/or  change  the  contents  

– insert  a  new  element  before  or  aYer  the  kth  element  

– delete  the  kth  element  of  the  list  

Reference:  Knuth,  The  Art  of  Comptuer  Programming,  Vol  1,                                        Fundamental  Algorithms,  3rd  Ed.,  p.238-­‐9  

Page 9: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Abstract  Data  Type  (ADT)  •  High-­‐level  definiGon  of  data  types  

•  An  ADT  specifies  –  A  collec>on  of  data  –  A  set  of  opera>ons  on  the  data  or  subsets  of  the  data  

•  ADT  does  not  specify  how  the  operaGons  should  be  implemented  

•  Examples  –     vector,  list,  stack,  queue,  deque,  priority  queue,  table  (map),  

associaGve  array,  set,  graph,  digraph  

Page 10: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

ADT  •  A  datatype  is  a  set  of  values  and  an  associated  set  of  

opera.ons  •  A  datatype  is  abstract  iff  it  is  completely  described  by  its  set  

of  operaGons  regradless  of    its  implementaGon  •  This  means  that  it  is  possible  to  change  the  implementa.on  

of  the  datatype  without  changing  its  use  •  The  datatype  is  thus  described  by  a  set  of  procedures  •  These  operaGons  are  the  only  thing  that  a  user  of  the  

abstracGon  can  assume  

Page 11: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Abstract  data  types:  

•  DicGonary                  (key,value)  •  Stack  (LIFO)  •  Queue  (FIFO)  •  Queue  (double-­‐ended)  •  Priority  queue  (fetch  highest-­‐priority  object)  •  ...  

Page 12: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

DicGonary  

•  Container  of  key-­‐element  (k,e)  pairs  •  Required  operaGons:  

–  insert(  k,e  ),  –  remove(  k  ),  –  find(  k  ),  –  isEmpty()  

•  May  also  support  (when  an  order  is  provided):  –  closestKeyBefore(  k  ),  –  closestElemAYer(  k  )  

•  Note:  No  duplicate  keys  

Page 13: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Abstract  data  types  

•  Container  •  Stack  •  Queue  •  Deque  •  Priority  queue  

•  Dic.onary/Map/AssociaGve  array  

•  MulGmap  •  Set  •  MulGset  •  String  •  Tree  •  Graph  •  Hash    

Page 14: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Some  data  structures  for  Dic.onary  ADT  •  Unordered    

–  Array  –  Sequence/list  

•  Ordered    –  Array  –  Sequence  (Skip  Lists)  –  Binary  Search  Tree  (BST)  –  AVL  trees,  red-­‐black  trees  –  (2;  4)  Trees  –  B-­‐Trees  

•  Valued  –  Hash  Tables  –  Extendible  Hashing  

Page 15: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

PrimiGve  &  composite  types  Primi.ve  types  •  Boolean  (for  boolean  values  

True/False)  •  Char  (for  character  values)  •  int  (for  integral  or  fixed-­‐precision  

values)  •  Float  (for  storing  real  number  

values)  •  Double  (a  larger  size  of  type  float)  •  String  (for  string  of  chars)  •  Enumerated  type  

Composite  types  •  Array  •  Record  (also  called  tuple  or  struct)  

•  Union  •  Tagged  union  (also  called  a  variant,  variant  record,  discriminated  union,  or  disjoint  union)  

•  Plain  old  data  structure    

Page 16: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Linear  data  structures  

Arrays  •  Array  •  BidirecGonal  

map  •  Bit  array  •  Bit  field  •  Bitboard  •  Bitmap  •  Circular  buffer  •  Control  table  •  Image  •  Dynamic  array  

•  Gap  buffer  •  Hashed  array  

tree  •  Heightmap  •  Lookup  table  •  Matrix  •  Parallel  array  •  Sorted  array  •  Sparse  array  •  Sparse  matrix  •  Iliffe  vector  •  Variable-­‐length  

array    

Lists  •  Doubly  linked  list  •  Linked  list  •  Self-­‐organizing  list  •  Skip  list  •  Unrolled  linked  list  •  VList  •  Xor  linked  list  •  Zipper  •  Doubly  connected  edge  list  

Page 17: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Trees  Binary  trees  •  AA  tree  •  AVL  tree  •  Binary  search  

tree  •  Binary  tree  •  Cartesian  tree  •  Pagoda  •  Randomized  

binary  search  tree  

•  Red-­‐black  tree  •  Rope  •  Scapegoat  tree  •  Self-­‐balancing  

binary  search  tree  

•  Splay  tree  •  T-­‐tree  •  Tango  tree  •  Threaded  binary  

tree  •  Top  tree  

•  Treap  •  Weight-­‐balanced  

tree      B-­‐trees  •  B-­‐tree  •  B+  tree  •  B*-­‐tree  •  B  sharp  tree  •  Dancing  tree  •  2-­‐3  tree  •  2-­‐3-­‐4  tree  •  Queap  •  Fusion  tree  •  Bx-­‐tree  

       Heaps  •  Heap  •  Binary  heap  •  Binomial  heap  •  Fibonacci  heap  •  AF-­‐heap  

•  2-­‐3  heap  •  SoY  heap  •  Pairing  heap  •  LeYist  heap  •  Treap  •  Beap  •  Skew  heap  •  Ternary  heap  •  D-­‐ary  heap  •     •  Tries  •  Trie  •  Radix  tree  •  Suffix  tree  •  Suffix  array  •  Compressed  

suffix  array  •  FM-­‐index  •  Generalised  

suffix  tree  •  B-­‐trie  •  Judy  array  

•  X-­‐fast  trie  •  Y-­‐fast  trie  •  Ctrie    Mul.way  trees  •  Ternary  search  

tree  •  And–or  tree  •  (a,b)-­‐tree  •  Link/cut  tree  •  SPQR-­‐tree  •  Spaghet  stack  •  Disjoint-­‐set  data  

structure  •  Fusion  tree  •  Enfilade  •  ExponenGal  tree  •  Fenwick  tree  •  Van  Emde  Boas  

tree      

Space-­‐par..oning  trees  •  Segment  tree  •  Interval  tree  •  Range  tree  •  Bin  •  Kd-­‐tree  •  Implicit  kd-­‐tree  •  Min/max  kd-­‐tree  •  AdapGve  k-­‐d  tree  •  Kdb  tree  •  Quadtree  •  Octree  •  Linear  octree  •  Z-­‐order  •  UB-­‐tree  •  R-­‐tree  •  R+  tree  •  R*  tree  •  Hilbert  R-­‐tree  •  X-­‐tree  •  Metric  tree  

•  Cover  tree  •  M-­‐tree  •  VP-­‐tree  •  BK-­‐tree  •  Bounding  

interval  hierarchy  •  BSP  tree  •  Rapidly-­‐exploring  

random  tree      Applica.on-­‐specific  trees  •  Syntax  tree  •  Abstract  syntax  

tree  •  Parse  tree  •  Decision  tree  •  AlternaGng  

decision  tree  •  Minimax  tree  •  ExpecGminimax  

tree  •  Finger  tree  

Page 18: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Hashes,  Graphs,  Other  •  Hashes  •  Bloom  filter  •  Distributed  hash  table  •  Hash  array  mapped  

trie  •  Hash  list  •  Hash  table  •  Hash  tree  •  Hash  trie  •  Koorde  •  Prefix  hash  tree  •     

Graphs  •  Adjacency  list  •  Adjacency  matrix  •  Graph-­‐structured  

stack  •  Scene  graph  •  Binary  decision  

diagram  •  Zero  suppressed  

decision  diagram  •  And-­‐inverter  graph  •  Directed  graph  

•  Directed  acyclic  graph  •  ProposiGonal  directed  

acyclic  graph  •  MulGgraph  •  Hypergraph      Other  •  Lightmap  •  Winged  edge  •  Quad-­‐edge  •  RouGng  table  •  Symbol  table  

Page 19: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Lists:  Array  

3   6   7   5   2  

0 1 size MAX_SIZE-1

3   6   7   8   2  

0 1 size

5   2   Insert 8 after L[2]

3   6   7   8   2  

0 1 size

5   2   Delete last

*array (memory address) size MAX_SIZE

Page 20: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Lists:  Array  

3   6   7   8   2  

0 1 size

5   2   Insert 8 after L[2]

3   6   7   8   2  

0 1 size

5   2   Delete last

•  Access i O(1) •  Insert to end O(1) •  Delete from end O(1) •  Insert O(n) •  Delete O(n) •  Search O(n)

Page 21: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Linear  Lists  

•  Other  operaGons  on  a  linear  list  may  include:  – determine  the  number  of  elements  – search  the  list  – sort  a  list  – combine  two  or  more  linear  lists  – split  a  linear  list  into  two  or  more  lists  – make  a  copy  of  a  list  

Page 22: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Stack  

•  push(x)    -­‐-­‐  add  to  end  (add  to  top)  •  pop()    -­‐-­‐  fetch  from  end  (top)  

•  O(1)  in  all  reasonable  cases  J  

•  LIFO  –  Last  In,  First  Out  

Page 23: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Linked  lists  head tail

head tail

Singly linked

Doubly linked

Page 24: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Linked  lists:  add  head tail

head tail

size

Page 25: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Linked  lists:  delete    (+  garbage  collecGon?)  

head tail

head tail

size

Page 26: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

OperaGons  

•  Array  indexed  from  0  to  n – 1:  

•  Singly-­‐linked  list  with  head  and  tail  pointers  

1  under  the  assumpGon  we  have  a  pointer  to  the  kth  node,  O(n)  otherwise  

k = 1 1 < k < n k = n access/change the kth element O(1) O(1) O(1)

insert before or after the kth element O(n) O(n) O(1)

delete the kth element O(n) O(n) O(1)

k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)

delete the kth element O(1) O(n) O(n)

Page 27: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Improving  Run-­‐Time  Efficiency  

•  We  can  improve  the  run-­‐Gme  efficiency  of  a  linked  list  by  using  a  doubly-­‐linked  list:  

Singly-­‐linked  list:      Doubly-­‐linked  list:        

–  Improvements  at  operaGons  requiring  access  to  the  previous  node  

–  Increases  memory  requirements...  

Page 28: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Improving  Efficiency    Singly-­‐linked  list:          Doubly-­‐linked  list:    

1  under  the  assumpGon  we  have  a  pointer  to  the  kth  node,  O(n)  otherwise  

k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(1)1 O(1)

delete the kth element O(1) O(1)1 O(1)

k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)

delete the kth element O(1) O(n) O(n)

Page 29: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

•  Array  indexed  from  0  to  n – 1:  

•  Singly-­‐linked  list  with  head  and  tail  pointers  

•  Doubly  linked  list  

k = 1 1 < k < n k = n access/change the kth element O(1) O(1) O(1) insert before or after the kth element O(n) O(n) O(1)

delete the kth element O(n) O(n) O(1)

k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)

delete the kth element O(1) O(n) O(n)

k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(1)1 O(1)

delete the kth element O(1) O(1)1 O(1)

Page 30: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

IntroducGon  to  linked  lists  •  Consider  the  following  struct  definiGon  

struct node { string word; int num; node *next; //pointer for the next node }; node *p = new node;

? ?

num word next

p ?

Page 31: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

IntroducGon  to  linked  lists:  inserGng  a  node  

•  node *p; •  p = new node;

•  p->num = 5; •  p->word = "Ali"; •  p->next = NULL

• 

5 Ali

num word next

p

Page 32: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

IntroducGon  to  linked  lists:  adding  a  new  node  

•  How  can  you  add  another  node  that  is  pointed  by  p->link?  

•  node *p; •  p = new node; •  p->num = 5; •  p->word = "Ali"; •  p->next = NULL; •  node *q;

• 

5 Ali

num word link

?p

q

Page 33: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

IntroducGon  to  linked  lists   node *p; p = new node; p->num = 5; p->word = "Ali"; p->next = NULL;

node *q; q = new node;

5 Ali

num word link

? ? ?

num word link

?

q

p

Page 34: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

IntroducGon  to  linked  lists   node *p, *q; p = new node; p->num = 5; p->word = "Ali"; p->next = NULL;

q = new node; q->num=8; q->word = "Veli";

5 Ali

num word next

? 8 Veli

num word next

? p

q

Page 35: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

IntroducGon  to  linked  lists   node *p, *q; p = new node; p->num = 5; p->word = "Ali"; p->next = NULL;

q = new node; q->num=8; q->word = "Veli"; p->next = q; q->next = NULL;

5 Ali

num word link

? 8 Veli

num word link

p

q

Page 36: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Pointers  in  C/C++  

p  =  new  node  ;    delete  p  ;    p  =  new  node[20]  ;      p  =  malloc(  sizeof(  node  )  )  ;      free  p  ;      p  =  malloc(  sizeof(  node  )  *  20  )  ;    (p+10)-­‐>next  =  NULL  ;    /*  11th  elements  */    

Page 37: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Book-­‐keeping  

•  malloc,  new  –  “remember”  what  has  been  created    free(p),  delete    (C/C++)  

•  When  you  need  many  small  areas  to  be  allocated,  reserve  a  big  chunk  (array)  and  maintain  your  own  set  of  free  objects  

•  Elements  of  array  of  objects  can  be  pointed  by  the  pointer  to  an  object.  

Page 38: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Object  

•  Object  =  new  object_type  ;    

•  Equals  to  creaGng  a  new  object  with  necessary  size  of  allocated  memory  (delete  can  free  it)  

Page 39: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Some  links  

•  hRp://en.wikipedia.org/wiki/Pointer  

•  Pointer  basics:  h{p://cslibrary.stanford.edu/106/  

 •  C++  Memory  Management  :  What  is  the  difference  between  malloc/free  and  new/delete?    – h{p://www.codeguru.com/forum/showthread.php?t=401848  

Page 40: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

AlternaGve:  arrays  and  integers  

•  If  you  want  to  test  pointers  and  linked  list  etc.  data  structures,  but  do  not  have  pointers  familiar  (yet)  

•  Use  arrays  and  indexes  to  array  elements    instead…  

Page 41: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Replacing  pointers  with  array  index  

/  

7  

5  

5  

8  

/  

1  

4  

3  

1 2 3 4 5 6 7 head=3

next

key prev

8   4   7  

head

Page 42: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Maintaining  list  of  free  objects  

/  

7  

5  

5  

8  

/  

1  

4  

3  

1 2 3 4 5 6 7 head=3

next

key prev

8   4   7  

head

/  

7  

5  

4   5  

8  

/  

/   1  

4  

3  

7   2  1 2 3 4 5 6 7

head=3

next

key prev

free=6 free = -1 => array is full allocate object:

new = free; free = next[free] ;

free object x next[x]= free free = x

Page 43: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

MulGple  lists,  single  free  list  

/  

7  

5  

4   5  

8  

/  

/   1  

4  

3  

7  

3  

/  

/  

9  

6  

1 2 3 4 5 6 7 next

key prev

head1=3 => 8, 4, 7 head2=6 => 3, 9 free =2 (2)

Page 44: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Hack:  allocate  more  arrays  …  

1 2 3 4 5 6 7

8 9 10 11 12 13 14

15 16 17 18 19 20 21

AA

AA[ (i-1)/7 ] -> [ (i -1) % 7 ] LIST(10) = AA[ 1 ][ 2 ] LIST(19) = AA[ 2 ][ 5 ]

use integer division and mod

Page 45: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 46: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Queue  (FIFO)  

3   6   7   5   2  

F L

Page 47: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Queue  (basic  idea,  does  not  contain  all  controls!)  

3   6   7   5   2  

F L MAX_SIZE-1

7   5   2  

F L MAX_SIZE-1

First = List[F]

Last = List[L-1]

Pop_first : { return List[F++] }

Pop_last : { return List[--L] }

Full: return ( L==MAX_SIZE ) Empty: F< 0 or F >= L

Page 48: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Circular  buffer  

•  A  circular  buffer  or  ring  buffer  is  a  data  structure  that  uses  a  single,  fixed-­‐size  buffer  as  if  it  were  connected  end-­‐to-­‐end.  This  structure  lends  itself  easily  to  buffering  data  streams.  

Page 49: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Circular  Queue  

3   6  

F L

First = List[F]

Last = List[ (L-1+MAX_SIZE) % MAX_SIZE ]

Full: return ( (L+1)%MAX_SIZE == F ) Empty: F==L

7   5   2  

MAX_SIZE-1

Add_to_end( x ) : { List[L]=x ; L= (L+1) % MAX_SIZE ] } // % = modulo

3   6  

F L,

7   5   2  

MAX_SIZE-1

Page 50: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Queue  

•  enqueue(x)  -­‐  add  to  end  •  dequeue()  -­‐  fetch  from  beginning    •  FIFO  –  First  In  First  Out  

•  O(1)  in  all  reasonable  cases  J  

Page 51: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Stack  

•  push(x)    -­‐-­‐  add  to  end  (add  to  top)  •  pop()    -­‐-­‐  fetch  from  end  (top)  

•  O(1)  in  all  reasonable  cases  J  

•  LIFO  –  Last  In,  First  Out  

Page 52: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Stack  based  languages  

•  Implement  a  posXix  calculator  – Reverse  Polish  nota.on  

•  5  4  3  *  2  -­‐  +    =>    5+((4*3)-­‐2)  

•  Very  simple  to  parse  and  interpret  

•  FORTH,  Postscript  are  stack-­‐based  languages  

Page 53: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Array  based  stack  

•  How  to  know  how  big  a  stack  shall  ever  be?    

•  When  full,  allocate  bigger  table  dynamically,  and  copy  all  previous  values  there  

•  O(n)  ?    

3   6   7   5  

3   6   7   5   2  

Page 54: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

•  When  full,  create  2x  bigger  table,  copy  previous  n  elements:  

•  AYer  every  2k  inserGons,  perform  O(n)  copy  

•  O(n)  individual  inserGons  +    •  n/2  +  n/4  +  n/8  …    copy-­‐ing    •  Total:  O(n)  effort!  

Page 55: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

What  about  dele.ons?    

•  when  n=32  -­‐>  33      (copy  32,  insert  1)  •  delete:  33-­‐>32  

– should  you  delete  immediately?  – Delete  only  when  becomes  less  than  1/4th  full  

– Have  to  delete  at  least  n/2  to  decrease  – Have  to  add  at  least  n  to  increase  size  – Most  operaGons,  O(1)  effort      – But  few  operaGons  take  O(n)  to  copy  – For  any  m  operaGons,  O(m)  Gme  

Page 56: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

AmorGzed  analysis  

•  Analyze  the  Gme  complexity  over  the  enGre  “lifespan”  of  the  algorithm  

•  Some  opera.ons  that  cost  more  will  be  “covered”  by  many  other  opera.ons  taking  less  

Page 57: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Lists  and  dic.onary  ADT…  

•  How  to  maintain  a  dicGonary  using  (linked)  lists?  

•  Is  k  in  D  ?    – go  through  all  elements  d  of  D,  test  if  d==k        O(n)  –  If  sorted:  d=  first(D);  while(  d<=k  )  d=next(D);    – on  average    n/2    tests  …  

•  Add(k,D)  =>  insert(k,D)  =  O(1)      or  O(n)  –  test  for  uniqueness  

Page 58: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Array  based  sorted  list  

•  is  d  in  D  ?    •  Binary  search  in  D  

low high mid

Page 59: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Binary  search  –  recursive  BinarySearch(A[0..N-­‐1],  value,  low,  high)    {      if  (high  <  low)        return  -­‐1  //  not  found      mid  =  low  +  ((high  -­‐  low)  /  2)        //  Note:  not  (low  +  high)  /  2  !!      if  (A[mid]  >  value)        return  BinarySearch(A,  value,  low,  mid-­‐1)      else  if  (A[mid]  <  value)        return  BinarySearch(A,  value,  mid+1,  high)      else    

 return  mid  //  found    }  

Page 60: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Binary  search  –  itera.ve  BinarySearch(A[0..N-­‐1],  value)    {    

 low  =  0  ;    high  =  N  -­‐  1  ;  

 while  (low  <=  high)  {        mid  =  low  +  ((high  -­‐  low)  /  2)  //  Note:  not  (low  +  high)  /  2  !!        if  (A[mid]  >  value)          high  =  mid  -­‐  1        else  if  (A[mid]  <  value)          low  =  mid  +  1        else    

   return  mid  //  found    }    return  -­‐1  //  not  found    

}  

Page 61: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Work  performed  

•  x  <=>  A[18]  ?        <  •  x  <=>  A[9]  ?    >  •  x  <=>  A[13]  ?        ==  

 

• O(lg  n)  

1 36 18

Page 62: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

SorGng  

•  given  a  list,  arrange  values  so  that        L[1]  <=  L[2]  <=  …  <=  L[n]  

•  n  elements  =>    n!    possible  orderings  •  One  test  L[i]  <=  L[j]  can  divide  n!  to  2    

– Make  a  binary  tree  and  calculate  the  depth    

•  log(  n!  )  =  Ω  (  n  log  n  )    •  Hence,  lower  bound  for  sorGng  is  Ω(n  log  n)  

– using  comparisons…      

Page 63: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 64: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 65: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Decision  tree  model  

   •  n!  orderings  (leaves)  •  Height  of  such  tree?    

Page 66: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Proof:  log(n!)  =  Ω  (  n  log  n  )    

•  log(  n!  )  =    log  n  +  log  (n-­‐1)  +  log(n-­‐2)  +  …  log(1)      

 >=    n/2  *  log(  n/2  )    

 =  Ω  (  n  log  n  )        

Half of elements are larger than log(n/2)

Page 67: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 68: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

The  divide-­‐and-­‐conquer  design  paradigm  

1.  Divide  the  problem  (instance)  into  subproblems.  

2.  Conquer  the  subproblems  by  solving  them  recursively.  

3.  Combine  subproblem  solu>ons.  

Page 69: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Merge  sort  

Merge-­‐Sort(A,p,r)    if  p<r  

 then  q  =  (p+r)/2        //  floor      Merge-­‐Sort(  A,  p,  q  )        Merge-­‐Sort(  A,  q+1,r)      Merge(  A,  p,  q,  r  )  

 

It was invented by John von Neumann in 1945.

Page 70: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Example  

•  Applying  the  merge  sort  algorithm:  

Page 71: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Merge  of  two  lists:  Θ(n)  

A,  B  –  lists  to  be  merged  L  =  new  list;      //  empty  while(  A  not  empty      and    B  not  empty  )    

if  A.first()  <=  B.first()  then    append(  L,  A.first()  )  ;  A  =  rest(A)  ;    else      append(  L,  B.first()  )  ;  B  =  rest(B)  ;  

append(  L,  A);  //  all  remaining  elements  of  A  append(  L,  B  );    //  all  remaining  elements  of  B  return  L  

Page 72: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Wikipedia  /  viz.  Va

lue

Pos in array

Page 73: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Run-­‐Gme  Analysis  of  Merge  Sort  

•  Thus,  the  Gme  required  to  sort  an  array  of  size  n > 1  is:  –  the  Gme  required  to  sort  the  first  half,  –  the  Gme  required  to  sort  the  second  half,  and  –  the  Gme  required  to  merge  the  two  lists  

•  That  is:  

( )⎩⎨⎧

>+

==

1)(T21)1(

)T(2 nn

nn n Θ

Θ

Page 74: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 75: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Merge  sort  

•  Worst  case,  average  case,  best  case  …          Θ(  n  log  n  )  

•  Common  wisdom:  – Requires  addiGonal  space  for  merging  (in  case  of  arrays)  

•  Homework*:  develop  in-­‐place  merge  of  two  lists  implemented  in  arrays  /compare  speed/  

a b

a b L[a] <= L[b] a b

L[a] > L[b]

Page 76: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Quicksort  

•  Proposed  by  C.A.R.  Hoare  in  1962.  •  Divide-­‐and-­‐conquer  algorithm.  •  Sorts  “in  place”(like  inserGon  sort,  but  not  like  merge  sort).  

•  Very  pracGcal  (with  tuning).  

Page 77: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 78: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Quicksort  

Page 79: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 80: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

ParGGoning  version  2  

pivot  =  A[R];          //    i=L;  j=R-­‐1;  while(  i<=j  )    while  (  A[i]  <  pivot  )  i++  ;    //  will  stop  at  pivot  latest    while  (  i<=j    and  A[j]  >=      pivot  )  j-­‐-­‐  ;      if  (  i  <  j  )    {    swap(  A[i],  A[j]  );    i++;  j-­‐-­‐  }    

A[R]=A[i];    A[i]=pivot;  return  i;      

 

L R

L R <= pivot

> pivot

i j

L R i j

L R i j

pivot

Page 81: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Wikipedia  /  “video”  

Page 82: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 83: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 84: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 85: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 86: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 87: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Choice  of  pivot  in  Quicksort  

•  Select  median  of  three  …  

•  Select  random  –  opponent  can  not  choose  the  winning  strategy  against  you!  

Page 88: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-046JFall-2005/VideoLectures/detail/embed04.htm

Page 89: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Random  pivot  

 Select  pivot              randomly  from  the  region  (blue)    and  swap  with  last  posiGon  

Select  pivot  as  a  median  of  3  [or  more]  random  values  from  region  

 Apply  non-­‐recursive  sort  for  array  less  than  10-­‐20      

L R

L R

<= pivot

> pivot i j

L R i j

Page 90: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 91: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 92: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 93: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 94: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 95: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 96: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 97: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 98: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 99: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 100: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 101: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

AlternaGve  materials  

•  Quicksort  average  case  analysis  h{p://eid.ee/10z    

•  h{ps://secweb.cs.odu.edu/~zeil/cs361/web/website/Lectures/quick/pages/ar01s05.html      

•  h{p://eid.ee/10y  -­‐  MIT  Open  Courseware  -­‐  AsymptoGc  notaGon,  Recurrences,  Subs.tuton  Master  Method  

•     

Page 102: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 103: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 104: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 105: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 106: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 107: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 108: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 109: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 110: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 111: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 112: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 113: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 114: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 115: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Back  to  sorGng  

Page 116: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

We  can  sort  in  O(n  log  n)  

•  Is  that  the  best  we  can  do  ?  

•  Remember:  using  comparisons  <,  >  ,  <=  ,  >=  we  can  not  do  beRer  than  O(n  log  n)    

Page 117: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

How  fast  can  we  sort  n  integers?  

•  E.g.  sort  people  by  year  of  birth?    

•  Sort  people  by  their  sex?  

Page 118: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 119: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 120: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 121: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 122: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 123: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 124: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 125: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 126: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 127: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 128: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Radix  sort  

 Radix-­‐Sort(A,d)  1.   for  i  =  1  to  d    /*  least  significant  to  most  significant  */  2.               use  a  stable  sort  to  sort  A  on  digit  i  

Page 129: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 130: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 131: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 132: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 133: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 134: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 135: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 136: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Radix  sort  using  lists  (stable)  bba  bbb   adb   aad   aac   ccc   ccb   aba   cca  

Page 137: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Radix  sort  using  lists  (stable)  

a  

b  

c  

d  

bba   aba   cca  

bbb   adb   ccb  

aac   ccc  

aad  

1.

bba  bbb   adb   aad   aac   ccc   ccb   aba   cca  

Page 138: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Radix  sort  using  lists  (stable)  

a  

b  

c  

d  

bba   aba   cca  

bbb   adb   ccb  

aac   ccc  

aad  

a  

b  

c  

d  

cca  

bbb  

adb  

ccb  

aac  

ccc  

aad  

bba   aba  

a  

b  

c  

d  

cca  

bbb  

adb  

ccb  

aac  

ccc  

aad  

bba  

aba  

1. 2.

3.

bba  bbb   adb   aad   aac   ccc   ccb   aba   cca  

Page 139: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Why  not  from  leY  to  right  ?    

•  Swap  ‘0’  with  first  ‘1’  •  Idea  1:  recursively  sort  first  and  second  half  

– Exercise  ?      

0101100 1001010 1111000 1001001 0010010 1001001 0101000 0010000

0101100 0010010 1111000 1001001 1001010 1001001 0101000 0010000

0101100 0010010 0101000 1001001 1001010 1001001 1111000 0010000

0101100 0010010 0101000 0010000 1001010 1001001 1111000 1001001

Page 140: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Bitwise  sort  leY  to  right  

•  Idea2:  – swap  elements  only  if  the  prefixes  match…  

– For  all  bits  from  most  significant  •  advance  when  0  •  when  1  -­‐>  look  for  next  0  

–  if  prefix  matches,  swap  –  otherwise  keep  advancing  on  0’s  and  look  for  next  1  

Page 141: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Bitwise  leY  to  right  sort  /*  Historical  sor.ng  –  was  used  in  Univ.  of  Tartu  using  assembler….  */  /*  C  implementa.on  –  Jaak  Vilo,  1989  */    void  bitwisesort(  SORTTYPE  *ARRAY  ,  int  size  )      {                  int  i,  j,  tmp,  nrbits  ;                    register  SORTTYPE  mask  ,  curbit  ,  group  ;                    nrbits  =  sizeof(  SORTTYPE  )  *  8  ;                    curbit  =  1  <<  (nrbits-­‐1)  ;            /*  set  most  significant  bit  1  */                  mask      =  0;                                                          /*  mask  of  the  already  sorted  area    */          Jaak Vilo, Univ. of Tartu

Page 142: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

               do  {                                                        /*  For  each  bit  */                                    i=0;  new_mask:                                    for(  ;  (  i  <  size  )  &&  (  !  (ARRAY[i]  &  curbit)  )  ;  i++  )    ;                /*  Advance  while  bit  ==  0  */                                  if(  i  >=  size  )  goto  array_end  ;                                    group  =  ARRAY[i]  &    mask  ;                                /*  Save  current  prefix  snapshot  */                                    j=i;                                                                              /*  memorize    locaGon  of  1  */                                  for(  ;  ;  )  {                                                  if  (  ++i  >=  size  )  goto  array_end  ;                                        /*    reached  end  of  array  */                                                  if  (  (ARRAY[i]  &  mask)  !=  group)  goto    new_mask  ;            /*  new  prefix  */                                                                                                                    if  (  !  (ARRAY[i]  &  curbit)  )    {        /*    bit  is  0  –  need  to  swap  with  previous  locaGon  of  1,  A[i]  ó  A[j]      */                                                        tmp    =  ARRAY[i];      ARRAY[i]    =  ARRAY[j];        ARRAY[j]  =  tmp  ;      j  +=  1  ;  /*  swap  and  increase  j  to  the  

next  possible  1  */                                                }                                  }  array_end:                                  mask  =  mask  |  curbit  ;                        /*  area  under  mask  is  now  sorted  */                                  curbit  >>=  1    ;                                                      /*  next  bit                                              */                  }    while(  curbit  );                                                            /*  unGl  all  bits  have  been  sorted…                */  }  

Jaak Vilo, Univ. of Tartu

Page 143: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Bitwise  from  leY  to  right  

•  Swap  ‘0’  with  first  ‘1’  

0010000 0010010 0101000 0101100 1001010 1001001 1001001 1111000

Jaak Vilo, Univ. of Tartu

Page 144: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Bucket  sort  

•  Assume  uniform  distribuGon  

•  Allocate  O(n)  buckets  

•  Assign  each  value  to  pre-­‐assigned  bucket  

Page 145: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

.78  

.17  

.39  

.26  

.72  

.94  

.21  

.12  

.23  

.68  

/  

/  

/  

1

0

Sort  small  buckets  with  inserGon  sort  

3

2

5

4

7

6

9

8

.12   .17  

.21   .23   .26  

.39  

.68  

.72   .78  

.94  

Page 146: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

h{p://sortbenchmark.org/    

•  The  sort  input  records  must  be  100  bytes  in  length,  with  the  first  10  bytes  being  a  random  key    

•  Minutesort  –  max  amount  sorted  in  1  minute  –  116GB  in  58.7  sec  (Jim  Wyllie,  IBM  Research)  –  40-­‐node  80-­‐Itanium  cluster,  SAN    array  of  2,520  disks  

•  2009,  500  GB  Hadoop    1406  nodes  x  (2  Quadcore  Xeons,  8  GB  memory,  4  SATA)    Owen  O'Malley  and  Arun  Murthy    Yahoo  Inc.    

•  Performance  /  Price  Sort  and  PennySort  

Page 147: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 148: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 149: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 150: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Sort  Benchmark  •  h{p://sortbenchmark.org/    •  Sort  Benchmark  Home  Page  •  We  have  a  new  benchmark  called  new      GraySort,  new      in  memory  of  the  father  of  the  sort  

benchmarks,  Jim  Gray.  It  replaces  TeraByteSort  which  is  now  reGred.    •  Unlike  2010,  we  will  not  be  accepGng  early  entries  for  the  2011  year.  The  deadline  for  

submitng  entries  is  April  1,  2011.  –  All  hardware  used  must  be  off-­‐the-­‐shelf  and  unmodified.  –  For  Daytona  cluster  sorts  where  input  sampling  is  used  to  determine  the  output  parGGon  boundaries,  the  input  

sampling  must  be  done  evenly  across  all  input  parGGons.  

New  rules  for  GraySort:  •  The  input  file  size  is  now  minimum  ~100TB  or  1T  records.  Entries  with  larger  input  sizes  also  

qualify.    •  The  winner  will  have  the  fastest  SortedRecs/Min.      •  We  now  provide  a  new  input  generator  that  works  in  parallel  and  generates  binary  data.  See  

below.      •  For  the  Daytona  category,  we  have  two  new  requirements.  (1)  The  sort  must  run  

conGnuously  (repeatedly)  for  a  minimum  1  hour.  (This  is  a  minimum  reliability  requirement).  (2)  The  system  cannot  overwrite  the  input  file.      

Page 151: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Order  sta.s.cs  

•  Minimum  –  the  smallest  value  •  Maximum  –  the  largest  value  •  In  general  i’th  value.  •  Find  the  median  of  the  values  in  the  array  •  Median  in  sorted  array  A  :  

– n  is  odd    A[(n+1)/2]  – n  is  even  –  A[    (n+1)/2    ]  or  A[  (n+1)/2  ]  

Page 152: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Order  staGsGcs  

•  Input:  A  set  A  of  n  numbers  and  i,      1≤i≤n  •  Output:  x  from  A  that  is  larger  than  exactly  i-­‐1  elements  of  A  

Page 153: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Q:  Find  i’th  value  in  unsorted  data    

A.  O(  n  )  B.  O(  n  log  log  n  )    C.  O(n  log  n)    D.  O(  n  log2  n  )  

Page 154: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Minimum  

Minimum(A)  1  min  =  A[1]  2  for  i  =  2  to  length(A)  3    if  min  >  A[i]    4    then  min  =  A[i]  5  return  min    n-­‐1  comparisons.      

Page 155: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Min  and  max  together  

•  compare  every  two  elements  A[i],A[i+1]  •  Compare  larger  against  current  max  •  Smaller  against  current  min  

•  3    n  /  2  

Page 156: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

SelecGon  in  expected  O(n)  

Randomised-­‐select(  A,  p,  r,  i  )  if  p=r  then  return  A[p]  q  =  Randomised-­‐ParGGon(A,p,r)  k=  q  –  p  +  1    //  nr  of  elements  in  subarr  if  i<=  k    then  return  Randomised-­‐ParGGon(A,p,q,i)    else    return  Randomised-­‐ParGGon(A,q+1,r,i-­‐k)  

 

Page 157: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Conclusion  

•  SorGng  in  general  O(  n  log  n  )  •  Quicksort  is  rather  good  

•  Linear  Gme  sorGng  is  achievable  when  one  does  not  assume  only  direct  comparisons  

•  Find  i’th  value  –  expected  O(n)  

•  Find  i’th  value:  worst  case  O(n)  –  see  CLRS  

Page 158: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Lists:  Array  

3   6   7   5   2  

0 1 size MAX_SIZE-1

3   6   7   8   2  

0 1 size

5   2   Insert 8 after L[2]

3   6   7   8   2  

0 1 size

5   2   Delete last

Page 159: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Linked  lists  head tail

head tail

Singly linked

Doubly linked

Page 160: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Ok…  

•  lists  –  a  versaGle  data  structure  for  various  purposes  

•  SorGng  –  a  typical  algorithm  (many  ways)  •  Which  sorGng  methods  for  array/list?  

•  Array:  most  of  the  important  (e.g.  update)  tasks  seem  to  be  O(n),  which  is  bad  

Page 161: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

•  Array  indexed  from  0  to  n – 1:  

•  Singly-­‐linked  list  with  head  and  tail  pointers  

•  Doubly  linked  list  

k = 1 1 < k < n k = n access/change the kth element O(1) O(1) O(1) insert before or after the kth element O(n) O(n) O(1)

delete the kth element O(n) O(n) O(1)

k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)

delete the kth element O(1) O(n) O(n)

k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(1)1 O(1)

delete the kth element O(1) O(1)1 O(1)

Page 162: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Q:  search  for  a  value  X  in  linked  list?  

A.  O(1)  

B.  O(  log  n  )    

C.  O(  n  )    

Page 163: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Can  we  search  faster  in  linked  lists?  

 •  Why  sort  linked  lists  if  search  anyway  O(n)?  

•  Linked  lists:  – what  is  the  “mid-­‐point”  of  any  sublist  ?  – Therefore,  binary  search  can  not  be  used…    

•  Or  can  it  ?    

Page 164: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Skip  List  

Page 165: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Skip  lists  

•  Build  several  lists  at  different  “skip”  steps  

•  O(n)  list  •  Level  1:    ~  n/2    •  Level  2:    ~  n/4    •  …  •  Level  log  n    ~  2-­‐3  elements…  

Page 166: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   166  

Skip  Lists  

+∞ -∞

S0

S1

S2

S3

+∞ -∞ 10 36 23 15

+∞ -∞ 15

+∞ -∞ 23 15

Page 167: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Skip  List  

typedef  struct  nodeStructure  *node;  typedef  struct  nodeStructure{          keyType  key;          valueType  value;          node  forward[1];  /*  variable  sized  array  of  forward  pointers  */  };  

Page 168: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   168  

What  is  a  Skip  List  •  A  skip  list  for  a  set  S  of  disGnct  (key,  element)  items  is  a  series  of  lists  

S0, S1 , … , Sh  such  that  –  Each  list  Si  contains  the  special  keys  +∞  and  -∞    –  List  S0  contains  the  keys  of  S in  nondecreasing  order   –  Each  list  is  a  subsequence  of  the  previous  one,  i.e.,  

     S0 ⊆ S1 ⊆ … ⊆ Sh –  List  Sh contains  only  the  two  special  keys  

•  We  show  how  to  use  a  skip  list  to  implement  the  dicGonary  ADT  

56 64 78 +∞ 31 34 44 -∞ 12 23 26

+∞ -∞

+∞ 31 -∞

64 +∞ 31 34 -∞ 23

S0

S1

S2

S3

Page 169: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

IllustraGon  of  lists  

-­‐inf   inf  25   30   47   99  17  3  

Key Right[ .. ] - right links in array Left[ .. ] - left links in array, array size – how high is the list

Page 170: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   170  

Search  •  We  search  for  a  key  x  in  a  a  skip  list  as  follows:  

–  We  start  at  the  first  posiGon  of  the  top  list    –  At  the  current  posiGon  p,  we  compare  x  with  y ← key(after(p))

x = y:  we  return  element(after(p)) x > y:  we  “scan  forward”     x < y:  we  “drop  down”  

–  If  we  try  to  drop  down  past  the  bo{om  list,  we  return  NO_SUCH_KEY •  Example:  search  for  78  

+∞ -∞

S0

S1

S2

S3

+∞ 31 -∞

64 +∞ 31 34 -∞ 23

56 64 78 +∞ 31 34 44 -∞ 12 23 26

Page 171: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   171  

Randomized  Algorithms  •  A  randomized  algorithm  

performs  coin  tosses  (i.e.,  uses  random  bits)  to  control  its  execuGon  

•  It  contains  statements  of  the  type  

b ← random() if b = 0 do A … else { b = 1} do B …

•  Its  running  Gme  depends  on  the  outcomes  of  the  coin  tosses  

•  We  analyze  the  expected  running  Gme  of  a  randomized  algorithm  under  the  following  assumpGons  –  the  coins  are  unbiased,  and    –  the  coin  tosses  are  

independent  •  The  worst-­‐case  running  Gme  of  

a  randomized  algorithm  is  oYen  large  but  has  very  low  probability  (e.g.,  it  occurs  when  all  the  coin  tosses  give  “heads”)  

•  We  use  a  randomized  algorithm  to  insert  items  into  a  skip  list  

Page 172: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   172  

•  To  insert  an  item  (x, o)  into  a  skip  list,  we  use  a  randomized  algorithm:  –  We  repeatedly  toss  a  coin  unGl  we  get  tails,  and  we  denote  with  i the  

number  of  Gmes  the  coin  came  up  heads  –  If  i ≥ h,  we  add  to  the  skip  list  new  lists  Sh+1, … , Si +1,  each  containing  

only  the  two  special  keys  –  We  search  for  x in  the  skip  list  and  find  the  posiGons  p0, p1 , …, pi of  the  

items  with  largest  key  less  than  x  in  each  list  S0, S1, … , Si  –  For  j ← 0, …, i,  we  insert  item  (x, o)  into  list  Sj  aYer  posiGon  pj

•  Example:  insert  key  15,  with  i = 2

InserGon  

+∞ -∞ 10 36

+∞ -∞

23

23 +∞ -∞

S0

S1

S2

+∞ -∞

S0

S1

S2

S3

+∞ -∞ 10 36 23 15

+∞ -∞ 15

+∞ -∞ 23 15 p0

p1

p2

Page 173: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   173  

DeleGon  

•  To  remove  an  item  with  key  x from  a  skip  list,  we  proceed  as  follows:  –  We  search  for  x in  the  skip  list  and  find  the  posiGons  p0, p1 , …, pi of  the  

items  with  key  x,  where  posiGon  pj  is  in  list  Sj

–  We  remove  posiGons  p0, p1 , …, pi  from  the  lists  S0, S1, … , Si  –  We  remove  all  but  one  list  containing  only  the  two  special  keys  

•  Example:  remove  key  34  

-∞ +∞ 45 12

-∞ +∞

23

23 -∞ +∞

S0

S1

S2

-∞ +∞

S0

S1

S2

S3

-∞ +∞ 45 12 23 34

-∞ +∞ 34

-∞ +∞ 23 34 p0

p1

p2

Page 174: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   174  

ImplementaGon  v2    

•  We  can  implement  a  skip  list  with    quad-­‐nodes  

•  A  quad-­‐node  stores:  –  item  –  link  to  the  node  before  –  link  to  the  node  aYer  –  link  to  the  node  below  –  link  to  the  node  aYer  

•  Also,  we  define  special  keys  PLUS_INF  and  MINUS_INF,  and  we  modify  the  key  comparator  to  handle  them      

x

quad-node

Page 175: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   175  

Space  Usage  

•  The  space  used  by  a  skip  list  depends  on  the  random  bits  used  by  each  invocaGon  of  the  inserGon  algorithm  

•  We  use  the  following  two  basic  probabilisGc  facts:  Fact  1:  The  probability  of  getng  i  

consecuGve  heads  when  flipping  a  coin  is  1/2i

Fact  2:  If  each  of  n  items  is  present  in  a  set  with  probability  p,  the  expected  size  of  the  set  is  np

•  Consider  a  skip  list  with  n  items  –  By  Fact  1,  we  insert  an  item  in  

list  Si  with  probability  1/2i  

–  By  Fact  2,  the  expected  size  of  list  Si  is  n/2i    

•  The  expected  number  of  nodes  used  by  the  skip  list  is  

nnn h

ii

h

ii 2

21

2 00<= ∑∑

==

" Thus, the expected space usage of a skip list with n items is O(n)

Page 176: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   176  

Height  

•  The  running  Gme  of  the  search  an  inserGon  algorithms  is  affected  by  the  height  h  of  the  skip  list  

•  We  show  that  with  high  probability,  a  skip  list  with  n  items  has  height  O(log n)

•  We  use  the  following  addiGonal  probabilisGc  fact:  Fact  3:  If  each  of  n  events  has  

probability  p,  the  probability  that  at  least  one  event  occurs  is  at  most  np

•  Consider  a  skip  list  with  n  items  –  By  Fact  1,  we  insert  an  item  in  list  

Si  with  probability  1/2i  

–  By  Fact  3,  the  probability  that  list  Si  has  at  least  one  item  is  at  most  n/2i  

•  By  picking  i = 3log n,  we  have  that  the  probability  that  S3log n  has  at  least  one  item  is  at  most  

   n/23log n = n/n3 = 1/n2

•  Thus  a  skip  list  with  n  items  has  height  at  most  3log n  with  probability  at  least  1 - 1/n2

Page 177: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   177  

Search  and  Update  Times  •  The  search  Gme  in  a  skip  list  is  

proporGonal  to  –  the  number  of  drop-­‐down  steps,  

plus  –  the  number  of  scan-­‐forward  

steps  •  The  drop-­‐down  steps  are  

bounded  by  the  height  of  the  skip  list  and  thus  are  O(log n) with  high  probability  

•  To  analyze  the  scan-­‐forward  steps,  we  use  yet  another  probabilisGc  fact:  Fact  4:  The  expected  number  of  

coin  tosses  required  in  order  to  get  tails  is  2  

•  When  we  scan  forward  in  a  list,  the  desGnaGon  key  does  not  belong  to  a  higher  list  –  A  scan-­‐forward  step  is  associated  

with  a  former  coin  toss  that  gave  tails

•  By  Fact  4,  in  each  list  the  expected  number  of  scan-­‐forward  steps  is  2  

•  Thus,  the  expected  number  of  scan-­‐forward  steps  is    O(log n)  

•  We  conclude  that  a  search  in  a  skip  list  takes  O(log n) expected  Gme  

•  The  analysis  of  inserGon  and  deleGon  gives  similar  results  

Page 178: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

12.09.13  21:42   Skip  Lists   178  

Summary  •  A  skip  list  is  a  data  structure  

for  dicGonaries  that  uses  a  randomized  inserGon  algorithm  

•  In  a  skip  list  with  n  items    –  The  expected  space  used  is  O

(n) –  The  expected  search,  

inserGon  and  deleGon  Gme  is  O(log n)

•  Using  a  more  complex  probabilisGc  analysis,  one  can  show  that  these  performance  bounds  also  hold  with  high  probability  

•  Skip  lists  are  fast  and  simple  to  implement  in  pracGce  

Page 179: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media
Page 180: Lecture 04 Linear Structures Sort - ut · Linear,(sequenGal,(ordered,(list…(Memory, disk, tape etc – is an ordered sequentially addressed media

Conclusions  

•  Abstract  data  types  hide  implementa.ons  •  Important  is  the  funcGonality  of  the  ADT  •  Data  structures  and  algorithms  determine  the  speed  of  the  operaGons  on  data  

•  Linear  data  structures  provide  good  versaGlity  •  SorGng  –  a  most  typical  need/algorithm  •  SorGng  in  O(n  log  n)      Merge  Sort,  Quicksort  •  Solving  Recurrences  –  means  to  analyse  •  Skip  lists  –  log  n    randomised  data  structure