MSV Incognito€¦ · Web view3.1.4 Insertion in a Binary Search Tree. To perform operation insertItem(k, o): 1) Search for key k. 2) If k is not already in the tree, let w be the

Summary – DSA, KE 2008, Year 1

Helpful:Animations of all search algorithms / ADTs

Author Notes:GeneralIn essence, the ADTs given in DSA should not be looked on in a case by case comparison.To reach a solution it is vital to understand that ADTs can be combined, and often more complex ADTs are only several smaller ADTs combined. An ADT is nothing more than a rhetorical description of how a object is supposed to work. NOT the description of how it should be implemented.(Think of it as JAVA abstract classes or interfaces, and per example, a JAVA Vector is a implementation of the Sequence ADT & Vector ADT.)

1

http://www.ansatt.hig.no/frodeh/algmet/animate.html

Table of Contents

HELPFUL:....................................................................................................................................................... 1

CHAPTER 1 - INTRODUCTION.......................................................................................................................... 3

1.1– PSUEDO CODE................................................................................................................................................31.2– ASYMPTOTIC NOTATION...................................................................................................................................31.3- QUICK MATH REVIEW......................................................................................................................................4

CHAPTER 2 – BASIC DATA STRUCTURES.......................................................................................................... 5

2.1 – STACKS AND QUEUES......................................................................................................................................52.2 – VECTORS, LISTS AND SEQUENCES......................................................................................................................72.5 DICTIONARIES AND HASH TABLES.......................................................................................................................28

CHAPTER 3 – SEARCH TREES AND SKIP LISTS.................................................................................................30

3.1 – ORDERED DICTIONARIES AND BINARY SEARCH TREES..........................................................................................303.2 – AVL TREES.................................................................................................................................................333.4 – SPLAY TREES...............................................................................................................................................35

CHAPTER 4................................................................................................................................................... 37

4.1 MERGE-SORT.................................................................................................................................................374.2 THE SET ABSTRACT DATA TYPE..........................................................................................................................384.3 QUICK SORT..................................................................................................................................................394.4 A LOWER BOUND ON COMPARISON-BASED SORTING............................................................................................414.5 BUCKET-SORT AND RADIX-SORT........................................................................................................................414.6 COMPARISON OF SORTING ALGORITHMS.............................................................................................................42

CHAPTER 5: FUNDAMENTAL TECHNIQUES....................................................................................................44

5.2 DIVIDE-AND-CONQUER....................................................................................................................................445.3 DYNAMIC PROGRAMMING................................................................................................................................44

CHAPTER 6: GRAPHS.................................................................................................................................... 46

6.1 THE GRAPH ADT............................................................................................................................................466.2 DATA STRUCTURES FOR GRAPHS........................................................................................................................476.3 GRAPH TRAVERSAL..........................................................................................................................................486.4 DIRECTED GRAPH............................................................................................................................................49

CHAPTER 9: TEXT PROCESSING..................................................................................................................... 51

9.1 STRINGS AND PATTERN MATCHING ALGORITHMS..................................................................................................519.2 TRIES............................................................................................................................................................51

CHAPTER 12: COMPUTATIONAL GEOMETRY.................................................................................................54

12.1 RANGE TREES...............................................................................................................................................5412.3 QUADTREES AND K-D TREES...........................................................................................................................54

2

Chapter 1 - Introduction

1.1 – Psuedo CodePsuedo Code: Code written for a human reader, not a computer.

Structure:

Expressions: Assignment: ← , Comparators: <,>,=, ≤, ≥, ≠

Method Declaration: Algorithm <name> (param1, param2, … )

Decision Structurs: if (condition) then [true-action] else [false-action]

While-loops: while (condition) do [action]

Repeat-loops: repeat [action] until (condition)

For-loops: for (variable-increment-definition) do [action]

Array-Indexing: A[i], ith cell of array A

Method Calls: object.method(args)

Method returns: return (value)

Average Case running time : (Worst Case running time + Best Case running time) / 2Always count Worst Case running time

1.2 – Asymptotic NotationWays to define running time of an algorithm:

Big-Oh: “less than or equal to” f(n) <= g(n) ; f(n) * c <= g(n) * e for all n after n, including n“f(n) is order of g(n)”, “f(n) is big-Oh of g(n)” , “ f(n) is O(g(n))”

Big-Omega: “greater than or equal to” f(n) >= g(n) ; f(n) * c >= g(n) * e for all n after n, including n)

“f(n) is big-Omega of g(n)”, “ f(n) is Ω (g(n))”

Big-Theta: “equal” f(n) <= g(n) ; f(n) * c <= g(n) * e for all n after n, including n“f(n) is big-Theta of g(n)” , “ f(n) is Θ (g(n))”

Difference between Big- and Little-Oh:Big-Oh: ∃ - There exists an constant c or e: f(n)*c is O(g(n)*e)Little-Oh: ∀ - For all constants c or e: f(n)*c is O(g(n)*e)

1.3 - Quick Math ReviewLog rules

Log b (a) = c if a = b ^ c

1 Log b ac = log b (a) + log b (c)

2 Log b (a/c) = log b (a) – log b (c)

3 Log b (ac) = c log b (a)

4 Log b (a) = (log c (a)) / log c (b)

5 b ^ ( log c (a)) = a(log c (b))

6 (ba)c = b(a*c)

7 (ba) *(bc) = b(a+c)

8 ba / bc = b(a-c)

⌈x ⌉ = largest integer less than or equal to x⌊x ⌋ = smallest integer greater than or equal to xJustification techniques- Counterexample

- Contra positive

- Contradiction

- Induction

- Loop invariant

Functions by Growth Rate:

log nlog² n

√nn

n log nn²n³2n

Chapter 2 – Basic Data Structures

2.1 – Stacks and Queues2.1.1 – StackContainer of objects that are inserted and removed according to last-in first out (LIFO)

ADT:

push(o): Insert object o at top of stackpop(o): Remove and return last object inserted into stack. Error if stack is empty

size(): Return number of objects in stackisEmpty(): Return Boolean indicating if stack is emptytop(): Return the last object, without removing it. Error if stack is empty

Additional InformationStack uses a variable t to keep track of the number of objects within Stack.

Psuedo Code - Stack Array:

Algorithm push(o):

If size() = N thenindicate that stack-full error has occurred.

t ← t+1S[t] ← o

Algorithm pop():

if isEmpty() thenindicate that a stack-empty error has occurred.

e ← S[t]S[t] ← nullt ← t-1return e

2.1.2 - QueueContainer of objects that are inserted and removed according to first-in first-out (FIFO)Objects enter Queue at the rear, are removed from front

ADT:

enqueue(o): Insert object o at the rear of queuedequeue (): Remove and return object inserted at the front. Error if queue is empty

size(): Return number of objects in queueisEmpty(): Return Boolean indicating if queue is emptyfront(): Return the front object, without removing it. Error if queue is empty

Additional InformationQueue uses 2 variables, f & r, to keep track of the cell storing the first object and the first free cell, respectively.N is the number of cells within the Array containing all the objects. (size of array for holding objects)Queue is empty if f = r.

Psuedo Code - Stack Array:

Algorithm enqueue():

If size() = N thenthrow a QueueFullException

Q[r] ← or ← (r+1) mod N

Algorithm dequeue (o):

if isEmpty() thenthrow a QueueEmptyException

temp ← Q[f]Q[f] ← nullf ← (f+1) mod Nreturn temp

2.2 – Vectors, Lists and Sequences2.2.1 – VectorsLinear sequence that supports access to its elements according to rank.

ADT:

elemAtRank(r): Return object at rank r. Error if r < 0 or r > n-1replaceAtRank (r,e): Replace object at rank r with e. Error if r < 0 or r > n-1insertAtRank(r,e): Insert e into Vector, and give it rank r. Error if r < 0 or r > nremoveAtRank(r): Remove object at rank r. Error if r < 0 or r > n-1

size(): Return number of objects in queueisEmpty(): Return Boolean indicating if queue is empty

Additional InformationVector contains n elements. [0] = first element, [n-1] = last element. Running times

Method Timesize() O(1)isEmpty() O(1)elemAtRank(r) O(1)replaceAtRank(r, e) O(1)insertAtRank(r, e) O(n)removeAtRank(r) O(n)

Psuedo Code - Vector Array:

Algorithm insertAtRank(r, e):

for i = n-1, n-2, n-3, … , r doA[i+1] ← A[i] // Make room for new element

A[r] ← en ← n+1

Algorithm removeAtRank (r):

temp ← A[r]for i = r, r+1, r+2, … , n-2 do

A[i] ← A[i+1] // fill in for the removed elementn ← n-1return temp

2.2.2 – ListsLinear sequence that supports access to its elements according to so called Nodes.A Node is a container which keeps a reference to the node before and after it.From now on, a Node will be called a Position.

ADT:

first(): Return the position of the first element of List. Error if List is empty.last(): Return the position of the last element of List. Error if List is empty.isFirst(p): Return Boolean indicating if p is first position within list.isLast(p): Return Boolean indicating if p is last position within list.before(p): Return the position before the position p. Error if p is first.after(p): Return the position after the position p. Error if p is first.

replaceElement(p, e): Replace element at p with e, Return element prev. at position p.swapElements(p, q): Swap elements; move p to q, q to p.insertFirst(e): Insert e into List as first element. (not replace)insertLast(e): Insert e into List as last element. (not replace)insertBefore(p, e): Insert e before position p into List.insertAfter(p, e): Insert e after position p into List.remove(p): Remove element at position p from List.

Position ADT:A Position is always defined relative, ie. “after” or “before” another position. Each position contains the object we want to store at the position.

element(): Return object contained within position

Linked List ImplementationA Linked List is a direct implementation of a List ADT. We also need to update the definition of a Position. In a singly linked-list, the position stores a reference to the position coming before it, prev(). In a doubly linked-list, the before AND after position references are stored, prev() & next().

// Note: you can also implement variable wise, instead of method wise, like:Position.next ← position.prev instead of Position.next() ← position.prev()

To simplify matters, special positions called Sentinel positions are stored at the beginning and the end of the List.

Psuedo Code – Linked List:

Algorithm insertAfter(p, e):

Create a new node vv.element ← ev.prev ← p // Link to predecessorv.next ← p.next // Link v to successor(p.next).prev ← v // link p’s old successor to vp.next ← v // link p to its new successor, vreturn v // the position for the element e

Algorithm remove (p):

t ← p.element // Temp variable for element(p.prev).next ← p.next // Link out p(p.next).prev ← p.prevp.prev ← null // invalidate position pp.next ← nullreturn t

2.2.3 SequencesAn ADT that supports both the Vector and List ADT.

ADT:

atRank(r): Return the position of the element with rank r.rankOf(p): Return the rank of the element at position p.

Additional InformationCan be either implemented with an Array or an Doubly Linked List.An Array implementation takes O(N) space, a DLList takes O(n) space.(Array is statically sized at a start and always takes N space, a DLList grows)

Running times

Operations Array Listsize, isEmpty O(1) O(1)atRank, rankOf, elemAtRank O(1) O(n)first,last, before, after O(1) O(1)replaceElements, swapElements O(1) O(1)replaceAtRank O(1) O(n)insertAtRank, removeAtRank O(n) O(n)insertFirst, insertLast O(1) O(1)insertAfter, insertBefore O(n) O(1)remove O(n) O(1)

Iterator ADTAn Iterator is a object that can go through a collection of elements, one element at a time.An Iterator consists of a Sequence and a current Position. (it extends Position ADT)

hasNext(): Test whether there are elements left in the IteratornextObject(): Return and remove the next element in the Iterator.

2.3 Trees2.3.1 Tree ADTA Tree is a ADT that stores elements hierarchically. Elements are stored in a parent-child relationship. Every element has zero or more children and one parent. (except the root, which is the initial element and has no parent)Children with a same parent are siblings. Nodes are called external or leaves when they have no children. Internal when they have one or more. Subtree of a tree at v is a tree consisting of all children of v, with v as the root.Ancestor is a parent of the node, or a parent’s parent, etc….Descendent, same, but with children. v is descendent from p, p is an ancestor of v.Ordered tree is a tree that has a linear ordering for all it’s children. (You know which element is first, second, third)Binary tree is a ordered tree where every node has at most two children.A left child and a right child. Which subsequently give a left subtree and a right subtree.

ADT:

// Accessor Methodsroot(): Return the root of the tree.parent(v): Return the parent of the node v. Error if v is rootchildren(v): Return Iterator containing all children of node v.

// Query MethodsisInternal(v): Test whether node v is internal.isExternal(v): Test whether node v is external.isRoot(v): Test whether node v is the root.

// Generic Methodssize(): Return the number of nodeselements(): Return the number of elements stored in the nodespositions(): Return Iterator containing all nodesswapElements(v, w): Swap elements stored at nodes v and wreplaceElement(v, e): Replace e and return element @ node v

Additional InformationDepth of a node is the amount of its ancestors, excluding node itself.Height of a tree is the maximum depth of a external node.Or; height of node v is 1 + maximum height of a child of v. Maximum height is height(T, root).

Algorithm depth(T, v):

If T.isRoot(v) thenreturn 0

elsereturn 1 + depth(T,T.parent(v))

Runs in O(1 + dv), dv: depth of node v in tree T

Algorithm height(T, v):

If T.isExternal(v) thenreturn 0

else

h = 0 for each w ∊ T.children(v) do h = max( h, height(T, w)) return 1 + h

Runs in O(n), n: number of nodes within tree T. If called on root that is. It goes through all children of node v on which it is called, thus a complete tree traversal.

Running times

Method Timeroot(), parent() O(1)isInternal(v), isExternal(v),isRoot(v) O(1)children(v) O(cv), cv: number of childrenswapElements(v,w), replaceElement(v,e) O(1)elements(), positions() O(n), n: nodes in tree

Tree TraversalLink to tree traversal applet very good representation of all traversal methods

Preorder traversalJust traverses the children of the starting note, visiting every node as it comes along. Gives linear order of nodes where children com after a parent.

Algorithm preorder(T, v): Perform “visit” action for node v //Whatever you want + set node as “visited” for each child w of v do preorder(T, w) //recursively traverse subtree at w

Runs in O(n), same as height(T, v)

http://www.csc.liv.ac.uk/~ped/teachadmin/COMP102/ProgramExamples/ExampleAnimations/TraversalAnimation.html

Postorder traversalWill visit a node after it has traversed every descendent of that node.Used if you need to know the information of all children before you can compute the value of a parent. Per example: sizes of files in a directory.

Algorithm postorder(T, v): for each child w of v do postorder(T, w) //recursively traverse subtree at w Perform “visit” action for node v //Whatever you want + set node as “visited”

Runs in O(n), same as height(T, v)

2.3.3 Binary TreesOrdered tree in which every internal node has exactly two children.

ADT:

leftChild(v): Return left child of v. Error if v is externalrightChild(v): Return right child of v. Error if v is externalsibling(v): Return sibling node of v. Error if v is root.

If tree is improper binary tree (not every internal node has 2 children), extra errors may occur. Per example, there may not be a sibling.

Addition InformationNode with same depth d as another node is at the same level.- Number of external nodes is at least h+1 and most 2h- Number of internal nodes is at least h and most 2 h-1- Total number of nodes is at least 2h+1 and at most 2h+1-1- Height is at least log(n+1)-1 and at most (n-1)/2

Inorder traversalCan be seen as going through the tree “from left to right”. First left child, parent, right child.

Algorithm inorder(T, v): if v is an internal node then inorder(T, T.leftChild(v)) // Go through left subtree perform the “visit” action for node v // Mark node “visited if v is an internal node then inorder(T, T.rightChild(v)) // Go through right subtree

Binary tree adapted Preorder and Postorder:

Algorithm binaryPreorder(T, v): perform the “visit” action for node v // Mark node “visited if v is an internal node then inorder(T, T.leftChild(v)) // Go through left subtree inorder(T, T.rightChild(v)) // Go through right subtree

Algorithm binaryPostorder (T, v):if v is an internal node then

inorder(T, T.leftChild(v)) // Go through left subtree inorder(T, T.rightChild(v)) // Go through right subtree

perform the “visit” action for node v // Mark node “visited

Euler tour traversalAn algorithm performing a uniform way of traversing a tree, it will encounter every node three times: from the left, from below and from the right. A Preorder traversal is a Euler tour with the “visit” actions performing when you encounter a node from the left.A Inorder traversal is a Euler tour with the “visit” actions performing when you encounter a node from below.A Postorder traversal is a Euler tour with the “visit” actions performing when you encounter a node from the right.

Algorithm eulerTour(T,v):Perform left “visit” actionif v is an internal node then eulerTour(T, T.leftChild(v) // Traverse left subtreePerform below “visit” actionif v is an internal node then eulerTour(T, T.rightChild(v) // Traverse right subtree Perform below “right” action

Runs in O(n) time: n number of nodes in tree T

2.3.4 Data Structures for Representing TreesVector-Based Binary Tree structureBased on the premise that every node has a number. Also known as level numbering.p(v) is the method that will return the number of the node. The Vector has the size N = Pm+1Pm being the maximum value of p(v), +1 because the numbers start at 1, not 0.

if v is root: p(v) = 1 if v is left child of u: p(v) = 2p(u) if v is right child u: p(v)=2p(u)+1

Additional InformationRunning times

Method Timeelements, positions O(n), n: nodes in treeswapElements, replaceElement O(1)root, parent, children O(1)leftChild, rightChild, sibling O(1)isInternal, isExternal, isRoot O(1)

Linked Structure for Binary TreeTree in which every node is represented by a position which contains a reference its element, and the positions of the left child, right child, and the parent.If node is the root, parent reference is null. If node is external, child references are null.Size is O(n), because there is a position for every node in the tree.

Additional informationRunning times

Method Timesize, isEmpty O(1)elements, positions O(n), n: nodes in treeswapElements, replaceElement O(1)root, parent O(1)children(v) O(cv), cv: children of nodeisInternal, isExternal, isRoot O(1)

2.4 Priority Queue and Heaps2.4.1 Priority Queue ADTA container of elements which gives a comparable key to an element, the moment the element is inserted into the container.Keys must follow these comparison rules, i.e. it must follow a total order relation:- Reflexive property: k ≤ k- Antisymmetric property: if k1 ≤ k2 and k2 ≤ k1, then k1 = k2

- Transitive property: if k1 ≤ k2 and k2 ≤ k3, then k1 ≤ k3

ADT:insertItem(k, e): Insert an element e with key k into Priority Queue.removeMin(): Return and remove element with the smallest key within PQ. minElement(): Return element with the smallest key within PQ. Error: empty PQminKey(): return the smallest key within PQ. Error: empty PQ

ComparatorAn algorithm/ADT which specifies in which way a key is compared. I.E. an object that compares keys.

ADT:isLess(a, b): True if and only if a is less than b.isLessOrEqualTo(a, b): True if and only if a is less than or equal to b.isEqualTo(a, b): True if and only if a and b are equal.isGreater(a, b): True if and only if a is greater than b.isGreaterOrEqualTo(a, b): True if and only if a is greater than or equal to b.isComparable(a): True if and only if a can be compared.

2.4.2 PQ-Sort, Selection-Sort and Insertion-SortA sorting problem is a problem in which a container C with n elements need to be sorted in increasing order, or at least non decreasing if there are ties. All elements should be comparable by a total order relationship.

PQ-Sort Selection-Sort & Insertion-Sort

A very simple algorithm which accepts a unsorted list, and works with an unsorted list using a Priority Queue to sort. Its output is a sorted list.

1. All elements are placed in a empty Priority Queue. Giving a key to each element. 2. All elements are extracted in non decreasing order by using removeMin(), putting them back

in C.

If this is implemented using a unsorted sequence, phase 1 takes O(n) and phase 2 takes O(n2)This is also known as a Selection-Sort, because selection and thus ordering is done in the second phase.If this is implemented using a sorted sequence, phase 1 takes O(n2) and phase 2 takes O(n)This is also known as a Insertion-Sort, because selection and thus ordering is done in the first phase.The difference is that SS always takes Ω(n2) and IS, in best circumstances takes O(n). (if the list is in reverse, sorted order)

Algorithm PQ-Sort(C, P): Input: n-element sequence C, and PQ P to compare element, using total-order relationship. Output: sequence C sorted by total order relationship

while C is not empty do //Phase 1 e ← C.removeFirst() // remove element from C P.insertItem(e, e) // Key is element itself while P is not empty do //Phase 2 e ← P.removeMin() // remove smallest from P C.insertLast(e) // Add element at end of C

2.4.3 Heap Data StructureA PQ data structure that is efficient for both insertion and removal. (insertion & selection-sort).It does this by storing all elements and keys at the internal nodes of a binary tree.Last node of the tree is the right-most, deepest node of T.

Heap-Order Property: The key stored at v is ≥ key stored at v’s parent. Minimum key is thus always at root.Complete Binary Tree: For efficiency reason, we want the lowest height possible. Every level must have the maximum number of nodes. Level i: 2i nodes All internal nodes must be left to any external nodes. I.E. will be visited before external nodes in a inorder traversal

Additional InformationHeap PQ implementation consists of the following: - Heap: complete binary tree. Implemented using a Vector.- Last: reference to the last node of T.- Comp: a comparator that defines a total order relation for the keys. It maintains the

minimum key at the root.A heap storing n keys has a height h = ⌈log(n+1)⌉Number of nodes within a heap is at least = 2h-1 and at most = 2h - 1First empty external node is at key = n+1. First key = 1 Last key = nUsually the insertion position, the position at which a new node is added, is key = n+1After insertion, the new node becomes the last node of the tree.

Up-Heap Bubbling (after insertion)Restores the Heap-Order Property. It checks if the parent of the new node has a higher key, and if so, swaps location with the parent. Will continue to do so until it is either at the root, or parent has a lower key. This process is called Up-Heap Bubbling. Because at maximum it will need to go to the root, it will need to go at most the height number of steps, thus O(log n) running time. n= tree height

Not a correct binary tree, but correct Up-Heap-Bubbling.

Down-Heap Bubbling (after removal)When removing a node using removeMin(), the last node in the tree is taken and set at the root.We then need to restore Heap-Order Property using Down-Heap Bubbling.D-H B checks if there exists a child of v that has a smaller key than v. And if so, swaps places with it.If both children have keys that are smaller, the child with the smallest key is swapped with v. It will continue swapping until there is no child of v that has a smaller key.Because at maximum it will need to go to the last node, it will need to go at most the depth of the last node. Thus O(log n) running time. n= depth of last node

Running Times

Method Timesize, isEmpty O(1)minElement, minKey O(1)insertItem O(log n)removeMin O(log n)

2.4.4 Heap-SortIf you implement the PQ sorting scheme with a heap, you get an algorithm known as heap-sort.

Not a correct binary tree, but correct Up-Heap-Bubbling.

Which has the following theorem: Heap-Sort sorts a sequence of n in O(n log n) time.Heap Sort animation

Heap-Sort In PlaceAn algorithm is said to be running in place if it only uses a constant amount of memory in addition to the memory required for the objects themselves.This requires that the sequence to be sorted is implemented as an array. We then use the array to store the heap, instead of using an external heap. The outline is as following:

1. Logically divide the array into a portion in the front that contains the growing heap and the rest that contains the elements of the array that have not yet been dealt with.

o Initially the heap part is empty and the not-yet-dealt-with part of the array is the entire array.

o At each insertion we remove the left most entry from the array part and insert it in the heap, growing the heap to include the memory previously used by the newly inserted element. The blue line moves down.

o At the end the heap uses all the space. We are making the optimization discussed before that we only store the internal nodes of the heap and do not leave the waste the first (index 0) component of the array used to store the heap.

2. Do the insertions with a normal heap-sort but change the comparison so that a maximum element is in the root (i.e., a parent is no smaller than a child).

3. Now do the removals from the heap, moving the blue line back up. o The elements removed are in order from big to small. o This is perfect since we are going to store them starting at

the right of the array since that is the portion of the array that is made available by the shrinking heap.

Bottom-Up Heap ConstructionHeap construction runs in O(n log n) time if n objects are added using insertItem().If all elements are given in advance, bottom-up construction can occur in O(n) time.This construction will construct a complete binary tree with height = log(n+1).It is called Bottom-Up Heap construction because the algorithm begins with the external nodes and runs up the tree.

http://www.cs.auckland.ac.nz/software/AlgAnim/heapsort.html

Algorithm BottomUpHeap(S):Input: A sequence S storing n keysOutput: a heap storing keys in SIf S is empty then return an empty heap // Consisting of 1 external nodeRemove first key, k from SSplit S into, S1 and S2, each of size (n-1)/2T1 ← BottomUpHeap(S1)T2 ← BottumUpHeap(S2)Create binary tree T with root r storing k, left-subtree T1 and right-subtree T2

Perform a down-heap bubbling from root r of T // Restore heap orderreturn T

Bottom and Top construction Test

Locator ADTIn our current setup (Heap Vector based implementation) we have:- A binary tree represented as a vector. List of cells, associated with a number which contain a element.You call the cell number, to retrieve the element.- Every element is associated with a comparable key, so that it may be sorted according to total order relationships using the key as comparison. If the element itself can be compared, then it can be the key. I.E. elements which are numbers.

The problem with our current implementation is that the element and the key do not know which position/cell they are in.To overcome this limitation, we implement another ADT, the locator ADT.The purpose of this ADT is to link key, element and location (cell or position) together.The locator “attaches” itself to the element, and therefore the key, and is constantly updated with a reference to the elements’ cell / position when the element changes cell / position.

ADT:element(): Return the element associated with this locator.key(): Return the key associated with this locator.

Locater Based PQ Methods:Logically, we can then extend the methods of the PQ to make use of this functionality.

Priority Queue ADT Update:min(): Return the locator of the element with the smallest key.insert(k, e): Insert new item with element e and key k into PQ and return a locator referencing to the new item.remove(l): Remove from PQ the locator l.replaceElement(l,e) Replace element in locator l with e and return previous element.replaceKey(l,k) Replace key in locator l with k and return previous key.

Additional Information:Running times

http://euler.slu.edu/~goldwasser/demos/BuildHeap/

2.5 Dictionaries and Hash Tables2.5.1 The Unordered Dictionary ADTA dictionary is a ADT which stores elements and keys in pairs in objects called items.In general, dictionaries are allowed to store multiple elements under one key.

ADT:findElement(k): return element associated with key k. return NO_SUCH_KEY element if such a element does not exist.insertItem(k,e): Insert item with key k and element e into dictionary.removeElement(k): Remove item which has key k from dictionary and return. return NO_SUCH_KEY element if element does not exist.

Additional Information:The special element NO_SUCH_KEY is called a sentinel.An implementation of a dictionary with a unsorted sequence is often called a log file, or audit trail.It is used to store small amounts of information which are unlikely to change over time.This implementation is often called a unordered sequence implementation. Space usage is: Θ(n).Chapter 3 – Search Trees and Skip Lists

Operations UnsortedSequence

SortedSequence

Heap

size, isEmpty, key, replaceElement O(1) O(1) O(1)minElement, min, minKey O(n) O(1) O(1)insertItem, insert O(1) O(n) O(log n)removeMin O(n) O(1) O(log n)remove O(1) O(1) O(log n)replaceKey O(1) O(n) O(log n)

3.1 – Ordered Dictionaries and Binary Search TreesDictionary: Searchable collection of key-element items. For example, an address book.

Operations are as follows (As defined in section 2.5.1):findElement(k): if the dictionary has an item with key k, returns its element, else, returns the

special element NO_SUCH_KEY insertItem(k, o): inserts item (k, o) into the dictionaryremoveElement(k): if the dictionary has an item with key k, removes it from the dictionary

and returns its element, else returns the special element NO_SUCH_KEY

New operations are:closestKeyBefore(k): Return the key of the item with the largest key less than or equal to k.closestElemBefore(k): Return the element for the item with largest key less than or equal to

k.closestKeyAfter(k): Return the key of the item with smallest key greater than or equal to k.closestElemAfter(k): Return the element for the item with smallest key greater than or equal

to k.Each of these methods returns the special NO_SUCH_KEY object if no item in the dictionary is present.

3.1.1 Sorted TablesA lookup table is a ordered dictionary implemented with a sorted sequencestore the items of the dictionary in an array-based sequence, sorted by key.It is one way of implementing a dictionary.

Performance:findElement O(log n) (using binary search)insertItem O(n) (shifts)removeElement O(n) (shifts)

3.1.2 Binary Search TreeAs we use a lookup table, which is array-based, we can use binary search tree for the searching algorithm.

Below is the pseudo-code of a Binary Search Algorithm:Algorithm BinarySearch(S, k, low, high):if low > high then

return NO_SUCH_KEYelse

mid ← [ (low + high) / 2]if k = key (mid) then

return elem(mid)else if k < key(mid) then

return BinarySearch(S, k, low, mid – 1)else

return BinarySearch(S, k, mid + 1, high)Method Log File Lookup Table

findElement O(n) O(log n)

insertItem O(1) O(n)

removeElement O(n) O(n)

closestKeyBefore O(n) O(log n)

Denoted the number of items in the dictionary at the time a method is executed with n.Comparison of Log File and Lookup Table, when using an ordered dictionary.

3.1.3 Searching in a Binary Search Tree

T = treek = search keyv = node

To search for a key k:Algorithm findElement(k, v)if T.isExternal (v)

return NO_SUCH_KEYif k < key(v)

return findElement(k, T.leftChild(v))else if k = key(v)

return element(v)else if k > key(v)

return findElement(k, T.rightChild(v))

3.1.4 Insertion in a Binary Search Tree

To perform operation insertItem(k, o) : 1) Search for key k.2) If k is not already in the tree, let w be the leaf reached by the search.3) Insert k at node w and expand w into an internal node.

3.1.5 Removal in a Binary Search Tree

To perform operation removeElement(k) : 1) Search for key k.2) If k is in the tree, let v be node storing k.3) If v has a leaf child w, remove v and w.4) If v has no leaf child, same as children of v are both internal, find the internal node

w that follows v (inorder), copy the key(w) into v, and remove remove w and the left child z (This child is a leaf).

3.1.6 Performance in a Binary Search Tree

A binary search tree with a certain height for a certain number of key-element items uses O(n) space and executes the dictionary ADT operations with following running times:

h = height of tree.n = number of items.s = size of the iterators returned.

Method Time

Size, isEmpty O(1)

findElement, insertItem, removeElement O(h)

findAllElements, removeAllElements O(h + s)

3.2 – AVL TreesAn AVL Tree is a self-balancing binary search tree. The reason for this tree is that we want to achieve logarithmic time for all the fundamental dictionary operations.

AVL Trees follow the height-balance property: for every internal node v of T, the heights of the children of v can differ by at most 1.

A subtree of an AVL tree is an AVL tree itself. The height of an AVL tree T storing n items is O(log n).

3.2.1 Update Operations

InsertionProcedure is in principle the same as the insertItem operator in a binary tree.However, after the insertion, the tree may become unbalanced. Hence we need to apply Trinode Restructuring (Explained below).

RemovalProcedure is in principle the same as the removeElement operator in a binary tree.However, after the removal, the tree may become unbalanced. Hence we need to apply Trinode Restructuring (Explained below).

Trinode Restructuring

Algorithm trinodeRestructuring1) Let (a,b,c) be a left-to-right (inorder) listing of the nodes x, y, and z, and let (T0, T1, T2,

T3) be a left-to-right (inorder) listing of the four subtrees of x, y and z not rooted at x, y, or z.

2) Replace the subtree rooted at z with a new subtree rooted at b.3) Let a be the left child of b and let T0 and T1 be the left and right subtrees of a.4) Let c be the right child of b and let T2 and T3 by the left and right subtrees of c.Might want to look at Figure 3.14 (page 154) for an example.

3.2.2 PerformanceMethod Time

Single Restructure (using linked-structure binary tree) O(1)

Find- Height of tree (no restructures needed)

O(log n)

Insert- Initial Find- 1 Restructure

O(log n)

Remove- Initial Find- Restructuring up the tree, maintaining heights

O(log n)

3.4 – Splay TreesA splay tree is a self-balancing binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O(log(n)) amortized time. For many non-uniform sequences of operations, splay trees perform better than other search trees, even when the specific pattern of the sequence is unknown. It is conceptually different from AVL trees, as it does not have any explicit rules to enforce it's balance.

Two things to remember:- Tree might get more unbalanced- Splaying costs O(h), where h is height of the tree – which is still O(n) worst-caseO(h) rotations, each of which is O(1)

3.4.1 Splaying

Each particular step depends on three factors:- Whether x is the left or right child of its parent node, p, - Whether p is the root or not, and if not - Whether p is the left or right child of its parent, g (the grandparent of x).

Zig Step:

This step is done when p is the root. The tree is rotated on the edge between x and p. Zig steps exist to deal with the parity issue and will be done only as the last step in a splay operation and only when x has odd depth at the beginning of the operation.

Zig-zig Step:

This step is done when p is not the root and x and p are either both right children or are both left children. The picture below shows the case where x and p are both left children. The tree is rotated on the edge joining p with its parent g, then rotated on the edge joining x with p.

Zig-zag Step:

This step is done when p is not the root and x is a right child and p is a left child or vice versa. The tree is rotated on the edge between x and p, then rotated on the edge between x and its new parent g.

3.4.2 Amortized Analysis of Splaying

Amortization is worst-case analysis on all possible series of operations. The "amortized running time" of the operations is the average worst-case running time of the operations in the seriesAmortization gives "average case" analysis, without using probabilities

It is done in an accounting way, by assigning “cyber euros” to operations. The main conclusions are listed below. For in depth prove, look at lecture slide 5a.

- Cost of insertion and deletion is also O(log n).- Cost of a series of m operations on a splay tree is O(m log n).- Thus, amortized cost of any splay operation is O(log n).- When items are accessed often, the amortized cost can decrease to O(1) (Theorem 3.11).

Chapter 4

4.1 Merge-Sort4.1.1 Divide-and-ConquerMerge Sort is based on Divide and conquer.The 3 steps of divide and conquer:

Divide: if number of objects > treshold => divide. (if n = 0 or 1, return object immediately)Recur: Recursively solve the subproblems.Conquer: ”Merge” the sub-solutions into a solution to the original problem.

Ceiling: ⌈x⌉ (smallest int n)

Floor: ⌊x⌋ (largest int k)

Theorem: The merge-sort tree associated with an execution of merge-sort on a sequence of size n has height ⌈log n⌉.

Merge Two sorter sequences,S1 and S2, merged by iteratively removing a smallest element from one of these two and adding it to the end of the output sequence, S, until one of these two sequences is empty, at which point we copy the remainder of the other sequence to the output sequence. (fig p 222)

Running time Running time per level = O(n) (the divide part and the conquer part are linear)

Running time per level * Number of levels = total running time. O(n) * O(log n) = O(n log n)

4.1.2 Merge-Sort and Recurrence EquationsAnother way to find the running time of the merge-sort algorithm, you can find it at page 224 (I can't write it shorter).

4.2 The Set Abstract Data TypeHere we introduce the ADT set. A set is a container of distinct objects. That is, there are no duplicate elements in a set, and there is no order.

Sets and some of their usesFirst we recall: Union: A⋃B = {x:x∈ A or x ∈ B},

Intersection: A⋂B = { x:x∈ A and x ∈ B},Subtractions: A–B = { x:x∈ A and x ∉ B}.

These calculations are used if you for example enter 2 query words, then the intersection has to be computed.

ADT:union(B): Replace A with (←) A⋃B.intersection(B): Replace A with (←) A⋂B.subtract(B): Replace A with (←) A–B.

4.2.1 A Simple Set ImplementationA generic version of the merge algorithm takes two sequences representing the input set, and constructs a sequence representing the output set, be it the union, intersection, or subtraction of the output sets.

The generic algorithm iteratively examines and compares the current elements a and b of the input sequences (A and B) and finds out whether a < b, a = b, a > b.if: a < b: a goes to the output sequence, and the next element is evaluated

a = b: a goes to the output sequence, and the next element is evaluateda > b: b goes to the output sequence, and the next element is evaluated

Running TimesAt each iteration: -Compare 2 items of two input sets (A and B) - O(1)

-Possibly copy an element to the output sequence - O(1)

-Advance to the next element=> O(na + nb) = O(n)

Theorem: The set ADT can be implemented with an ordered sequence and a generic merge scheme that supports operations union, intersection and subtraction in O(n) time, where n denotes the sum of sizes of the sets involved.

4.3 Quick SortThree steps: -Divide: if S is larger then 1, take a specific element of S, (in practice we take the last element) which we call pivot (x).

Make three subsequences:-L, all elements < than x,-E, all elements = to x,-G, all elements > than x.

-Recur: Recursively sort L and G.-Conquer: Merge L, E, G together.

Like the merge-sort we can make a binary tree. But unlike merge-sort, the tree height can be linear (worst case). This happens when the tree is already sorted. (x will be the biggest number then)

Running time- At each level, all elements have to be compared - O(n)- The height of the tree = n (in worst case) - O(n)=> (O(n) * O(n) = O(n²)

In the best case we get a merge-sort like tree. This means that L and G are equal, which results in a tree height of log n which then again results in O(n log n).

4.3.1 Randomized Quick-SortInstead of taking the last element of the sequence, we take some random number's average. Now, using a probability theory, the average pivot taken will be the average one of the whole sequence.This means again that the height will be O(log n) which again results in O(n log n).

4.4 A Lower Bound on Comparison-Based SortingTheorem: The running time of any comparison based algorithm for sorting an n-element sequence is Ω(n log n) in the worst case.

The Running Time of a comparison based algorithm must be equal or greater than the height of the tree.The height cannot be smaller than log n because you have to compare every element at least once.

4.5 Bucket-Sort and Radix-SortThese algorithms are faster than O(n log n) but they require special assumptions about the input sequence to be sorted. Even so, such scenarios often arise in practice.In this section we consider the problem of sorting a sequence of items, each a key-element pair.

4.5.1 Bucket-SortThe special assumption is that each element has a key, these keys have a range [0, N-1].So we have sequence S with integer key's [0, N-1].Now we create a second array, say B (bucket) which has a size of N.We then place all the elements from S into B, but we place them at B[key] = element.Now we take all elements one by one from B to place them in S. (this is necessary because a list doesn't always has as much elements as the largest key says).

Stable sortingSuppose you have 2 items with the same key; they will have a specific order in the original array.Stable sorting means that they will have the same order after sorting.(And after each subsequent sorting. Elements do not move around if you sort the same sequence twice.)

4.5.1 Radix-SortRadix-sort is used for items with 2 keys.

Example S = ((3,3),(1,5),(2,5)...)

Radix-sort is actually just bucket sort, but you do it twice. (two key's)In a (k1, l1) < (k2, l2) situation, total-order relation is defined as such: - k1 < k2 or- k1 = k2 and l1 < l2

The order is important, suppose you want a lexicographical* ordered list, and you first do the first one, and then the second one you would get a wrong order. (examples at page 243). *) lexicographical = dictionary

Running time O(d(n+N))

4.6 Comparison of Sorting Algorithms-Insertion-Sort: If implemented well, running time of O(n+k) (k = number of inversions)

Good for small sequences (less than 50). Also quite effective for almost ordered sequences.But the O(n²) makes it a poor choice

-Merge-Sort: running time: O(n log n) in worst case (optimal for comparison based algorithms)

good for large sequences because you can store parts on different places (if main memory is too small).

-Quick-Sort: Good choice if it fits on main memory, but the running time of O(n²) makes it less attractive in real time applications where we must make guarantees on the time needed.

-Heap-Sort: So if your memory is big enough, and you need to finish on time, heap sort is an excellent choice. It has a running time of O(n log n) and it can easily be made to execute in-place.

-Bucket-Sort or Radix-Sort: Excellent choice for it runs in O(d(n + N)), where [0, N-1] is the range of the key's (and d = 1 for bucket sort). Thus d(n + N) is “below” n log n (formally it is equal), then this algorithm should run faster than even quick-sort or heap-sort.

Chapter 5: Fundamental Techniques

5.2 Divide-and-Conquer

Divide-and-Conquer is a technique that involves solving a problem by dividing it into smaller subproblems, solve each subproblem and merge these solutions into one solution.

5.2.1 Divide-and-Conquer Recurrence EquationsWith a recurrence equation we determine the run-time of an algorithm, with variable the input of size n. The problems is that in recurrence equation the original function T is still in the right hand side of the equation and we want this only dependent of n. We call this the closed form equation. There are some general ways for solving such an equation in divide-and-conquer algorithms.

Iterative substitution: Here you try to substitute the function T a couple of times and hope that we will see a pattern so it can be translated into a closed form equation.

Recursion tree: This technique is almost the same as iterative substitution. The only difference is that the recursion tree is more visual, while the iterative sub. is more algebraic. In this method you draw a tree, with a node for each substitution. In addition, every node has a overhead. This corresponds to the running time of the merging of all children of the node.

Guess-and-Test: In this method you make a guess of what the closed form could be and then try to justify that guess by induction ( An example at page 266 of the book).

Master method: This method contains rules for what you should do in some cases. This cannot be summarized. If you want to study this method go to page 268 of the book.

The next two subsections are applications of the master method. They cannot be summarized, because when you want to understand this you have to read the whole text.

5.3 Dynamic Programming

In the book is stated that dynamic programming cannot be explained in a few sentences and they give an example. This will be shortly explained.

5.3.1 Matrix Chain-ProductThe matrix chain-product problem is to find a way of defining product A in such a way that you reduce multiplications. One way to do this is just to try every different product and count the number of multiplications. Of course we want to improve this performance and start with defining subproblems. For example you can first find it out for every pair how many multiplications are needed. Another observation is that every subproblem has a optimal solution, this is called subproblem optimality condition. We can’t divide the problem into more subproblems, because there is a sharing of subproblems. This is why we use dynamic programming instead of divide-and-conquer.

5.3.2 The General TechniqueDynamic programming is most of the time used for optimization problems. Often the number of ways of solving this problem is exponential, so brute-force isn’t possible. When we do dynamic programming there are three components which have to be taken into account:

Simple Subproblems: If all subproblems have the same structure and there is a simple way to define them

Subproblem Optimally: The subproblems have to optimized to optimize the global problem. The global solution shouldn’t contain any suboptimal subproblems.

Subproblem Overlap: Optimal solutions to unrelated subproblems can contain subproblems in common. Such overlap improves the quality of the efficiency of a d.p. algorithm that stores solutions to subproblems.

This subsection does contain another example of d.p. , the knapsack problem.

Chapter 6: Graphs

6.1 The Graph ADTA graph is a way to represent connections between objects. In the vertices(nodes) the objects are stored and the connections are represented by edges(arcs). Edges are either directed or undirected. Directed edges can only be traveled in one way, undirected in both. A graph with only directed edges is called directed graph, with only undirected edges is called a undirected graph and with both is called a mixed graph. Two vertices at the end of an edge are called end vertices(or endpoints). If an edge is directed then start point is called origin and endpoint destination. If two vertices are the endpoint of the same edge then they are adjacent. If an vertex is endpoint then it is incident to that edge. If an vertex has his start point at a vertex then that edge is called to outgoing edge of that vertex and incoming edge if it is the other way around. The degree of a vertex is the number of incident edges. In-degree and out-degree are the number of incoming en outgoing edges. A group of edges of a graph is called a collection. When there is more than 1 edge with the same vertices as endpoints then this edges are parallel(or multiple). A self-loop is when a loop is created between two vertices. Graphs who don’t have the last two properties are said to be simple. When you travel from one vertex to another one, your visited edges and vertices is the path. A cycle is a path that has the same start- and endpoint. When a path or cycle is distinct we call it simple. If all edges in a path of cycle are directed then it is called directed path/cycle. A subgraph is a graph whose vertices and edges are a subset of another graph. A spanning subgraph uses all vertices of the other graph. When there is a path between any two vertices then the graph is connected. If a graph isn’t connected, the connected components are the maximal connected subgrahs. A forest Is a graph without cycles. A tree is a connected forest. A spanning tree is a spanning subgraph for trees.

6.1.1 Graph MethodsNotation: Graph G; Vertices v, w; Edge e; Object o

General methods:- numVertices() Return the number of vertices of G.- numEdges() Return the number of edges of G.- vertices() Return an enumeration of the vertices of G.- edges() Return an enumeration of the edges of G.- avertex() Return a vertex of G.- directedEdges() Return an enumeration of all directed edges in G.- undirectedEdges() Return an enumeration of all undirected edges in G.- incidentEdges(v) Return an enumeration of all edges incident on v.- inIncidentEdges(v) Return an enumeration of all the incoming edges to v.- outIncidentEdges(v) Return an enumeration of all the outgoing edges from v- opposite(v, e) Return an endpoint of e distinct from v- degree(v) Return the degree of v.- inDegree(v) Return the in-degree of v.- outDegree(v) Return the out-degree of v.- adjacentVertices(v) Return an enumeration of the vertices adjacent to v.- inAdjacentVertices(v) Return an enumeration of the vertices adjacent to v along incoming edges.- outAdjacentVertices(v) Return an enumeration of the vertices adjacent to v along outgoing edges.- areAdjacent(v,w) Return whether vertices v and w are adjacent- endVertices(e) Return an array of size 2 storing the end vertices of e.- origin(e) Return the end vertex from which e leaves.- destination(e) Return the end vertex at which e arrives.- isDirected(e) Return true iff e is directed.

Update Methods:- makeUndirected(e) Set e to be an undirected edge.- reverseDirection(e) Switch the origin and destination vertices of e.- setDirectionFrom(e, v) Sets the direction of e away from v, one of its end vertices.- setDirectionTo(e, v) Sets the direction of e toward v, one of its end vertices.- insertEdge(v, w, o) Insert and return an undirected edge between v and w, storing o at this position.- insertDirectedEdge(v, w, o) Insert and return a directed edge between v and w, storing o at this position.- insertVertex(o) Insert and return a new (isolated) vertex storing o at this position.- removeEdge(e) Remove edge e.

6.2 Data Structures for GraphsThree most popular ways to realize a graph ADT.

6.2.1 The edge List StructureTwo different objects:

Vertex objects:

o A reference to object stored

o Counters for the number incident undirected edges, incoming and outgoing directed edges

o A reference to the position of the vertex-object in container V

Edge objects:

o A reference to object stored

o A Boolean whether it is directed or undirected

o Reference to the vertex objects in V to endpoints(or origin and destination)

o A reference to the position of the edge-object in container E.

The edge list is a very simple implementation, but not very efficient. It looks at edge – vertex only from point of view of the edges.

6.2.2 The Adjacency List Structure

Vertex objects

o All mentioned variables in edge list

o Incidence container: stores references to the edges incident of vertex

Edge objects:


o A reference to the positions of the edge-object in incidence container

Also relative simple implementation. More efficient then edge-list, because looks from point of view edges en vertices.

6.2.3 The Adjacency Matrix Structure

Extends edge list structure with a matrix(2-dimensional array), which allows to determine adjacencies between pairs of vertices in constant time, but uses more space.

Vertex objects


o Distinct integer, called index

Edge objects:


o Have 2-dimensional array A in such a way that if A[i,j], where I and j are index of vertex, exist there is an edge between them. If the edge is undirected, then A[j,i] is stored too. If no edge A[i,j] is null.

6.3 Graph traversal6.3.1 Depth-First search

Done by backtracking. Edges that are not visited yet are called tree edges(or discovery edges), already visited edges are called back edges(or cross edges). The tree edges form a spanning tree, called DFS tree.

BFS is better in finding shortest paths.

6.3.2 Biconnected componentsA separation edge/vertex is an edge whose removal disconnects a graph. When for any pair of vertices are two disjoints paths, then the graph is biconnected. A biconnected graph satisfies one of the following properties:

A subgraph is biconnected and adding this component means it isn’t anymore.

A single edge of G consisting of a separation edge and its endpoints.

6.3.3 Breadth-First searchBFS subdivides the vertices into levels. BFS is better in solving difficult connectivity problems

6.4 Directed GraphA digraph is another word for directed graph. The reachability of a graph is where when can get to. A vertex is reachable is reachable if there is a directed path from one vertex to another one. If, any to vectors are reachable, then the graph is strongly connected. When a graph has the same start and endpoint in a digraph, then this is a directed cycle. If a graph doesn’t have any directed cycle, then it is acyclic. The transitive closure of a digraph G is the digraph G* such that the vertices of G* are the same as the vertices of G, and G* has an edge (u,v), whenever G has a directed path from u to v. That is, we define G* by starting with the digraph G and adding in an extra edge (u,v) for each u and v, such that v is reachable from u.

6.4.1 Traversing a DigraphWe distinguish 3 kinds of edges:

Back edges: which connect a vertex to the ancestor in the DFS tree

Forward edges: which connect a vertex to a descendent in the DFS tree(not in BFS)

Cross edges: which connect a vertex to a vertex that is neither its ancestors nor its descendent

6.4.2 Transitive closureAn algorithm for finding the transitive closure can be found be using dynamic programming. It can be divided in smaller subproblems, like for every vertex, check if there is a edge for a certain vertex. If there is ok, else make an edge between them.

6.4.3 DFS and Garbage collectionOnce in a while the JVM checks if there is enough space left in the memory heap. If there isn’t the garbage collector start s electing the space used for dead objects. You can do this with a mark-sweep algorithm. In this case the memory heap is viewed is a digraph and uses DFS to find the objects that still live(the object are vertices and the reference are edges) and marks them(mark phase). After that the objects that are not marked will be deleted(sweep phase).

6.4.4 Directed Acyclic graphsTopological ordering is an ordering of vertices such that every edge(vi, vj), i < j.

Chapter 9: Text Processing

9.1 Strings and Pattern Matching Algorithms

9.1.1 String OperationsA substring is a part of a string. A proper substring is when you want to rule out that a string is a substring of itself. An empty string is called a null string. When a substring contains the first part of the string then it is called the prefix and the last part of the string is called the suffix.

9.1.2 Brute Force Pattern MatchingThe brute-force pattern matching just test all possible placements of the pattern. It is very simple and runs with a double loop, so O(n2).

9.1.3 The Boyer-Moore AlgorithmIf we want to improve the brute-force algorithm, we can do that by two time-saving heuristics:

Looking-Glass Heuristic: We begin at the back of the pattern with comparing.

Character-Jump Heuristic: If a mismatch occurs, with a character c, then the pattern is shifted until the next occurrence of c. If c is not in the pattern the pattern is shifted past c.

9.1.4 The Knuth-Morris-Pratt Algortithm(KMP)The KMP algorithm works with a failure function. The main idea is that this function pre-examines the pattern. This function calculates the next position for the pattern to shift to. How this function exactly works isn’t clearly explained in the text.

9.2 TriesA try is a tree-based data structure for pattern matching and prefix matching. The main idea is that that for a given pattern P, the whole tree is searched looking for a string with prefix P.

9.2.1 Standard TriesA standard try has the following properties:

Each node, except the root, contains a character The ordering of the children is determined

by a canonical ordering of the alphabet An external node is the last letter of the

word. The shortest path to the rootwill be the string

9.2.2 Compressed Tries ↓A compressed try is only advantageous of an auxiliarystructure is used. Then the word is represented in an array and references to the words are stored in the nodes.

Here you see an example of such an auxiliary index.The first number in the node represents the array were it refers to, the second and third are for the

substring in that array

9.2.3 Suffix TriesA suffix try represent all suffixes of a string. Here an example.

Suffix try

Compact representation:

9.2.4 Search enginesA Web crawler is the program that gathers the information from web pages. Search engines make it possible to retrieve that information. An inverted file stores all information of the search engines in a dictionary. The information is stored in pairs, one with the key word and one reference to the web pages containing this word. Key words are called index terms and references to the web pages are called occurrence lists. Of course, a basic task of search engines is also to rank the results, but it is still a major challenge for companies to do this fast and accurate.

Chapter 12: Computational Geometry

12.1 Range TreesA range-search query is a quest to retrieve all points in a multi-dimensional collection whose coordinates fall within given ranges. To keep it simple we talk about 2-dimensional range-search queries. They have a method findAllInRange(x1, x2, y1, y2), that returns all the elements in the range of the coordinates. This is called the reporting version. There is also a counting version of that query, which counts the number of elements.

12.1.1 One-Dimensional Range SearchingThis is done, like explained earlier, with the findAllInRange method. Only there is one coordinate given in the method. Then recur through the range tree. There are 3 possibilties:

Key(v) < k1: recur to the right child of v

K1 < key(v) < k2: report element en recur to both children.

Key(v) < k2: recur to left child

In the search we recognize 3 kind of nodes:

Boundary nodes: A node belongs to the paths of searching, but do not belong to the interval.

Inside nodes: All nodes inside the interval.

Outside node: A node belongs to the left child of P1 or the right child of P2.

12.1.2 Two-dimensional Range SearchingA two-dimensional tree consists of a primary structure, which is a tree, and a auxiliary structure. The primary structure represents the x-coordinate. Every node stores:

An item, which consists of coordinates and an element.

A one-dimensional tree that has the same items, but uses the y-coordinate as keys.

12.3 Quadtrees and k-D Trees12.3.1 QuadTreesA main application for quadtrees is a set points in a picture according to an image.Dividing the square is called a split. A quadtree is defined by recursively doing splits.

12.3.2 k-D TreesThe difference between a k-D tree and a quadtree is that in a k-D tree a split operation is only done with a single line perpendicular to the axis and with a quadtree you can draw more lines a one split. There are two kinds of k-D trees, region-based and point-based. Region-based is the variant on quadtrees, while point-based perform splits based on distribution of the points.

Documents

MSV Incognito€¦ · Web view3.1.4 Insertion in a Binary Search Tree. To perform operation insertItem(k, o): 1) Search for key k. 2) If k is not already in the tree, let w be the