Upload
edith-atkins
View
218
Download
0
Embed Size (px)
Citation preview
Applications• Coding
– Huffman, prefix
• Parsing/Compiling
– tree is standard internal representation for code
• Information Storage/Retrieval
– binary trees, AA-trees, AVL, Red-Black, Splay
• Game-Playing (Scenario analysis)
– virtual trees
– alpha-beta search
• Decision Trees
– representation of choices
– automatically constructed from data
General Trees
• Tree Definition
– distinguished root node
– all other node’s have unique, sole parent
• Depth of a node:
– number of edges from root to node
• Height of a node:
– number of edges from node to deepest descendant
• Balanced:
– Goal: O(log n) insert/delete/find
– height of any sons of any node differs by less than 1 (k)
• K-arity:
– nodes have at most k sons
Depth of a Node
0
1 1
2 2
1
2
Often convenient to add another field to node structure for additional information such as: depth, height, visited, cost, father, number of visits, number of nodes below, etc.
Simple Relationships
• Leaf <=> height is 0
• Height of a node is 1+maximum height of sons
• Root <=> depth is 0
• Depth of a node is 1+ depth of father
• These can be computed recursively.
Three Tree Representations
• List: (variable number of children)
– son representation
– Object value;
– NodeList children;
• Sibling: (variable number of children)
– Sibling representation
– Object value;
– Node child; // the leftmost child
– Node sibling; // each node points
• Array (k is bound on number of children)
– Object value;
– Node[k] children;
Depth of node (list rep)
• Recall depth(node) is number of links form node to root.
• Idea:
– depth of sons is 1+ depth of father
• call depth(root, 0)
Define depth(node n,int d)
mark depth at node n = d
for each son of n, call depth(son,d+1) (use iterator)
• Marking can be done in two ways:
– have an addition field (int depth) for each node
– have an array int depth[number of nodes]
Depth of node (sibling rep)
• Compute the depth of a node
• Recall depth(node) is number of links form node to root.
• Idea:
– depth of left son is 1+ depth of father
– depth of siblings is same as depth of father
• Call depth(root, 0)
• Define depth(node n, int d)
mark depth at node n as d
call depth(n.leftson,d+1)
call depth(n.sibling, d)
Height of Node
• List representation:
– if node is leaf, height = 0;
– else height = 1 +max(height of sons)
• Sibling representation
– if node is leaf, height = 0;
– else height = max (1 + height of leftson, max of heights of siblings)
Virtual Trees
• Trees are often conceptual objects, but take too much room to store. Store only what is needed.
• Representation:
• Node:
– object value
– Node nextSon(): returns null if no more sons, else returns the next son
• In this representation you generate son’s on the fly
• E.G. in game playing typically only store depth of tree nodes.
Standard Operations
• Copying
• Traversals
– preorder, inorder, postorder, level-order
– illustrated with printing, but any processing ok
• Find (Object o)
• Insertion(Object o)
• Deletion(Object o)
• Complexity of these operations varies with constraints / structure of tree that must be preserved.
Binary Trees• Object Representation: node has
– Object value;
– Node left, right;
• Array Representation
– use Object[]
– requires you know size of tree, or use growable arrays
– no pointer overhead
– Trick: if node is stored at i, then
• left son stored at 2*i
• right son stored at 2*i+1
– root stored at 1
– father of node i is at i/2
• Generalizes to k-ary trees naturally.
Binary Search Trees
• Left < Right
– i.e. any descendant of a node in left is less than any descendant of a node in right.
• Operations: let d be depth of tree
– object find(key k)
• sometimes key and object are the same
– insert(object o) or insert(key k, object o)
– Object findMin()
– removeMin()
– removeElement(object o)
– Cost: all O(d) via separate and conquer
Removing elements is tricky
• How would you remove value at root?
• Plan for remove(object o)
1. Find o, i.e. let n be node in tree with value o
2. Keep a ptr to the father of n
3. If ( n.right == null) ptr.son = n.left // not code
4. Else
a. find min in n.right
b. remove min from n.right
c. ptr.son = new node(min, n.left, n.right)
Assumes appropriate constructor.
Make pictures of the cases.
Support routines
• BinaryNode findMin(BinaryNode n)
• Recursively
– if (n.left == null) return n
– else return left.findMin()
– O(d) Time and Space
• BinaryNode findMin(BinaryNode n)
• Iteratively
– while ( n.left !=null) n= n.left
– return n
– O(d) Time, O(1) space
Remove Min
• removeMin(BinaryNode n): idea
– Node n’ = n.findMin()
– father(n’).right = n.right
– // idea ok, code not right
– What if minimum is root?
• BinaryNode removeMin(BinaryNode n)
– if (n.left != null)
• n.left = removeMin(n.left)
– else
• n = n.right
– return n
removeNode
• BinaryNode removeNode(BinaryNode x, BinaryNode n) // remove x from n
if (x<n) n.left=removeNode(x, n.left)
else if (x>n) n.right=removeNode(x, n.right)
// Now x = n
else if (n.left != null & n.right !=null)
n.data = findMin(n.right).data
n.right =removeMin(n.right)
else (// left or right is empty)
n = (n.left != null) ? N.left : n.right;
return n
Find a node (three meanings)
• Search tree:
– given a node id, find id in tree.
• Search tree:
– find a node with a specific property, e.g.
– kth largest element (Order Statistic)
– Separate and conquer answers in log(n) time
• Arbitrary tree
– find a node with a specific property
– E.g. node is a position in game tree, find win
– E.g. node is particular tour, find node(tour) with least cost
Separate and Conquer
• Finding the kth smallest (Case Analysis)
• Where can it be?
i nodes N-i-1nodes
If at root, left subtree has k-1 nodes.If (i<k) then search for k-I-1 in right subtreeIf (i>k) then search for kth in right subtree.Complexity: depth of tree (log (n))
Analysis Definitions
• Problem: what is average time to find or insert an element
• Definitions follow from problem
• Internal path length of Binary tree (IPL)
– sum of depth of nodes = ipl
– average cost of successful search = average depth+1 cost = number of nodes you look at
• External path length of Binary tree (EPL)
– sum of cost of accessing all N+1 null references = epl
– average cost of insertion or failed search = epl/(N+1)
Example of IPL and EXP0
1 12 2
IPL = 1+1+2+2 = 6
EPL = 2+2+3+3+3+3 = 16 = IPL+2*5 = IPL+2N
Null reference
What happens if you remove a leaf?
Picture Proofof IPL related to IPL of subtrees
N node tree
I nodesubtree
N-I-1 node subtree
Each node (n-1 of them) had its path length reduce by 1
Some Theorems
• Average internal path length of binary search tree is 1.38NlogN
• Proof that it is O(n*log n)
– Let D(N) = average ipl for tree with N nodes
– D(0)=D(1) = 0.
– D(i) = average over all splits of tree (draw picture)
– D(i) = (left split) 1/N (D(0)+….D(N-1)) + N-1 +
(right split) 1/N(…..)
= same as quicksort analysis (to be done)
– O(NlogN)
• Why does EPL = IPL+2N (induction)
Analysis Goal: f(n) in terms of f(n-1)then expand
• 2/n( D(0)+…+D(n-1)) + n = D(n)
• 2*(D(0) + …+ D(n-1))+ n^2 = n*D(n)
– Goal compare with previous, subtract and hope
• 2*(D(0)+…+D(n-2)) + (n-1)^2 = (n-1)*D(n-1)
• 2*D(n-1) +2n-1 = n*D(n) - (n-1)*D(n-1)
• n*D(n) =(n+1)*D(n-1) +2n
• D(n)/(n+1) = D(n-1)/n + 2/(n+1) EUREKA! Expand.
• Hence: D(n)/(n+1) = 2/(n+1)+ 2/n +….+2/1
= 2*(harmonic series) is O(log n)
• Conclusion: D(n) is O(n*log(n))
1/1+1/2+…1/n is O(log n)
• General Trick: sum approximates integral and vice versa
• Area under function 1/x is given by log(x).
1 2 3 4
Balanced Trees• Depth of tree controls amount of work for many
operations, so….
• Goal: keep depth small
– what does that mean?
– What can be achieved?
– What needs to be achieved?
• AVL: 1962 - very balanced
• Btrees: 1972 (reduce disk accesses)
• Red-Black: 1978
• AA: 1993, a little faster now
• Splay trees: probabilistically balanced (on finds)
• All use rotations
AVL Tree
• Recall height of empty tree = -1
• In AVL tree, For all nodes, height of left and right subtrees differ by at most 1.
• AVL trees have logarithmic height
• Fibonacci numbers: F[1]=1; F[2]= 1; F[3]=2; F[4]=3;
• Induction Strikes: Thm: S[h] >= F[h+3]-1
Let S[i] = size of smallest AVL tree of height i
S[0] = 1; S[1]=2; why?
So S[1] >= F[4]-1
S[h]=S[h-1]+S[h-2]+1 >=F[h+2]-1+F[h+1]-1+1
= F[h+3]-1.
• Hence number of nodes grows exponential with height.
Insertion
• After insertion, there are 4 ways tree can be unbalanced. Check it out.
• Outside unbalanced: handled by single rotations
• Inside unbalanced: handled by double rotations.
2
1
a b
c
2
1
r
p
q
Maintaining Balance
• Rebalancing: single and double rotations
• Left rotation: after insertion
2
1
1
a b
c
2
a b c
Another View2
1
a b
c Left
12a
b c
1
2a
b c
21
a b c
Notice what happens to heights, (LEFT) in general:
a goes up 1, b stays the same, c goes down 1
Right
Single (left) rotation
• Switches parent and child
• In diagram: static node leftRotate(node 2)
1 = 2.left
2.left = 1.right
1.right = 2
return 1
• Appropriate test question
– do it, i.e. given sequence of such as 6, 2, 7,1, -1 etc show the succession on trees after inserts, rotations.
• Similar for right rotation
Double Rotation Code (left-right)
• Idea: rotate left child with its right child
• Then node with new left child
• static BinaryNode doubleLeft( BinaryNode n)
n.left = rotateRight(n.left);
return rotateLeft(n)
• Analogous code for other middle case
• All rotations are O(1) operations
• Out-of-balance checked after insertion and after deletions. All O(1).
• For AVL, d is O(logN) so all operations O(logN).
Red-Black Trees
• Every node red or black
• Root is black
• If node red, children black
• Every path from node to null has same number of black nodes
• Implementation used in Swing library (JDK1.2) for search trees.
• Single top-down pass means faster than AVL
• Depth typically same as for AVL trees.
• Code has many cases - skipping
• Red-black trees are what you get via TreeSet()
• And you can set/change the comparator
AA Trees
• Simpler variant of Red-black trees
– simpler = more efficient
• Add two more properties: 5. Left children may not be red. 6. Remove colors, use levels• Leaves are at level 1• If red, level is level of parent• If black, level is level of parent-1• Code also has many special cases
B-tree of order M• Goal: reduce the number of disk accesses
• Generalization of binary trees
• Method: keep top of tree in memory and have large branching factor
• Disk access >1000 times slower than memory access
• M-ary tree yields O ( log (m/2 N)) accesses
• Data stored only at leaves
• Nonleaves store up to M-1 keys
• Root is leaf or has 2…M children
• All internal nodes have (M+1)/2…M children
• All leaves at same depth and have (L+1)/2…L children
• Often set L = M
• Practical algorithm, but code longish (many cases)
B-Tree Picture: internal node
Key
...
Goal: Store as many key’s a possibleKeys are in orderM-1 KeysM ptrsSpace = M*ptrSize +(M-1)*KeySize
Ptrs
Representation
• Leaf nodes are arrays of size M (or linked lists)
• Internal nodes are:
– array of size M-1 of keys
– array of size M of pointers to nodes
• The keys are in orders
• Choice of M depends on machine architecture and problem.
• M is argmax of:
– keySize*(M-1) + ptrSize*M <= BlockSize
Example Analysis (all on disk)
• Suppose a disk block holds 8,192 bytes.
• Suppose each key is 32 bytes, each branch is 4 bytes, and each data record is 256 bytes.
• L = 32 (8192/256)
• If B-tree has order M, then M-1 keys.
• An interior node holds 32M-32 + M*4 =36M-32 bytes.
• Largest solution for M is 228.
Splay Trees
• Like Splay lists, only probabilistically ordered
• Goal: minimize access time
• Method: no ordering on insert
• Ordering on finds only ( as in splay lists)
• Rotating inserted node up, moves node to root but makes tree unbalanced
• Instead use double rotations zig-zag and zig-zig
• This rebalances tree
• Guarantees O(M log N) costs for M operations, ie. Amortized O(log N).
Summary
• Depth of tree determines overall costs
• Balancing achieved by rotations
• AVL trees require 2 passes for insertion/deletions
– a pass down to find the point
– a pass up to do the corrections
• Red-Black and AA trees require 1 pass
• B-Trees are uses for accessing information that won’t fit in memory
• General: CASE ANALYSIS, separate and conquer