Upload
haquynh
View
221
Download
0
Embed Size (px)
Citation preview
10/23/2014
1
15.5 Optimal binary search trees
• We are designing a program to translate text• Perform lookup operations by building a BST with
words as keys and their equivalents as satellite data• We can ensure an (lg ) search time per occurrence by
using a RBT or any other balanced BST• A frequently used word may appear far from the root
while a rarely used word appears near the root• We want frequent words to be placed nearer the root• How do we organize a BST so as to minimize the
number of nodes visited in all searches, given that weknow how often each word occurs?
23-Oct-14MAT-72006 AA+DS, Fall 2014 516
• What we need is an optimal binary search tree• Formally, given a sequence = , … , of
distinct sorted keys ( ), wewish to build a BST from these keys
• For each key , we have a probability that asearch will be for
• Some searches may be for values not in , sowe also have + 1 “dummy keys” , … ,representing values not in
• In particular, represents all values less than, represents all values greater than
23-Oct-14MAT-72006 AA+DS, Fall 2014 517
10/23/2014
2
• For = 1, 2, … , 1, the dummy keyrepresents all values between and
• For each dummy key , we have a probabilitythat a search will correspond to
23-Oct-14MAT-72006 AA+DS, Fall 2014 518
• Each key is an internal node, and eachdummy key is a leaf
• Every search is either successful ( nds a key )or unsuccessful ( nds a dummy key ), and sowe have
+ = 1
• Because we have probabilities of searches foreach key and each dummy key, we candetermine the expected cost of a search in agiven BST
23-Oct-14MAT-72006 AA+DS, Fall 2014 519
10/23/2014
3
• Let us assume that the actual cost of a searchequals the number of nodes examined, i.e., thedepth of the node found by the search in + 1
• Then the expected cost of a search in ,E search cost in =
depth + 1 + (depth + 1)
= 1 + depth + depth ,
• where depth denotes a node’s depth in tree23-Oct-14MAT-72006 AA+DS, Fall 2014 520
Node Depth Probability Contribution1 0.15 0.300 0.10 0.102 0.05 0.151 0.10 0.202 0.20 0.602 0.05 0.152 0.10 0.303 0.05 0.203 0.05 0.203 0.05 0.203 0.10 0.40
Total 2.80
23-Oct-14MAT-72006 AA+DS, Fall 2014 521
10/23/2014
4
• For a given set of probabilities, we wish to constructa BST whose expected search cost is smallest
• We call such a tree an optimal binary search tree• An optimal BST for the probabilities given has
expected cost 2.75• An optimal BST is not necessarily a tree whose
overall height is smallest• Nor can we necessarily construct an optimal BST
by always putting the key with the greatestprobability at the root
• The lowest expected cost of any BST with at theroot is 2.85
23-Oct-14MAT-72006 AA+DS, Fall 2014 522
23-Oct-14MAT-72006 AA+DS, Fall 2014 523
10/23/2014
5
Step 1: The structure of an optimal BST• Consider any subtree of a BST• It must contain keys in a contiguous range
, … , , for some• In addition, a subtree that contains keys , … ,
must also have as its leaves the dummy keys, … ,
• If an optimal BST has a subtree containingkeys , … , , then this subtree must beoptimal as well for the subproblem with keys
, … , and dummy keys , … ,
23-Oct-14MAT-72006 AA+DS, Fall 2014 524
• Given keys , … , , one of them, say , is theroot of an optimal subtree containing these keys
• The left subtree of the root contains the keys, … , (and dummy keys , … , )
• The right subtree contains the keys , … ,(and dummy keys , … , )
• As long as we– examine all candidate roots , where ,– and determine all optimal BSTs containing
, … , and those containing , … , ,we are guaranteed to nd an optimal BST
23-Oct-14MAT-72006 AA+DS, Fall 2014 525
10/23/2014
6
• Suppose that in a subtree with keys , … , , weselect as the root
• ’s left subtree contains the keys , … ,• Interpret this sequence as containing no keys• Subtrees, however, also contain dummy keys• Adopt the convention that a subtree containing
keys , … , has no actual keys but doescontain the single dummy key
• Symmetrically, if we select as the root, then’s right subtree contains no actual keys, but it
does contain the dummy key
23-Oct-14MAT-72006 AA+DS, Fall 2014 526
Step 2: A recursive solution• We pick our subproblem domain as nding an
optimal BST containing the keys , … , , where1, , and 1
• Let us de ne ] as the expected cost ofsearching an optimal BST containing the keys
, … ,• Ultimately, we wish to compute [1, ]• The easy case occurs when 1• Then we have just the dummy key• The expected search cost is 1 =
23-Oct-14MAT-72006 AA+DS, Fall 2014 527
10/23/2014
7
• When > , we need to select a root fromamong , … , and make an optimal BST withkeys , … , as its left subtree and an optimalBST with keys , … , as its right subtree
• What happens to the expected search cost of asubtree when it becomes a subtree of a node?– Depth of each node increases by 1– Expected search cost of this subtree increases by
the sum of all the probabilities in it• For a subtree with keys , … , , let us denote
this sum of probabilities as, = +
23-Oct-14MAT-72006 AA+DS, Fall 2014 528
• Thus, if is the root of an optimal subtreecontaining keys , … , , we have
+ 1 + 1+( + 1, + 1, )
• Noting that1 + + 1, )
we rewrite1 + + 1, )
• We choose the root that gives the lowestexpected search cost: =
if 1min 1 + + 1, ) if
23-Oct-14MAT-72006 AA+DS, Fall 2014 529
10/23/2014
8
• The values give the expected search costsin optimal BSTs
• To help us keep track of the structure of optimalBSTs, we de ne root , for , to bethe index for which is the root of an optimalBST containing keys , … ,
• Although we will see how to compute the valuesof root , we leave the construction of anoptimal binary search tree from these values asen exercise
23-Oct-14MAT-72006 AA+DS, Fall 2014 530
Step 3: Computing the expected search cost ofan optimal BST• We store values in a table 1. . + 1,0. .• The rst index needs to run to + 1 because to
have a subtree containing only the dummy key, we need to compute and store + 1,
• The second index needs to start from 0 becauseto have a subtree containing only the dummykey , we need to compute and store 1,0
23-Oct-14MAT-72006 AA+DS, Fall 2014 531
10/23/2014
9
• We use only the entries for which 1• We also use a table root , for recording the
root of the subtree containing keys , … ,• This table uses only the entries• We also store the values in a table
[1. . + 1,0. . ]• For the base case, we compute 1 =• For , we compute
1 +• Thus, we can compute the ) values of
, in (1) time each23-Oct-14MAT-72006 AA+DS, Fall 2014 532
OPTIMAL-BST( , , )1. let 1. . + 1,0. . , [1. . + 1,0. . ], root 1. . , 1. . be new tables2. for = 1 to + 13. 1 =4. 1 =5. for = 1 to6. for = 1 to + 17. 18.9. 1 +10. for to11. 1 + + 1, ]12. if ]13.14. root15.return and root
23-Oct-14MAT-72006 AA+DS, Fall 2014 533
10/23/2014
10
23-Oct-14MAT-72006 AA+DS, Fall 2014 534
• The OPTIMAL-BST procedure takes ) time,just like MATRIX-CHAIN-ORDER
• Its running time is ), since its for loops arenested three deep and each loop index takes onat most values
• The loop indices in OPTIMAL-BST do not haveexactly the same bounds as those in MATRIX-CHAIN-ORDER, but they are within 1 in alldirections
• Thus, like MATRIX-CHAIN-ORDER, the OPTIMAL-BST procedure takes ( ) time
23-Oct-14MAT-72006 AA+DS, Fall 2014 535
10/23/2014
11
16 Greedy Algorithms
• Optimization algorithms typically go through asequence of steps, with a set of choices at each
• For many optimization problems, using dynamicprogramming to determine the best choices isoverkill; simpler, more ef cient algorithms will do
• A greedy algorithm always makes the choicethat looks best at the moment
• That is, it makes a locally optimal choice in thehope that this choice will lead to a globallyoptimal solution
23-Oct-14MAT-72006 AA+DS, Fall 2014 536
16.1 An activity-selection problem
• Suppose we have a set = , … , ofproposed activities that wish to use a resource(e.g., a lecture hall), which can serve only oneactivity at a time
• Each activity has a start time and a nishtime , where
• If selected, activity takes place during the half-open time interval [ )
23-Oct-14MAT-72006 AA+DS, Fall 2014 537
10/23/2014
12
• Activities and are compatible if theintervals [ ) and [ ) do not overlap
• I.e., and are compatible if or• We wish to select a maximum-size subset of
mutually compatible activities• We assume that the activities are sorted in
monotonically increasing order of nish time:
23-Oct-14MAT-72006 AA+DS, Fall 2014 538
• Consider, e.g., the following set of activities:
• For this example, the subsetconsists of mutually compatible activities
• It is not a maximum subset, however, since thesubset is larger
• In fact, it is a largest subset of mutuallycompatible activities; another largest subset is
, , ,23-Oct-14MAT-72006 AA+DS, Fall 2014 539
1 2 3 4 5 6 7 8 9 10 111 3 0 5 3 5 6 8 8 2 124 5 6 7 9 9 10 11 12 14 16
10/23/2014
13
The optimal substructure of the activity-selection problem• Let be the set of activities that start after
nishes and that nish before starts• We wish to nd a maximum set of mutually
compatible activities in• Suppose that such a maximum set is , which
includes some activity• By including in an optimal solution, we are left
with two subproblems: nding mutuallycompatible activities in the set and ndingmutually compatible activities in the set
23-Oct-14MAT-72006 AA+DS, Fall 2014 540
• Let and , so that– contains the activities in that nish
before starts and– contains the activities in that start after
nishes• Thus, , and so the
maximum-size set in consists of= | + + 1 activities
• The usual cut-and-paste argument shows thatthe optimal solution must also includeoptimal solutions for and
23-Oct-14MAT-72006 AA+DS, Fall 2014 541
10/23/2014
14
• This suggests that we might solve the activity-selection problem by dynamic programming
• If we denote the size of an optimal solution forthe set by ], then we would have therecurrence
+ 1• Of course, if we did not know that an optimal
solution for the set includes activity , wewould have to examine all activities in to ndwhich one to choose, so that
=0 if
max + 1 if
23-Oct-14MAT-72006 AA+DS, Fall 2014 542
Making the greedy choice• For the activity-selection problem, we need
consider only the greedy choice• We choose an activity that leaves the resource
available for as many other activities as possible• Now, of the activities we end up choosing, one of
them must be the rst one to nish• Choose the activity in with the earliest nish
time, since that leaves the resource available foras many of the activities that follow it as possible
• Activities are sorted in monotonically increasingorder by nish time; greedy choice is activity
23-Oct-14MAT-72006 AA+DS, Fall 2014 543
10/23/2014
15
• If we make the greedy choice, we only have to ndactivities that start after nishes
• and is the earliest nish time of anyactivity no activity can have a nish time
• Thus, all activities that are compatible with activitymust start after nishes
• Let = be the set of activities thatstart after activity nishes
• Optimal substructure: if is in the optimal solution,then an optimal solution to the original problemconsists of and all the activities in an optimalsolution to the subproblem
23-Oct-14MAT-72006 AA+DS, Fall 2014 544
Theorem 16.1 Consider any nonempty subproblem ,and let be an activity in with the earliest nish time.Then is included in some maximum-size subset ofmutually compatible activities of .
Proof Let be a max-size subset of mutually compatibleactivities in , and let be the activity in with theearliest nish time. If , we are done, since is in amax-size subset of mutually compatible activities of .If , let the set . The activitiesin are disjoint because the activities in are disjoint,is the rst activity in to nish, and . Since
| = | |, we conclude that is a maximum-size subsetof mutually compatible activities of and includes .
23-Oct-14MAT-72006 AA+DS, Fall 2014 545
10/23/2014
16
• We can repeatedly choose the activity thatnishes rst, keep only the activities compatible
with this activity, and repeat until no activitiesremain
• Moreover, because we always choose theactivity with the earliest nish time, the nishtimes of the activities we choose must strictlyincrease
• We can consider each activity just once overall,in monotonically increasing order of nish times
23-Oct-14MAT-72006 AA+DS, Fall 2014 546
A recursive greedy algorithm
RECURSIVE-ACTIVITY-SELECTOR )1. + 12. while and ] < ] // nd the rst
// activity in to nish3. + 14. if5. return RECURSIVE-ACTIVITY-
SELECTOR )6. else return
23-Oct-14MAT-72006 AA+DS, Fall 2014 547
10/23/2014
17
23-Oct-14MAT-72006 AA+DS, Fall 2014 548
An iterative greedy algorithm
GREEDY-ACTIVITY-SELECTOR )1.2.3. 14. for 2 to5. if ]6.7.8. return
23-Oct-14MAT-72006 AA+DS, Fall 2014 549
10/23/2014
18
• The set returned by the callGREEDY-ACTIVITY-SELECTOR
is precisely the set returned by the callRECURSIVE-ACTIVITY-SELECTOR
• Both the recursive version and the iterativealgorithm schedule a set of activities in )time, assuming that the activities were alreadysorted initially by their nish times
23-Oct-14MAT-72006 AA+DS, Fall 2014 550
16.3 Huffman codes
• Huffman codes compress data very effectively– savings of 20% to 90% are typical, depending on
the characteristics of the data being compressed• We consider the data to be a sequence of
characters• Huffman’s greedy algorithm uses a table giving
how often each character occurs (i.e., itsfrequency) to build up an optimal way ofrepresenting each character as a binary string
23-Oct-14MAT-72006 AA+DS, Fall 2014 551
10/23/2014
19
• We have a 100,000-character data le that wewish to store compactly
• We observe that the characters in the le occurwith the frequencies given in the table above
• That is, only 6 different characters appear, andthe character occurs 45,000 times
• Here, we consider the problem of designing abinary character code (or code for short) inwhich each character is represented by a uniquebinary string, which we call a codeword
23-Oct-14MAT-72006 AA+DS, Fall 2014 552
• Using a xed-length code, requires 3 bits torepresent 6 characters:
000, 001, … , 101• We now need 300,000 bits to code the entire le• A variable-length code gives frequent
characters short codewords and infrequentcharacters long codewords
• Here the 1-bit string 0 represents , and the 4-bitstring 1100 represents
• This code requires45 1 + 13 3 + 12 3 + 16 3 + 9 4 + 5
1,000 = 224,000 bits (savings 25%)23-Oct-14MAT-72006 AA+DS, Fall 2014 553
10/23/2014
20
Pre x codes
• We consider only codes in which no codeword isalso a pre x of some other codeword
• A pre x code can always achieve the optimal datacompression among any character code, and so wecan restrict our attention to pre x codes
• Encoding is always simple for any binary charactercode; we just concatenate the codewordsrepresenting each character of the le
• E.g., with the variable-length pre x code, we codethe 3-character le as 101 100 = 0101100
23-Oct-14MAT-72006 AA+DS, Fall 2014 554
• Pre x codes simplify decoding– No codeword is a pre x of any other, the
codeword that begins an encoded le isunambiguous
• We can simply identify the initial codeword,translate it back to the original character, andrepeat the decoding process on the remainder ofthe encoded le
• In our example, the string 001011101 parsesuniquely as 101 1101, which decodes to
23-Oct-14MAT-72006 AA+DS, Fall 2014 555
10/23/2014
21
• The decoding process needs a convenientrepresentation for the pre x code so that we caneasily pick off the initial codeword
• A binary tree whose leaves are the givencharacters provides one such representation
• We interpret the binary codeword for a characteras the simple path from the root to thatcharacter, where 0 means “go to the left child”and 1 means “go to the right child”
• Note that the trees are not BSTs — the leavesneed not appear in sorted order and internalnodes do not contain character keys
23-Oct-14MAT-72006 AA+DS, Fall 2014 556
23-Oct-14MAT-72006 AA+DS, Fall 2014 557
The trees corresponding to the xed-lengthcode = 000, … , = 101 and the optimalpre x code = 0, = 101, … , = 1100
10/23/2014
22
• An optimal code for a le is always represented by afull binary tree, in which every nonleaf node has twochildren
• The xed-length code in our example is not optimalsince its tree is not a full binary tree: it containscodewords beginning 10 …, but none beginning 11 …
• Since we can now restrict our attention to full binarytrees, we can say that if is the alphabet fromwhich the characters are drawn and– all character frequencies are positive, then– the tree for an optimal pre x code has exactly |
leaves, one for each letter of the alphabet, and– exactly 1 internal nodes
23-Oct-14MAT-72006 AA+DS, Fall 2014 558
• Given a tree corresponding to a pre x code,we can easily compute the number of bitsrequired to encode a le
• For each character in the alphabet , let theattribute denote the frequency of andlet ) denote the depth of ’s leaf
• ) is also the length of the codeword for• Number of bits required to encode a le is thus
)
which we de ne as the cost of the tree
23-Oct-14MAT-72006 AA+DS, Fall 2014 559
10/23/2014
23
Constructing a Huffman code
• Let be a set of characters and each characterbe an object with an attribute
• The algorithm builds the tree corresponding to theoptimal code bottom-up
• It begins with | leaves and performs 1“merging” operations to create the nal tree
• We use a min-priority queue , keyed on , toidentify the two least-frequent objects to merge
• The result is a new object whose frequency is thesum of the frequencies of the two objects
23-Oct-14MAT-72006 AA+DS, Fall 2014 560
HUFFMAN )1. |2.3. for 1 to 14. allocate a new node5. EXTRACT-MIN )6. EXTRACT-MIN )7.8. INSERT )9. return EXTRACT-MIN ) // return the root of the tree
23-Oct-14MAT-72006 AA+DS, Fall 2014 561
10/23/2014
24
23-Oct-14MAT-72006 AA+DS, Fall 2014 562
• To analyze the running time of HUFFMAN, letbe implemented as a binary min-heap
• For a set of characters, we can initialize(line 2) in ) time using the BUILD-MIN-HEAP
• The for loop executes exactly 1 times, andsince each heap operation requires time (lg ),the loop contributes lg ), to the running time
• Thus, the total running time of HUFFMAN on a setof characters is lg )
• We can reduce the running time to lg lg ) byreplacing the binary min-heap with a van EmdeBoas tree
23-Oct-14MAT-72006 AA+DS, Fall 2014 563
10/23/2014
25
Correctness of Huffman’s algorithm
• We show that the problem of determining anoptimal pre x code exhibits the greedy-choiceand optimal-substructure properties
Lemma 16.2 Let be an alphabet in which eachcharacter has frequency . Let andbe two characters in having the lowestfrequencies. Then there exists an optimal pre xcode for in which the codewords for andhave the same length and differ only in the last bit.
23-Oct-14MAT-72006 AA+DS, Fall 2014 564
Lemma 16.3 Let , , , and be as inLemma 16.2. Let . De nefor as for , except that +
. Let be any tree representing an optimalpre x code for the alphabet . Then the tree ,obtained from by replacing the leaf node forwith an internal node having and as children,represents an optimal pre x code for .
Theorem 16.4 Procedure HUFFMAN produces anoptimal pre x code.
23-Oct-14MAT-72006 AA+DS, Fall 2014 565