15.5 Optimal binary search trees - TUTelomaa/teach/AADS-2014-8.pdf · 15.5 Optimal binary search trees ... optimal BST containing the keys G ... key @ 4, we need to compute and store

10/23/2014

1

15.5 Optimal binary search trees

• We are designing a program to translate text• Perform lookup operations by building a BST with

words as keys and their equivalents as satellite data• We can ensure an (lg ) search time per occurrence by

using a RBT or any other balanced BST• A frequently used word may appear far from the root

while a rarely used word appears near the root• We want frequent words to be placed nearer the root• How do we organize a BST so as to minimize the

number of nodes visited in all searches, given that weknow how often each word occurs?

23-Oct-14MAT-72006 AA+DS, Fall 2014 516

• What we need is an optimal binary search tree• Formally, given a sequence = , … , of

distinct sorted keys ( ), wewish to build a BST from these keys

• For each key , we have a probability that asearch will be for

• Some searches may be for values not in , sowe also have + 1 “dummy keys” , … ,representing values not in

• In particular, represents all values less than, represents all values greater than

23-Oct-14MAT-72006 AA+DS, Fall 2014 517

10/23/2014

2

• For = 1, 2, … , 1, the dummy keyrepresents all values between and

• For each dummy key , we have a probabilitythat a search will correspond to

23-Oct-14MAT-72006 AA+DS, Fall 2014 518

• Each key is an internal node, and eachdummy key is a leaf

• Every search is either successful ( nds a key )or unsuccessful ( nds a dummy key ), and sowe have

+ = 1

• Because we have probabilities of searches foreach key and each dummy key, we candetermine the expected cost of a search in agiven BST

23-Oct-14MAT-72006 AA+DS, Fall 2014 519

10/23/2014

3

• Let us assume that the actual cost of a searchequals the number of nodes examined, i.e., thedepth of the node found by the search in + 1

• Then the expected cost of a search in ,E search cost in =

depth + 1 + (depth + 1)

= 1 + depth + depth ,

• where depth denotes a node’s depth in tree23-Oct-14MAT-72006 AA+DS, Fall 2014 520

Node Depth Probability Contribution1 0.15 0.300 0.10 0.102 0.05 0.151 0.10 0.202 0.20 0.602 0.05 0.152 0.10 0.303 0.05 0.203 0.05 0.203 0.05 0.203 0.10 0.40

Total 2.80

23-Oct-14MAT-72006 AA+DS, Fall 2014 521

10/23/2014

4

• For a given set of probabilities, we wish to constructa BST whose expected search cost is smallest

• We call such a tree an optimal binary search tree• An optimal BST for the probabilities given has

expected cost 2.75• An optimal BST is not necessarily a tree whose

overall height is smallest• Nor can we necessarily construct an optimal BST

by always putting the key with the greatestprobability at the root

• The lowest expected cost of any BST with at theroot is 2.85

23-Oct-14MAT-72006 AA+DS, Fall 2014 522

23-Oct-14MAT-72006 AA+DS, Fall 2014 523

10/23/2014

5

Step 1: The structure of an optimal BST• Consider any subtree of a BST• It must contain keys in a contiguous range

, … , , for some• In addition, a subtree that contains keys , … ,

must also have as its leaves the dummy keys, … ,

• If an optimal BST has a subtree containingkeys , … , , then this subtree must beoptimal as well for the subproblem with keys

, … , and dummy keys , … ,

23-Oct-14MAT-72006 AA+DS, Fall 2014 524

• Given keys , … , , one of them, say , is theroot of an optimal subtree containing these keys

• The left subtree of the root contains the keys, … , (and dummy keys , … , )

• The right subtree contains the keys , … ,(and dummy keys , … , )

• As long as we– examine all candidate roots , where ,– and determine all optimal BSTs containing

, … , and those containing , … , ,we are guaranteed to nd an optimal BST

23-Oct-14MAT-72006 AA+DS, Fall 2014 525

10/23/2014

6

• Suppose that in a subtree with keys , … , , weselect as the root

• ’s left subtree contains the keys , … ,• Interpret this sequence as containing no keys• Subtrees, however, also contain dummy keys• Adopt the convention that a subtree containing

keys , … , has no actual keys but doescontain the single dummy key

• Symmetrically, if we select as the root, then’s right subtree contains no actual keys, but it

does contain the dummy key

23-Oct-14MAT-72006 AA+DS, Fall 2014 526

Step 2: A recursive solution• We pick our subproblem domain as nding an

optimal BST containing the keys , … , , where1, , and 1

• Let us de ne ] as the expected cost ofsearching an optimal BST containing the keys

, … ,• Ultimately, we wish to compute [1, ]• The easy case occurs when 1• Then we have just the dummy key• The expected search cost is 1 =

23-Oct-14MAT-72006 AA+DS, Fall 2014 527

10/23/2014

7

• When > , we need to select a root fromamong , … , and make an optimal BST withkeys , … , as its left subtree and an optimalBST with keys , … , as its right subtree

• What happens to the expected search cost of asubtree when it becomes a subtree of a node?– Depth of each node increases by 1– Expected search cost of this subtree increases by

the sum of all the probabilities in it• For a subtree with keys , … , , let us denote

this sum of probabilities as, = +

23-Oct-14MAT-72006 AA+DS, Fall 2014 528

• Thus, if is the root of an optimal subtreecontaining keys , … , , we have

+ 1 + 1+( + 1, + 1, )

• Noting that1 + + 1, )

we rewrite1 + + 1, )

• We choose the root that gives the lowestexpected search cost: =

if 1min 1 + + 1, ) if

23-Oct-14MAT-72006 AA+DS, Fall 2014 529

10/23/2014

8

• The values give the expected search costsin optimal BSTs

• To help us keep track of the structure of optimalBSTs, we de ne root , for , to bethe index for which is the root of an optimalBST containing keys , … ,

• Although we will see how to compute the valuesof root , we leave the construction of anoptimal binary search tree from these values asen exercise

23-Oct-14MAT-72006 AA+DS, Fall 2014 530

Step 3: Computing the expected search cost ofan optimal BST• We store values in a table 1. . + 1,0. .• The rst index needs to run to + 1 because to

have a subtree containing only the dummy key, we need to compute and store + 1,

• The second index needs to start from 0 becauseto have a subtree containing only the dummykey , we need to compute and store 1,0

23-Oct-14MAT-72006 AA+DS, Fall 2014 531

10/23/2014

9

• We use only the entries for which 1• We also use a table root , for recording the

root of the subtree containing keys , … ,• This table uses only the entries• We also store the values in a table

[1. . + 1,0. . ]• For the base case, we compute 1 =• For , we compute

1 +• Thus, we can compute the ) values of

, in (1) time each23-Oct-14MAT-72006 AA+DS, Fall 2014 532

OPTIMAL-BST( , , )1. let 1. . + 1,0. . , [1. . + 1,0. . ], root 1. . , 1. . be new tables2. for = 1 to + 13. 1 =4. 1 =5. for = 1 to6. for = 1 to + 17. 18.9. 1 +10. for to11. 1 + + 1, ]12. if ]13.14. root15.return and root

23-Oct-14MAT-72006 AA+DS, Fall 2014 533

10/23/2014

10

23-Oct-14MAT-72006 AA+DS, Fall 2014 534

• The OPTIMAL-BST procedure takes ) time,just like MATRIX-CHAIN-ORDER

• Its running time is ), since its for loops arenested three deep and each loop index takes onat most values

• The loop indices in OPTIMAL-BST do not haveexactly the same bounds as those in MATRIX-CHAIN-ORDER, but they are within 1 in alldirections

• Thus, like MATRIX-CHAIN-ORDER, the OPTIMAL-BST procedure takes ( ) time

23-Oct-14MAT-72006 AA+DS, Fall 2014 535

10/23/2014

11

16 Greedy Algorithms

• Optimization algorithms typically go through asequence of steps, with a set of choices at each

• For many optimization problems, using dynamicprogramming to determine the best choices isoverkill; simpler, more ef cient algorithms will do

• A greedy algorithm always makes the choicethat looks best at the moment

• That is, it makes a locally optimal choice in thehope that this choice will lead to a globallyoptimal solution

23-Oct-14MAT-72006 AA+DS, Fall 2014 536

16.1 An activity-selection problem

• Suppose we have a set = , … , ofproposed activities that wish to use a resource(e.g., a lecture hall), which can serve only oneactivity at a time

• Each activity has a start time and a nishtime , where

• If selected, activity takes place during the half-open time interval [ )

23-Oct-14MAT-72006 AA+DS, Fall 2014 537

10/23/2014

12

• Activities and are compatible if theintervals [ ) and [ ) do not overlap

• I.e., and are compatible if or• We wish to select a maximum-size subset of

mutually compatible activities• We assume that the activities are sorted in

monotonically increasing order of nish time:

23-Oct-14MAT-72006 AA+DS, Fall 2014 538

• Consider, e.g., the following set of activities:

• For this example, the subsetconsists of mutually compatible activities

• It is not a maximum subset, however, since thesubset is larger

• In fact, it is a largest subset of mutuallycompatible activities; another largest subset is

, , ,23-Oct-14MAT-72006 AA+DS, Fall 2014 539

1 2 3 4 5 6 7 8 9 10 111 3 0 5 3 5 6 8 8 2 124 5 6 7 9 9 10 11 12 14 16

10/23/2014

13

The optimal substructure of the activity-selection problem• Let be the set of activities that start after

nishes and that nish before starts• We wish to nd a maximum set of mutually

compatible activities in• Suppose that such a maximum set is , which

includes some activity• By including in an optimal solution, we are left

with two subproblems: nding mutuallycompatible activities in the set and ndingmutually compatible activities in the set

23-Oct-14MAT-72006 AA+DS, Fall 2014 540

• Let and , so that– contains the activities in that nish

before starts and– contains the activities in that start after

nishes• Thus, , and so the

maximum-size set in consists of= | + + 1 activities

• The usual cut-and-paste argument shows thatthe optimal solution must also includeoptimal solutions for and

23-Oct-14MAT-72006 AA+DS, Fall 2014 541

10/23/2014

14

• This suggests that we might solve the activity-selection problem by dynamic programming

• If we denote the size of an optimal solution forthe set by ], then we would have therecurrence

+ 1• Of course, if we did not know that an optimal

solution for the set includes activity , wewould have to examine all activities in to ndwhich one to choose, so that

=0 if

max + 1 if

23-Oct-14MAT-72006 AA+DS, Fall 2014 542

Making the greedy choice• For the activity-selection problem, we need

consider only the greedy choice• We choose an activity that leaves the resource

available for as many other activities as possible• Now, of the activities we end up choosing, one of

them must be the rst one to nish• Choose the activity in with the earliest nish

time, since that leaves the resource available foras many of the activities that follow it as possible

• Activities are sorted in monotonically increasingorder by nish time; greedy choice is activity

23-Oct-14MAT-72006 AA+DS, Fall 2014 543

10/23/2014

15

• If we make the greedy choice, we only have to ndactivities that start after nishes

• and is the earliest nish time of anyactivity no activity can have a nish time

• Thus, all activities that are compatible with activitymust start after nishes

• Let = be the set of activities thatstart after activity nishes

• Optimal substructure: if is in the optimal solution,then an optimal solution to the original problemconsists of and all the activities in an optimalsolution to the subproblem

23-Oct-14MAT-72006 AA+DS, Fall 2014 544

Theorem 16.1 Consider any nonempty subproblem ,and let be an activity in with the earliest nish time.Then is included in some maximum-size subset ofmutually compatible activities of .

Proof Let be a max-size subset of mutually compatibleactivities in , and let be the activity in with theearliest nish time. If , we are done, since is in amax-size subset of mutually compatible activities of .If , let the set . The activitiesin are disjoint because the activities in are disjoint,is the rst activity in to nish, and . Since

| = | |, we conclude that is a maximum-size subsetof mutually compatible activities of and includes .

23-Oct-14MAT-72006 AA+DS, Fall 2014 545

10/23/2014

16

• We can repeatedly choose the activity thatnishes rst, keep only the activities compatible

with this activity, and repeat until no activitiesremain

• Moreover, because we always choose theactivity with the earliest nish time, the nishtimes of the activities we choose must strictlyincrease

• We can consider each activity just once overall,in monotonically increasing order of nish times

23-Oct-14MAT-72006 AA+DS, Fall 2014 546

A recursive greedy algorithm

RECURSIVE-ACTIVITY-SELECTOR )1. + 12. while and ] < ] // nd the rst

// activity in to nish3. + 14. if5. return RECURSIVE-ACTIVITY-

SELECTOR )6. else return

23-Oct-14MAT-72006 AA+DS, Fall 2014 547

10/23/2014

17

23-Oct-14MAT-72006 AA+DS, Fall 2014 548

An iterative greedy algorithm

GREEDY-ACTIVITY-SELECTOR )1.2.3. 14. for 2 to5. if ]6.7.8. return

23-Oct-14MAT-72006 AA+DS, Fall 2014 549

10/23/2014

18

• The set returned by the callGREEDY-ACTIVITY-SELECTOR

is precisely the set returned by the callRECURSIVE-ACTIVITY-SELECTOR

• Both the recursive version and the iterativealgorithm schedule a set of activities in )time, assuming that the activities were alreadysorted initially by their nish times

23-Oct-14MAT-72006 AA+DS, Fall 2014 550

16.3 Huffman codes

• Huffman codes compress data very effectively– savings of 20% to 90% are typical, depending on

the characteristics of the data being compressed• We consider the data to be a sequence of

characters• Huffman’s greedy algorithm uses a table giving

how often each character occurs (i.e., itsfrequency) to build up an optimal way ofrepresenting each character as a binary string

23-Oct-14MAT-72006 AA+DS, Fall 2014 551

10/23/2014

19

• We have a 100,000-character data le that wewish to store compactly

• We observe that the characters in the le occurwith the frequencies given in the table above

• That is, only 6 different characters appear, andthe character occurs 45,000 times

• Here, we consider the problem of designing abinary character code (or code for short) inwhich each character is represented by a uniquebinary string, which we call a codeword

23-Oct-14MAT-72006 AA+DS, Fall 2014 552

• Using a xed-length code, requires 3 bits torepresent 6 characters:

000, 001, … , 101• We now need 300,000 bits to code the entire le• A variable-length code gives frequent

characters short codewords and infrequentcharacters long codewords

• Here the 1-bit string 0 represents , and the 4-bitstring 1100 represents

• This code requires45 1 + 13 3 + 12 3 + 16 3 + 9 4 + 5

1,000 = 224,000 bits (savings 25%)23-Oct-14MAT-72006 AA+DS, Fall 2014 553

10/23/2014

20

Pre x codes

• We consider only codes in which no codeword isalso a pre x of some other codeword

• A pre x code can always achieve the optimal datacompression among any character code, and so wecan restrict our attention to pre x codes

• Encoding is always simple for any binary charactercode; we just concatenate the codewordsrepresenting each character of the le

• E.g., with the variable-length pre x code, we codethe 3-character le as 101 100 = 0101100

23-Oct-14MAT-72006 AA+DS, Fall 2014 554

• Pre x codes simplify decoding– No codeword is a pre x of any other, the

codeword that begins an encoded le isunambiguous

• We can simply identify the initial codeword,translate it back to the original character, andrepeat the decoding process on the remainder ofthe encoded le

• In our example, the string 001011101 parsesuniquely as 101 1101, which decodes to

23-Oct-14MAT-72006 AA+DS, Fall 2014 555

10/23/2014

21

• The decoding process needs a convenientrepresentation for the pre x code so that we caneasily pick off the initial codeword

• A binary tree whose leaves are the givencharacters provides one such representation

• We interpret the binary codeword for a characteras the simple path from the root to thatcharacter, where 0 means “go to the left child”and 1 means “go to the right child”

• Note that the trees are not BSTs — the leavesneed not appear in sorted order and internalnodes do not contain character keys

23-Oct-14MAT-72006 AA+DS, Fall 2014 556

23-Oct-14MAT-72006 AA+DS, Fall 2014 557

The trees corresponding to the xed-lengthcode = 000, … , = 101 and the optimalpre x code = 0, = 101, … , = 1100

10/23/2014

22

• An optimal code for a le is always represented by afull binary tree, in which every nonleaf node has twochildren

• The xed-length code in our example is not optimalsince its tree is not a full binary tree: it containscodewords beginning 10 …, but none beginning 11 …

• Since we can now restrict our attention to full binarytrees, we can say that if is the alphabet fromwhich the characters are drawn and– all character frequencies are positive, then– the tree for an optimal pre x code has exactly |

leaves, one for each letter of the alphabet, and– exactly 1 internal nodes

23-Oct-14MAT-72006 AA+DS, Fall 2014 558

• Given a tree corresponding to a pre x code,we can easily compute the number of bitsrequired to encode a le

• For each character in the alphabet , let theattribute denote the frequency of andlet ) denote the depth of ’s leaf

• ) is also the length of the codeword for• Number of bits required to encode a le is thus

)

which we de ne as the cost of the tree

23-Oct-14MAT-72006 AA+DS, Fall 2014 559

10/23/2014

23

Constructing a Huffman code

• Let be a set of characters and each characterbe an object with an attribute

• The algorithm builds the tree corresponding to theoptimal code bottom-up

• It begins with | leaves and performs 1“merging” operations to create the nal tree

• We use a min-priority queue , keyed on , toidentify the two least-frequent objects to merge

• The result is a new object whose frequency is thesum of the frequencies of the two objects

23-Oct-14MAT-72006 AA+DS, Fall 2014 560

HUFFMAN )1. |2.3. for 1 to 14. allocate a new node5. EXTRACT-MIN )6. EXTRACT-MIN )7.8. INSERT )9. return EXTRACT-MIN ) // return the root of the tree

23-Oct-14MAT-72006 AA+DS, Fall 2014 561

10/23/2014

24

23-Oct-14MAT-72006 AA+DS, Fall 2014 562

• To analyze the running time of HUFFMAN, letbe implemented as a binary min-heap

• For a set of characters, we can initialize(line 2) in ) time using the BUILD-MIN-HEAP

• The for loop executes exactly 1 times, andsince each heap operation requires time (lg ),the loop contributes lg ), to the running time

• Thus, the total running time of HUFFMAN on a setof characters is lg )

• We can reduce the running time to lg lg ) byreplacing the binary min-heap with a van EmdeBoas tree

23-Oct-14MAT-72006 AA+DS, Fall 2014 563

10/23/2014

25

Correctness of Huffman’s algorithm

• We show that the problem of determining anoptimal pre x code exhibits the greedy-choiceand optimal-substructure properties

Lemma 16.2 Let be an alphabet in which eachcharacter has frequency . Let andbe two characters in having the lowestfrequencies. Then there exists an optimal pre xcode for in which the codewords for andhave the same length and differ only in the last bit.

23-Oct-14MAT-72006 AA+DS, Fall 2014 564

Lemma 16.3 Let , , , and be as inLemma 16.2. Let . De nefor as for , except that +

. Let be any tree representing an optimalpre x code for the alphabet . Then the tree ,obtained from by replacing the leaf node forwith an internal node having and as children,represents an optimal pre x code for .

Theorem 16.4 Procedure HUFFMAN produces anoptimal pre x code.

23-Oct-14MAT-72006 AA+DS, Fall 2014 565

Documents

15.5 Optimal binary search trees - TUTelomaa/teach/AADS-2014-8.pdf · 15.5 Optimal binary search trees ... optimal BST containing the keys G ... key @ 4, we need to compute and store