36
Copyright © 2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright 2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Embed Size (px)

DESCRIPTION

Copyright © Curt Hill Expression trees 2+3*4 + * *2 34 3*4+2

Citation preview

Page 1: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Other Trees

Applications of the Tree Structure

Page 2: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Parse or expression trees• An expression tree contains:

– Operators as interior nodes– Values as leaves

• The shape of the expression tree captures the precedence

• Consider the following expression:2+3*4

Page 3: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Expression trees2+3*4

+

*2

3 4

+

* 2

3 4

3*4+2

Page 4: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Notes• The plus is always higher in the

tree because the precedence of multiplication is higher

• The expression tree is based upon the grammar rules– Syntax diagrams will be considered

later

Page 5: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Traversal• The names come from the above

expression tree• There are six (3!) ways to traverse the

depending on the order of processing:– The node – The left subtree– The right subtree

• Inorder (left and right)• Preorder• Postorder

Page 6: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Inorder • According to the sorted order of

tree• Visit lower (left) subtree• Process node• Visit upper (right) subtree• The reverse produces higher to

lower• Left to right 2 + 3 * 4• This gives standard algebraic

notation

Page 7: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Preorder • Node first then subtrees• Process node• Visit lower (left) subtree• Visit upper (right) subtree• Expression + 2 * 3 4• Remember this?

Page 8: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Postorder • Subtrees first and then node• Visit lower (left) subtree• Visit upper (right) subtree• Process node• Expression: 2 3 4 * +• Reverse Polish

Page 9: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Parse Trees• A parse tree is a construction that

represents the parse of a sentence• Parse trees are often built by

compilers and other programs that scan source code

• A parse tree is one acceptable parse of this source code based on the grammar

• First some definitions and background

Page 10: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

A grammar consists of• Terminals

• Keywords• Constants• Words• Symbols or operators

• Non-terminals• Named constructs where the name never actually

appears• Such as statements and expression

• Distinguished symbol• The starting point• Usually represents whole program

• Productions• Rules for going from distinguished symbol to non

terminals

Page 11: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Example• Simplified grammar for expression

– Terminals:• Constant• Ident• + - ( ) * /

– Productions in EBNF:• Expression ::= Term + term• Expression ::= term - term• Expression ::= term• Term ::= Factor [ { * / } factor ]• Factor ::= Constant• Factor ::= Ident• Factor ::= ( Expression )

Page 12: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Productions Again• The productions are all

replacement rules• They may also be visually denoted

in terms of syntax graphs• These are usually as easy to

understand but represent same information

Page 13: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Syntax Graphs• A circle represents a terminal

– Reserved word or operator– No further definition

• A rectangle represents a non-terminal– For statement or expression– Must be defined else where

• An arrow represents the path between one item and another– The arrows may branch indicating

alternatives• Recursion is also allowed

Page 14: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Simple Expressionsexpression

term+

-termfactor

*

/factor

constant ident ( )expression

Page 15: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Parse tree example• In a parse tree

– Leaves are terminals– Nodes are non-terminals

• Usually evaluated from bottom to top

• Consider the parse of:2 + 5 * ( 3 - 4 )

Page 16: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Expression: 2 + 5 * (3 – 4)

term -

factor

3

term

factor

4

expression

*factor

5

termterm +

factor

2

expression

factor

( (

Page 17: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Notes• Parse trees are partially built by

the parser of the compiler• Then used by the code generator• Once used the sub-trees are

discarded• Whole tree never exists

Page 18: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Balance and Search Times• The time it takes to search a tree

is based upon the path length to the desired node

• Assuming equal distributions then– The average search is the sum of the

path lengths divided by the number of tree nodes

Page 19: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Unbalanced Tree12

19 6

2 15 36

4 0 24

30

29

Page 20: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Average Search Length• 12 – 1• 6 – 2• 19 – 2• 2 – 3• 15 – 3• 36 – 3

• 24 – 4• 0 – 4• 4 – 4• 30 – 5• 29 – 6

• Sum of 37 for 11 nodes gives average search length of 3.3

Page 21: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Perfectly Balanced Tree36

24 4

2 15 28

6 0 19

30

12

Page 22: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Average Search Length• 36 – 1• 4 – 2• 24 – 2• 2 – 3• 12 - 3• 15 – 3• 28 – 3

• 0 – 4• 6 – 4• 19 – 4• 30 – 4

• Sum of 33 fpr 11 nodes gives average search length of 3.0

• Balanced does perform better

Page 23: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

AVL Balanced Tree36

19 6

2 15 28

4 0 24

30

12

Page 24: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Average Search Length• 36 – 1• 6 – 2• 19 – 2• 2 – 3• 12 - 3• 15 – 3• 28 – 3

• 0 – 4• 4 – 4• 28 – 4• 30 – 4

• Sum of 33 fpr 11 nodes gives average search length of 3.0

• AVL balanced has the same performance as perfectly balanced

Page 25: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Balanced is Best?• The idea of balancing a tree is

predicated on equal frequencies of keys– Reasonable assumption if no contrary

information– However, if we have frequency

information we can do better• C++ keywords are not evenly

distributed

Page 26: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Path Lengths• The idea of balance is nice in general but…• If we have a reasonable idea of the

frequency of entries we can do better than perfectly balanced

• What we want to do is minimize the average path length

• With our previous knowledge we could make not assumptions concerning frequency

• Now we can generate a more precise formula

Page 27: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Average Path Length

• Where– n is the number of words– p is the path length of word i– f is the frequency of word i

n

iii fp

1

Page 28: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Optimal Search Trees• What we want are high frequency

words close to the root and low frequency words at the leaves

• You might think that the most common word should be the root and the next two words the second and third common

• It does not work that way since we need to maintain the order as well

Page 29: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Example• For example the word "the" is the most

common word in English text• The top n are:

– the (20)– and (15)– of (13)– to (12)– you (7)– in (7)– a (6)

• Because the top two are such extremes it may be better to have “of” as the root

Page 30: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

LISP Lists• LISP is very old

– Second only to FORTRAN• Usually encountered in Programming

Language or Artificial Intelligence classes• It has an ubiquitous data structure called

a list• However it is not a list in the sense that

it is purely linear• Instead it is a tree, but a tree without a

key

Page 31: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Variables in LISP• A variable may be:

– An atom– A list

• An atom is any word or number• A list may be:

– Empty– A variable followed by a list

Page 32: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Lists• A list could be a simple list within

parenthesis– (Three element list)

• It could also have sub-lists– (Atom (A sub list) another (list))– This is clearly not a linear list such as

an STL List• LISP programs were also lists

– The programs and data had same form

Page 33: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Implementation• The LISP language was influenced

by the machine on which it was developed

• It had a 36 bit word that was partitioned into two pointers– Contents Address Register (CAR)– Contents Data Register (CDR)

• An atom used the word for data• A list used the pointers and atoms• A list always ended in nil, a special

pointer

Page 34: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Example

Three

element

List

nil

(Three element list)

Page 35: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

Second Example

Atom

sub

last

nil

(Atom (sub list) last)

listnil

Page 36: Copyright  2004-2011 Curt Hill Other Trees Applications of the Tree Structure

Copyright © 2004-2011 Curt Hill

List Processing• There were two functions that

were continually used in LISP to process a list

• Car gave the first item of the list– Which could be a list itself

• Cdr gave the rest of the list• A heavy dose of recursion and LISP

could do it all