Discussion #61/13 Discussion #6 Parsing Recursive Grammars

Discussion #6 1/13

Discussion #6

Parsing Recursive Grammars

Discussion #6 2/13

Topics

• Tail recursion

• LL(1) with • Table driven LL(1) with • Lexical Analyzers

Discussion #6 3/13

Motivating Example• Let’s use integers instead of digits in our prefix

language.• What’s the problem with *2+72100?

– Syntax error? Or is it simply ambiguous?

– * 2 + 7 2100 = 2 * (7 + 2100) = 4214?

– * 2 + 72 100 = 2 * (72+100) = 344?

• Solution?– Let n mark the beginning of a number

e.g. * n2 + n72 n100 = 2 * (72 + 100) = 344

– Strange: but you’ll soon see where we are headed and why.

Discussion #6 4/13

E (1)N | (2)OEEO (3)+ | (4)*N (5)nII (6)D | (7)IDD (8)0 | (9)1 | … | (17)9

E

I

D

n

2

O E E

N N+

In

DI

DI

D

1

0

0

In

DI

D

7

2

N O E E*

Consider: * n2 + n72 n100

Discussion #6 5/13

E (1)N | (2)OEEO (3)+ | (4)*N (5)nII (6)D | (7)IDD (8)0 | (9)1 | … | (17)9

E

I

D

n

2

O E E

N N+

In

??

N O E E*

* n2 + n72 n100

Question…

Which rule do we choose?I (6)D

or I (7)IDWe don’t know without looking further ahead.Should we look further ahead, or find another way?

Discussion #6 6/13

LL(1) with

• There is another way.• Consider the following

replacement:– (6)I D by (6) I D T

– (7)I ID T (7)I | (8)

• Now, if I is on the top of the stack and we see a digit, we choose I DT.

• If T is on top, if we see a digit, we choose T I, otherwise we choose T .

D T

I

D T

I1

D T

0

0 I

Note: The does not “consume” the “+” which is still on top.

Example: the 100 in …n100+n21n…

Discussion #6 7/13

Tails• We use tails for things that go on forever.

– Numbers, eg. 12, 123456, …– Parameter lists, eg. (parm1, parm2,…,parmn)– Variable names, eg. dog, doggone, …

• Note that T(ail) rules have special constructions:– FIRST(I) = FIRST(T) = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}– FIRST(for T) = { {VT {#}} – FIRST(T) }

= {+, *, n, #}– Note: FIRST(T) FIRST(for T) = – Note also: FIRST(I) FIRST(for T) = VT {#}– Thus, our tail construction simply iterates until it reaches the

end. Further, it leaves the character that is one beyond the end on top of the stack.

Discussion #6 8/13

E (1)N | (2)OEEO (3)+ | (4)*N (5)nII (6)DTT (7)I | (8)D (9)0 | (10)1 | … | (18)9

+ * 0 1 2… n #

E (OEE,2) (OEE,2) (N,1)O (+,3) (*,4)N (nI,5)I (DT,6) (DT,6) (DT,6)T (,8) (,8) (I,7) (I,7) (I,7) (,8) (,8)D (0,9) (1,10) (2,11…)+ pop* pop0 pop1 pop

2… popn pop# accept

Discussion #6 9/13

+ * 0 1 2… n #

E (OEE,2) (OEE,2) (N,1)O (+,3) (*,4)N (nI,5)I (DT,6) (DT,6) (DT,6)T (,8) (,8) (I,7) (I,7) (I,7) (,8) (,8)D (0,9) (1,10) (2,11)

+,*,0-9,n pop pop pop pop pop pop# accept

Action Stack Input Output

Initialize E# +n10n1#ACTION(E,+) = Replace [E,OEE], Out 2 OEE# +n10n1# 2

ACTION(+,+) = pop(+,+) EE# +n10n1# 23ACTION(E,n) = Replace [E,N], Out 1 NE# +n10n1# 231

ACTION(O,+) = Replace [O,+], Out 3 +EE# +n10n1# 23

ACTION(N,n) = Replace [N,nI], Out 5 nIE# +n10n1# 2315

ACTION(I,1) = Replace [I,DT], Out 6 DTE# +n10n1# 23156ACTION(n,n) = pop(n,n) IE# +n10n1# 2315

ACTION(D,1) = Replace [D,1], Out 10 1TE# +n10n1# 2315610

ACTION(1,1) = pop(1,1) TE# +n10n1# 2315610

ACTION(T,0) = Replace [T,I], Out 7 IE# +n10n1# 23156107ACTION(I,0) = Replace [I,DT], Out 6 DTE# +n10n1# 231561076ACTION(D,0) = Replace [D,0], Out 9 0TE# +n10n1# 2315610769

Discussion #6 10/13

+ * 0 1 2… n #

E (OEE,2) (OEE,2) (N,1)O (+,3) (*,4)N (nI,5)I (DT,6) (DT,6) (DT,6)T (,8) (,8) (I,7) (I,7) (I,7) (,8) (,8)D (0,9) (1,10) (2,11)

+,*,0-9,n pop pop pop pop pop pop# accept

Action Stack Input Output

Continued… 0TE# +n10n1# 2315610769ACTION(0,0) = pop(0,0) TE# +n10n1# 2315610769ACTION(T,n) = Replace [T,], Out 8 E# +n10n1# 23156107698

ACTION(E,n) = Replace [E,N], Out 1 N# +n10n1# 231561076981ACTION(N,n) = Replace [N,nI], Out 5 nI# +n10n1# 2315610769815ACTION(n,n) = pop(n,n) I# +n10n1# 2315610769815ACTION(I,1) = Replace [I,DT], Out 6 DT# +n10n1# 23156107698156ACTION(D,1) = Replace [D,1], Out 10 1T# +n10n1# 2315610769815610

ACTION(1,1) = pop(1,1) T# +n10n1# 2315610769815610

ACTION(T,#) = Replace [T,], Out 8 # +n10n1# 23156107698156108

ACTION(#,#) = Accept! # +n10n1# 23156107698156108

ACTION(,n) = pop E# +n10n1# 23156107698

ACTION(,#) = pop # +n10n1# 23156107698156108

Discussion #6 11/13


2 3 1 5 6 10 7 6 9 8 1 5 6 10 8

E

2 3 1 5 6 10 7 6 9 8 1 5 6 10 8 is the parse for + n 1 0 n 1

O E E

2

+

3

N1

N

1

In

5

TD

6

1

10

I

7

TD

6

0

9

8

In

5

TD

6

1

10

8

Discussion #6 12/13


2 3 1 5 6 10 7 6 9 8 1 5 6 10 8

E

O E E

2

+

3

N1

N

1

In

5

TD

6

1

10

I

7

TD

6

0

9

8

In

5

TD

6

1

10

8

Lexical Analyzer Motivation

Discussion #6 13/13

E (1)N | (2)OEEO (3)+ | (4)*N (5)<number>

E

2 3 1 5 1 5 becomes the parse for +n10n1 where the tokens are +, n10, and n1

O E E

2

+

3

N1

N

1

n105

n15

Tokenization Simplifies Grammars

Documents

Discussion #61/13 Discussion #6 Parsing Recursive Grammars