103
1 Syntax Analysis

Syntax Analysis - ecourse2.ccu.edu.tw

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Syntax Analysis - ecourse2.ccu.edu.tw

1

Syntax Analysis

Page 2: Syntax Analysis - ecourse2.ccu.edu.tw

2

Syntax Analysis

Introduction to parsers

Context-free grammars

Push-down automata

Top-down parsing

Buttom-up parsing

Bison - a parser generator

Page 3: Syntax Analysis - ecourse2.ccu.edu.tw

3

Introduction to parsers

Lexical

AnalyzerParser

Symbol

Table

token

next token

source Semantic

Analyzersyntax

treecode

Page 4: Syntax Analysis - ecourse2.ccu.edu.tw

4

Context-Free Grammars

A set of terminals: basic symbols from

which sentences are formed

A set of nonterminals: syntactic

categories denoting sets of sentences

A set of productions: rules specifying

how the terminals and nonterminals can

be combined to form sentences

The start symbol: a distinguished

nonterminal denoting the language

Page 5: Syntax Analysis - ecourse2.ccu.edu.tw

5

An Example

Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’

Nonterminals: expr, op

Productions:

expr expr op expr

expr ‘(’ expr ‘)’

expr ‘-’ expr

expr id

op ‘+’ | ‘-’ | ‘*’ | ‘/’

The start symbol: expr

Page 6: Syntax Analysis - ecourse2.ccu.edu.tw

6

Derivations

A derivation step is an application of a

production as a rewriting rule

E - E

A sequence of derivation steps

E - E - ( E ) - ( id )

is called a derivation of “- ( id )” from E

The symbol * denotes “derives in zero or

more steps”; the symbol + denotes “derives

in one or more steps

E * - ( id ) E + - ( id )

Page 7: Syntax Analysis - ecourse2.ccu.edu.tw

7

Context-Free Languages

A context-free language L(G) is the language

defined by a context-free grammar G

A string of terminals is in L(G) if and only

if S + , is called a sentence of G

If S * , where may contain nonterminals,

then we call a sentential form of G

E - E - ( E ) - ( id )

G1 is equivalent to G2 if L(G1) = L(G2)

Page 8: Syntax Analysis - ecourse2.ccu.edu.tw

8

Left- & Right-most Derivations

Each derivation step needs to choose

– a nonterminal to rewrite

– a production to apply

A leftmost derivation always chooses the

leftmost nonterminal to rewrite

E lm - E lm - ( E ) lm - ( E + E )

lm - ( id + E ) lm - ( id + id )

A rightmost derivation always chooses the

rightmost nonterminal to rewrite

E rm - E rm - ( E ) rm - ( E + E )

rm - (E + id ) rm - ( id + id )

Page 9: Syntax Analysis - ecourse2.ccu.edu.tw

9

Parse Trees

A parse tree is a graphical representation

for a derivation that filters out the order of

choosing nonterminals for rewriting

Many derivations may correspond to the

same parse tree, but every parse tree has

associated with it a unique leftmost and a

unique rightmost derivation

Page 10: Syntax Analysis - ecourse2.ccu.edu.tw

10

An Example

E

-

( )

+

id id

E

E E

EE

lm - E

lm - ( E )

lm - ( E + E )

lm - ( id + E )

lm - ( id + id )

E

rm - E

rm - ( E )

rm - ( E + E )

rm - ( E + id )

rm - ( id + id )

Page 11: Syntax Analysis - ecourse2.ccu.edu.tw

11

Ambiguous Grammar

A grammar is ambiguous if it produces

more than one parse tree for some sentence

E

E + E

id + E

id + E * E

id + id * E

id + id * id

E

E * E

E + E * E

id + E * E

id + id * E

id + id * id

Page 12: Syntax Analysis - ecourse2.ccu.edu.tw

12

Ambiguous Grammar

E

+E E

id

id

*E E

id

E

*E E

id

id

+E E

id

Page 13: Syntax Analysis - ecourse2.ccu.edu.tw

13

Resolving Ambiguity

Use disambiguiting rules to throw away

undesirable parse trees

Rewrite grammars by incorporating

disambiguiting rules into grammars

Page 14: Syntax Analysis - ecourse2.ccu.edu.tw

14

An Example

The dangling-else grammar

stmt if expr then stmt

| if expr then stmt else stmt

| other

Two parse trees for

if E1 then if E2 then S1 else S2

Page 15: Syntax Analysis - ecourse2.ccu.edu.tw

15

An Example

S

elseE S Sif then

if E then S

elseE

S

S Sif then

if E then S

Page 16: Syntax Analysis - ecourse2.ccu.edu.tw

16

Disambiguiting Rules

Rule: match each else with the closest

previous unmatched then

Remove undesired state transitions in the

pushdown automaton

Page 17: Syntax Analysis - ecourse2.ccu.edu.tw

17

Grammar Rewriting

stmt m_stmt

| unm_stmt

m_stmt if expr then m_stmt else m_stmt

| other

unm_stmt if expr then stmt

| if expr then m_stmt else unm_stmt

Page 18: Syntax Analysis - ecourse2.ccu.edu.tw

18

RE vs. CFG

Every language described by a RE can also

be described by a CFG

Why use REs for lexical syntax?

– do not need a notation as powerful as CFGs

– are more concise and easier to understand than

CFGs

– More efficient lexical analyzers can be

constructed from REs than from CFGs

– Provide a way for modularizing the front end

into two manageable-sized components

Page 19: Syntax Analysis - ecourse2.ccu.edu.tw

19

Push-Down Automata

Finite Automaton

Input

OutputStack

$

$

Page 20: Syntax Analysis - ecourse2.ccu.edu.tw

20

An Example

S’ S $

S a S b

S

1 2 3start (a, $)

a

(b, a)

a

($, $)

(a, a)

a

(b, a)

a

0

($, $)

Page 21: Syntax Analysis - ecourse2.ccu.edu.tw

21

Nonregular Constructs

REs can denote only a fixed number of

repetitions or an unspecified number of

repetitions of one given construct:

an, a*

A nonregular construct:

– L = {anbn | n 0}

Page 22: Syntax Analysis - ecourse2.ccu.edu.tw

22

Non-Context-Free Constructs

CFGs can denote only a fixed number of

repetitions or an unspecified number of

repetitions of one or two given constructs

Some non-context-free constructs:

– L1 = {wcw | w is in (a | b)*}

– L2 = {anbmcndm | n 1 and m 1}

– L3 = {anbncn | n 0}

Page 23: Syntax Analysis - ecourse2.ccu.edu.tw

共勉

子貢曰︰貧而無諂,富而無驕,何如。

子曰︰可也,未若貧而樂,富而好禮者也。

子貢曰︰詩云︰「如切如磋,如琢如磨。」

其斯之謂與。

子曰︰賜也,始可與言詩已矣;

告諸往而知來者。

--論語

Page 24: Syntax Analysis - ecourse2.ccu.edu.tw

24

Top-Down Parsing

Construct a parse tree from the root to the

leaves using leftmost derivation

1. S c A B input: cad

2. A a b

3. A a

4. B d

S S

c A B

1S

c A B

a b

2S

c A B

a d

4S

c A B

a

3

backtrack

Page 25: Syntax Analysis - ecourse2.ccu.edu.tw

25

Predictive Parsing

A top-down parsing without backtracking

– there is only one alternative production to

choose at each derivation step

stmt if expr then stmt else stmt

| while expr do stmt

| begin stmt_list end

Page 26: Syntax Analysis - ecourse2.ccu.edu.tw

26

LL(k) Parsing

The first L stands for scanning the input

from left to right

The second L stands for producing a

leftmost derivation

The k stands for the number of lookahead

input symbols used to choose alternative

productions at each derivation step

Page 27: Syntax Analysis - ecourse2.ccu.edu.tw

27

LL(1) Parsing

Use one input symbol of lookahead

LL(1): S a b e | c d e

LL(2): S a b e | a d e

Page 28: Syntax Analysis - ecourse2.ccu.edu.tw

28

Bottom-Up Parsing

Construct a parse tree from the leaves to the

root using rightmost derivation in reverse

S a A B e input: abbcde

A A b c | b

B d

ca d eb

A

b

A

ca d eb

A

b

BA

ca d eb

A

b

S

BA

ca d eb

A

bca d ebb

abbcde rm aAbcde rm aAde rm aABe rm S

Page 29: Syntax Analysis - ecourse2.ccu.edu.tw

29

Handles

A handle of a right-sentential form consists of

– a production A

– a position of where can be replaced by A to

produce the previous right-sentential form in a

rightmost derivation of

abbcde rm aAbcde rm aAde rm aABe rm S

A b A A b c B d S a A B e

Page 30: Syntax Analysis - ecourse2.ccu.edu.tw

30

Handle Pruning

rm A rm S

S

A

The string to the right of the handle contains only

terminals

A is the bottommost leftmost interior node with all

its children in the tree

Page 31: Syntax Analysis - ecourse2.ccu.edu.tw

31

An Example

S

S

BA

ca d eb

A

b

S

BA

ca d eb

A

S

BA

a d e

S

BA

a e

Page 32: Syntax Analysis - ecourse2.ccu.edu.tw

32

Shift-Reduce Parsing

Parsing driver

Parsing table

Input

Output

Stack

Handle

$

$

Page 33: Syntax Analysis - ecourse2.ccu.edu.tw

33

Stack Operations

Shift: shift the next input symbol onto the

top of the stack

Reduce: replace the handle at the top of the

stack with the corresponding nonterminal

Accept: announce successful completion of

the parsing

Error: call an error recovery routine

Page 34: Syntax Analysis - ecourse2.ccu.edu.tw

34

An Example

Action Stack Input

S $ a b b c d e $

S $ a b b c d e $

R $ a b b c d e $

S $ a A b c d e $

S $ a A b c d e $

R $ a A b c d e $

S $ a A d e $

R $ a A d e $

S $ a A B e $

R $ a A B e $

A $ S $

Page 35: Syntax Analysis - ecourse2.ccu.edu.tw

35

Shift/Reduce Conflict

stmt if expr then stmt

| if expr then stmt else stmt

| other

Stack Input

$ - - - if expr then stmt else - - - $

Shift if expr then stmt else stmt

Reduce if expr then stmt

Page 36: Syntax Analysis - ecourse2.ccu.edu.tw

36

Reduce/Reduce Conflict

stmt id ( para_list )

stmt expr := expr

para_list para_list , para

para_list para

para id

expr id ( expr_list )

expr id

expr_list expr_list , expr

expr_list expr

Stack Input

$ - - - id ( id , id ) - - - $

$- - - procid ( id , id ) - - - $

Page 37: Syntax Analysis - ecourse2.ccu.edu.tw

37

LR(k) Parsing

The L stands for scanning the input from

left to right

The R stands for constructing a rightmost

derivation in reverse

The k stands for the number of lookahead

input symbols used to make parsing

decisions

Page 38: Syntax Analysis - ecourse2.ccu.edu.tw

38

LR Parsing

The LR parsing algorithm

Constructing SLR(1) parsing tables

Constructing LR(1) parsing tables

Constructing LALR(1) parsing tables

Page 39: Syntax Analysis - ecourse2.ccu.edu.tw

39

Model of an LR Parser

Parsing driver

Input

Output

Stack

Action Goto

Sm

Sm-1

Xm-1

Xm

S0Parsing table

$

$

Page 40: Syntax Analysis - ecourse2.ccu.edu.tw

40

An Example

(1) E E + T

(2) E T

(3) T T * F

(4) T F

(5) F ( E )

(6) F id

State Action Goto

id + * ( ) $ E T F

0 s5 s4 1 2 3

1 s6 acc

2 r2 s7 r2 r2

3 r4 r4 r4 r4

4 s5 s4 8 2 3

5 r6 r6 r6 r6

6 s5 s4 9 3

7 s5 s4 10

8 s6 s11

9 r1 s7 r1 r1

10 r3 r3 r3 r3

11 r5 r5 r5 r5

Page 41: Syntax Analysis - ecourse2.ccu.edu.tw

41

An Example

Action Stack Input

s5 $0 id + id * id $

r6 $0 id5 + id * id $

r4 $0 F3 + id * id $

r2 $0 T2 + id * id $

s6 $0 E1 + id * id $

s5 $0 E1 +6 id * id $

r6 $0 E1 +6 id5 * id $

r4 $0 E1 +6 F3 * id $

s7 $0 E1 +6 T9 * id $

s5 $0 E1 +6 T9 *7 id $

r6 $0 E1 +6 T9 *7 id5 $

r3 $0 E1 +6 T9 *7 F10 $

r1 $0 E1 +6 T9 $

acc $0 E1 $

Page 42: Syntax Analysis - ecourse2.ccu.edu.tw

42

LR Parsing Driver

push $s0 onto the stack, where s0 is the initial state

set ip to point to the first symbol of w$;

repeat

let s be the top state on the stack and a the symbol pointed to by ip;

if action[s, a] == shift s’ then

push a and s’ onto the stack and advance ip

else if action[s, a] == reduce A then

pop 2 * | | symbols off the stack; s’ = goto[top(), A];

push A and s’ onto the stack

else if action[s, a] == accept then

return

else error

until false

Page 43: Syntax Analysis - ecourse2.ccu.edu.tw

43

First Sets

The first set of a string is the set of

terminals that begin the strings derived from

. If * , then is also in the first set of

.

Page 44: Syntax Analysis - ecourse2.ccu.edu.tw

44

First Sets

If X is terminal, then FIRST(X) is {X}

If X is nonterminal and X is a

production, then add to FIRST(X)

If X is nonterminal and X Y1 Y2 ... Yk is

a production, then add a to FIRST(X) if

for some i, a is in FIRST(Yi) and is in all

of FIRST(Y1), ..., FIRST(Yi-1). If is in

FIRST(Yj) for all j, then add to FIRST(X)

Page 45: Syntax Analysis - ecourse2.ccu.edu.tw

45

An Example

E T E'

E' + T E' |

T F T'

T' * F T' |

F ( E ) | id

FIRST(F) = { (, id }

FIRST(T') = { *, }, FIRST(T) = { (, id }

FIRST(E') = { +, }, FIRST(E) = { (, id }

Page 46: Syntax Analysis - ecourse2.ccu.edu.tw

46

Follow Sets

The follow set of a nonterminal A is the set of

terminals that can appear immediately to the

right of A in some sentential form, namely,

S * A a

a is in the follow set of A.

Page 47: Syntax Analysis - ecourse2.ccu.edu.tw

47

Follow Sets

Place $ in FOLLOW(S), where S is the start

symbol and $ is the input right endmarker

If there is a production A B , then

everything in FIRST() except for is placed

in FOLLOW(B)

If there is a production A B or A B

where FIRST() contains , then everything

in FOLLOW(A) is in FOLLOW(B)

Page 48: Syntax Analysis - ecourse2.ccu.edu.tw

48

An Example

E T E'

E' + T E' |

T F T'

T' * F T' |

F ( E ) | id

FIRST(E) = FIRST(T) = FIRST(F) = { (, id }

FIRST(E') = { +, }, FIRST(T') = { *, }

FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }

FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }

FOLLOW(F) = { *, +, ), $ }

Page 49: Syntax Analysis - ecourse2.ccu.edu.tw

49

SLR(1) Parsing Table

LR(0) Items

• An LR(0) item of a grammar in G is a production of G with a dot at some position of the right-hand side, A

• An LR(0) item represents a state in an NPDA indicating how much of a production we have seen at a given point in the parsing process

• The production A X Y Z yields the following four LR(0) items

A • X Y Z, A X • Y Z, A X Y • Z, A X Y Z •

Page 50: Syntax Analysis - ecourse2.ccu.edu.tw

50

From CFG to NPDA

• The state A a will go to the state A

a via an edge of terminal a (a shifting)

• The state A B will go to the state B

via an edge of the empty string

• The state A will cause a reduction on

seeing a terminal in FOLLOW(A)

• The state A B will go to the state A

B via an edge of nonterminal B (after a

reduction)

Page 51: Syntax Analysis - ecourse2.ccu.edu.tw

51

An Example

1. E’ E

2. E E + T

3. E T

4. T T * F

5. T F

6. F ( E )

7. F id

Augmented grammar:Easier to identify

the accepting state

Page 52: Syntax Analysis - ecourse2.ccu.edu.tw

52

An Example

E’•E

0

E

E’E•

7

T

ET•

9

F

TF•

11

E•E+T

1

E•T

2

T•T*F

3

T•F

4

EE•+T

8E

EE+•T

14+

17

EE+T•T

F•(E)

5

F•id

6 Fid•

13id

F(•E)

12(

TT•*F

10

TTT*•F*

15

TT*F•F

18

E F(E•)

16)

F(E)•

19

5

6

1

2

3

4

Page 53: Syntax Analysis - ecourse2.ccu.edu.tw

53

From NPDA to DPDA

• There are two functions performed on sets

of LR(0) items (DPDA states)

• The function closure(I) adds more items to I

when there is a dot to the left of a

nonterminal (corresponding to edges)

• The function goto(I, X) moves the dot past

the symbol X in all items in I that contain X

(corresponding to non- edges)

Page 54: Syntax Analysis - ecourse2.ccu.edu.tw

54

The Closure Function

function closure(I);

begin

J := I;

repeat

for each item A B in J and

each production B of G such that

B is not in J do

J = J { B }

until no more items can be added to J;

return J

end

Page 55: Syntax Analysis - ecourse2.ccu.edu.tw

55

An Example

1. E’ E

2. E E + T

3. E T

4. T T * F

5. T F

6. F ( E )

7. F id

s0 = E’ E,

I0 = closure({s0 }) =

{ E’ E,

E E + T,

E T,

T T * F,

T F,

F ( E ),

F id }

Page 56: Syntax Analysis - ecourse2.ccu.edu.tw

56

The Goto Function

function goto(I, X);

begin

set J to the empty set

for any item A X in I do

add A X to J

return closure(J)

end

Page 57: Syntax Analysis - ecourse2.ccu.edu.tw

57

An Example

I0 = {E’ E, E E + T, E T,

T T * F, T F, F ( E ),

F id }

goto(I0 , E)

= closure({E’ E , E E + T })

= {E’ E , E E + T }

Page 58: Syntax Analysis - ecourse2.ccu.edu.tw

58

Subset Construction

function items(G’);

begin

C := {closure({S’ S})}

repeat

for each set of items I in C and each symbol X do

J := goto(I, X)

if J is not empty and not in C then

C = C { J }

until no more sets of items can be added to C

return C

end

Page 59: Syntax Analysis - ecourse2.ccu.edu.tw

59

An Example

1. E’ E

2. E E + T

3. E T

4. T T * F

5. T F

6. F ( E )

7. F id

Page 60: Syntax Analysis - ecourse2.ccu.edu.tw

60

I0 : E’ E

E E + T

E T

T T * F

T F

F ( E )

F id

goto(I0, E) =

I1 : E’ E

E E + T

goto(I0, T) =

I2 : E T

T T * F

goto(I0, F) =

I3 : T F

goto(I0, ‘(’) =

I4 : F ( E )

E E + T

E T

T T * F

T F

F ( E )

F id

goto(I0, id) =

I5 : F id

goto(I1, ‘+’) =

I6 : E E + T

T T * F

T F

F ( E )

F id

goto(I2, ‘*’) =

I7 : T T * F

F ( E )

F id

goto(I4, E) =

I8 : F ( E )

E E + T

goto(I6, T) =

I9 : E E + T

T T * F

goto(I7, F) =

I10 : T T * F

goto(I8, ‘)’) =

I11 : F ( E )

Page 61: Syntax Analysis - ecourse2.ccu.edu.tw

61

An Example

E’ • E

E • E + T

E • T

T • T * F

T • F

F • ( E )

F • id

E’ E •

E E • + T

E T •

T T • * F

E T

T F •

F

F ( • E )

E • E + T

E • T

T • T * F

T • F

F • ( E )

F • id

F id •id

(

T T * • F

F • ( E )

F • id*

E E + • T

T • T * F

T • F

F • ( E )

F • id

+

F ( E • )

E E • + T

F T T * F •

E E + T •

T T • * FT

F ( E ) •

)

0

1

2

3

4

5

6

7

8

9

10

11

(id *

+

id

ET

F

F(

(id

Page 62: Syntax Analysis - ecourse2.ccu.edu.tw

62

SLR(1) Parsing Table Generation

procedure SLR(G’);

begin

for each state I in items(G’) do begin

if A a in I and goto(I, a) = J for a terminal a then

action[I, a] = “shift J”

if A in I and A S’ then

action[I, a] = “reduce A ” for all a in Follow(A)

if S’ S in I then action[I, $] = “accept”

if A X in I and goto(I, X) = J for a nonterminal X

then goto[I, X] = J

end

all other entries in action and goto are made error

end

Page 63: Syntax Analysis - ecourse2.ccu.edu.tw

63

An Example

+ * ( ) id $ E T F

0 s4 s5 1 2 3

1 s6 a

2 r3 s7 r3 r3

3 r5 r5 r5 r5

4 s4 s5 8 2 3

5 r7 r7 r7 r7

6 s4 s5 9 3

7 s4 s5 10

8 s6 s11

9 r2 s7 r2 r2

10 r4 r4 r4 r4

11 r6 r6 r6 r6

Page 64: Syntax Analysis - ecourse2.ccu.edu.tw

64

共勉

大學之道︰

在明明德,在親民,在止於至善。

--大學

Page 65: Syntax Analysis - ecourse2.ccu.edu.tw

65

LR(1) Parsing Table

LR(1) Items

• An LR(1) item of a grammar in G is a pair,

( A , a ), of an LR(0) item A and a lookahead symbol a

• The lookahead has no effect in an LR(1) item

of the form ( A , a ), where is not

• An LR(1) item of the form ( A , a )

calls for a reduction by A only if the next

input symbol is a

Page 66: Syntax Analysis - ecourse2.ccu.edu.tw

66

The Closure Function

function closure(I);

begin

J := I;

repeat

for each item (A B , a) in J and

each production B of G and

each b FIRST( a) such that

(B , b) is not in J do

J = J { (B , b) }

until no more items can be added to J;

return J

end

Page 67: Syntax Analysis - ecourse2.ccu.edu.tw

67

The Goto Function

function goto(I, X);

begin

set J to the empty set

for any item (A X , a) in I do

add (A X , a) to J

return closure(J)

end

Page 68: Syntax Analysis - ecourse2.ccu.edu.tw

68

Subset Construction

function items(G’);

begin

C := {closure({S’ S, $})}

repeat

for each set of items I in C and each symbol X do

J := goto(I, X)

if J is not empty and not in C then

C = C { J }

until no more sets of items can be added to C

return C

end

Page 69: Syntax Analysis - ecourse2.ccu.edu.tw

69

An Example

1. S’ S

2. S C C

3. C c C

4. C d

Page 70: Syntax Analysis - ecourse2.ccu.edu.tw

70

An Example

I0: closure({(S’ S, $)}) =

(S’ S, $)

(S C C, $)

(C c C, c/d)

(C d, c/d)

I1: goto(I0, S) = (S’ S , $)

I2: goto(I0, C) =

(S C C, $)

(C c C, $)

(C d, $)

I3: goto(I0, c) =

(C c C, c/d)

(C c C, c/d)

(C d, c/d)

I4: goto(I0, d) =

(C d , c/d)

I5: goto(I2, C) =

(S C C , $)

Page 71: Syntax Analysis - ecourse2.ccu.edu.tw

71

An Example

I6: goto(I2, c) =

(C c C, $)

(C c C, $)

(C d, $)

I7: goto(I2, d) =

(C d , $)

I8: goto(I3, C) =

(C c C , c/d)

: goto(I3, c) = I3

: goto(I3, d) = I4

I9: goto(I6, C) =

(C c C , $)

: goto(I6, c) = I6

: goto(I6, d) = I7

Page 72: Syntax Analysis - ecourse2.ccu.edu.tw

72

LR(1) Parsing Table Generation

procedure LR(G’);

begin

for each state I in items(G’) do begin

if (A a , b) in I and goto(I, a) = J for a terminal a then

action[I, a] = “shift J”

if (A , a) in I and A S’ then

action[I, a] = “reduce A ”

if (S’ S , $) in I then action[I, $] = “accept”

if (A X , a) in I and goto(I, X) = J for a nonterminal X

then goto[I, X] = J

end

all other entries in action and goto are made error

end

Page 73: Syntax Analysis - ecourse2.ccu.edu.tw

73

An Example

c d $ S C

0 s3 s4 1 2

1 a

2 s6 s7 5

3 s3 s4 8

4 r4 r4

5 r2

6 s6 s7 9

7 r4

8 r3 r3

9 r3

Page 74: Syntax Analysis - ecourse2.ccu.edu.tw

74

LALR(1) Parsing Table

The Core of LR(1) Items

• The core of a set of LR(1) items is the set of

their first components (i.e., LR(0) items)

• The core of the set of LR(1) items

{ (C c C, c/d),

(C c C, c/d),

(C d, c/d) }

is {C c C,

C c C,

C d }

Page 75: Syntax Analysis - ecourse2.ccu.edu.tw

75

Merging Cores

I3: { (C c C, c/d)

(C c C, c/d)

(C d, c/d) }

I4: { (C d , c/d) }

I8: { (C c C , c/d) }

I6: { (C c C, $)

(C c C, $)

(C d, $) }

I7: { (C d , $) }

I9: { (C c C , $) }

Page 76: Syntax Analysis - ecourse2.ccu.edu.tw

76

LALR(1) Parsing Table Generation

procedure LALR(G’);

begin

for each state I in mergeCore(items(G’)) do begin

if (A a , b) in I and goto(I, a) = J for a terminal a then

action[I, a] = “shift J”

if (A , a) in I and A S’ then

action[I, a] = “reduce A ”

if (S’ S , $) in I then action[I, $] = “accept”

if (A X , a) in I and goto(I, X) = J for a nonterminal X

then goto[I, X] = J

end

all other entries in action and goto are made error

end

Page 77: Syntax Analysis - ecourse2.ccu.edu.tw

77

An Example

c d $ S C

0 s36 s47 1 2

1 a

2 s36 s47 5

36 s36 s47 89

47 r4 r4 r4

5 r2

89 r3 r3 r3

Page 78: Syntax Analysis - ecourse2.ccu.edu.tw

78

LR Grammars

• A grammar is SLR(1) iff its SLR(1) parsing

table has no multiply-defined entries

• A grammar is LR(1) iff its LR(1) parsing

table has no multiply-defined entries

• A grammar is LALR(1) iff its LALR(1)

parsing table has no multiply-defined

entries

Page 79: Syntax Analysis - ecourse2.ccu.edu.tw

79

Hierarchy of Grammar Classes

Unambiguous Grammars Ambiguous Grammars

LL(k) LR(k)

LR(1)

LALR(1)

LL(1) SLR(1)

Page 80: Syntax Analysis - ecourse2.ccu.edu.tw

80

Hierarchy of Grammar Classes

• Why LL(k) LR(k)?

• Why SLR(k) LALR(k) LR(k)?

Page 81: Syntax Analysis - ecourse2.ccu.edu.tw

81

LL(k) vs. LR(k)

• For a grammar to be LL(k), we must be able

to recognize the use of a production by

seeing only the first k symbols of what its

right-hand side derives

• For a grammar to be LR(k), we must be able

to recognize the use of a production by

having seen all of what is derived from its

right-hand side with k more symbols of

lookahead

Page 82: Syntax Analysis - ecourse2.ccu.edu.tw

82

LALR(k) vs. LR(k)

• The merge of the sets of LR(1) items having the

same core does not introduce shift/reduce conflicts

• Suppose there is a shift-reduce conflict on

lookahead a in the merged set because of

1. (A , a) 2. (B a , b)

• Then some set of items has item (A , a) , and

since the cores of all sets merged are the same, it

must have an item (B a , c) for some c

• But then this set has the same shift/reduce conflict

on a

Page 83: Syntax Analysis - ecourse2.ccu.edu.tw

83

LALR(k) vs. LR(k)

• The merge of the sets of LR(1) items having the same core may introduce reduce/reduce conflicts

• As an example, consider the grammar1. S’ S2. S a A d | a B e | b A e | b B d3. A c4. B c

that generates acd, ace, bce, bcd

• The set {(A c , d), (B c , e)} is valid for acx

• The set {(A c , e), (B c , d)} is valid for bcx

• But the union {(A c , d/e), (B c , d/e)} generates a reduce/reduce conflict

Page 84: Syntax Analysis - ecourse2.ccu.edu.tw

84

SLR(k) vs. LALR(k)

1. S’ S

2. S L = R

3. S R

4. L * R

5. L id

6. R L

Page 85: Syntax Analysis - ecourse2.ccu.edu.tw

85

SLR(k) vs. LALR(k)

I0: closure({S’ S}) =

S’ S

S L = R

S R

L * R

L id

R L

I1: goto(I0, S) = S’ S

I2: goto(I0, L) =

S L = R

R L

I3: goto(I0, R) =

S R

I4: goto(I0, *) =

L * R

R L

L * R

L id

I5: goto(I0, id) =

L id

FOLLOW(R) = {=, $}

Page 86: Syntax Analysis - ecourse2.ccu.edu.tw

86

SLR(k) vs. LALR(k)

I6: goto(I2, =) =

S L = R

R L

L * R

L id

I7: goto(I4, R) =

L * R

I8: goto(I4, L) =

R L

I9: goto(I6, R) =

S L = R

Page 87: Syntax Analysis - ecourse2.ccu.edu.tw

87

SLR(k) vs. LALR(k)

I0: closure({(S’ S, $)}) =

(S’ S, $)

(S L = R, $)

(S R, $)

(L * R, =/$)

(L id, =/$)

(R L, $)

I1: goto(I0, S) = (S’ S , $)

I2: goto(I0, L) =

(S L = R, $)

(R L , $)

I3: goto(I0, R) =

(S R , $)

I4: goto(I0, *) =

(L * R, =/$)

(R L, =/$)

(L * R, =/$)

(L id, =/$)

I5: goto(I0, id) =

(L id , =/$)

Page 88: Syntax Analysis - ecourse2.ccu.edu.tw

88

SLR(k) vs. LALR(k)

I6: goto(I2, =) =

(S L = R, $)

(R L, $)

(L * R, $)

(L id, $)

I7: goto(I4, R) =

(L * R , =/$)

I8: goto(I4, L) =

(R L , =/$)

I9: goto(I6, R) =

(S L = R , $)

I10: goto(I6, L) =

(R L , $)

I11: goto(I6, *) =

(L * R, $)

(R L, $)

(L * R, $)

(L id, $)

I12: goto(I6, id) =

(L id , $)

I13: goto(I11, R) =

(L * R , $)

I4

I5

Page 89: Syntax Analysis - ecourse2.ccu.edu.tw

89

Bison – A Parser Generator

Bison compiler

C compiler

a.out

lang.ylang.tab.c

lang.tab.h (-d option)

lang.tab.c a.out

tokens syntax tree

A langauge for specifying parsers and semantic analyzers

Page 90: Syntax Analysis - ecourse2.ccu.edu.tw

90

Bison Programs

%{

C declarations

%}

Bison declarations

%%

Grammar rules

%%

Additional C code

Page 91: Syntax Analysis - ecourse2.ccu.edu.tw

91

An Example

line expr ‘\n’

expr expr ‘+’ term | term

term term ‘*’ factor | factor

factor ‘(’ expr ‘)’ | DIGIT

Page 92: Syntax Analysis - ecourse2.ccu.edu.tw

92

An Example

%token DIGIT

%start line

%%

line : expr ‘\n’ {printf(“line: expr \\n\n”);}

;

expr: expr ‘+’ term {printf(“expr: expr + term\n”);}

| term {printf(“expr: term\n”}

;

term: term ‘*’ factor {printf(“term: term * factor\n”;}

| factor {printf(“term: factor\n”);}

;

factor: ‘(’ expr ‘)’ {printf(“factor: ( expr )\n”);}

| DIGIT {printf(“factor: DIGIT\n”);}

;

Page 93: Syntax Analysis - ecourse2.ccu.edu.tw

93

Functions and Variables

• yyparse(): the parser function

• yylex(): the lexical analyzer function. Bison recognizes any non-positive value as indicating the end of the input

• yylval: the attribute value of a token. Its default type is int, and can be declared to be multiple types in the first section using

%union {int ival;double dval;

}

Page 94: Syntax Analysis - ecourse2.ccu.edu.tw

94

Conflict Resolutions

• A reduce/reduce conflict is resolved by

choosing the production listed first

• A shift/reduce conflict is resolved in favor

of shift

• A mechanism for assigning precedences and

assocoativities to terminals

Page 95: Syntax Analysis - ecourse2.ccu.edu.tw

95

Precedence and Associativity

• The precedence and associativity of operators are

declared simultaneously

%nonassoc ‘<’ /* lowest */

%left ‘+’ ‘-’

%right ‘^’ /* highest */

• The precedence of a rule is determined by the

precedence of its rightmost terminal

• The precedence of a rule can be modified by

adding %prec <terminal> to its right end

Page 96: Syntax Analysis - ecourse2.ccu.edu.tw

96

An Example

%{

#include <stdio.h>

%}

%token NUMBER

%left ‘+’ ‘-’

%left ‘*’ ‘/’

%right UMINUS

%%

Page 97: Syntax Analysis - ecourse2.ccu.edu.tw

97

An Example

line : expr ‘\n’

;

expr: expr ‘+’ expr

| expr ‘-’ expr

| expr ‘*’ expr

| expr ‘/’ expr

| ‘-’ expr %prec UMINUS

| ‘(’ expr ‘)’

| NUMBER

;

Page 98: Syntax Analysis - ecourse2.ccu.edu.tw

98

Error Recovery

• Error recovery is performed via error productions

• An error production is a production containing

the predefined terminal error

• After adding an error production,

A B | error

on encountering an error in the middle of B, the

parser pops symbols from its stack until , shifts

error, and skips input tokens until a token in

FIRST()

Page 99: Syntax Analysis - ecourse2.ccu.edu.tw

99

Error Recovery

• The parser can report a syntax error by calling

the user provided function yyerror(char *)

• The parser will suppress the report of another

error message for 3 tokens

• You can resume error report immediately by

using the macro yyerrok

• Error productions are used for major

nonterminals

Page 100: Syntax Analysis - ecourse2.ccu.edu.tw

100

An Example

line : expr ‘\n’

| error ‘\n’ {yyerror("reenter last line:");

yyerrok;}

;

expr: expr ‘+’ expr

| expr ‘*’ expr

| ‘-’ expr %prec UMINUS

| ‘(’ expr ‘)’

| NUMBER

;

Page 101: Syntax Analysis - ecourse2.ccu.edu.tw

101

共勉

樊遲問仁。

子曰︰「愛人。」

子曰︰「人之過也,各於其黨,

觀過,斯知仁矣。」

--論語

Page 102: Syntax Analysis - ecourse2.ccu.edu.tw

102

共勉

子曰︰唯仁者,能好人,能惡人。

子曰︰茍志於仁矣,無惡也。

--論語

子曰︰志士仁人,無求生以害仁,

有殺身以成仁。

Page 103: Syntax Analysis - ecourse2.ccu.edu.tw

103

共勉

子曰︰里仁為美。擇不處仁,焉得知?

子曰︰朝聞道,夕死可矣!

--論語

子曰︰不仁者不可以久處約,

不可以長處樂。仁者安仁,知者利仁。