Syntax Analysis - ecourse2.ccu.edu.tw

1

Syntax Analysis

2

Syntax Analysis

Introduction to parsers

Context-free grammars

Push-down automata

Top-down parsing

Buttom-up parsing

Bison - a parser generator

3

Introduction to parsers

Lexical

AnalyzerParser

Symbol

Table

token

next token

source Semantic

Analyzersyntax

treecode

4

Context-Free Grammars

A set of terminals: basic symbols from

which sentences are formed

A set of nonterminals: syntactic

categories denoting sets of sentences

A set of productions: rules specifying

how the terminals and nonterminals can

be combined to form sentences

The start symbol: a distinguished

nonterminal denoting the language

5

An Example

Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’

Nonterminals: expr, op

Productions:

expr expr op expr

expr ‘(’ expr ‘)’

expr ‘-’ expr

expr id

op ‘+’ | ‘-’ | ‘*’ | ‘/’

The start symbol: expr

6

Derivations

A derivation step is an application of a

production as a rewriting rule

E - E

A sequence of derivation steps

E - E - ( E ) - ( id )

is called a derivation of “- ( id )” from E

The symbol * denotes “derives in zero or

more steps”; the symbol + denotes “derives

in one or more steps

E * - ( id ) E + - ( id )

7

Context-Free Languages

A context-free language L(G) is the language

defined by a context-free grammar G

A string of terminals is in L(G) if and only

if S + , is called a sentence of G

If S * , where may contain nonterminals,

then we call a sentential form of G

E - E - ( E ) - ( id )

G1 is equivalent to G2 if L(G1) = L(G2)

8

Left- & Right-most Derivations

Each derivation step needs to choose

– a nonterminal to rewrite

– a production to apply

A leftmost derivation always chooses the

leftmost nonterminal to rewrite

E lm - E lm - ( E ) lm - ( E + E )

lm - ( id + E ) lm - ( id + id )

A rightmost derivation always chooses the

rightmost nonterminal to rewrite

E rm - E rm - ( E ) rm - ( E + E )

rm - (E + id ) rm - ( id + id )

9

Parse Trees

A parse tree is a graphical representation

for a derivation that filters out the order of

choosing nonterminals for rewriting

Many derivations may correspond to the

same parse tree, but every parse tree has

associated with it a unique leftmost and a

unique rightmost derivation

10

An Example

E

-

( )

+

id id

E

E E

EE

lm - E

lm - ( E )

lm - ( E + E )

lm - ( id + E )

lm - ( id + id )

E

rm - E

rm - ( E )

rm - ( E + E )

rm - ( E + id )

rm - ( id + id )

11

Ambiguous Grammar

A grammar is ambiguous if it produces

more than one parse tree for some sentence

E

E + E

id + E

id + E * E

id + id * E

id + id * id

E

E * E

E + E * E

id + E * E

id + id * E

id + id * id

12

Ambiguous Grammar

E

+E E

id

id

*E E

id

E

*E E

id

id

+E E

id

13

Resolving Ambiguity

Use disambiguiting rules to throw away

undesirable parse trees

Rewrite grammars by incorporating

disambiguiting rules into grammars

14

An Example

The dangling-else grammar

stmt if expr then stmt

| if expr then stmt else stmt

| other

Two parse trees for

if E1 then if E2 then S1 else S2

15

An Example

S

elseE S Sif then

if E then S

elseE

S

S Sif then

if E then S

16

Disambiguiting Rules

Rule: match each else with the closest

previous unmatched then

Remove undesired state transitions in the

pushdown automaton

17

Grammar Rewriting

stmt m_stmt

| unm_stmt

m_stmt if expr then m_stmt else m_stmt

| other

unm_stmt if expr then stmt

| if expr then m_stmt else unm_stmt

18

RE vs. CFG

Every language described by a RE can also

be described by a CFG

Why use REs for lexical syntax?

– do not need a notation as powerful as CFGs

– are more concise and easier to understand than

CFGs

– More efficient lexical analyzers can be

constructed from REs than from CFGs

– Provide a way for modularizing the front end

into two manageable-sized components

19

Push-Down Automata

Finite Automaton

Input

OutputStack

$

$

20

An Example

S’ S $

S a S b

S

1 2 3start (a, $)

a

(b, a)

a

($, $)

(a, a)

a

(b, a)

a

0

($, $)

21

Nonregular Constructs

REs can denote only a fixed number of

repetitions or an unspecified number of

repetitions of one given construct:

an, a*

A nonregular construct:

– L = {anbn | n 0}

22

Non-Context-Free Constructs

CFGs can denote only a fixed number of

repetitions or an unspecified number of

repetitions of one or two given constructs

Some non-context-free constructs:

– L1 = {wcw | w is in (a | b)*}

– L2 = {anbmcndm | n 1 and m 1}

– L3 = {anbncn | n 0}

共勉

子貢曰︰貧而無諂，富而無驕，何如。

子曰︰可也，未若貧而樂，富而好禮者也。

子貢曰︰詩云︰「如切如磋，如琢如磨。」

其斯之謂與。

子曰︰賜也，始可與言詩已矣；

告諸往而知來者。

--論語

24

Top-Down Parsing

Construct a parse tree from the root to the

leaves using leftmost derivation

1. S c A B input: cad

2. A a b

3. A a

4. B d

S S

c A B

1S

c A B

a b

2S

c A B

a d

4S

c A B

a

3

backtrack

25

Predictive Parsing

A top-down parsing without backtracking

– there is only one alternative production to

choose at each derivation step

stmt if expr then stmt else stmt

| while expr do stmt

| begin stmt_list end

26

LL(k) Parsing

The first L stands for scanning the input

from left to right

The second L stands for producing a

leftmost derivation

The k stands for the number of lookahead

input symbols used to choose alternative

productions at each derivation step

27

LL(1) Parsing

Use one input symbol of lookahead

LL(1): S a b e | c d e

LL(2): S a b e | a d e

28

Bottom-Up Parsing

Construct a parse tree from the leaves to the

root using rightmost derivation in reverse

S a A B e input: abbcde

A A b c | b

B d

ca d eb

A

b

A

ca d eb

A

b

BA

ca d eb

A

b

S

BA

ca d eb

A

bca d ebb

abbcde rm aAbcde rm aAde rm aABe rm S

29

Handles

A handle of a right-sentential form consists of

– a production A

– a position of where can be replaced by A to

produce the previous right-sentential form in a

rightmost derivation of

abbcde rm aAbcde rm aAde rm aABe rm S

A b A A b c B d S a A B e

30

Handle Pruning

rm A rm S

S

A

The string to the right of the handle contains only

terminals

A is the bottommost leftmost interior node with all

its children in the tree

31

An Example

S

S

BA

ca d eb

A

b

S

BA

ca d eb

A

S

BA

a d e

S

BA

a e

32

Shift-Reduce Parsing

Parsing driver

Parsing table

Input

Output

Stack

Handle

$

$

33

Stack Operations

Shift: shift the next input symbol onto the

top of the stack

Reduce: replace the handle at the top of the

stack with the corresponding nonterminal

Accept: announce successful completion of

the parsing

Error: call an error recovery routine

34

An Example

Action Stack Input

S $ a b b c d e $

S $ a b b c d e $

R $ a b b c d e $

S $ a A b c d e $

S $ a A b c d e $

R $ a A b c d e $

S $ a A d e $

R $ a A d e $

S $ a A B e $

R $ a A B e $

A $ S $

35

Shift/Reduce Conflict

stmt if expr then stmt

| if expr then stmt else stmt

| other

Stack Input

$ - - - if expr then stmt else - - - $

Shift if expr then stmt else stmt

Reduce if expr then stmt

36

Reduce/Reduce Conflict

stmt id ( para_list )

stmt expr := expr

para_list para_list , para

para_list para

para id

expr id ( expr_list )

expr id

expr_list expr_list , expr

expr_list expr

Stack Input

$ - - - id ( id , id ) - - - $

$- - - procid ( id , id ) - - - $

37

LR(k) Parsing

The L stands for scanning the input from

left to right

The R stands for constructing a rightmost

derivation in reverse

The k stands for the number of lookahead

input symbols used to make parsing

decisions

38

LR Parsing

The LR parsing algorithm

Constructing SLR(1) parsing tables

Constructing LR(1) parsing tables

Constructing LALR(1) parsing tables

39

Model of an LR Parser

Parsing driver

Input

Output

Stack

Action Goto

Sm

Sm-1

Xm-1

Xm

S0Parsing table

$

$

40

An Example

(1) E E + T

(2) E T

(3) T T * F

(4) T F

(5) F ( E )

(6) F id

State Action Goto

id + * ( ) $ E T F

0 s5 s4 1 2 3

1 s6 acc

2 r2 s7 r2 r2

3 r4 r4 r4 r4

4 s5 s4 8 2 3

5 r6 r6 r6 r6

6 s5 s4 9 3

7 s5 s4 10

8 s6 s11

9 r1 s7 r1 r1

10 r3 r3 r3 r3

11 r5 r5 r5 r5

41

An Example

Action Stack Input

s5 $0 id + id * id $

r6 $0 id5 + id * id $

r4 $0 F3 + id * id $

r2 $0 T2 + id * id $

s6 $0 E1 + id * id $

s5 $0 E1 +6 id * id $

r6 $0 E1 +6 id5 * id $

r4 $0 E1 +6 F3 * id $

s7 $0 E1 +6 T9 * id $

s5 $0 E1 +6 T9 *7 id $

r6 $0 E1 +6 T9 *7 id5 $

r3 $0 E1 +6 T9 *7 F10 $

r1 $0 E1 +6 T9 $

acc $0 E1 $

42

LR Parsing Driver

push $s0 onto the stack, where s0 is the initial state

set ip to point to the first symbol of w$;

repeat

let s be the top state on the stack and a the symbol pointed to by ip;

if action[s, a] == shift s’ then

push a and s’ onto the stack and advance ip

else if action[s, a] == reduce A then

pop 2 * | | symbols off the stack; s’ = goto[top(), A];

push A and s’ onto the stack

else if action[s, a] == accept then

return

else error

until false

43

First Sets

The first set of a string is the set of

terminals that begin the strings derived from

. If * , then is also in the first set of

.

44

First Sets

If X is terminal, then FIRST(X) is {X}

If X is nonterminal and X is a

production, then add to FIRST(X)

If X is nonterminal and X Y1 Y2 ... Yk is

a production, then add a to FIRST(X) if

for some i, a is in FIRST(Yi) and is in all

of FIRST(Y1), ..., FIRST(Yi-1). If is in

FIRST(Yj) for all j, then add to FIRST(X)

45

An Example

E T E'

E' + T E' |

T F T'

T' * F T' |

F ( E ) | id

FIRST(F) = { (, id }

FIRST(T') = { *, }, FIRST(T) = { (, id }

FIRST(E') = { +, }, FIRST(E) = { (, id }

46

Follow Sets

The follow set of a nonterminal A is the set of

terminals that can appear immediately to the

right of A in some sentential form, namely,

S * A a

a is in the follow set of A.

47

Follow Sets

Place $ in FOLLOW(S), where S is the start

symbol and $ is the input right endmarker

If there is a production A B , then

everything in FIRST() except for is placed

in FOLLOW(B)

If there is a production A B or A B

where FIRST() contains , then everything

in FOLLOW(A) is in FOLLOW(B)

48

An Example

E T E'

E' + T E' |

T F T'

T' * F T' |

F ( E ) | id

FIRST(E) = FIRST(T) = FIRST(F) = { (, id }

FIRST(E') = { +, }, FIRST(T') = { *, }

FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }

FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }

FOLLOW(F) = { *, +, ), $ }

49

SLR(1) Parsing Table

LR(0) Items

• An LR(0) item of a grammar in G is a production of G with a dot at some position of the right-hand side, A

• An LR(0) item represents a state in an NPDA indicating how much of a production we have seen at a given point in the parsing process

• The production A X Y Z yields the following four LR(0) items

A • X Y Z, A X • Y Z, A X Y • Z, A X Y Z •

50

From CFG to NPDA

• The state A a will go to the state A

a via an edge of terminal a (a shifting)

• The state A B will go to the state B

via an edge of the empty string

• The state A will cause a reduction on

seeing a terminal in FOLLOW(A)

• The state A B will go to the state A

B via an edge of nonterminal B (after a

reduction)

51

An Example

1. E’ E

2. E E + T

3. E T

4. T T * F

5. T F

6. F ( E )

7. F id

Augmented grammar:Easier to identify

the accepting state

52

An Example

E’•E

0

E

E’E•

7

T

ET•

9

F

TF•

11

E•E+T

1

E•T

2

T•T*F

3

T•F

4

EE•+T

8E

EE+•T

14+

17

EE+T•T

F•(E)

5

F•id

6 Fid•

13id

F(•E)

12(

TT•*F

10

TTT*•F*

15

TT*F•F

18

E F(E•)

16)

F(E)•

19

5

6

1

2

3

4

53

From NPDA to DPDA

• There are two functions performed on sets

of LR(0) items (DPDA states)

• The function closure(I) adds more items to I

when there is a dot to the left of a

nonterminal (corresponding to edges)

• The function goto(I, X) moves the dot past

the symbol X in all items in I that contain X

(corresponding to non- edges)

54

The Closure Function

function closure(I);

begin

J := I;

repeat

for each item A B in J and

each production B of G such that

B is not in J do

J = J { B }

until no more items can be added to J;

return J

end

55

An Example

1. E’ E

2. E E + T

3. E T

4. T T * F

5. T F

6. F ( E )

7. F id

s0 = E’ E,

I0 = closure({s0 }) =

{ E’ E,

E E + T,

E T,

T T * F,

T F,

F ( E ),

F id }

56

The Goto Function

function goto(I, X);

begin

set J to the empty set

for any item A X in I do

add A X to J

return closure(J)

end

57

An Example

I0 = {E’ E, E E + T, E T,

T T * F, T F, F ( E ),

F id }

goto(I0 , E)

= closure({E’ E , E E + T })

= {E’ E , E E + T }

58

Subset Construction

function items(G’);

begin

C := {closure({S’ S})}

repeat

for each set of items I in C and each symbol X do

J := goto(I, X)

if J is not empty and not in C then

C = C { J }

until no more sets of items can be added to C

return C

end

59

An Example

1. E’ E

2. E E + T

3. E T

4. T T * F

5. T F

6. F ( E )

7. F id

60

I0 : E’ E

E E + T

E T

T T * F

T F

F ( E )

F id

goto(I0, E) =

I1 : E’ E

E E + T

goto(I0, T) =

I2 : E T

T T * F

goto(I0, F) =

I3 : T F

goto(I0, ‘(’) =

I4 : F ( E )

E E + T

E T

T T * F

T F

F ( E )

F id

goto(I0, id) =

I5 : F id

goto(I1, ‘+’) =

I6 : E E + T

T T * F

T F

F ( E )

F id

goto(I2, ‘*’) =

I7 : T T * F

F ( E )

F id

goto(I4, E) =

I8 : F ( E )

E E + T

goto(I6, T) =

I9 : E E + T

T T * F

goto(I7, F) =

I10 : T T * F

goto(I8, ‘)’) =

I11 : F ( E )

61

An Example

E’ • E

E • E + T

E • T

T • T * F

T • F

F • ( E )

F • id

E’ E •

E E • + T

E T •

T T • * F

E T

T F •

F

F ( • E )

E • E + T

E • T

T • T * F

T • F

F • ( E )

F • id

F id •id

(

T T * • F

F • ( E )

F • id*

E E + • T

T • T * F

T • F

F • ( E )

F • id

+

F ( E • )

E E • + T

F T T * F •

E E + T •

T T • * FT

F ( E ) •

)

0

1

2

3

4

5

6

7

8

9

10

11

(id *

+

id

ET

F

F(

(id

62

SLR(1) Parsing Table Generation

procedure SLR(G’);

begin

for each state I in items(G’) do begin

if A a in I and goto(I, a) = J for a terminal a then

action[I, a] = “shift J”

if A in I and A S’ then

action[I, a] = “reduce A ” for all a in Follow(A)

if S’ S in I then action[I, $] = “accept”

if A X in I and goto(I, X) = J for a nonterminal X

then goto[I, X] = J

end

all other entries in action and goto are made error

end

63

An Example

+ * ( ) id $ E T F

0 s4 s5 1 2 3

1 s6 a

2 r3 s7 r3 r3

3 r5 r5 r5 r5

4 s4 s5 8 2 3

5 r7 r7 r7 r7

6 s4 s5 9 3

7 s4 s5 10

8 s6 s11

9 r2 s7 r2 r2

10 r4 r4 r4 r4

11 r6 r6 r6 r6

64

共勉

大學之道︰

在明明德，在親民，在止於至善。

--大學

65

LR(1) Parsing Table

LR(1) Items

• An LR(1) item of a grammar in G is a pair,

( A , a ), of an LR(0) item A and a lookahead symbol a

• The lookahead has no effect in an LR(1) item

of the form ( A , a ), where is not

• An LR(1) item of the form ( A , a )

calls for a reduction by A only if the next

input symbol is a

66

The Closure Function

function closure(I);

begin

J := I;

repeat

for each item (A B , a) in J and

each production B of G and

each b FIRST( a) such that

(B , b) is not in J do

J = J { (B , b) }

until no more items can be added to J;

return J

end

67

The Goto Function

function goto(I, X);

begin

set J to the empty set

for any item (A X , a) in I do

add (A X , a) to J

return closure(J)

end

68

Subset Construction

function items(G’);

begin

C := {closure({S’ S, $})}

repeat

for each set of items I in C and each symbol X do

J := goto(I, X)

if J is not empty and not in C then

C = C { J }

until no more sets of items can be added to C

return C

end

69

An Example

1. S’ S

2. S C C

3. C c C

4. C d

70

An Example

I0: closure({(S’ S, $)}) =

(S’ S, $)

(S C C, $)

(C c C, c/d)

(C d, c/d)

I1: goto(I0, S) = (S’ S , $)

I2: goto(I0, C) =

(S C C, $)

(C c C, $)

(C d, $)

I3: goto(I0, c) =

(C c C, c/d)

(C c C, c/d)

(C d, c/d)

I4: goto(I0, d) =

(C d , c/d)

I5: goto(I2, C) =

(S C C , $)

71

An Example

I6: goto(I2, c) =

(C c C, $)

(C c C, $)

(C d, $)

I7: goto(I2, d) =

(C d , $)

I8: goto(I3, C) =

(C c C , c/d)

: goto(I3, c) = I3

: goto(I3, d) = I4

I9: goto(I6, C) =

(C c C , $)

: goto(I6, c) = I6

: goto(I6, d) = I7

72

LR(1) Parsing Table Generation

procedure LR(G’);

begin

for each state I in items(G’) do begin

if (A a , b) in I and goto(I, a) = J for a terminal a then


if (A , a) in I and A S’ then

action[I, a] = “reduce A ”

if (S’ S , $) in I then action[I, $] = “accept”

if (A X , a) in I and goto(I, X) = J for a nonterminal X

then goto[I, X] = J

end


end

73

An Example

c d $ S C

0 s3 s4 1 2

1 a

2 s6 s7 5

3 s3 s4 8

4 r4 r4

5 r2

6 s6 s7 9

7 r4

8 r3 r3

9 r3

74

LALR(1) Parsing Table

The Core of LR(1) Items

• The core of a set of LR(1) items is the set of

their first components (i.e., LR(0) items)

• The core of the set of LR(1) items

{ (C c C, c/d),

(C c C, c/d),

(C d, c/d) }

is {C c C,

C c C,

C d }

75

Merging Cores

I3: { (C c C, c/d)

(C c C, c/d)

(C d, c/d) }

I4: { (C d , c/d) }

I8: { (C c C , c/d) }

I6: { (C c C, $)

(C c C, $)

(C d, $) }

I7: { (C d , $) }

I9: { (C c C , $) }

76

LALR(1) Parsing Table Generation

procedure LALR(G’);

begin

for each state I in mergeCore(items(G’)) do begin

if (A a , b) in I and goto(I, a) = J for a terminal a then


if (A , a) in I and A S’ then

action[I, a] = “reduce A ”

if (S’ S , $) in I then action[I, $] = “accept”

if (A X , a) in I and goto(I, X) = J for a nonterminal X

then goto[I, X] = J

end


end

77

An Example

c d $ S C

0 s36 s47 1 2

1 a

2 s36 s47 5

36 s36 s47 89

47 r4 r4 r4

5 r2

89 r3 r3 r3

78

LR Grammars

• A grammar is SLR(1) iff its SLR(1) parsing

table has no multiply-defined entries

• A grammar is LR(1) iff its LR(1) parsing

table has no multiply-defined entries

• A grammar is LALR(1) iff its LALR(1)

parsing table has no multiply-defined

entries

79

Hierarchy of Grammar Classes

Unambiguous Grammars Ambiguous Grammars

LL(k) LR(k)

LR(1)

LALR(1)

LL(1) SLR(1)

80

Hierarchy of Grammar Classes

• Why LL(k) LR(k)?

• Why SLR(k) LALR(k) LR(k)?

81

LL(k) vs. LR(k)

• For a grammar to be LL(k), we must be able

to recognize the use of a production by

seeing only the first k symbols of what its

right-hand side derives

• For a grammar to be LR(k), we must be able

to recognize the use of a production by

having seen all of what is derived from its

right-hand side with k more symbols of

lookahead

82

LALR(k) vs. LR(k)

• The merge of the sets of LR(1) items having the

same core does not introduce shift/reduce conflicts

• Suppose there is a shift-reduce conflict on

lookahead a in the merged set because of

1. (A , a) 2. (B a , b)

• Then some set of items has item (A , a) , and

since the cores of all sets merged are the same, it

must have an item (B a , c) for some c

• But then this set has the same shift/reduce conflict

on a

83

LALR(k) vs. LR(k)

• The merge of the sets of LR(1) items having the same core may introduce reduce/reduce conflicts

• As an example, consider the grammar1. S’ S2. S a A d | a B e | b A e | b B d3. A c4. B c

that generates acd, ace, bce, bcd

• The set {(A c , d), (B c , e)} is valid for acx

• The set {(A c , e), (B c , d)} is valid for bcx

• But the union {(A c , d/e), (B c , d/e)} generates a reduce/reduce conflict

84

SLR(k) vs. LALR(k)

1. S’ S

2. S L = R

3. S R

4. L * R

5. L id

6. R L

85

SLR(k) vs. LALR(k)

I0: closure({S’ S}) =

S’ S

S L = R

S R

L * R

L id

R L

I1: goto(I0, S) = S’ S

I2: goto(I0, L) =

S L = R

R L

I3: goto(I0, R) =

S R

I4: goto(I0, *) =

L * R

R L

L * R

L id

I5: goto(I0, id) =

L id

FOLLOW(R) = {=, $}

86

SLR(k) vs. LALR(k)

I6: goto(I2, =) =

S L = R

R L

L * R

L id

I7: goto(I4, R) =

L * R

I8: goto(I4, L) =

R L

I9: goto(I6, R) =

S L = R

87

SLR(k) vs. LALR(k)

I0: closure({(S’ S, $)}) =

(S’ S, $)

(S L = R, $)

(S R, $)

(L * R, =/$)

(L id, =/$)

(R L, $)

I1: goto(I0, S) = (S’ S , $)

I2: goto(I0, L) =

(S L = R, $)

(R L , $)

I3: goto(I0, R) =

(S R , $)

I4: goto(I0, *) =

(L * R, =/$)

(R L, =/$)

(L * R, =/$)

(L id, =/$)

I5: goto(I0, id) =

(L id , =/$)

88

SLR(k) vs. LALR(k)

I6: goto(I2, =) =

(S L = R, $)

(R L, $)

(L * R, $)

(L id, $)

I7: goto(I4, R) =

(L * R , =/$)

I8: goto(I4, L) =

(R L , =/$)

I9: goto(I6, R) =

(S L = R , $)

I10: goto(I6, L) =

(R L , $)

I11: goto(I6, *) =

(L * R, $)

(R L, $)

(L * R, $)

(L id, $)

I12: goto(I6, id) =

(L id , $)

I13: goto(I11, R) =

(L * R , $)

I4

I5

89

Bison – A Parser Generator

Bison compiler

C compiler

a.out

lang.ylang.tab.c

lang.tab.h (-d option)

lang.tab.c a.out

tokens syntax tree

A langauge for specifying parsers and semantic analyzers

90

Bison Programs

%{

C declarations

%}

Bison declarations

%%

Grammar rules

%%

Additional C code

91

An Example

line expr ‘\n’

expr expr ‘+’ term | term

term term ‘*’ factor | factor

factor ‘(’ expr ‘)’ | DIGIT

92

An Example

%token DIGIT

%start line

%%

line : expr ‘\n’ {printf(“line: expr \\n\n”);}

;

expr: expr ‘+’ term {printf(“expr: expr + term\n”);}

| term {printf(“expr: term\n”}

;

term: term ‘*’ factor {printf(“term: term * factor\n”;}

| factor {printf(“term: factor\n”);}

;

factor: ‘(’ expr ‘)’ {printf(“factor: ( expr )\n”);}

| DIGIT {printf(“factor: DIGIT\n”);}

;

93

Functions and Variables

• yyparse(): the parser function

• yylex(): the lexical analyzer function. Bison recognizes any non-positive value as indicating the end of the input

• yylval: the attribute value of a token. Its default type is int, and can be declared to be multiple types in the first section using

%union {int ival;double dval;

}

94

Conflict Resolutions

• A reduce/reduce conflict is resolved by

choosing the production listed first

• A shift/reduce conflict is resolved in favor

of shift

• A mechanism for assigning precedences and

assocoativities to terminals

95

Precedence and Associativity

• The precedence and associativity of operators are

declared simultaneously

%nonassoc ‘<’ /* lowest */

%left ‘+’ ‘-’

%right ‘^’ /* highest */

• The precedence of a rule is determined by the

precedence of its rightmost terminal

• The precedence of a rule can be modified by

adding %prec <terminal> to its right end

96

An Example

%{

#include <stdio.h>

%}

%token NUMBER

%left ‘+’ ‘-’

%left ‘*’ ‘/’

%right UMINUS

%%

97

An Example

line : expr ‘\n’

;

expr: expr ‘+’ expr

| expr ‘-’ expr

| expr ‘*’ expr

| expr ‘/’ expr

| ‘-’ expr %prec UMINUS

| ‘(’ expr ‘)’

| NUMBER

;

98

Error Recovery

• Error recovery is performed via error productions

• An error production is a production containing

the predefined terminal error

• After adding an error production,

A B | error

on encountering an error in the middle of B, the

parser pops symbols from its stack until , shifts

error, and skips input tokens until a token in

FIRST()

99

Error Recovery

• The parser can report a syntax error by calling

the user provided function yyerror(char *)

• The parser will suppress the report of another

error message for 3 tokens

• You can resume error report immediately by

using the macro yyerrok

• Error productions are used for major

nonterminals

100

An Example

line : expr ‘\n’

| error ‘\n’ {yyerror("reenter last line:");

yyerrok;}

;

expr: expr ‘+’ expr

| expr ‘*’ expr

| ‘-’ expr %prec UMINUS

| ‘(’ expr ‘)’

| NUMBER

;

101

共勉

樊遲問仁。

子曰︰「愛人。」

子曰︰「人之過也，各於其黨，

觀過，斯知仁矣。」

--論語

102

共勉

子曰︰唯仁者，能好人，能惡人。

子曰︰茍志於仁矣，無惡也。

--論語

子曰︰志士仁人，無求生以害仁，

有殺身以成仁。

103

共勉

子曰︰里仁為美。擇不處仁，焉得知？

子曰︰朝聞道，夕死可矣！

--論語

子曰︰不仁者不可以久處約，

不可以長處樂。仁者安仁，知者利仁。

Documents

Syntax Analysis - ecourse2.ccu.edu.tw