Winter 2012-2013Compiler Principles
Syntax Analysis (Parsing) – Part 3Mayer Goldberg and Roman Manevich
Ben-Gurion University
2
Today
• Shift-reduce parsing– How does a shift-reduce parser work?– How do we construct a shift-reduce parse table?– LR(1), SLR(1), LALR(1)– Meaning of shift-reduce / reduce-reduce conflicts– Behavior on left-recursion/right-recursion
• Automatic parser generation (CUP)– Handling ambiguity
3
Model of an LR parser
LRParsing program
0
T
2
+
7
id
5
Stack
$ id + id + id
Outputstate
symbol
goto action
Input
4
LR parser stack
• Sequence made of state, symbol pairs• For instance a possible stack for the grammar
S E $E TE E + TT id T ( E )could be: 0 T 2 + 7 id 5
Form of LR parsing table
5
state terminals non-terminals
shift/reduceactions
gotopart
01...
sn
rk
shift state n reduce by rule k
gm
goto state m
acc
accept
error
6
LR parser table example
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
goto action STATE
T E $ ) ( + id
g6 g1 s7 s5 0
acc s3 1
2
g4 s7 s5 3
r3 r3 r3 r3 r3 4
r4 r4 r4 r4 r4 5
r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7
s9 s3 8
r5 r5 r5 r5 r5 9
7
Shift move
LRParsing program
q...
Stack
$ … a …
Output
goto action
Input
If action[q, a] = sn
8
Result of shift
If action[q, a] = sn
LRParsing program
naq...
Stack
$ … a …
Output
goto action
Input
9
Reduce move
• If action[qn, a] = rk• Production: (k) A β• If β= σ1… σn
Top of stack looks like q1 σ1… qn σn
• goto[q, A] = qm
LRParsing program
qn
…
q…
Stack
$ … a …
Output
goto action
Input
2*|β|
10
Result of reduce move
• If action[qn, a] = rk• Production: (k) A β• If β= σ1… σn
Top of stack looks like q1 σ1… qn σn
• goto[q, A] = qm
LRParsing program
Stack
Output
goto action
2*|β| qm
A
q
…
$ … a …Input
11
Accept move
LRParsing program
q...
Stack
$ a …
Output
goto action
Input
If action[q, a] = acceptparsing completed
12
Error move
LRParsing program
q...
Stack
$ … a …
Output
goto action
Input
If action[q, a] = errorparsing discovered a syntactic error
13
Parsing id+id$
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Stack Input Action0 id + id $ s5
Initialize with state 0
14
Parsing id+id$
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Stack Input Action0 id + id $ s5
Initialize with state 0
15
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r4
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
16
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r4
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
pop id 5
17
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r4
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
push T 6
18
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r2
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
19
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s3
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
20
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s5
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
21
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r4
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
22
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r3
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
23
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r30 E 1 $ s2
goto action ST E $ ) ( + id
g6 g1 s7 s5 0acc s3 1
2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
24
Constructing an LR parsing table
1. Construct a (determinized) transition diagram from LR items
2. If there are conflicts – stop3. Fill table entries from diagram
Form of LR(0) items
N αβ
Already matched To be matched
Input
Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β
25
Types of LR(0) items
N αβ Shift Item
N αβ Reduce Item
26
27
LR(0) items enumeration example
• All items can be obtained by placing a dot at every position for every production:
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
1: S E$2: S E $3: S E $ 4: E T5: E T 6: E E + T7: E E + T8: E E + T9: E E + T 10: T i11: T i 12: T (E)13: T ( E)14: T (E )15: T (E)
Grammar LR(0) items
28
Operations for transition diagram construction
• Initial = {S’S$}• For an item set I
Closure(I) = Closure(I) + {Xµ is in grammar| NαXβ in I}
• Goto(I, X) = { NαXβ | NαXβ in I}
29
Initial example
• Initial = {S E $}(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Grammar
30
Closure example
• Initial = {S E $}• Closure({S E $}) =
S E $E TE E + TT id T ( E )
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Grammar
31
Goto example
• Initial = {S E $}• Closure({S E $}) =
S E $E TE E + TT id T ( E )
• Goto({S E $ , E E + T, T id}, E) = {S E $, E E + T}
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Grammar
32
Constructing the transition diagram
1. Start with state 0 containing itemClosure({S E $})
2. Repeat until no new states are discovered– For every state p containing item set Ip, and
symbol N, compute state q containing item setIq = Closure(goto(Ip, N))
33
Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$E TE E + TT iT (E)
T (E)E TE E + TT iT (E)
E E + T
T (E) S E$
S E$E E+ T E E+T
T iT (E)
T i
T (E)E E+T
E Tq0
q1
q2
q3
q4
q5
q6
q7
q8
q9
T
(
i
E
+
$
T
)
+
E
i
T
(i
(
34
Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$
q0
Initialize
35
Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$E TE E + TT iT (E)
q0
applyClosure
36
Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$E TE E + TT iT (E)
q0E T
q6
T
T (E)E TE E + TT iT (E)
(
T i
q5i
S E$E E+ T
q1
E
37
Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$E TE E + TT iT (E)
T (E)E TE E + TT iT (E)
E E + T
T (E) S E$
Z E$E E+ T E E+T
T iT (E)
T i
T (E)E E+T
E Tq0
q1
q2
q3
q4
q5
q6
q7
q8
q9
T
(
i
E
+
$
T
)
+
E
i
T
(i
(
terminal transition corresponds to shift action in parse table
non-terminal transition corresponds to goto action in parse table
a single reduce item corresponds to reduce action
38
Conflicts
• Can construct a diagram for every grammar but some may introduce conflicts
• shift-reduce conflict: an item set contains at least one shift item and one reduce item
• reduce-reduce conflict: an item set contains two reduce items
39
LR(0) conflicts
S E $E T E E + TT i T ( E )T i[E]
S E$E TE E + TT iT (E)T i[E] T i
T i[E]
q0
q5
T
(
i
E Shift/reduce conflict
…
…
…
40
LR(0) conflicts
S E $E T E E + TT i V iT ( E )
S E$E TE E + TT iT (E)T i[E] T i
V i
q0
q5
T
(
i
E reduce/reduce conflict
…
…
…
41
LR(0) conflicts
• Any grammar with an -rule cannot be LR(0)• Inherent shift/reduce conflict– A – reduce item– P αAβ – shift item– A can always be predicted from P αAβ
LR variants
• LR(0) – what we’ve seen so far• SLR(0)– Removes infeasible reduce actions via FOLLOW set
reasoning• LR(1)– LR(0) with one lookahead token in items
• LALR(0)– LR(1) with merging of states with same LR(0)
component
42
SRL parsing• A handle should not be reduced to a non-terminal
N if the lookahead is a token that cannot follow N• A reduce item N α is applicable only when the
lookahead is in FOLLOW(N)– If b is not in FOLLOW(N) we just proved there is no
derivation S =>* βNαb and thus it is safe to remove the reduce item from the conflicted state
• Differs from LR(0) only on the ACTION table– Now a row in the parsing table may contain both shift
actions and reduce actions and we need to consult the current token to decide which one to take
43
SLR action tableState i + ( ) [ ] $
0 shift shift
1 shift accept
2
3 shift shift
4 EE+T EE+T EE+T5 Ti Ti shift Ti6 ET ET ET7 shift shift
8 shift shift
9 T(E) T(E) T(E)
vs.
state action
q0 shift
q1 shift
q2
q3 shift
q4 EE+Tq5 Tiq6 ETq7 shift
q8 shift
q9 TE
SLR – use 1 token look-ahead LR(0) – no look-ahead
… as before…T i T i[E]
Lookahead token from the
input
44
Going beyond SLR(0)
• Some common language constructs introduce conflicts even for SLR
(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L
45
S’ → SS → L = RS → RL → * RL → idR → L
S’ → S
S → L = RR → L
S → R
L → * RR → LL → * RL → id
L → id
S → L = RR → LL → * RL → id
L → * R
R → L
S → L = R
S
L
R
id
*
=
R
*
id
R
L*
L
id
q0
q4
q7
q1
q3
q9
q6
q8
q2
q5
46
47
shift/reduce conflict
• S → L = R vs. R → L • FOLLOW(R) contains =
– S L ⇒ = R * R ⇒ = R
• SLR cannot resolve conflict
S → L = RR → L
S → L = RR → LL → * RL → id
=q6
q2
(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L
48
Inputs requiring shift/reduce
• For the input id the rightmost derivationS’ => S => R => L => id requires reducing in q2
• For the input id = idS’ => S => L = R => L = L => L = id => id = idrequires shifting
S → L = RR → L
S → L = RR → LL → * RL → id
=q6
q2(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L
LR(1) grammars
• In SLR: a reduce item N α is applicable only when the lookahead is in FOLLOW(N)
• But FOLLOW(N) merges lookahead for all alternatives for N– Insensitive to the context of a given production
• LR(1) keeps lookahead with each LR item• Idea: a more refined notion of follows
computed per item
49
50
LR(1) items• LR(1) item is a pair
– LR(0) item– Lookahead token
• Meaning– We matched the part left of the dot, looking to match the part on the right of
the dot, followed by the lookahead token• Example
– The production L id yields the following LR(1) items
[L → ● id, *][L → ● id, =][L → ● id, id][L → ● id, $][L → id ●, *][L → id ●, =][L → id ●, id][L → id ●, $]
(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L
[L → ● id][L → id ●]
LR(0) items
LR(1) items
-closure for LR(1)
• For every [A → α ● Bβ , c] in S – for every production B→δ and every token b in
the grammar such that b FIRST(βc) – Add [B → ● δ , b] to S
51
(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )
(S’ → S ∙ , $)
(S → L ∙ = R , $)(R → L ∙ , $)
(S → R ∙ , $)
(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → id ∙ , $)(L → id ∙ , =)
(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → * R ∙ , =)(L → * R ∙ , $)
(R → L ∙ , =)(R → L ∙ , $)
(S → L = R ∙ , $)
S
L
R
id*
=
R
*id
R
L
*
L
id
q0
q4 q5
q7
q6
q9
q3
q1
q2
q8
(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → id ∙ , $)
(R → L ∙ , $)
(L → * R ∙ , $)
q11
q12
q10
Rq13
id
52
Back to the conflict
• Is there a conflict now?
53
(S → L ∙ = R , $)(R → L ∙ , $)
(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
=
q6
q2
LALR(1)• LR(1) tables have huge number of entries• Often don’t need such refined observation (and
cost)• Idea: find states with the same LR(0) component
and merge their lookaheads component as long as there are no conflicts
• LALR(1) not as powerful as LR(1) in theory but works quite well in practice– Merging may not introduce new shift-reduce conflicts,
only reduce-reduce, which is unlikely in practice
54
(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )
(S’ → S ∙ , $)
(S → L ∙ = R , $)(R → L ∙ , $)
(S → R ∙ , $)
(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → id ∙ , $)(L → id ∙ , =)
(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → * R ∙ , =)(L → * R ∙ , $)
(R → L ∙ , =)(R → L ∙ , $)
(S → L = R ∙ , $)
S
L
R
id*
=
R
*id
R
L
*
L
id
q0
q4 q5
q7
q6
q9
q3
q1
q2
q8
(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → id ∙ , $)
(R → L ∙ , $)
(L → * R ∙ , $)
q11
q12
q10
Rq13
id
55
(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )
(S’ → S ∙ , $)
(S → L ∙ = R , $)(R → L ∙ , $)
(S → R ∙ , $)
(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → id ∙ , $)(L → id ∙ , =)
(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → * R ∙ , =)(L → * R ∙ , $)
(R → L ∙ , =)(R → L ∙ , $)
(S → L = R ∙ , $)
S
L
R
id*
=
R
*id
R
L
*
L
id
q0
q4 q5
q7
q6
q9
q3
q1
q2
q8
(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → * R ∙ , $)
q10
Rq13
id
56
57
Left/Right- recursion
• At home: create a simple grammar with left-recursion and one with right-recursion
• Construct corresponding LR(0) parser– Any conflicts?
• Run on simple input and observe behavior– Attempt to generalize observation for long inputs
High-level structure
JFlex javacLexerspec
Lexical analyzer
text
tokens
.java
CUP javacParserspec .java Parser
AST
TPL.cup
TPL.lex
sym.javaParser.java
Lexer.java
(Token.java)
59
Expression calculator
expr expr + expr| expr - expr| expr * expr| expr / expr| - expr| ( expr )| number
Goals of expression calculator parser:• Is 2+3+4+5 a valid expression?• What is the meaning (value) of this expression?
60
Syntax analysis with CUP
CUP javacParserspec .java Parser
AST
CUP – parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer
tokens
61
CUP spec file
• Package and import specifications• User code components• Symbol (terminal and non-terminal) lists– Terminals go to sym.java– Types of AST nodes
• Precedence declarations• The grammar– Semantic actions to construct AST
62
Expression Calculator – 1st Attempt
terminal Integer NUMBER;terminal PLUS, MINUS, MULT, DIV;terminal LPAREN, RPAREN;
non terminal Integer expr;
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr| LPAREN expr RPAREN| NUMBER
;
Symbol typeexplained later
63
Ambiguities
a * b + c
a b c
+
*
a b c
*
+
a + b + ca b c
+
+
a b c
+
+
64
terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal UMINUS;non terminal Integer expr;
precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER
;
Expression Calculator – 2nd Attempt
Increasing precedence
Contextual precedence
65
Parsing ambiguous grammars using precedence declarations
• Each terminal assigned with precedence– By default all terminals have lowest precedence– User can assign his own precedence– CUP assigns each production a precedence
• Precedence of last terminal in production• or user-specified contextual precedence
• On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce
• In case of equal precedences left/right help resolve conflicts– left means reduce– right means shift
• More information on precedence declarations in CUP’s manual
66
Resolving ambiguity
a + b + c
a b c
+
+
a b c
+
+
precedence left PLUS
67
Resolving ambiguity
a * b + c
a b c
+
*
a b c
*
+
precedence left PLUSprecedence left MULT
68
Resolving ambiguity
- a - b
a b
-
-
MINUS expr %prec UMINUS
a
-b
-
69
Resolving ambiguityterminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal UMINUS;
precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec
UMINUS| LPAREN expr RPAREN| NUMBER
;
Rule has precedence of UMINUS
UMINUS never returnedby scanner
(used only to define precedence)
70
More CUP directives• precedence nonassoc NEQ– Non-associative operators: < > == != etc.– 1<2<3 identified as an error (semantic error?)
• start non-terminal– Specifies start non-terminal other than first non-terminal– Can change to test parts of grammar
• Getting internal representation– Command line options:
• -dump_grammar• -dump_states • -dump_tables• -dump
71
import java_cup.runtime.*;%%%cup%eofval{ return new Symbol(sym.EOF);%eofval}NUMBER=[0-9]+%%<YYINITIAL>”+” { return new Symbol(sym.PLUS); }<YYINITIAL>”-” { return new Symbol(sym.MINUS); }<YYINITIAL>”*” { return new Symbol(sym.MULT); }<YYINITIAL>”/” { return new Symbol(sym.DIV); }<YYINITIAL>”(” { return new Symbol(sym.LPAREN); }<YYINITIAL>”)” { return new Symbol(sym.RPAREN); }<YYINITIAL>{NUMBER} {
return new Symbol(sym.NUMBER, new Integer(yytext()));}<YYINITIAL>\n { }<YYINITIAL>. { }
Parser gets terminals from the scanner
Scanner integration
Generated from tokendeclarations in .cup file
72
Recap
• Package and import specifications and user code components
• Symbol (terminal and non-terminal) lists– Define building-blocks of the grammar
• Precedence declarations– May help resolve conflicts
• The grammar– May introduce conflicts that have to be resolved
73
Assigning meaning
• So far, only validation• Add Java code implementing semantic actions
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER
;
74
• Symbol labels used to name variables• RESULT names the left-hand side symbol
expr ::= expr:e1 PLUS expr:e2{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}| expr:e1 MINUS expr:e2{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}| expr:e1 MULT expr:e2{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}| expr:e1 DIV expr:e2{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}| MINUS expr:e1{: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS| LPAREN expr:e1 RPAREN{: RESULT = e1; :}| NUMBER:n {: RESULT = n; :};
Assigning meaning
75
Building an AST
• More useful representation of syntax tree– Less clutter– Actual level of detail depends on your design
• Basis for semantic analysis• Later annotated with various information– Type information– Computed values
76
Parse tree vs. AST
+
expr
1 2 + 3
expr
expr
( ) ( )
expr
expr
1 2
+
3
+
77
AST construction• AST Nodes constructed during parsing– Stored in push-down stack
• Bottom-up parser– Grammar rules annotated with actions for AST
construction– When node is constructed all children available
(already constructed)– Node (RESULT) pushed on stack
• Top-down parser– More complicated
78
1 + (2) + (3)
expr + (expr) + (3)
+
expr
1 2 + 3
expr
expr + (3)
expr
( ) ( )
expr + (expr)
expr
expr
expr
expr + (2) + (3)
int_constval = 1
pluse1 e2
int_constval = 2
int_constval = 3
pluse1 e2
expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :}
AST construction
79
terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI;terminal UMINUS;non terminal Integer expr;non terminal expr_list, expr_part; precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;
expr_list ::= expr_list expr_part | expr_part
; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI
; expr ::= expr PLUS expr
| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER
;
Designing an AST
80
See you next time