MELJUN CORTES automata14

CSC 3130: Automata theory and formal languages

LR(k) grammars

Fall 2008MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

LR(0) example from last time

A → •aAbA→ •ab

A → a•AbA → a•bA → •aAbA → •ab

A → aA•b

A → aAb•

A → ab•

ab

bAa

1

2

3

4

5

A → aAb | ab

LR(0) parsing example revisited

Stack Input

S

S

SRSR

11a2

1a2a2

1a2a2b31a2A41a2A4b51A

aabbabb

bb

bbεε

A S

A → aAb | ab A ⇒ aAb ⇒ aabb

12

2

345

A

A → •aAbA→ •ab A → a•Ab

A → a•bA → •aAbA → •ab

A → aA•b A → aAb•

A → ab•

a

b

b

A

a12

3

4 5

Aa b

a b

• •

• •

• •

•

Meaning of LR(0) items

α •

A

A → α•Xβundiscovered part

εNFA transitions to:

X → •γ

X β

focus

shift focus to subtree rooted at X(if X is nonterminal)

A → αX•βmove past subtreerooted at X

Outline of LR(0) parsing algorithm

• Algorithm can perform two actions:

• What if:

no complete itemis valid

there is one valid item,and it is complete

shift (S) reduce (R)

some valid itemscomplete, some not

more than one validcomplete item

S / R conflict R / R conflict

Definition of LR(0) grammar

• A grammar is LR(0) if S/R, R/R conflicts never occur– LR means parsing happens left to right and produces a

rightmost derivation

• LR(0) grammars are unambiguous and have a fastparsing algorithm

• Unfortunately, they are not “expressive” enoughto describe programming languages

context-free grammarsparse using CYK algorithm (slow)

LR(∞) grammars

…

Hierarchy of context-free grammars

LR(1) grammars

LR(0) grammarsparse using LR(0) algorithm

javaperl

python…

A grammar that is not LR(0)

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a

A grammar that is not LR(0)

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

A

S

A B

A

aA

a a

A

a a

S S

ca

input:

possibilities:shift (3), reduce (4)reduce (5), shift (6)

• • •

valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a

a

S/R, R/R conflicts!

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

A

S

A B

A

aA

a a

A

a a

S S

ca

input:

• • •

apeek inside!


Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a apeek inside!


A

A

a a

S

•

…

parse tree must look like this

action: shift

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a a apeek inside!

valid LR(0) items:A → a•A, A → a• A → •aA, A → •a


…

A

A

aA

a

S

•action: shift

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a a a

valid LR(0) items:A → a•A, A → a• A → •aA, A → •a


action: reduce

A

A

aA

a a

S

•

LR(0) items vs. LR(1) items

A

A

a b

a b

Aa b•

A → aAb | ab

A → a•Ab

A

A

a b

a b

Aa b•

[A → a•Ab, b]

LR(0) LR(1)

LR(1) items

• LR(1) items are of the form

to represent this state in the parsing

[A → α•β, x] [A → α•β, ε]or

α β x•

A

α β•

A

Outline of LR(1) parsing algorithm

• Step 1: Build εNFA that describes valid item updates

• Step 2: Convert εNFA to DFA– As in LR(0), DFA will have shift and reduce states

• Step 3: Run DFA on input, using stack to remember sequence of states– Use lookahead to eliminate wrong reduce items

Recall εNFA transitions for LR(0)

• States of εNFA will be items (plus a start state q0)

• For every item S → •α we have a transition

• For every item A → α•Xβ we have a transition

• For every item A → α•Cβ and production C → •δ

S → •αq0ε

A → αX•βXA → α•Xβ

C → •δεA → α•Cβ

εNFA transitions for LR(1)

• For every item [S → •α, ε] we have a transition

• For every item A → α•Xβ we have a transition

• For every item [A → α•Cβ, x] and production C → δ

for every y in FIRST(βx)

[S → •α, ε]q0ε

[A → αX•β, x]X

[A → α•Xβ, x]

[C → •δ, y]ε

[A → α•Cβ, x]

FIRST sets

• Example

FIRST(α) is the set of terminals that occuron the left in some derivation starting from α

S → A(1) | cB(2) A → aA(3) | a(4) B → a(5) | ab(6)

FIRST(a) = {a}FIRST(A) = {a}FIRST(S) = {a, c}FIRST(bAc) = {b}FIRST(BA) = {a}FIRST(ε) = ∅

Explaining the transitions

[A → αX•β, x]X

[A → α•Xβ, x]

[C → •δ, y]ε

[A → α•Cβ, x]

α

A

C β x

α •

A

X β x α •

A

X β x

y ∈ FIRST(βx)

y

C β

δ • •

Example

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

[S → •A, ε]

q0

ε

[S → •Bc, ε]

ε

[S → A•, ε]

A[A → •aA, ε]

[B → •a, c]

[S → B•c, ε]

[B → •ab, c]

. . .

ε

ε

ε

B

[A → •a, ε]ε

Convert NFA to DFA

• Each DFA state is a subset of LR(1) items, e.g.

• States can contain S/R, R/R conflicts

• But lookahead can always resolve such conflicts

[A → a•A, ε] [A → a•, ε][B → a•, c] [B → a•b, c] [A → •aA, ε] [A → •a, ε]

Example

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

stack input

ε

a

abBBcS

abc

bc

ccεε

A valid items[S → •A, ε] [S → •Bc, ε] [A → •aA, ε] [A → •a, ε] [B → •a, c] [B → •ab, c]

S

SRSR

[A → a•A, ε] [A → a•, ε] [B → a•, c] [B → a•b, c] [A → •aA, ε] [A → •a, ε]

[B → ab•, c] [S → B•c, ε]

[S → Bc•, ε]

look ahead!

LR(k) grammars

• A context-free grammar is LR(1) if all S/R, R/Rconflicts can be resolved with one lookahead

• More generally, LR(k) grammars can resolve allconflicts with k lookahead symbols– Items have the form [A → α•β, x1...xk]

• LR(1) grammars describe the semantics of mostprogramming languages

Technology

MELJUN CORTES automata14