24
CSC 3130: Automata theory and formal languages LR(k) grammars Fall 2008 MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACS MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES

MELJUN CORTES automata14

Embed Size (px)

Citation preview

CSC 3130: Automata theory and formal languages

LR(k) grammars

Fall 2008MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

LR(0) example from last time

A → •aAbA→ •ab

A → a•AbA → a•bA → •aAbA → •ab

A → aA•b

A → aAb•

A → ab•

ab

bAa

1

2

3

4

5

A → aAb | ab

LR(0) parsing example revisited

Stack Input

S

S

SRSR

11a2

1a2a2

1a2a2b31a2A41a2A4b51A

aabbabb

bb

bbεε

A S

A → aAb | ab A ⇒ aAb ⇒ aabb

12

2

345

A

A → •aAbA→ •ab A → a•Ab

A → a•bA → •aAbA → •ab

A → aA•b A → aAb•

A → ab•

a

b

b

A

a12

3

4 5

Aa b

a b

• •

• •

• •

Meaning of LR(0) items

α •

A

A → α•Xβundiscovered part

εNFA transitions to:

X → •γ

X β

focus

shift focus to subtree rooted at X(if X is nonterminal)

A → αX•βmove past subtreerooted at X

Outline of LR(0) parsing algorithm

• Algorithm can perform two actions:

• What if:

no complete itemis valid

there is one valid item,and it is complete

shift (S) reduce (R)

some valid itemscomplete, some not

more than one validcomplete item

S / R conflict R / R conflict

Definition of LR(0) grammar

• A grammar is LR(0) if S/R, R/R conflicts never occur– LR means parsing happens left to right and produces a

rightmost derivation

• LR(0) grammars are unambiguous and have a fastparsing algorithm

• Unfortunately, they are not “expressive” enoughto describe programming languages

context-free grammarsparse using CYK algorithm (slow)

LR(∞) grammars

Hierarchy of context-free grammars

LR(1) grammars

LR(0) grammarsparse using LR(0) algorithm

javaperl

python…

A grammar that is not LR(0)

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a

A grammar that is not LR(0)

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

A

S

A B

A

aA

a a

A

a a

S S

ca

input:

possibilities:shift (3), reduce (4)reduce (5), shift (6)

• • •

valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a

a

S/R, R/R conflicts!

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

A

S

A B

A

aA

a a

A

a a

S S

ca

input:

• • •

apeek inside!

valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a apeek inside!

valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a

A

A

a a

S

parse tree must look like this

action: shift

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a a apeek inside!

valid LR(0) items:A → a•A, A → a• A → •aA, A → •a

parse tree must look like this

A

A

aA

a

S

•action: shift

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a a a

valid LR(0) items:A → a•A, A → a• A → •aA, A → •a

parse tree must look like this

action: reduce

A

A

aA

a a

S

LR(0) items vs. LR(1) items

A

A

a b

a b

Aa b•

A → aAb | ab

A → a•Ab

A

A

a b

a b

Aa b•

[A → a•Ab, b]

LR(0) LR(1)

LR(1) items

• LR(1) items are of the form

to represent this state in the parsing

[A → α•β, x] [A → α•β, ε]or

α β x•

A

α β•

A

Outline of LR(1) parsing algorithm

• Step 1: Build εNFA that describes valid item updates

• Step 2: Convert εNFA to DFA– As in LR(0), DFA will have shift and reduce states

• Step 3: Run DFA on input, using stack to remember sequence of states– Use lookahead to eliminate wrong reduce items

Recall εNFA transitions for LR(0)

• States of εNFA will be items (plus a start state q0)

• For every item S → •α we have a transition

• For every item A → α•Xβ we have a transition

• For every item A → α•Cβ and production C → •δ

S → •αq0ε

A → αX•βXA → α•Xβ

C → •δεA → α•Cβ

εNFA transitions for LR(1)

• For every item [S → •α, ε] we have a transition

• For every item A → α•Xβ we have a transition

• For every item [A → α•Cβ, x] and production C → δ

for every y in FIRST(βx)

[S → •α, ε]q0ε

[A → αX•β, x]X

[A → α•Xβ, x]

[C → •δ, y]ε

[A → α•Cβ, x]

FIRST sets

• Example

FIRST(α) is the set of terminals that occuron the left in some derivation starting from α

S → A(1) | cB(2) A → aA(3) | a(4) B → a(5) | ab(6)

FIRST(a) = {a}FIRST(A) = {a}FIRST(S) = {a, c}FIRST(bAc) = {b}FIRST(BA) = {a}FIRST(ε) = ∅

Explaining the transitions

[A → αX•β, x]X

[A → α•Xβ, x]

[C → •δ, y]ε

[A → α•Cβ, x]

α

A

C β x

α •

A

X β x α •

A

X β x

y ∈ FIRST(βx)

y

C β

δ • •

Example

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

[S → •A, ε]

q0

ε

[S → •Bc, ε]

ε

[S → A•, ε]

A[A → •aA, ε]

[B → •a, c]

[S → B•c, ε]

[B → •ab, c]

. . .

ε

ε

ε

B

[A → •a, ε]ε

Convert NFA to DFA

• Each DFA state is a subset of LR(1) items, e.g.

• States can contain S/R, R/R conflicts

• But lookahead can always resolve such conflicts

[A → a•A, ε] [A → a•, ε][B → a•, c] [B → a•b, c] [A → •aA, ε] [A → •a, ε]

Example

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

stack input

ε

a

abBBcS

abc

bc

ccεε

A valid items[S → •A, ε] [S → •Bc, ε] [A → •aA, ε] [A → •a, ε] [B → •a, c] [B → •ab, c]

S

SRSR

[A → a•A, ε] [A → a•, ε] [B → a•, c] [B → a•b, c] [A → •aA, ε] [A → •a, ε]

[B → ab•, c] [S → B•c, ε]

[S → Bc•, ε]

look ahead!

LR(k) grammars

• A context-free grammar is LR(1) if all S/R, R/Rconflicts can be resolved with one lookahead

• More generally, LR(k) grammars can resolve allconflicts with k lookahead symbols– Items have the form [A → α•β, x1...xk]

• LR(1) grammars describe the semantics of mostprogramming languages