Syntax Analysis: LR(1) and LALR Parsing Canonical LR(1) Parsing LALR Parsing Objectives By the end of this lecture you should be able to: 1 Identify LR(1) items. 2 Construct an LR(1)

Canonical LR(1) ParsingLALR Parsing

Syntax Analysis: LR(1) and LALR Parsing

Lecture 7

c©Haythem O. Ismail LR(1) and LALR 1 / 29


Objectives

By the end of this lecture you should be able to:1 Identify LR(1) items.2 Construct an LR(1) automaton for a CFG.3 Construct the LR(1) parsing table for a CFG.4 Trace the operation of an LR(1) parser.5 Construct an LALR automaton for a CFG.6 Construct the LALR parsing table for a CFG.7 Trace the operation of an LALR parser.



Outline

1 Canonical LR(1) Parsing

2 LALR Parsing



Outline


2 LALR Parsing



Grammars Not SLR

ExampleConsider the following grammar G7:

S −→ L=R | RL −→ ∗R | idR −→ L



Grammars Not SLR: States

Example

c© Aho et al. (2007)c©Haythem O. Ismail LR(1) and LALR 6 / 29


Problems with SLR Parsing

One problem with SLR parsers is that it is always possible to reduceA→ α if the next input is in Follow(A).

ExampleWith G7 and input id = id, an SLR parser may shift id and thenreduce L→ id.

Since = ∈ Follow(R), the parser may decide to reduce R→ L.

Clearly, this is not correct; a sentential form starting with R= cannever reduce to S.

States should carry more information about when reduction isappropriate.



















LR(0) Items: Reprise

DefinitionAn LR(0) item of CFG G = 〈V,Σ,R, S〉 is a pair 〈A→ α, i〉, where(A→ α) ∈ R and 0 ≤ i ≤ |α|.

Intuitively, an LR(0) item is a rule and a position in the right sideof the rule.

Thus, 〈A→ aBb, 2〉 ≡ A→ aB.b

An LR(0) item represents a state of the parser: We have alreadyfound the prefix of α before the dot and, if we find the suffixfollowing the dot, we may reduce α to A.



LR(1) Items

DefinitionAn LR(1) item of CFG G = 〈V,Σ,R, S〉 is a pair 〈c, a〉, where c is anLR(0) item and a ∈ Σ ∪ {$}.

c is called the core of the item.a is the lookahead of the item.

Note that a is a single symbol, hence the “1” in “LR(1) item.”

An LR(1) item [A→ α.β, a] represents a state of the parser: Wehave already found α in the input and, if we find β, we mayreduce αβ to A if the next symbol is a.



LR(1) Items







LR(1) Items







LR(1) Items







The Set of LR(1) Items

The smallest set I(G) satisfying the following

[S′ → .S, $] ∈ I.

[A→ αX.β, a] ∈ I if [A→ α.Xβ, a] ∈ I, for X ∈ Σ ∪ V .

[B→ .γ, b] ∈ I if [A→ α.Bβ, a] ∈ I where (B→ γ) ∈ R andb ∈ First(βa).



The LR(1) NFA

DefinitionFor the CFG G = 〈V,Σ,R, S〉, the LR(1) NFA is an NFAN1

G = 〈I(G),V ∪ Σ, δ, [S′ → .S, $], I(G)〉, where

S′ /∈ V ∪ Σ;

δ([A→ α.Xβ, a],X) = {[A→ αX.β, a]};δ([A→ α.Bβ, a], ε) = {[B→ .γ, b] | (B→ γ) ∈ R andb ∈ First(βa)}.



The LR(1) Automaton

Definition

The LR(1) automaton for a CFG G is the DFA M1G which is equivalent

to N1G and constructed using the standard subset construction.



Grammar G8

Example (The grammar)

1. S −→ CC2. C −→ cC3. C −→ d



Grammar G8

Example (The automaton)

c©Aho et al. (2007)c©Haythem O. Ismail LR(1) and LALR 14 / 29


The Canonical LR(1) Parsing Table

We are given a CFG G = 〈V,Σ,R, S〉.1 Construct M1

G.2 For all states q of M1

G1 GOTO(q,A) = δ(q,A), for every A ∈ V .2 If a ∈ Σ and [A→ α.aβ, b] ∈ q, then ACTION(q, a) = “shiftδ(q, a)”.

3 Else if A 6= S′ and [A→ α., a] ∈ q, then ACTION(q, a) =“reduce A→ α”.

4 Else if [S′ → S., $] ∈ q, then ACTION(q, $) = “accept”.5 Else ACTION(q, a) = “error”.

If any conflicting actions result from the above construction, we saythat G is not LR(1).



The Canonical LR(1) Parsing Table

We are given a CFG G = 〈V,Σ,R, S〉.1 Construct M1

G.2 For all states q of M1

G1 GOTO(q,A) = δ(q,A), for every A ∈ V .2 If a ∈ Σ and [A→ α.aβ, b] ∈ q, then ACTION(q, a) = “shiftδ(q, a)”.

3 Else if A 6= S′ and [A→ α., a] ∈ q, then ACTION(q, a) =“reduce A→ α”.

4 Else if [S′ → S., $] ∈ q, then ACTION(q, $) = “accept”.5 Else ACTION(q, a) = “error”.

If any conflicting actions result from the above construction, we saythat G is not LR(1).



Grammar G8

Example (Table Automaton )



Grammar G8

Example (Trace Automaton )

Stack Input0 ccdd$03 cdd$033 dd$0334 d$0338 d$038 d$02 d$027 $025 $01 $



Canonical LR(1) and SLR Parsing

Note that a canonical LR(1) table can include conflicts only ifthe corresponding SLR table does.

Thus, every SLR grammar is a canonical LR(1) grammar, but notvice versa.

The price, however, is the increased table size due to theincreased number of automaton states.



Outline


2 LALR Parsing



Lookahead LR Parsing

Lookahead LR parsers (LALR parsers) are often used in practice.

Most common syntactic constructs in programming languagesare representable by LALR grammars.LALR tables are considerably smaller than canonical LR(1)tables.

SLR and LALR tables always have the same number of states.Such number is typically several hundred states.A canonical LR(1) table typically has several thousand states!

































The Core

DefinitionThe core of an LR(1) state q is the set

core(q) = {c|[c, l] ∈ q}

Two LR(1) states q1 and q2 are core-equivalent ifcore(q1) = core(q2).

Example (Automaton of G8 )The following states are core-equivalent.

I4 and I7.

I8 and I9.

I3 and I6.

How are they different?



The Core

DefinitionThe core of an LR(1) state q is the set

core(q) = {c|[c, l] ∈ q}

Two LR(1) states q1 and q2 are core-equivalent ifcore(q1) = core(q2).

Example (Automaton of G8 )The following states are core-equivalent.

I4 and I7.

I8 and I9.

I3 and I6.

How are they different?



Core-Equivalence and the Automaton

Observation

Let M1G be the LR(1) DFA for a CFG G, and let q1, q2 be states of M1

Gand X ∈ V ∪ Σ. If q1 and q2 are core-equivalent, then δ(q1,X) andδ(q2,X) are core-equivalent.

The number of states of the automaton may be reducedconsiderably if we use core-equivalence-classes as states.

In particular, if {q1, . . . , qn} is an equivalence class ofcore-equivalence, then we may replace the states q1, . . . , qn bythe single state Q = q1 ∪ · · · ∪ qn.

Transitions are adjusted appropriately.




Observation









Observation









Observation








The LALR Automaton

DefinitionLet 〈Q(G),V ∪ Σ, δ, q0,F〉 be an LR(1) automaton. The LALRautomaton MA

G = 〈Q′(G),V ∪ Σ, δ′, q′0,F′〉 is such that

Q′(G) = {⋃

qi∈[q]

qi|[q] is an equivalence class of core-equivalence

}q0 and q′0 are core-equivalent.

q′ ∈ F′ if and only if there is some q ∈ F such that q and q′ arecore-equivalent.

δ′(q′,X) is core-equivalent to δ(q,X), where q and q′ arecore-equivalent.



The LALR Automaton



Q′(G) = {⋃

qi∈[q]







The LALR Automaton



Q′(G) = {⋃

qi∈[q]







The LALR Automaton



Q′(G) = {⋃

qi∈[q]







Grammar G8

Example (LALR automaton)

Draw the state diagram of the LALR automaton for G8 .



The LALR Parsing Table

The LALR parsing table is constructed in exactly the same waythe canonical LR(1) table is constructed, but with MA

G rather thanM1

G.

The size of the LALR table is the same as the size of the SLRtable.

If the table has no conflicts, the grammar is said to be an LALRgrammar.



Grammar G8

Example (LALR parsing table)



Grammar G8

Example (Trace)Trace the operation of both the canonical LR(1) and the LALRparsers on input ccd. What do you notice?



LALR Conflicts (I)

ObservationIf an LR(1) table does not have shift/reduce conflicts, then thecorresponding LALR table does not have shift/reduce conflicts.

Proof. A shift/reduce conflict occurs if an LALR state has an item[A→ α., a], calling for reducing by A→ α, and an item[B→ β.aγ, b] calling for a shift. But, then, there must be some LR(1)state with items [A→ α., a] and [B→ β.aγ, c]. This state would alsohave a shift/reduce conflict, contradicting our assumption.



LALR Conflicts (I)

ObservationIf an LR(1) table does not have shift/reduce conflicts, then thecorresponding LALR table does not have shift/reduce conflicts.

Proof. A shift/reduce conflict occurs if an LALR state has an item[A→ α., a], calling for reducing by A→ α, and an item[B→ β.aγ, b] calling for a shift. But, then, there must be some LR(1)state with items [A→ α., a] and [B→ β.aγ, c]. This state would alsohave a shift/reduce conflict, contradicting our assumption.



LALR Conflicts (II)

Observation

If an LR(1) table does not have reduce/reduce conflicts, the corresponding LALR table mayhave reduce/reduce conflicts.

Proof. Consider the grammar

S −→ aAd | aBe | bBd | bAeA −→ cB −→ c

This grammar generates the strings acd,ace,bcd,bce.

The LR(1) state {[A→ c., d], [B→ c., e]} is reachable from the start state after readingac.

Similarly, the LR(1) state {[A→ c., e], [B→ c., d]} is reachable from the start stateafter reading bc.

The LALR automaton will include the state {[A→ c., d/e], [B→ c., d/e]}, whichresults in a reduce/reduce conflict.



LALR Conflicts (II)

Observation










LALR Conflicts (II)

Observation










LALR Conflicts (II)

Observation










LALR Conflicts (II)

Observation









Documents

Syntax Analysis: LR(1) and LALR Parsing Canonical LR(1) Parsing LALR Parsing Objectives By the end of this lecture you should be able to: 1 Identify LR(1) items. 2 Construct an LR(1)