83
Abstract stack machines for LL and LR parsing Hayo Thielecke August 13, 2015

Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Abstract stack machines for LL and LR parsing

Hayo Thielecke

August 13, 2015

Page 2: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Contents

Introduction

Background and preliminaries

Parsing machines

LL machine

LL(1) machine

LR machine

Page 3: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Parsing and (non-)deterministic stack machines

We decompose the parsing problem into to parts:

What the parser does: a stack machine, possiblynondeterministic.The “what” is very simple and elegant and has notchanged in 50 years.

How the parser knows which step to take, making themachine deterministic.The “how” can be very complex, e.g. LALR(1) itemor LL(*) constructions ; there is still ongoingresearch, even controversy.

There are tools for computing the “how”, e.g. yacc and ANTLRYou need to understand some of the theory for really using tools,e.g. what does it mean when yacc complains about areduce/reduce conflict

Page 4: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Exercise

Consider the language of matching round and square brackets.For example, these strings are in the language:

[()]

and[()]()()([])

but this is not:[(])

How would you write a program that recognizes this language, sothat a string is accepted if and only if all the brackets match?It is not terribly hard. But are you sure your solution is correct?You will see how the LL and LR machines solve this problem, andare correct by construction.

Page 5: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Parsing stack and function call stack

A useful analogy: a grammar rule

A→ B C

is like a function definition

void A()

{

B();

C();

}

Function calls also use a stackANTLR uses the function call stack as its parsing stackyacc maintains its own parsing stack

Page 6: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Grammars: formal definition

A context-free grammar consists of

I some terminal symbols a, b, . . . , +, ),. . .

I some non-terminal symbols A, B, S ,. . .

I a distinguished non-terminal start symbol S

I some rules of the form

A→ X1 . . .Xn

where n ≥ 0, A is a non-terminal, and the Xi are symbols.

Page 7: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Notation: Greek letters

Mathematicians and computer scientists are inordinately fond of Greekletters.

α alpha

β beta

γ gamma

ε epsilon

σ sigma

Page 8: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Notational conventions for grammars

I We will use Greek letters α, β, γ, σ . . . , to stand for strings ofsymbols that may contain both terminals and non-terminals.

I In particular, ε is used for the empty string (of length 0).

I We will write A, B, . . . for non-terminals.

I We will write S for the start symbol.

I Terminal symbols are usually written as lower case letters a,b, c , . . .

I These conventions are handy once you get used to them andare found in most books, e.g. the “Dragon Book”

Page 9: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Derivations

I If A→ α is a rule, we can replace A by α for any strings βand γ on the left and right:

β A γ ⇒ β α γ

This is one derivation step.

I A string w consisting only of terminal symbols is generated bythe grammar if there is a sequence of derivation steps leadingto it from the start symbol S :

S ⇒ · · · ⇒ w

Page 10: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

An example derivationConsider this grammar:

D → [D ]D (1)

D → (D )D (2)

D → (3)

There is a unique leftmost derivation for each string in thelanguage. For example, we derive [ ] [ ] as follows:

D

⇒ [D ]D

⇒ [ ]D

⇒ [ ] [D ]D

⇒ [ ] [ ]D

⇒ [ ] [ ]

Page 11: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

An example derivationConsider this grammar:

D → [D ]D (1)

D → (D )D (2)

D → (3)

There is a unique leftmost derivation for each string in thelanguage. For example, we derive [ ] [ ] as follows:

D

⇒ [D ]D

⇒ [ ]D

⇒ [ ] [D ]D

⇒ [ ] [ ]D

⇒ [ ] [ ]

Page 12: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

An example derivationConsider this grammar:

D → [D ]D (1)

D → (D )D (2)

D → (3)

There is a unique leftmost derivation for each string in thelanguage. For example, we derive [ ] [ ] as follows:

D

⇒ [D ]D

⇒ [ ]D

⇒ [ ] [D ]D

⇒ [ ] [ ]D

⇒ [ ] [ ]

Page 13: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

An example derivationConsider this grammar:

D → [D ]D (1)

D → (D )D (2)

D → (3)

There is a unique leftmost derivation for each string in thelanguage. For example, we derive [ ] [ ] as follows:

D

⇒ [D ]D

⇒ [ ]D

⇒ [ ] [D ]D

⇒ [ ] [ ]D

⇒ [ ] [ ]

Page 14: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

An example derivationConsider this grammar:

D → [D ]D (1)

D → (D )D (2)

D → (3)

There is a unique leftmost derivation for each string in thelanguage. For example, we derive [ ] [ ] as follows:

D

⇒ [D ]D

⇒ [ ]D

⇒ [ ] [D ]D

⇒ [ ] [ ]D

⇒ [ ] [ ]

Page 15: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

An example derivationConsider this grammar:

D → [D ]D (1)

D → (D )D (2)

D → (3)

There is a unique leftmost derivation for each string in thelanguage. For example, we derive [ ] [ ] as follows:

D

⇒ [D ]D

⇒ [ ]D

⇒ [ ] [D ]D

⇒ [ ] [ ]D

⇒ [ ] [ ]

Page 16: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Lexer and parser

The raw input is processed by the lexer before the parser sees it.For instance, while or 4223666 count as a single symbol for thepurpose of parsing.Lexers can be automagically generated (just like parsers by parsergenerators.Example: lex

regular expression→ deterministic finite automaton

We’ll skip this phase of the compiler.Parsing is much harder and has more interesting ideas.

Page 17: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Parsing stack machines

The states of the machines are of the form

〈σ , w 〉

where

I σ is the stack, a string of symbolswhich may include non-terminals

I w is the remaining input, a string of input symbolsno non-terminal symbols may appear in the input

Transitions or steps are of the form

〈σ1 , w1 〉︸ ︷︷ ︸old state

−→ 〈σ2 , w2 〉︸ ︷︷ ︸new state

I pushing or popping the stack changes σ1 to σ2

I consuming input changes w1 to w2

Page 18: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL vs LR idea

There are two main classes of parsers: LL and LR.Both use a parsing stack, but in different ways.

LL the stack contains a prediction of what the parserexpects to see in the input

LR the stack contains a reduction of what the parser hasalready seen in the input

Which is more powerful, LL or LR?

LL: Never make predictions, especially about the future.LR: Benefit of hindsightTheoretically, LR is much more powerful than LL.But LL is much easier to understand.

Page 19: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL vs LR idea

There are two main classes of parsers: LL and LR.Both use a parsing stack, but in different ways.

LL the stack contains a prediction of what the parserexpects to see in the input

LR the stack contains a reduction of what the parser hasalready seen in the input

Which is more powerful, LL or LR?LL: Never make predictions, especially about the future.LR: Benefit of hindsightTheoretically, LR is much more powerful than LL.But LL is much easier to understand.

Page 20: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Abstract and less abstract machines

You could easily implement these parsing stack machines whenthey are deterministic.

I In OCAML, Haskell, Agda:state = two lists of symbolstransitions by pattern matching

I In C:state = stack pointer + input pointeryacc does this, plus an LALR(1) automaton

Page 21: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Deterministic and nondeterministic machines

The machine is deterministic if for every 〈σ1 , w1 〉, there is atmost one state 〈σ2 , w2 〉 such that

〈σ1 , w1 〉 −→ 〈σ2 , w2 〉

In theory, one uses non-deterministic parsers (see PDA in Modelsof Computation).In compilers, we want deterministic parser for efficiency (lineartime).Some real parsers (ANTLR) tolerate some non-determinism and sosome backtracking.

Page 22: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Parser generators and the LL and LR machines

The LL and LR machines:

I encapsulate the main ideas (stack = prediction vs reduction)

I can be used for abstract reasoning, like partial correctness

I cannot be used off the shelf, since they are nondeterministic

A parser generator:

I computes information that makes these machinesdeterministic

I does not work on all grammars

I some grammars are not suitable for some (or all) deterministicparsing techniques

I parser generators produce errors such as reduce/reduceconflicts

I we may redesign our grammar to make the parser generatorwork

Page 23: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Examples of parser generators and their parsing principles

ANTLR: LL(k) for any k, also called LL(*)Menhir for OCAML: LR(1)yacc/bison, ocamlyacc: LALR(1)The numbers refer to the amount of lookahead.LALR(1) is an version of LR(1) that uses less main memory—useful in the 1970s. Now, not so much.

Page 24: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL parsing stack machine

Assume a fixed context-free grammar. We construct the LLmachine for that grammar.The top of stack is on the left.

〈Aσ , w 〉 predict−→ 〈ασ , w 〉 if there is a rule A→ α

〈a σ , a w 〉 match−→ 〈σ , w 〉

〈S , w 〉 is the initial state for input w

〈ε , ε〉 is the accepting state

Page 25: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Accepting a given input in the LL machine

Definition: An input string w is accepted if and only if there is asequence of machine steps leading to the accepting state:

〈S , w 〉 −→ · · · −→ 〈ε , ε〉

Theorem: an input string is accepted if and only if it can bederived by the grammar.More precisely: LL machine run ∼= leftmost derivation in thegrammarStretch exercise: prove this in Agda

Page 26: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL example

Consider this grammar

S → L b

L → a L

L →

Show how the LL machine for this grammar can accept the inputa a a b.

Page 27: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉

predict−→ 〈L b , a a b 〉predict−→ 〈a L b , a a b 〉match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉match−→ 〈L b , b 〉predict−→ 〈b , b 〉match−→ 〈ε , ε〉 X

Page 28: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉

predict−→ 〈a L b , a a b 〉match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉match−→ 〈L b , b 〉predict−→ 〈b , b 〉match−→ 〈ε , ε〉 X

Page 29: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈a L b , a a b 〉

match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉match−→ 〈L b , b 〉predict−→ 〈b , b 〉match−→ 〈ε , ε〉 X

Page 30: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈a L b , a a b 〉match−→ 〈L b , a b 〉

predict−→ 〈a L b , a b 〉match−→ 〈L b , b 〉predict−→ 〈b , b 〉match−→ 〈ε , ε〉 X

Page 31: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈a L b , a a b 〉match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉

match−→ 〈L b , b 〉predict−→ 〈b , b 〉match−→ 〈ε , ε〉 X

Page 32: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈a L b , a a b 〉match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉match−→ 〈L b , b 〉

predict−→ 〈b , b 〉match−→ 〈ε , ε〉 X

Page 33: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈a L b , a a b 〉match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉match−→ 〈L b , b 〉predict−→ 〈b , b 〉

match−→ 〈ε , ε〉 X

Page 34: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈a L b , a a b 〉match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉match−→ 〈L b , b 〉predict−→ 〈b , b 〉match−→ 〈ε , ε〉

X

Page 35: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈a L b , a a b 〉match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉match−→ 〈L b , b 〉predict−→ 〈b , b 〉match−→ 〈ε , ε〉 X

Page 36: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example: what should not happen

S → L b

L → a L

L →

〈S , a a b 〉

predict−→ 〈L b , a a b 〉predict−→ 〈b , a a b 〉

When it makes bad nondeterministic choice, the LL machine getsstuck.

Page 37: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example: what should not happen

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉

predict−→ 〈b , a a b 〉

When it makes bad nondeterministic choice, the LL machine getsstuck.

Page 38: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example: what should not happen

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈b , a a b 〉

When it makes bad nondeterministic choice, the LL machine getsstuck.

Page 39: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL machine run example: what should not happen

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉predict−→ 〈b , a a b 〉

When it makes bad nondeterministic choice, the LL machine getsstuck.

Page 40: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Making the LL machine deterministic

I Idea: we use one symbol of lookahead to guide the predictmoves.⇒ LL(1) machine.

I Formally: FIRST and FOLLOW construction.

I Can be done by hand, though tedious

I The construction does not work for all grammars!

I Real-world: ANTLR does a more powerful version, LL(k) forany k .

Page 41: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

FIRST and FOLLOW

We define FIRST, FOLLOW and nullable:

I A terminal symbol b is in FIRST (α) if there exist a β suchthat

α∗⇒ b β

that is, b is the first symbol in something derivable from α

I A terminal symbol b is in FOLLOW (X ) if there exist α and βsuch that

S∗⇒ αX b γ

that is, b follows X in some derivation

I X is nullable ifX

∗⇒ ε

that is, we can derive the empty string from it

Page 42: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

FIRST and FOLLOW examples

Consider

S → L b

L → a L

L →

Thena ∈ FIRST (L)

b ∈ FOLLOW (L)

L is nullable

Page 43: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine

This is a predictive parser with 1 symbol of lookahead.

〈Aσ , b w 〉 predict−→ 〈ασ , b w 〉 if there is a rule A→ α

and b ∈ FIRST (α)

〈Aσ , b w 〉 predict−→ 〈σ , b w 〉 if there is a rule A→ ε

and b ∈ FOLLOW (A)

〈a σ , a w 〉 match−→ 〈σ , w 〉

〈S , w 〉 is the initial state for input w

〈ε , ε〉 is the accepting state

Page 44: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Parsing errors in the LL(1) machine

Suppose the LL(1) machine reaches a state of the form

〈a σ , b w 〉

where a 6= b. Then the machine can report an error, likeexpecting afound b in the input instead.Similarly, if it reaches a state of the form

〈ε , w 〉

where w 6= ε, the machine can report unexpected input w at theend.

Page 45: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉

predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)match−→ 〈L b , b 〉predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)match−→ 〈ε , ε〉 X

Page 46: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)

predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)match−→ 〈L b , b 〉predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)match−→ 〈ε , ε〉 X

Page 47: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)

match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)match−→ 〈L b , b 〉predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)match−→ 〈ε , ε〉 X

Page 48: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)match−→ 〈L b , a b 〉

predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)match−→ 〈L b , b 〉predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)match−→ 〈ε , ε〉 X

Page 49: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)

match−→ 〈L b , b 〉predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)match−→ 〈ε , ε〉 X

Page 50: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)match−→ 〈L b , b 〉

predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)match−→ 〈ε , ε〉 X

Page 51: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)match−→ 〈L b , b 〉predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)

match−→ 〈ε , ε〉 X

Page 52: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)match−→ 〈L b , b 〉predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)match−→ 〈ε , ε〉

X

Page 53: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine run example

Just like LL machine, but now deterministic

S → L b

L → a L

L →

〈S , a a b 〉predict−→ 〈L b , a a b 〉 as a ∈ FIRST (L)predict−→ 〈a L b , a a b 〉 as a ∈ FIRST (L)match−→ 〈L b , a b 〉predict−→ 〈a L b , a b 〉 as a ∈ FIRST (L)match−→ 〈L b , b 〉predict−→ 〈b , b 〉 as b ∈ FOLLOW (L)match−→ 〈ε , ε〉 X

Page 54: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Is the LL(1) machine deterministic?

〈Aσ , b w 〉 predict−→ 〈ασ , b w 〉 if there is a rule A→ α

and b ∈ FIRST (α)

〈Aσ , b w 〉 predict−→ 〈σ , b w 〉 if there is a rule A→ ε

and b ∈ FOLLOW (A)

For some grammars, there may be:

I FIRST/FIRST conflicts

A→ α1 A→ α2 FIRST (α1) ∩ FIRST (α2) 6= ∅

I FIRST/FOLLOW conflicts

A→ α FIRST (α) ∩ FOLLOW (A) 6= ∅NB: FIRST/FIRST conflict do not mean that the grammar is ambiguous. Ambiguous means different parse treesfor the same string.

Page 55: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Is the LL(1) machine deterministic?

〈Aσ , b w 〉 predict−→ 〈ασ , b w 〉 if there is a rule A→ α

and b ∈ FIRST (α)

〈Aσ , b w 〉 predict−→ 〈σ , b w 〉 if there is a rule A→ ε

and b ∈ FOLLOW (A)

For some grammars, there may be:

I FIRST/FIRST conflicts

A→ α1 A→ α2 FIRST (α1) ∩ FIRST (α2) 6= ∅

I FIRST/FOLLOW conflicts

A→ α FIRST (α) ∩ FOLLOW (A) 6= ∅NB: FIRST/FIRST conflict do not mean that the grammar is ambiguous. Ambiguous means different parse treesfor the same string.

Page 56: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Is the LL(1) machine deterministic?

〈Aσ , b w 〉 predict−→ 〈ασ , b w 〉 if there is a rule A→ α

and b ∈ FIRST (α)

〈Aσ , b w 〉 predict−→ 〈σ , b w 〉 if there is a rule A→ ε

and b ∈ FOLLOW (A)

For some grammars, there may be:

I FIRST/FIRST conflicts

A→ α1 A→ α2 FIRST (α1) ∩ FIRST (α2) 6= ∅

I FIRST/FOLLOW conflicts

A→ α FIRST (α) ∩ FOLLOW (A) 6= ∅NB: FIRST/FIRST conflict do not mean that the grammar is ambiguous. Ambiguous means different parse treesfor the same string.

Page 57: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Is the LL(1) machine deterministic?

〈Aσ , b w 〉 predict−→ 〈ασ , b w 〉 if there is a rule A→ α

and b ∈ FIRST (α)

〈Aσ , b w 〉 predict−→ 〈σ , b w 〉 if there is a rule A→ ε

and b ∈ FOLLOW (A)

For some grammars, there may be:

I FIRST/FIRST conflicts

A→ α1 A→ α2 FIRST (α1) ∩ FIRST (α2) 6= ∅

I FIRST/FOLLOW conflicts

A→ α FIRST (α) ∩ FOLLOW (A) 6= ∅NB: FIRST/FIRST conflict do not mean that the grammar is ambiguous. Ambiguous means different parse treesfor the same string.

Page 58: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Is the LL(1) machine deterministic?

〈Aσ , b w 〉 predict−→ 〈ασ , b w 〉 if there is a rule A→ α

and b ∈ FIRST (α)

〈Aσ , b w 〉 predict−→ 〈σ , b w 〉 if there is a rule A→ ε

and b ∈ FOLLOW (A)

For some grammars, there may be:

I FIRST/FIRST conflicts

A→ α1 A→ α2 FIRST (α1) ∩ FIRST (α2) 6= ∅

I FIRST/FOLLOW conflicts

A→ α FIRST (α) ∩ FOLLOW (A) 6= ∅NB: FIRST/FIRST conflict do not mean that the grammar is ambiguous. Ambiguous means different parse treesfor the same string.

Page 59: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Is the LL(1) machine deterministic?

〈Aσ , b w 〉 predict−→ 〈ασ , b w 〉 if there is a rule A→ α

and b ∈ FIRST (α)

〈Aσ , b w 〉 predict−→ 〈σ , b w 〉 if there is a rule A→ ε

and b ∈ FOLLOW (A)

For some grammars, there may be:

I FIRST/FIRST conflicts

A→ α1 A→ α2 FIRST (α1) ∩ FIRST (α2) 6= ∅

I FIRST/FOLLOW conflicts

A→ α FIRST (α) ∩ FOLLOW (A) 6= ∅NB: FIRST/FIRST conflict do not mean that the grammar is ambiguous. Ambiguous means different parse treesfor the same string.

Page 60: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Computing FIRST and FOLLOW

for each symbol X , nullable[X ] is initialised to false

for each symbol X , FOLLOW [X ] is initialised to the empty setfor each terminal symbol a, first[a] is initialised to {a}for each non-terminal symbol A, first[A] is initialised to the empty setrepeat

for each production X → Y1 . . .Yk

if all the Yi are nullablethen set nullable[X] to true

for each i from 1 to k , and j from i + 1 to kif Y1, . . . ,Yi−1 are all nullable

then add all symbols in first[Yi ] to first[X ]if Yi+1, . . . ,Yk are all nullable

then add all symbols in follow[X ] to follow[Yi ]if Yi+1, . . . ,Yj−1 are all nullable

then add all symbols in first[Yj ] to follow[Yi ]until first, follow and nullable did not change in this iteration

Page 61: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL(1) machine exercise

Consider the grammar

D → [D ]D (1)

D → (D )D (2)

D → (3)

Implement the LL(1) machine for this grammar in a language ofyour choice, preferably C.Bonus for writing the shortest possible implementation in C.

Page 62: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LR machine

Assume a fixed context free grammar. We construct the LLmachine for it.The top of stack is on the right.

〈σ , a w 〉 shift−→ 〈σ a , w 〉

〈σ α , w 〉 reduce−→ 〈σ A , w 〉 if there is a rule A→ α

〈ε , w 〉 is the initial state for input w

〈S , ε〉 is the accepting state

Page 63: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Accepting a given input in the LR machine

Definition: An input string w is accepted if and only if there is asequence of machine steps leading to the accepting state:

〈ε , w 〉 −→ · · · −→ 〈S , ε〉

Theorem: an input string is accepted if and only if it can bederived by the grammar.More precisely: LR machine run ∼= rightmost derivation in thegrammarStretch exercise: prove this in Agda

Page 64: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL vs LR example

Consider this grammar

S → AB

A → c

B → d

Show how the LL and LR machine can accept the input c d .

Page 65: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Is the LR machine deterministic?

〈σ , a w 〉 shift−→ 〈σ a , w 〉

〈σ α , w 〉 reduce−→ 〈σ A , w 〉 if there is a rule A→ α

For some grammars, there may be:

I shift/reduce conflicts

I reduce/reduce conflicts

Page 66: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Is the LR machine deterministic?

〈σ , a w 〉 shift−→ 〈σ a , w 〉

〈σ α , w 〉 reduce−→ 〈σ A , w 〉 if there is a rule A→ α

For some grammars, there may be:

I shift/reduce conflicts

I reduce/reduce conflicts

Page 67: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL vs LR in more detail

LL

〈Aσ , w 〉 predict−→ 〈ασ , w 〉 if there is a rule A→ α

〈a σ , a w 〉 match−→ 〈σ , w 〉

LR

〈σ , a w 〉 shift−→ 〈σ a , w 〉

〈σ α , w 〉 reduce−→ 〈σ A , w 〉 if there is a rule A→ α

The LR machines can make its choices after it has seen the righthand side of a rule.

Page 68: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL vs LR in more detail

LL

〈Aσ , w 〉 predict−→ 〈ασ , w 〉 if there is a rule A→ α

〈a σ , a w 〉 match−→ 〈σ , w 〉

LR

〈σ , a w 〉 shift−→ 〈σ a , w 〉

〈σ α , w 〉 reduce−→ 〈σ A , w 〉 if there is a rule A→ α

The LR machines can make its choices after it has seen the righthand side of a rule.

Page 69: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LL vs LR example

Here is a simple grammar:

S → A

S → B

A → a b

B → a c

One symbol of lookahead is not enough for the LL machine.An LR machine can look at the top of its stack and base its choiceon that.

Page 70: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LR machine run example

S → A

S → B

A → a b

B → a c

〈ε , a b 〉

shift−→ 〈a , b 〉shift−→ 〈a b , ε〉

reduce−→ 〈A , ε〉reduce−→ 〈S , ε〉 X

Page 71: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LR machine run example

S → A

S → B

A → a b

B → a c

〈ε , a b 〉shift−→ 〈a , b 〉

shift−→ 〈a b , ε〉reduce−→ 〈A , ε〉reduce−→ 〈S , ε〉 X

Page 72: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LR machine run example

S → A

S → B

A → a b

B → a c

〈ε , a b 〉shift−→ 〈a , b 〉shift−→ 〈a b , ε〉

reduce−→ 〈A , ε〉reduce−→ 〈S , ε〉 X

Page 73: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LR machine run example

S → A

S → B

A → a b

B → a c

〈ε , a b 〉shift−→ 〈a , b 〉shift−→ 〈a b , ε〉

reduce−→ 〈A , ε〉

reduce−→ 〈S , ε〉 X

Page 74: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LR machine run example

S → A

S → B

A → a b

B → a c

〈ε , a b 〉shift−→ 〈a , b 〉shift−→ 〈a b , ε〉

reduce−→ 〈A , ε〉reduce−→ 〈S , ε〉

X

Page 75: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

LR machine run example

S → A

S → B

A → a b

B → a c

〈ε , a b 〉shift−→ 〈a , b 〉shift−→ 〈a b , ε〉

reduce−→ 〈A , ε〉reduce−→ 〈S , ε〉 X

Page 76: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Experimenting with ANTLR and Menhir errors

Construct some grammar rules that are not:LL(k) for any k and feed them to ANTLRLR(1) and feed them to Menhirand observe the error messages.The error messages are allegedly human-readable.It helps if you understand LL and LR.

Page 77: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

How to make the LR machine deterministic

I Construction of LR items

I Much more complex than FIRST/FOLLOW construction

I Even more complex: LALR(1) items, to consume less memory

I You really want a tool to compute it for you

I real world: yacc performs LALR(1) construction.

I Generations of CS students had to simulate the LALR(1)automaton in the exam

I Hardcore: compute LALR(1) items by hand in the examfun: does not fit on a sheet; if you make a mistake it neverstops

I But I don’t think that teaches you anything

Page 78: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Problem: ambiguous grammars

A grammar is ambiguous if there is a string that has more thanone parse tree.Standard example:

E → E − E

E → 1

One such string is 1-1-1. It could mean (1-1)-1 or 1-(1-1)depending on how you parse it.Ambiguous grammars are a problem for parsing, as we do notknow which tree is intended.Note: do not confuse ambiguous with FIRST/FIRST conflict.

Page 79: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Left recursion

In fact, this grammar also has a FIRST/FIRST conflict.

E → E − E

E → 1

1 is in FIRST of both rules⇒ predictive parser construction failsStandard solution: left recursion elimination

Page 80: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

Left recursion elimination example

E → E - E

E → 1

We observe that E ⇒∗ 1 - 1 - . . . - 1

Idea: 1 followed by 0 or more “ - 1”

E → 1 F

F → - 1 F

F →

This refactored grammar also eliminates the ambiguity. Yay.

Page 81: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

FIRST/FIRST 6= ambiguity

This grammar has a FIRST/FIRST conflict

A → a b

A → a c

No left recursion.No ambiguity.

Page 82: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

FIRST/FIRST 6= ambiguity

This grammar has a FIRST/FIRST conflict

A → a b

A → a c

No left recursion.No ambiguity.

Page 83: Abstract stack machines for LL and LR parsinghxt/2015/compilers/parsing-abstract-machines… · August 13, 2015. Contents Introduction Background and preliminaries Parsing machines

FIRST/FIRST 6= ambiguity

This grammar has a FIRST/FIRST conflict

A → a b

A → a c

No left recursion.No ambiguity.