33

op-doTwn Syntax Analysis - rw.cdl.uni-saarland.de Syntax Analysis op-doTwn Syntax Analysis Wilhelm/Maurer: Compiler Design, Chapter 8 Reinhard Wilhelm Universität des Saarlandes [email protected]

Embed Size (px)

Citation preview

Top-down Syntax Analysis

Top-down Syntax Analysis

� Wilhelm/Maurer: Compiler Design, Chapter 8 �

Reinhard WilhelmUniversität des [email protected]

andMooly Sagiv

Tel Aviv [email protected]

22. November 2007

Top-down Syntax Analysis

Subjects

I Functionality and Method

I Recursive Descent Parsing

I Using parsing tables

I Explicit stacks

I Creating the table

I LL(k)�grammars

I Other properties

I Handling Limitations

Top-down Syntax Analysis

Top-Down Syntax Analysis

input: A sequence of symbols (symbols)

output: A syntax tree or an error message

method

I Read input from left to rightI Construct the syntax tree in a top-down manner

starting with a node labeled with the startsymbol

I until input accepted (or error) doI Predict expansion for the actual leftmost

nonterminal (maybe using some lookahead intothe remaining input) or

I Verify predicted terminal symbol against nextsymbol of the remaining input

Finds leftmost derivations.

Top-down Syntax Analysis

Grammar for Arithmetic Expressions

Left factored grammar G2, i.e. left recursion removed.

S → E

E → TE ′ E generates T with a continuation E ′

E ′ → +E |ε E ′ generates possibly empty sequence of +T sT → FT ′ T generates F with a continuation T ′

T ′ → ∗T |ε T ′ generates possibly empty sequence of ∗F sF → id|(E )

Top-down Syntax Analysis

Recursive Descent Parsing

I parser is a program,

I a procedure X for each non-terminal X ,I parses words for non-terminal X ,I starts with the �rst symbol read (into variable nextsym),I ends with the following symbol read (into variable nextsym).

I uses one symbol lookahead into the remaining input.

I uses the FiFo sets to make the expansion transitionsdeterministic

FiFo(N → α) = FIRST1(α)⊕1FOLLOW1(N) ={FIRST1(α) ∪ FOLLOW1(N) α

∗=⇒ ε

FIRST1(α) otherwise

Top-down Syntax Analysis

Parser for G2

program parser;var nextsym: string;proc scan;{reads next input symbol into nextsym}proc error (message: string);{issues error message and stops parser}proc accept; {terminates successfully}

proc S;begin Eend ;

proc E;begin T; E'end ;

Top-down Syntax Analysis

proc E';begincase nextsym in{”+”}: if nextsym = "+"

then scanelse error( "+ expected") � ; E;

otherwise ;endcase

end ;

proc T;begin F; T' end ;proc T';begincase nextsym in{” ∗ ”}: if nextsym = "*"

then scanelse error( "* expected") � ; T;

otherwise ;endcase

end ;

Top-down Syntax Analysis

proc F;begin

case nextsym in

{”(”}: if nextsym = "("then scanelse error( "( expected") � ; E;if nextsym = ”)”then scan else error(" ) expected") �;

otherwise if nextsym =�id�then scan else error("id expected") �;

endcase

end ;begin

scan; S;if nextsym = ”#” then accept else error("# expected") �end .

Top-down Syntax Analysis

How to Construct such a Parser Program

Observation: Much redundant code generated. Why this?Code was automatically generated from the grammar and the FiFosets.Nice application for a functional programming language!

Top-down Syntax Analysis

Parser Schema

program parser;var nextsym: symbol;proc scan;

(∗ reads next input symbol into nextsym ∗)proc error (message: string);

(∗ issues error message and stops the parser ∗)proc accept;

(∗ terminates parser successfully ∗)

N_prog(X0); (* X0 start symbol *)N_prog(X1);

...N_prog(Xn);

beginscan;X0;if nextsym = �#�

then acceptelse error(". . . ")

�end

Top-down Syntax Analysis

The Non-terminal ProceduresN = Non-terminal, C = Concatenation, S = Symbol

N_prog(X ) = (* X → α1|α2| · · · |αk−1|αk *)proc X;begincase nextsym inFiFo(X → α1) : C_progr(α1);FiFo(X → α2) : C_progr(α2);

...FiFo(X → αk−1) : C_progr(αk−1);otherwise C_progr(αk);endcaseend ;

Top-down Syntax Analysis

C_progr(α1α2 · · ·αk) =S_progr(α1); S_progr(α2); . . . S_progr(αk);

S_progr(a) =if nextsym = a then scanelse error ( "a expected")�

S_progr(Y ) = Y

FiFo�sets should be disjoint (LL(1)�grammar)

Top-down Syntax Analysis

A Generative Solution

Generate the control of a deterministic PDA from the grammar andthe FiFo sets.

I At compiler�generation time construct a table MM : VN × VT → P

M[N, a] is the production used to expand nonterminal N whenthe current symbol is a

I For some grammars report that the table cannot beconstructedThe compiler writer can then decide to:

I change the grammar (but not the language)I use a more general parser-generatorI �Patch� the table (manually or using some rules)

Top-down Syntax Analysis

Creating the table

Input: cfg G , FIRST1 und FOLLOW1 for G .

Output: The parsing table M or an indication that such atable cannot be constructed

Method: M is constructed as follows:For all X → α ∈ P and a ∈ FIRST1(α), setM[X , a] = (X → α).If ε ∈ FIRST1(α), for all b ∈ FOLLOW1(X ), setM[X , b] = (X → α).Set all other entries of M to error .

Parser table cannot be constructed if at least one entry is set twice.G is not LL(1)

Top-down Syntax Analysis

Example � arithmetic expressions

nonterminal symbol Production

S (, id S → E

S +, ∗, ), # error

E (, id E → TE ′

E +, ∗, ), # error

E ′ + E ′ → +E

E ′ ), # E ′ → εE ′ (, ∗, id error

T (, id T → FT ′

T +, ∗, ), # error

T ′ ∗ T ′ → ∗TT ′ +, ), # T ′ → εT ′ (, id error

F id F → id

F ( F → (E)F +, ∗, ) error

Top-down Syntax Analysis

LL-Parser Driver (interprets the table M)

program parser;var nextsym: symbol;var st: stack of item;proc scan;

(∗ reads next input symbol into nextsym ∗)proc error (message: string);

(∗ issues error message and stops the parser ∗)proc accept;

(∗ terminates parser successfully ∗)proc reduce;

(∗ replaces [X → β.Y γ][Y → α.] by [X → βY .γ] ∗)proc pop;

(∗ removes topmost item from st ∗)proc push ( i : item);

(∗ pushes i onto st ∗)proc replaceby ( i: item);

(∗ replaces topmost item of st by i ∗)

Top-down Syntax Analysis

beginscan; push( [S ′ → .S ] );while nextsym 6= "#"docase top in[X → β.aγ]: if nextsym = a

then scan; replaceby([X → βa.γ]else error � ;

[X → β.Y γ] : if M[Y , nextsym] = (Y → α)then push([Y → .α])else error � ;

[X → α.]: reduce;[S ′ → S .] : if nextsym = "#"then accept

else error �endcaseod

end .

Top-down Syntax Analysis

Explicit StackDeterministic Pushdown Automaton

6

?

ρ

tree

M

vaw

[X → α.Y β]

#

Parser�Table

Control

Stack

Output

Input

Top-down Syntax Analysis

LL(k)-grammar

Let G = (VN ,VT ,P, S) be a cfg and k be a natural number.G is an LL(k)-grammar i� the following holds:if there exist two leftmost derivations

S∗

=⇒lm

uYα =⇒lm

uβα∗

=⇒lm

ux and

S∗

=⇒lm

uYα =⇒lm

uγα∗

=⇒lm

uy , and if k : x = k : y ,

then β = γ.The expansion of the leftmost non-terminal is always uniquelydetermined by

I the consumed part of the input and

I the next k symbols of the remaining input

Top-down Syntax Analysis

Example 1Let G1 the cfg with the productions

STAT → if id then STAT else STAT � |while id do STAT od |begin STAT end |id := id

G1 is an LL(1)-grammar.

STAT∗

=⇒lm

w STAT α =⇒lm

w β α∗

=⇒lm

w x

STAT∗

=⇒lm

w STAT α =⇒lm

w γ α∗

=⇒lm

w y

From 1 : x = 1 : y follows β = γ,e.g., from 1 : x = 1 : y = if followsβ = γ = if id then STAT else STAT �

Top-down Syntax Analysis

Example 2

Let G2 be the cfg with the productions

STAT → if id then STAT else STAT � |while id do STAT od |begin STAT end |id := id |id: STAT | (∗ labeled statem. ∗)id (id ) (∗ procedure call ∗)

Top-down Syntax Analysis

Example 2 (cont'd)

G2 is not an LL(1)�grammar.

STAT∗

=⇒lm

w STAT α =⇒lm

w

β︷ ︸︸ ︷id := id α

∗=⇒lm

w x

STAT∗

=⇒lm

w STAT α =⇒lm

w

γ︷ ︸︸ ︷id : STAT α

∗=⇒lm

w y

STAT∗

=⇒lm

w STAT α =⇒lm

w

δ︷ ︸︸ ︷id(id) α

∗=⇒lm

w z

and 1 : x = 1 : y = 1 : z = � id�,and β, γ, δ are pairwise di�erent.G2 is an LL(2)�grammar.2 : x = � id :=�, 2 : y = � id :�, 2 : z = � id(� are pairwise di�erent.

Top-down Syntax Analysis

Example 3

Let G3 have the productions

STAT → if id then STAT else STAT � |while id do STAT od |begin STAT end |VAR := VAR |id( IDLIST ) (∗ procedure call ∗)

VAR → id | id (IDLIST ) (∗ indexed variable ∗)IDLIST → id | id, IDLIST

G3 is not an LL(k)�grammar for any k .

Top-down Syntax Analysis

Proof:

Assume G3 to be LL(k) for a k > 0.

Let STAT⇒ β∗

=⇒lm

x and STAT⇒ γ∗

=⇒lm

y with

x = id (id, id, . . . , id︸ ︷︷ ︸d k2e times

) := id and y = id (id, id, . . . , id︸ ︷︷ ︸d k2e times

)

Then k : x = k : y ,but β = VAR := VAR 6= γ = id (IDLIST).

Top-down Syntax Analysis

Transforming to LL(k)

Factorization creates an LL(2)�grammar, equivalent to G3.

The productionsSTAT → VAR := VAR | id(IDLIST)

are replaced bySTAT → ASSPROC | id := VAR

ASSPROC → id(IDLIST) APRESTAPREST → := VAR | ε

Top-down Syntax Analysis

A non�LL(k)�language

Let G4 = ({S ,A,B}, {0, 1, a, b},P4,S)

P4 =

S → A | BA → aAb | 0B → aBbb | 1

L(G4) = {an0bn | n ≥ 0} ∪ {an1b2n | n ≥ 0}.G4 is not LL(k) for any k .

Consider the two leftmost derivations

S0

=⇒lm

S =⇒lm

A∗

=⇒lm

ak0bk

S0

=⇒lm

S =⇒lm

B∗

=⇒lm

ak1b2k

With u = α = ε, β = A, γ = B, x = ak0bk , y = ak1b2k it holdsk : x = k : y , but β 6= γ.Since k can be chosen arbitrarily, we have G4 is not LL(k) for any k .

There even is no LL(k)-grammar for L(G4) for any k .

Top-down Syntax Analysis

Towards Checkable LL(k)�conditions

TheoremG is LL(k)�grammar i� the following condition holds:

Are A → β and A → γ di�erent productions in P, then

FIRSTk(βα) ∩ FIRSTk(γα) = ∅ for all α with S∗

=⇒lm

wAα

TheoremLet G be a cfg without productions of the form X → ε.G is an LL(1)�grammar i�

for each non-terminal X with the alternatives X → α1| . . . |αn

the sets FIRST1(α1), . . . , FIRST1(αn) are pairwise disjoint.

Top-down Syntax Analysis

TheoremG is LL(1) i�For di�erent productions A → β and A → γFIRST1(β)⊕1FOLLOW1(A) ∩ FIRST1(γ)⊕1FOLLOW1(A) = ∅ .Corollary:G is LL(1) i� for all alternatives A → α1| . . . |αn:

1. FIRST1(α1), . . . ,FIRST1(αn) are pairwise disjoint; in

particular, at most one of them may contain ε

2. αi∗

=⇒ ε implies:

FIRST1(αj) ∩ FOLLOW1(A) = ∅ for 1 ≤ j ≤ n, j 6= i .

The condition of the Theorem was used in the parser

construction!

Top-down Syntax Analysis

Further De�nitions and Theorems

I G is called a strong LL(k)-grammar if for each two di�erentproductions A → β and A → γ

FIRSTk(β)⊕kFOLLOWk(A) ∩ FIRSTk(γ)⊕kFOLLOWk(A) = ∅,

I A production is called directly left recursive, if it has theform A → Aα

I A non-terminal A is called left recursive if it has a derivationA

+=⇒ Aα.

I A cfg G is called left recursive, if G contains at least one leftrecursive non-terminal

Top-down Syntax Analysis

Theorem

(a) G is not LL(k) for any k if G is left recursive.

(b) G is not ambiguous if G is LL(k)-grammar.

Top-down Syntax Analysis

Recursive Descent Parsing

program parser;var nextsym: symbol;proc scan;

(∗ reads next input symbol into nextsym ∗)proc error (message: string);

(∗ issues error message and stops the parser ∗)proc accept;

(∗ terminates parser successfully ∗)N_prog(X0 → α0);N_prog(X1 → α1);

...N_prog(Xn → αn); beginscan;X0;if nextsym = �#�

then acceptelse error(". . . ")

�end

Top-down Syntax Analysis

N_prog(X → α) =proc X;begin

progr([X → .α])end ;

progr([X → · · · .(α1|α2| · · · |αk−1|αk) · · · ]) =case nextsym inFiFo([X → · · · (.α1|α2| · · · |αk−1|αk) · · · ]) :

progr([X → · · · (.α1|α2| · · · |αk−1|αk) · · · ]);FiFo([X → · · · (α1|.α2| · · · |αk−1|αk) · · · ]) :

progr([X → · · · (α1|.α2| · · · |αk−1|αk) · · · ]);...

FiFo([X → · · · (α1|α2| · · · |.αk−1|αk) · · · ]) :progr([X → · · · (α1|α2| · · · |.αk−1|αk) · · · ]);

otherwise progr([X → · · · (α1|α2| · · · |αk−1|.αk) · · · ]);endcase

progr([X → · · · .(α1α2 · · ·αk) · · · ]) =progr([X → · · · (.α1α2 · · ·αk) · · · ]);progr([X → · · · (α1.α2 · · ·αk) · · · ]);

...progr([X → · · · (α1α2 · · · .αk) · · · ]);

progr([X → · · · .(α)∗ · · · ]) =while nextsym in FIRST1(α) do

progr([X → · · · .α · · · ])od

Top-down Syntax Analysis

where a ∈ VT is

progr([X → · · · .a · · · ]) =if nextsym = a then scanelse error�

where Y ∈ VN is

progr([X → · · · .Y · · · ]) = Y