48
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands

Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Compiler constructionin4020 – lecture 3

Koen Langendoen

Delft University of TechnologyThe Netherlands

Summary of lecture 2

• lexical analyzer generator• description FSA

• FSA construction• dotted items

• character moves

• moves

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

scanner

generator

token

description

Quiz

2.24 Will the function identify() (to access a symbol table) still work if the hash function maps all identifiers onto the same number?

2.26 Tutor X insists that macro processing must be implemented as a separate phase between reading the program and the lexical analysis. Show mr. X wrong.

• syntax analysis: tokens AST

• AST construction• by hand: recursive descent

• automatic: top-down (LLgen), bottom-up (yacc)

Overview

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

parser

generator

language

grammar

Syntax analysis

• parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together.

• result: parse tree

‘b’

identifier

expression

term

factor

term

‘b’

factor

identifier

‘4’

constant

term

factor

term

‘a’

factor

identifier

term

factor

identifier

expression

‘c’‘*’

‘-’

‘*’

‘*’

syn-tax: the way in which words are put together to form phrases, clauses, or sentences.

Webster’s Dictionary

Syntax analysis

• parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together.

• result: parse tree

‘b’

identifier

expression

term

factor

term

‘b’

factor

identifier

‘4’

constant

term

factor

term

‘a’

factor

identifier

term

factor

identifier

expression

‘c’‘*’

‘-’

‘*’

‘*’

Context free grammar

• G = (VN, VT, S, P)

• VN : set of Non-terminal symbols

• VT : set of Terminal symbols

• S : Start symbol (S VN)

• P : set of Production rules {N }

• VN VT =

• P = {N | N VN (VN VT)*}

Top-down parsing

• process tokens left to right

• expression grammarinput expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |

• example expression

input

expression EOF

term rest_expression

IDENTIFIER

aap + ( noot + mies )

rest_expression

expression

rest_exprterm

Bottom-up parsing

• process tokens left to right

• expression grammarinput expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |

IDENT

• • •

IDENT

• •

IDENT

• •aap + ( noot + mies )

Comparison

top-down bottom-up

node creation pre-order post-order

alternative selection first token last token

grammar type restricted

LL(1)

relaxed

LR(1)

implementation manual + automatic

automatic

5

1 4

2 3

1

2 3

4 5

Recursive descent parsing

• each rule N translates to a boolean function• return true if a terminal production of N was matched

• return false otherwise (without consuming any token)

• try alternatives of N in turn

• a terminal symbol must match the current token

• a non-terminal is matched by calling its routine

input expression EOF

int input(void) {

return expression() && require(token(EOF));

}

Recursive descent parsing

expression term rest_expression

int expression(void) {

return term() && require(rest_expression());

}

term IDENTIFIER | ‘(’ expression ‘)’

int term(void) {

return token(IDENTIFIER) ||

token('(') && require(expression()) && require(token(')'));

}

Recursive descent parsing

auxiliary functions• consume matched tokens

• report syntax errors

int token(int tk) {

if (Token.class != tk) return 0;

get_next_token(); return 1;

}

int require(int found) {

if (!found) error();

return 1;

}

rest_expression ‘+’ expression |

int rest_expression(void) {

return token('+') && require(expression()) || 1;

}

Automatic top-down parsing

• follow recursive descent scheme, but avoid interpretation overhead

• for each rule and alternative determine the tokens it can start with: FIRST set

• parsing scheme for rule N A1 | A2 | …• if token FIRST(N) then ERROR

• if token FIRST(A1) then parse A1

• if token FIRST(A2) then parse A2

• …

Exercise (7 min.)

• design an algorithm to compute the FIRST sets of all non-terminals in a context free grammar.

• hint: consider the types of rules• alternatives

• composition

• empty productions

input expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |

Answers

Answers (Fig 2.58, page 122)

• N w (w VT )

FIRST(N) = {w}

• N A1 | A2 | …

FIRST(N) = FIRST(Ai )

• N FIRST(N) = {}

• N A FIRST(N) = FIRST(A ) , FIRST(A)

FIRST(N) = FIRST(A ) \ {} FIRST() , otherwise

closure algorithm

Break

Predictive parsing

• similar to recursive descent, but no back-tracking

• functions “know” what they are doing

input expression EOF FIRST(expression) = {IDENT, ‘(‘}

void input(void) { switch (Token.class) { case IDENT: case '(': expression(); token(EOF); break; default: error();

}

}

void token(int tk) {

if (Token.class != tk) error();

get_next_token();

}

Predictive parsing

expression term rest_expression FIRST(term) = {IDENT, ‘(‘}

void expression(void) { switch (Token.class) { case IDENT: case '(': term(); rest_expression(); break; default: error();

}

}

term IDENTIFIER | ‘(’ expression ‘)’void term(void) { switch (Token.class) { case IDENT: token(IDENT); break; case '(': token('('); expression(); token(')'); break; default: error();

}

}

Predictive parsing

• FIRST() = {}• check nothing?

• NO: token FOLLOW(rest_expr)

rest_expression ‘+’ expression | FIRST(rest_expr) = {‘+’, }void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error();

}

}

FOLLOW(rest_expr) = {EOF, ‘)’}

Limitations of LL(1) parsers

• FIRST/FIRST conflictterm IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ | ‘(’ expression ‘)’

• FIRST/FOLLOW conflictS A ‘a’ ‘b’

A ‘a’ |

• left recursionexpression expression ‘-’ term | term

FIRST(A) = { ‘a’ } = FOLLOW(A)

Making grammars LL(1)

• manual labour• rewrite grammar

• adjust semantic actions

• three rewrite methods• left factoring

• substitution

• left-recursion removal

Left factoring

term IDENTIFIER

| IDENTIFIER ‘[‘ expression ‘]’

• factor out common prefix

term IDENTIFIER after_identifier

after_identifier | ‘[‘ expression ‘]’

‘[’ FOLLOW(after_identifier)

Substitution

S A ‘a’ ‘b’

A ‘a’ |

• replace non-terminal by its alternative

S ‘a’ ‘a’ ‘b’ | ‘a’ ‘b’

Left-recursion removal

N N |

• replace by

N M

M M |

• example

expression expression ‘-’ term | term

...

expression term tail

tail ‘-’ term tail |

N

Exercise (7 min.)

• make the following grammar LL(1)

expression expression ‘+’ term | expression ‘-’ term | termterm term ‘*’ factor | term ‘/’ factor | factorfactor ‘(‘ expression ‘)’ | func-call | identifier | constantfunc-call identifier ‘(‘ expr-list? ‘)’expr-list expression (‘,’ expression)*

• and what about

S if E then S (else S)?

Answers

Answers

• substitutionF ‘(‘ E ‘)’ | ID ‘(‘ expr-list? ‘)’ | ID | constant

• left factoringE E ( ‘+’ | ‘-’ ) T | TT T ( ‘*’ | ‘/’ ) F | FF ‘(‘ E ‘)’ | ID ( ‘(‘ expr-list? ‘)’ )? | constant

• left recursion removalE T (( ‘+’ | ‘-’ ) T )*T F (( ‘*’ | ‘/’ ) F )*

• if-then-else grammar is ambiguous

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

parser

generator

language

grammar

automatic generation

LL(1) push-down

automaton

LL(1) push-down automaton

• stack right-hand side of production

transition tablestate

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

LL(1) push-down automaton

aap + ( noot + mies ) EOF

input

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

LL(1) push-down automaton

aap + ( noot + mies ) EOF

input

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

replace non-terminal by transition entry

LL(1) push-down automaton

aap + ( noot + mies ) EOF

expression EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

LL(1) push-down automaton

aap + ( noot + mies ) EOF

expression EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

replace non-terminal by transition entry

LL(1) push-down automaton

aap + ( noot + mies ) EOF

term rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

LL(1) push-down automaton

aap + ( noot + mies ) EOF

term rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

replace non-terminal by transition entry

LL(1) push-down automaton

aap + ( noot + mies ) EOF

IDENT rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

LL(1) push-down automaton

aap + ( noot + mies ) EOF

IDENT rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

pop matching token

LL(1) push-down automaton

+ ( noot + mies ) EOF

rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

LL(1) push-down automaton

+ ( noot + mies ) EOF

rest-expr EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

LL(1) push-down automaton

+ ( noot + mies ) EOF

+ expression EOF

input

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

LL(1) push-down automaton

+ ( noot + mies ) EOFinput

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

+ expression EOF

LL(1) push-down automaton

( noot + mies ) EOFinput

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

expression EOF

LL(1) push-down automaton

( noot + mies ) EOFinput

prediction stack

state

(top of stack)

look-ahead token

IDENT + ( ) EOF

input expression EOF expression EOF

expression term rest-expr term rest-expr

term IDENT ( expression )

rest-expr + expression

expression EOF

LLgen

• top-down parser generator

• to be used in assignment #1

• discussed in lecture 5

• syntax analysis: tokens AST

• top-down parsing• recursive descent• push-down automaton• making grammars LL(1)

Summary

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

parser

generator

language

grammar

Homework

• study sections:• 1.10 closure algorithm

• 2.2.4.6 error handling in LL(1) parsers

• print handout for next week [blackboard]

• find a partner for the “practicum”

• register your group• send e-mail to [email protected]