35
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands

Compiler construction in4020 – lecture 2

  • Upload
    amina

  • View
    38

  • Download
    3

Embed Size (px)

DESCRIPTION

Compiler construction in4020 – lecture 2. Koen Langendoen Delft University of Technology The Netherlands. program in some source language. executable code for target machine. semantic represen- tation. front-end analysis. back-end synthesis. compiler. Summary of lecture 1. - PowerPoint PPT Presentation

Citation preview

Page 1: Compiler construction in4020 –  lecture 2

Compiler constructionin4020 – lecture 2

Koen Langendoen

Delft University of TechnologyThe Netherlands

Page 2: Compiler construction in4020 –  lecture 2

Summary of lecture 1

• compiler is a structured toolbox

• front-end: program text annotated AST

• back-end: annotated AST executable code

• lexical analysis: program text tokens• token specifications

• implementation by hand

program

in some

source

language

front-endanalysis

semanticrepresen-

tation

executable

code for

target

machine

back-endsynthesis

compiler

Page 3: Compiler construction in4020 –  lecture 2

Quiz

2.7 What does the regular expression a?* mean? And a** ?

Are these expressions erroneous?

Are they ambiguous?

Page 4: Compiler construction in4020 –  lecture 2

Overview

• Generating a lexical analyzer• generic methods

• specific tool lex

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

scanner

generator

token

description

Page 5: Compiler construction in4020 –  lecture 2

Token description

• (f)lex: scanner generator for UNIX• token description C code

• format of the lex input file:

definitions

%%

rules

%%

user code

regular descriptions

regular expressions + actions

auxiliary C-code

Page 6: Compiler construction in4020 –  lecture 2

Lex description to recognize integers

• an integer is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal).

• base [bo]integer digit+ base?

• rule = expr + action

• {} signal applicationof a description

%{

#include "lex.h"

%}

base [bo]

digit [0-9]

%%

{digit}+ {base}? {return INTEGER;}

%%

Page 7: Compiler construction in4020 –  lecture 2

Lexresulting C-code• char yytext[]; /* token representation */

• int yylex(void); /* returns type of next token */

• wrapper function

to add token

attributes

%%

\n {line_number++;}

%%

void get_next_token(void) {

Token.class = yylex();

if (Token.class == 0) {

Token.class = EOF;

Token.repr = "<EOF>";

return;

}

Token.pos.line_number = line_number;

Token.repr = strdup(yytext);

}

Page 8: Compiler construction in4020 –  lecture 2

automatic generation

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

scanner

generator

token

description

finite state automaton

S0

‘.’digit

S2

digit

S3

digit

digit

S1

‘.’

Page 9: Compiler construction in4020 –  lecture 2

Finite-state automaton

• Recognize input character by character

• Transfer between states

• FSA• Initial state S0

• set of accepting states

• transition function: State x Char State

S0‘i’

S1 S2‘f’

Page 10: Compiler construction in4020 –  lecture 2

FSA examples

• integral_number [0-9]+

• fixed_point_number [0-9]* ‘.’ [0-9]+

digit

S0‘.’ digit

S2

digit

S3

digitS0

digit

S1

Page 11: Compiler construction in4020 –  lecture 2

Concurrent recognition

• integral_number [0-9]+

• fixed_point_number [0-9]* ‘.’ [0-9]+

• recognize both

tokens in one pass

digit

S0‘.’ digit

S2

digit

S3

digitS0

digit

S1

Page 12: Compiler construction in4020 –  lecture 2

Concurrent recognition

• integral_number [0-9]+

• fixed_point_number [0-9]* ‘.’ [0-9]+

• naïve approach:

merge initial states

digit

S0

‘.’

digitS2

digit

S3

digit

digit

S1

Page 13: Compiler construction in4020 –  lecture 2

Concurrent recognition

• integral_number [0-9]+

• fixed_point_number [0-9]* ‘.’ [0-9]+

• correct approach:

share common

prefix transitions

S0

‘.’

digitS2

digit

S3

digit

digit

S1

‘.’

Page 14: Compiler construction in4020 –  lecture 2

FSA implementation:transition table

• concurrent recognition of integers and fixed point numbers

state

character

recognized token

digit dot other

S0 S1 S2 -S1 S1 S2 - integerS2 S3 - -S3 S3 - - fixed point

S0

‘.’

digitS2

digit

S3

digit

digit

S1

‘.’

Page 15: Compiler construction in4020 –  lecture 2

FSA exercise (6 min.)

• draw an FSA to recognize integers

base [bo]integer digit+ base?

• draw an FSA to recognize the regular expression (a|b)*bab

Page 16: Compiler construction in4020 –  lecture 2

Answers

Page 17: Compiler construction in4020 –  lecture 2

Answers

• integer

• (a|b)*bab

S2 S3S0 S1

a

b

b

b

b

a

a

a

digitS0

digit

S1 S2[bo]

Page 18: Compiler construction in4020 –  lecture 2

Break

Page 19: Compiler construction in4020 –  lecture 2

Automatic generation:description FSA

• start with initial set (S0) of all token descriptions to be recognized

• for each character (ch) • find the set (Sch) of descriptions that can start

with ch

• extend the FSA with transition (S0,ch, Sch)

• repeat adding transitions (to Sch ) until no new set is generated

Page 20: Compiler construction in4020 –  lecture 2

Dotted items

• keeping track of matched characters in a token description: T R

regular expression

input

already matched still to be matched

T

Page 21: Compiler construction in4020 –  lecture 2

Types of dotted items

• shift item: dot in front of a basic pattern• if ‘i’ ‘f’• if ‘i’ ‘f’• identifier [a-z] [a-z0-9]*

• reduce item: dot at the end• if ‘i’ ‘f’ • identifier [a-z] [a-z0-9]*

• non-basic item: dot in front of repeated pattern or parenthesis• identifier [a-z] [a-z0-9]*

Page 22: Compiler construction in4020 –  lecture 2

Character moves

input T c c

input c T c

Page 23: Compiler construction in4020 –  lecture 2

Character moves

• T c • T [class] • T .

input T c c

input c T c

c

c class

T c

T . T [class]

Page 24: Compiler construction in4020 –  lecture 2

moves

T (R)? T (R)?

T ( R)?

T (R)* T (R)*

T ( R)*

T (R )* T (R)*

T ( R)*

T (R )? T (R)?

Page 25: Compiler construction in4020 –  lecture 2

moves

T (R)+ T ( R)+

T (R )+ T (R)+

T (R1|R2|…) T ( R1|R2|…)

T (R1| R2|…)

T (R1 |R2|…) T (R1|R2|…)

… … …

T ( R)+

Page 26: Compiler construction in4020 –  lecture 2

FSA construction

• a state corresponds to a set of basic items

• a character move yields a new set

• expand non-basic items into basic items

using moves

• see if the resulting set was produced before, if not introduce a new state

• add transition

Page 27: Compiler construction in4020 –  lecture 2

ExampleFSA construction

• tokens• integer: I (D)+

• fixed-point: F (D)* ‘.’ (D)+

• initial state

I (D)+

F (D)* ‘.’ (D)+

I ( D)+

F ( D)* ‘.’ (D)+

F (D)* ‘.’ (D)+

S0

moves

Page 28: Compiler construction in4020 –  lecture 2

F (D)* ‘.’ (D )+

F (D)* ‘.’ (D)+ F (D)* ‘.’ ( D)+

S3

D

I (D )+F (D )* ‘.’ (D)+

ExampleFSA construction

• character moves

I ( D)+

F ( D)* ‘.’ (D)+

F (D)* ‘.’ (D)+

S0

F (D)* ‘.’ (D)+F (D)* ‘.’ ( D)+

I (D)+ I ( D)+

F (D )* ‘.’ (D)+

I (D)+ I ( D)+

F (D)* ‘.’ (D)+

F ( D)* ‘.’ (D)+

S1

D

‘.’

S2

‘.’

D

D

Page 29: Compiler construction in4020 –  lecture 2

ExerciseFSA construction (7 min.)

• draw the FSA (with item sets) for recognizing an identifier:

identifier letter (letter_or_digit_or_und* letter_or_digit+)?

• extend the above FSA to recognize the keyword ‘if’ as well.

if ‘i’ ‘f’

Page 30: Compiler construction in4020 –  lecture 2

Answers

Page 31: Compiler construction in4020 –  lecture 2

Answers

ID L (( LDU)* (LD)+)?

ID L ((LDU)* ( LD)+)?S2

ID L ((LDU)* (LD)+)?S0

ID L ((LDU)* (LD)+)? ID L (( LDU)* (LD)+)?

ID L ((LDU)* ( LD)+)?S1

L

LDU

LD

U

‘i’

LD

‘f’

LD

U

U

S3

S4

accepting states S1 and S4

Page 32: Compiler construction in4020 –  lecture 2

Transition table compression

state

characterrecognized

token‘i’ ‘f’ L D U

S0 S3 S1 S1 - -

S1 S1 S1 S1 S1 S2 identifier

S2 S1 S1 S1 S1 S2

S3 S1 S4 S1 S1 S2

S4 S1 S1 S1 S1 S2 keyword if

Page 33: Compiler construction in4020 –  lecture 2

Transition table compression

• redundant rows

• empty transitions

state

characterrecognized

token‘i’ ‘f’ L D U

S0 S3 S1 S1 - -

S1 S1 S1 S1 S1 S2 identifier

S2 S1 S1 S1 S1 S2

S3 S1 S4 S1 S1 S2

S4 S1 S1 S1 S1 S2 keyword if

S1S4 S1S1S2S1 S1 S2S1S1S1S3

S0

S1

S2

S4 S3

row displacement

Page 34: Compiler construction in4020 –  lecture 2

Summary: generating a lexical analyzer

• tool: lex• token descriptions + actions

• wrapper interface

• FSA construction• dotted items

• character moves

• moves

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

scanner

generator

token

description

Page 35: Compiler construction in4020 –  lecture 2

Homework

• study sections 2.1.10 – 2.1.12• lexical identification of tokens• symbol tables• macro processing

• print handout lecture 3 [blackboard]

• find a partner for the “practicum”• register your group

• send e-mail to [email protected]