55
UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical Analyzer, Input Buffering. Specification of Tokens, Recognition of Tokens, Design of Lexical Analyzer using Uniform Symbol Table, Lexical Errors. LEX: LEX Specification, Generation of Lexical Analyzer by LEX.

UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

  • Upload
    others

  • View
    30

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

UNIT - III INTRODUCTION TO COMPILERS

Phase structure of Compiler and entire compilation process.

Lexical Analyzer: The Role of the Lexical Analyzer, InputBuffering. Specification of Tokens, Recognition of

Tokens, Design of Lexical Analyzer using Uniform SymbolTable, Lexical Errors.

LEX: LEX Specification, Generation of Lexical Analyzer by LEX.

Page 2: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

What is Compiler?

• Compiler is a software which converts a program written in high level language (Source Language) to low level language (Object/Target/Machine Language).

Page 3: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 4: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 5: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 6: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Symbol Table

• Symbol Table – It is a data structure being used and maintained by the compiler, consists all the identifier’s name along with their types. It helps the compiler to function smoothly by finding the identifiers quickly.

Page 7: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Phases of Compiler

• Structure of compiler has of two parts:

1.Analysis phase(front end )

2.Sysnthesis Phase(back end)

• Front-end constitutes of the Lexical analyzer, semantic analyzer, syntax analyzer and intermediate code generator. And the rest are assembled to form the back end

Page 8: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 9: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

1.Lexical Analyzer

• Lexical Analyzer – It reads the program and converts it into tokens. It converts a stream of lexemes into a stream of tokens. Tokens are defined by regular expressions which are understood by the lexical analyzer. It also removes white-spaces and comments.

• Example: X:=z+y

• 5 token x,=,z,+,y after this stage id1 assign id2 binop id3

Page 10: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Tokens, Patterns, and LexemesToken Sample Lexemes Informal Description of Pattern

const

if

relation

id

num

literal

const

if

<, <=, =, < >, >, >=

pi, count, D2

3.1416, 0, 6.02E23

“core dumped”

const

if

< or <= or = or < > or >= or >

letter followed by letters and digits

any numeric constant

any characters between “ and “ except “

Classifies Pattern

Actual values are critical. Info is :

1. Stored in symbol table

2. Returned to parser

Page 11: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

2.Syntax Analyzer

• It is sometimes called as parser. It constructs the parse tree.

• It takes all the tokens one by one and uses Context Free Grammar to construct the parse tree.

• Why Grammar ?The rules of programming can be entirely represented in some few productions. Using these productions we can represent what the program actually is. The input has to be checked whether it is in the desired format or not.

• Syntax error can be detected at this level if the input is not in accordance with the grammar

Page 12: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

3.Semantic Analyzer

• Semantic Analyzer – It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a verified parse tree.

• Semantic deals with Type checking and constraint with the help of rules.

Page 13: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

4.Intermediate Code Generator

• It act as bridge between the analysis phase and synthesis phase of compilation process.

• It generates intermediate code, that is a form which can be readily executed by machine We have many popular intermediate codes.

• Example – Three address code(TAC), Quadruples, triples,Postfix etc.

• Intermediate code is converted to machine language using the last two phases which are platform dependent. Till intermediate code, it is same for every compiler out there, but after that, it depends on the platform. To build a new compiler we don’t need to build it from scratch. We can take the intermediate code from the already existing compiler and build the last two parts.

Page 14: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

5.Code Optimizer• Code Optimizer – It transforms the code so that it

consumes fewer resources and produces more speed. • The meaning of the code being transformed is not altered. • Optimization can be categorized into two types: machine

dependent and machine independent.• Optimization Technique:• 1.Removing redundant identifiers• 2.Removing unreachable s• ections of code• 3.Identifying common sub expression.• 4.Unfolding loops• 5.Elimnating Procedure

Page 15: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

6.Target Code Generator

• Target Code Generator – The main purpose of Target Code generator is to write a code that the machine can understand.

• The output is dependent on the type of assembler.

• This is the final stage of compilation.

Page 16: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

1.Lexical Analysis

• Lexical Analysis is the first phase of compiler also known as scanner. It converts the input program into a sequence of Tokens.

• Lexical Analysis can be implemented with the Deterministic finite Automata.

• What is a token?A lexical token is a sequence of characters that can be treated as a single logical entity.

Page 17: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

1.Lexical Analysis

• Example of tokens: Keyword, Operators, Constants,Identifiers, Special Symbol.

• Keywords; Examples-for, while, if etc. Identifier; Examples-Variable name, function name etc. Operators; Examples '+', '++', '-' etc. Separators; Examples ',' ';' etc

Page 18: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Tokens, Patterns, Lexemes

• Pattern: A set of strings in the input for which the same token is produced as output. This set of string is described by a rule called a pattern associated with the token.– e.g., id => “letter followed by letters and digits”

• Lexeme: a sequence of characters in the source program that is matched by the pattern for a token

• Examples: int a;

First string:int pattern:int,lexeme:int, token:keyword

Second string:a pattern:[a-zA-Z][a-zA-Z-)-9]* lexeme:atoken: identifier

Page 19: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

It is the first phase of the compiler.

It reads the input characters and produces as output a sequence of tokens that the parser uses for syntax analysis.

It strips out from the source program comments and white spaces in the form of blank , tab and newline characters .

It also correlates error messages from the compiler with the source program (because it keeps track of line numbers).

20

The role of the lexical analyzer

Page 20: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Interaction Of The Lexical Analyzer With The Parser

21

LexicalAnalyzer

ParserSource

Program

Token,tokenval

Symbol Table

Get nexttoken

error error

Page 21: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 22: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 23: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 24: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 25: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 26: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Recognition of Token

Page 27: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Recognition of Token

•Data structure used in Lexical Analyzer,•Terminal Table(TRM)•Identifier Table|(IDN)•Uniform symbol table•Literal Table

Page 28: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Recognition of Token

Page 29: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Lexical Errors

• Lexical error is a sequence of characters that does not match the pattern of any token. Lexical phase error is found during the execution of the program.

•Lexical phase error can be:1. Spelling error.2. Exceeding length of identifier or numeric constants.3. Appearance of illegal characters.4. To remove the character that should be present.5. To replace a character with an incorrect character.6. Transposition of two characters.

Page 30: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Example:Void main(){

int x=10, y=20;char * a;a= &x;x= 1xab;

}In this code, 1xab is neither a number nor an identifier. So this code will show the lexical error.

Lexical Errors

Page 31: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

A lexical error is any input that can be rejected by the lexer.

This generally results from token recognition falling off the end of

the rules you've defined.

For example (in no particular syntax):

[0-9]+ ===> NUMBER token [a-zA-Z] ===> LETTERS token anything else ===> error!

If you think about a lexer as a finite state machine that accepts valid input strings.

If not valid input then ERROR can be generated.

Lexical Errors

Page 32: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Input buffering

The L.A. scan the input string from input from left to right, one character at time.,

The input character is read from secondary storage, but reading in this way is costly.

Hence buffering technique is used.

A block of data is first read into a buffer & then scanned by lexical Analyzer.

Uses 2 Pointer

1.begin_ptr(Bp)

2.forwar_ptr(fp)

forward_ptr moves ahead to search for end of lexme,if blank space is encounter it indicate end of lexme.

Begin pointer point to current lexme.(Lexme_Pointer)

Eof means end of buffer: input is at an end.

Page 33: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 34: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Introduction to Lex

35

Page 35: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• The main job of a lexical analyzer (scanner) is to break up an input stream into more usable elements (tokens)a = b + c * d;

ID ASSIGN ID PLUS ID MULT ID SEMI

• Lex is an utility to help you rapidly generate your scanners

36

What is Lex?

Page 36: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Lex

Page 37: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Lex

Page 38: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• Lexical analyzers tokenize input streams

• Tokens are the terminals of a language

– English

• words, punctuation marks, …

– Programming language

• Identifiers, operators, keywords, …

• Regular expressions define terminals/tokens

39

Lex – Lexical Analyzer

Page 39: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

40

An Overview of Lex

Lex

C compiler

a.out

Lex source program

lex.yy.c

input

lex.yy.c

a.out

tokens

Page 40: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
Page 41: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• Lex source is separated into three sections by %%delimiters

• The general format of Lex source is

• The absolute minimum Lex program is thus

42

Lex Source

(optional)

(required)

{definitions}

%%

{transition rules}

%%

{user subroutines}

%%

Page 42: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• Lex source is a table of

– regular expressions and

– corresponding program fragments

43

Lex Source Program

digit [0-9]

letter [a-zA-Z]

%%

{letter}({letter}|{digit})* printf(“id: %s\n”, yytext);

\n printf(“new line\n”);

%%

main() {

yylex();

}

Page 43: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Regular Expressions

44

Page 44: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• A regular expression matches a set of strings

• Regular expression

– Operators

– Character classes

– Arbitrary character

– Optional expressions

– Alternation and grouping

– Context sensitivity

– Repetitions and definitions

45

Lex Regular Expressions (Extended Regular Expressions)

Page 45: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• [abc] matches a single character, which may be a, b, or c

• Every operator meaning is ignored except \ - and ^

• e.g.

[ab] => a or b

[a-z] => a or b or c or … or z

[-+0-9] => all the digits and the two signs

[^a-zA-Z] => any character which is not a

letter

46

Character Classes []

Page 46: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Pattern Matching PrimitivesMetacharacter Matches

. any character except newline

\n newline

* zero or more copies of the preceding expression

+ one or more copies of the preceding expression

? zero or one copy of the preceding expression

^ beginning of line / complement

$ end of line

a|b a or b

(ab)+ one or more copies of ab (grouping)

[ab] a or b

a{3} 3 instances of a

“a+b” literal “a+b” (C escapes still work) 47

Page 47: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• regexp <one or more blanks> action (C code);

• regexp <one or more blanks> { actions (C code) }

• A null statement ; will ignore the input (no actions)

[ \t\n];

48

Transition Rules

Page 48: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• | indicates that the action for this rule is from the action for the next rule[ \t\n] ;

“ “ |

“\t” |

“\n” ;

49

Transition Rules (cont’d)

Page 49: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• yytext -- a string containing the lexeme

• yyleng -- the length of the lexeme

• yyin -- the input stream pointer – the default input of default main() is stdin

• yyout -- the output stream pointer– the default output of default main() is stdout.

• cs20: %./a.out < inputfile > outfile

• E.g. [a-z]+ printf(“%s”, yytext);

[a-z]+ ECHO;

[a-zA-Z]+ {words++; chars += yyleng;}

50

Lex Predefined Variables

Page 50: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• yylex()– The default main() contains a call of yylex()

• yymore()– return the next token

• yyless(n)– retain the first n characters in yytext

• yywarp()– is called whenever Lex reaches an end-of-file

– The default yywarp() always returns 1

51

Lex Library Routines

Page 51: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Review of Lex Predefined Variables

Name Function

char *yytext pointer to matched string

int yyleng length of matched string

FILE *yyin input stream pointer

FILE *yyout output stream pointer

int yylex(void) call to invoke lexer, returns token

char* yymore(void) return the next token

int yyless(int n) retain the first n characters in yytext

int yywrap(void) wrapup, return 1 if done, 0 if not done

ECHO write matched string

REJECT go to the next alternative rule

INITAL initial start condition

BEGIN condition switch start condition52

Page 52: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• You can use your Lex routines in the same ways you use routines in other programming languages.

53

User Subroutines Section

%{

void foo();

%}

letter [a-zA-Z]

%%

{letter}+ foo();

%%

void foo() {

}

Page 53: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• The section where main() is placed

54

User Subroutines Section (cont’d)

%{

int counter = 0;

%}

letter [a-zA-Z]

%%

{letter}+ {printf(“a word\n”); counter++;}

%%

main() {

yylex();

printf(“There are total %d words\n”, counter);

}

Page 54: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• To run Lex on a source file, typelex scanner.l

• It produces a file named lex.yy.c which is a C program for the lexical analyzer.

• To compile lex.yy.c, typecc lex.yy.c –ll

• To run the lexical analyzer program, type

./a.out < inputfile

55

Usage

Page 55: UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

• AT&T -- lexhttp://www.combo.org/lex_yacc_page/lex.html

• GNU -- flexhttp://www.gnu.org/manual/flex-2.5.4/flex.html

• a Win32 version of flex :http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html

• Lex on different machines is not created equal.

56

Versions of Lex