34
Compiler Compiler Construction Construction Sohail Aslam Lecture 5

Compiler Construction

  • Upload
    yuri

  • View
    65

  • Download
    0

Embed Size (px)

DESCRIPTION

Compiler Construction. Sohail Aslam Lecture 5. Lexical Analysis. Recall: Front-End. Output of lexical analysis is a stream of tokens. tokens. source code. IR. scanner. parser. errors. Tokens. Example: if( i == j ) z = 0; else z = 1;. Tokens. - PowerPoint PPT Presentation

Citation preview

Page 1: Compiler  Construction

Compiler Compiler ConstructionConstruction

Compiler Compiler ConstructionConstruction

Sohail Aslam

Lecture 5

Page 2: Compiler  Construction

Lexical AnalysisLexical AnalysisLexical AnalysisLexical Analysis

Page 3: Compiler  Construction

3

Recall: Front-EndRecall: Front-EndRecall: Front-EndRecall: Front-End

Output of lexical analysis is a stream of tokens

scanner parsersourcecode

tokens IR

errors

Page 4: Compiler  Construction

4

TokensTokensTokensTokensExample:

if( i == j )

z = 0;

else

z = 1;

Page 5: Compiler  Construction

5

TokensTokensTokensTokens Input is just a sequence of

characters:

if ( \b i \b = = \b j \n \t ....

Page 6: Compiler  Construction

6

TokensTokensTokensTokens

Goal: partition input string into

substrings classify them according to

their role

Page 7: Compiler  Construction

7

TokensTokensTokensTokens A token is a syntactic

category

Natural language: “He wrote the program”

Words: “He”, “wrote”, “the”, “program”

Page 8: Compiler  Construction

8

TokensTokensTokensTokens Programming language:

“if(b == 0) a = b” Words:

“if”, “(”, “b”, “==”, “0”, “)”, “a”, “=”, “b”

Page 9: Compiler  Construction

9

TokensTokensTokensTokens Identifiers: x y11 maxsize Keywords: if else while for Integers: 2 1000 -44 5L Floats: 2.0 0.0034 1e5 Symbols: ( ) + * / { } < > == Strings: “enter x” “error”

Page 10: Compiler  Construction

10

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Hand-write code to generate

tokens. Partition the input string by

reading left-to-right, recognizing one token at a time

Page 11: Compiler  Construction

11

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Look-ahead required to

decide where one token ends and the next token begins.

Page 12: Compiler  Construction

12

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 13: Compiler  Construction

13

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 14: Compiler  Construction

14

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 15: Compiler  Construction

15

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 16: Compiler  Construction

16

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 17: Compiler  Construction

17

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

Page 18: Compiler  Construction

18

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

Page 19: Compiler  Construction

19

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

Page 20: Compiler  Construction

20

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

Page 21: Compiler  Construction

21

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 22: Compiler  Construction

22

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 23: Compiler  Construction

23

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 24: Compiler  Construction

24

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 25: Compiler  Construction

25

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 26: Compiler  Construction

26

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 27: Compiler  Construction

27

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 28: Compiler  Construction

28

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerboolean idChar(char c){if( isAlpha(c) ) return true;if( isDigit(c) ) return true;if( c == ‘_’ ) return true;

return false;}

Page 29: Compiler  Construction

29

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

Page 30: Compiler  Construction

30

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

Page 31: Compiler  Construction

31

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

Page 32: Compiler  Construction

32

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: Do not know what kind of

token we are going to read from seeing first character.

Page 33: Compiler  Construction

33

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: If token begins with “i”, is it

an identifier “i” or keyword “if”?

If token begins with “=”, is it “=” or “==”?

Page 34: Compiler  Construction

34

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Need a more principled

approach Use lexer generator that

generates efficient tokenizer automatically.