Upload
yuri
View
65
Download
0
Embed Size (px)
DESCRIPTION
Compiler Construction. Sohail Aslam Lecture 5. Lexical Analysis. Recall: Front-End. Output of lexical analysis is a stream of tokens. tokens. source code. IR. scanner. parser. errors. Tokens. Example: if( i == j ) z = 0; else z = 1;. Tokens. - PowerPoint PPT Presentation
Citation preview
Compiler Compiler ConstructionConstruction
Compiler Compiler ConstructionConstruction
Sohail Aslam
Lecture 5
Lexical AnalysisLexical AnalysisLexical AnalysisLexical Analysis
3
Recall: Front-EndRecall: Front-EndRecall: Front-EndRecall: Front-End
Output of lexical analysis is a stream of tokens
scanner parsersourcecode
tokens IR
errors
4
TokensTokensTokensTokensExample:
if( i == j )
z = 0;
else
z = 1;
5
TokensTokensTokensTokens Input is just a sequence of
characters:
if ( \b i \b = = \b j \n \t ....
6
TokensTokensTokensTokens
Goal: partition input string into
substrings classify them according to
their role
7
TokensTokensTokensTokens A token is a syntactic
category
Natural language: “He wrote the program”
Words: “He”, “wrote”, “the”, “program”
8
TokensTokensTokensTokens Programming language:
“if(b == 0) a = b” Words:
“if”, “(”, “b”, “==”, “0”, “)”, “a”, “=”, “b”
9
TokensTokensTokensTokens Identifiers: x y11 maxsize Keywords: if else while for Integers: 2 1000 -44 5L Floats: 2.0 0.0034 1e5 Symbols: ( ) + * / { } < > == Strings: “enter x” “error”
10
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Hand-write code to generate
tokens. Partition the input string by
reading left-to-right, recognizing one token at a time
11
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Look-ahead required to
decide where one token ends and the next token begins.
12
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
13
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
14
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
15
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
16
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;
next = s.read(); }
17
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...
18
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...
19
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...
20
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...
21
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
22
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
23
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
24
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
25
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
26
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
27
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);
id = id + string(c); }}
28
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerboolean idChar(char c){if( isAlpha(c) ) return true;if( isDigit(c) ) return true;if( c == ‘_’ ) return true;
return false;}
29
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);
num = num+string(next); }}
30
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);
num = num+string(next); }}
31
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);
num = num+string(next); }}
32
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: Do not know what kind of
token we are going to read from seeing first character.
33
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: If token begins with “i”, is it
an identifier “i” or keyword “if”?
If token begins with “=”, is it “=” or “==”?
34
Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Need a more principled
approach Use lexer generator that
generates efficient tokenizer automatically.