164
Lexical Analysis April 3, 2013 Wednesday, April 3, 13

Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Lexical AnalysisApril 3, 2013

Wednesday, April 3, 13

Page 2: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Previously on CSE 131b...

The Structure of a Modern Compiler

Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization

SourceCode

Machine

Code

Structure of a modern compiler

Wednesday, April 3, 13

Page 3: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Where are we?

Where We Are

Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization

SourceCode

MachineCode

Wednesday, April 3, 13

Page 4: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

Input: code (character stream)

Goal of Lexical AnalysisBreaking the program down into words or “tokens”

Wednesday, April 3, 13

Page 5: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

T_While ( T_Ident < T_Ident ) ++ T_Ident

ip z ip

Goal of Lexical AnalysisOutput: Token Stream

Wednesday, April 3, 13

Page 6: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

T_While ( T_Ident < T_Ident ) ++ T_Ident

ip z ip

While

++

Ident

<

Ident Ident

ip z ip

The Token Stream is then used as input for Parser (syntax analysis)

Wednesday, April 3, 13

Page 7: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

What’s a token?

• What’s a lexical unit of code?

Wednesday, April 3, 13

Page 8: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Wednesday, April 3, 13

Page 9: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 10: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 11: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 12: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 13: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 14: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 15: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 16: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 17: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 18: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 19: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 20: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 21: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 22: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 23: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

What is my name ?

Wednesday, April 3, 13

Page 24: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Token Type

Wednesday, April 3, 13

Page 25: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Token Type

• Keyword: for int if else while

Wednesday, April 3, 13

Page 26: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Token Type

• Keyword: for int if else while

• Punctuation: ( ) { } ;

Wednesday, April 3, 13

Page 27: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Token Type

• Keyword: for int if else while

• Punctuation: ( ) { } ;

• Operand: + - ++

Wednesday, April 3, 13

Page 28: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Token Type

• Keyword: for int if else while

• Punctuation: ( ) { } ;

• Operand: + - ++

• Relation: < > =

Wednesday, April 3, 13

Page 29: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Token Type

• Keyword: for int if else while

• Punctuation: ( ) { } ;

• Operand: + - ++

• Relation: < > =

• Identifier: (variable name,function name) foo foo_2

Wednesday, April 3, 13

Page 30: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Token Type

• Keyword: for int if else while

• Punctuation: ( ) { } ;

• Operand: + - ++

• Relation: < > =

• Identifier: (variable name,function name) foo foo_2

• Integer, float point, string: 2345 2.0 “hello world”

Wednesday, April 3, 13

Page 31: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Token Type

• Keyword: for int if else while

• Punctuation: ( ) { } ;

• Operand: + - ++

• Relation: < > =

• Identifier: (variable name,function name) foo foo_2

• Integer, float point, string: 2345 2.0 “hello world”

• Whitespace, comment /* this code is awesome */Wednesday, April 3, 13

Page 32: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Wednesday, April 3, 13

Page 33: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Wednesday, April 3, 13

Page 34: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Wednesday, April 3, 13

Page 35: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Wednesday, April 3, 13

Page 36: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Wednesday, April 3, 13

Page 37: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Wednesday, April 3, 13

Page 38: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7

Wednesday, April 3, 13

Page 39: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( 1 < i ) \n \t + i ;3 + +

T_While

7

Wednesday, April 3, 13

Page 40: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( 1 < i ) \n \t + i ;3 + +

T_While

7

Token

Wednesday, April 3, 13

Page 41: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( 1 < i ) \n \t + i ;3 + +

T_While

7

Token

Lexeme: the piece of the original program from which we made the token

Wednesday, April 3, 13

Page 42: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( 1 < i ) \n \t + i ;3 + +

T_While

7

( T_IntConst

137

Wednesday, April 3, 13

Page 43: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning a Source File

w h i l e ( 1 < i ) \n \t + i ;3 + +

T_While

7

( T_IntConst

137

Some tokens can have

attributes that store

extra information about

the token. Here we

store which integer is

represented.

Some tokens can have

attributes that store

extra information about

the token. Here we

store which integer is

represented.

Wednesday, April 3, 13

Page 44: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Lexical Analyzer

• Recognize substrings that correspond to tokens: lexemes

• Lexeme: actual text of the token

• For each lexeme, identify token type

• < Token type, attribute>

• attribute: optional, extra information, often numeric value

Wednesday, April 3, 13

Page 45: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Why is this process hard?

Wednesday, April 3, 13

Page 46: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning is Hard

● FORTRAN: Whitespace is irrelevant

DO 5 I = 1,25

DO5I = 1.25

Thanks to Prof. Alex Aiken

Wednesday, April 3, 13

Page 47: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning is Hard

● C++: Nested template declarations

vector<vector<int>> myVector

Thanks to Prof. Alex Aiken

Wednesday, April 3, 13

Page 48: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning is Hard

● C++: Nested template declarations

vector < vector < int >> myVector

Thanks to Prof. Alex Aiken

Wednesday, April 3, 13

Page 49: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Scanning is Hard

● C++: Nested template declarations

(vector < (vector < (int >> myVector)))

● Again, can be difficult to determine where to split.

Thanks to Prof. Alex Aiken

Wednesday, April 3, 13

Page 50: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Challenges for Lexical Analyzer

• How do we determine which lexemes are associated with each token?

• When there are multiple ways we could scan the input, how do we know which one to pick?

• if if1

• How do we address these concerns efficiently?

Wednesday, April 3, 13

Page 51: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Associate Lexemes to Tokens

• Tokens: categorize lexemes by what information they provide.

• Associate lexemes to token: Pattern matching

• How to describe patterns??

Wednesday, April 3, 13

Page 52: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Token: Lexemes

• Keyword: for int if else while

• Punctuation: ( ) { } ;

• Operand: + - ++

• Relation: < > =

• Identifier: (variable name,function name) foo foo_2

• Integer, float point, string: 2345 2.0 “hello world”

• Whitespace, comment /* this code is awesome */Wednesday, April 3, 13

Page 53: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Token: Lexemes

• Keyword: for int if else while

• Punctuation: ( ) { } ;

• Operand: + - ++

• Relation: < > =

• Identifier: (variable name,function name) foo foo_2

• Integer, float point, string: 2345 2.0 “hello world”

• Whitespace, comment /* this code is awesome */

Finite possible lexemes

Wednesday, April 3, 13

Page 54: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Token: Lexemes

• Keyword: for int if else while

• Punctuation: ( ) { } ;

• Operand: + - ++

• Relation: < > =

• Identifier: (variable name,function name) foo foo_2

• Integer, float point, string: 2345 2.0 “hello world”

• Whitespace, comment /* this code is awesome */

Finite possible lexemes

Infinite possible lexemes

Wednesday, April 3, 13

Page 55: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

• How do we describe which (potentially infinite) set of lexemes is associated with each token type?

Wednesday, April 3, 13

Page 56: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Formal Languages

● A formal language is a set of strings.

● Many infinite languages have finite descriptions:

● Define the language using an automaton.

● Define the language using a grammar.

● Define the language using a regular expression.

● We can use these compact descriptions of the language to define sets of strings.

● Over the course of this class, we will use all of these approaches.

Wednesday, April 3, 13

Page 57: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

• What type of formal language should we use to describe tokens?

Wednesday, April 3, 13

Page 58: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Regular Expressions

● Regular expressions are a family of descriptions that can be used to capture certain languages (the regular languages).

● Often provide a compact and human-readable description of the language.

● Used as the basis for numerous software systems, including the flex tool we will use in this course.

Wednesday, April 3, 13

Page 59: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Atomic Regular Expressions

● The regular expressions we will use in this course begin with two simple building blocks.

● The symbol ε is a regular expression matches the empty string.

● For any symbol a, the symbol a is a regular expression that just matches a.

Wednesday, April 3, 13

Page 60: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Compound Regular Expressions

● If R1 and R

2 are regular expressions, R

1R

2 is a regular

expression represents the concatenation of the languages of R

1 and R

2.

● If R1 and R

2 are regular expressions, R

1 | R

2 is a regular

expression representing the union of R1 and R

2.

● If R is a regular expression, R* is a regular expression for the Kleene closure of R.

● If R is a regular expression, (R) is a regular expression with the same meaning as R.

Wednesday, April 3, 13

Page 61: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Simple Regular Expressions

● Suppose the only characters are 0 and 1.

● Here is a regular expression for strings containing 00 as a substring:

(0 | 1)*00(0 | 1)*

Wednesday, April 3, 13

Page 62: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Simple Regular Expressions

● Suppose the only characters are 0 and 1.

● Here is a regular expression for strings containing 00 as a substring:

(0 | 1)*00(0 | 1)*

Wednesday, April 3, 13

Page 63: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Simple Regular Expressions

● Suppose the only characters are 0 and 1.

● Here is a regular expression for strings containing 00 as a substring:

(0 | 1)*00(0 | 1)*

110111001010000

11111011110011111

Wednesday, April 3, 13

Page 64: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Simple Regular Expressions

● Suppose the only characters are 0 and 1.

● Here is a regular expression for strings containing 00 as a substring:

(0 | 1)*00(0 | 1)*

110111001010000

11111011110011111

Wednesday, April 3, 13

Page 65: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Applied Regular Expressions

● Suppose that our alphabet is all ASCII characters.

● A regular expression for even numbers is

(+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)?

Wednesday, April 3, 13

Page 66: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Applied Regular Expressions

● Suppose that our alphabet is all ASCII characters.

● A regular expression for even numbers is

(+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)

Wednesday, April 3, 13

Page 67: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Applied Regular Expressions

● Suppose that our alphabet is all ASCII characters.

● A regular expression for even numbers is

42+1370-3248

-9999912

(+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)

Wednesday, April 3, 13

Page 68: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Wednesday, April 3, 13

Page 69: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

• More examples

• Whitespace: [ \t\n]+

• Integers: [+\-]?[0-9]+

• Hex numbers: 0x[0-9a-f]+

• identifier

Wednesday, April 3, 13

Page 70: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

• More examples

• Whitespace: [ \t\n]+

• Integers: [+\-]?[0-9]+

• Hex numbers: 0x[0-9a-f]+

• identifier

• [A-Za-z]([A-Za-z]|[0-9])*

Wednesday, April 3, 13

Page 71: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

• Use regular expressions to describe token types

• How do we match regular expressions?

Wednesday, April 3, 13

Page 72: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Recognizing Regular Language

What is the machine that recognize regular language??

Wednesday, April 3, 13

Page 73: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Recognizing Regular Language

• Finite Automata

• DFA (Deterministic Finite Automata)

• NFA (Non-deterministic Finite Automata)

What is the machine that recognize regular language??

Wednesday, April 3, 13

Page 74: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

Wednesday, April 3, 13

Page 75: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

Each circle is a state of the

automaton. The automaton's

configuration is determined

by what state(s) it is in.

Each circle is a state of the

automaton. The automaton's

configuration is determined

by what state(s) it is in.

A Simple Automaton

Wednesday, April 3, 13

Page 76: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

These arrows are called

transitions. The automaton

changes which state(s) it is in

by following transitions.

These arrows are called

transitions. The automaton

changes which state(s) it is in

by following transitions.

A Simple Automaton

Wednesday, April 3, 13

Page 77: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Finite Automata: Takes an input string and determines whether it’s a valid sentence of a language

accept or reject

Wednesday, April 3, 13

Page 78: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

The automaton takes a string

as input and decides whether

to accept or reject the string.

The automaton takes a string

as input and decides whether

to accept or reject the string.

Wednesday, April 3, 13

Page 79: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 80: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 81: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 82: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 83: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 84: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 85: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 86: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 87: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 88: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

Wednesday, April 3, 13

Page 89: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

" "start

A,B,C,...,Z

A Simple Automaton

" H E Y A "

The double circle indicates that this

state is an accepting state. The

automaton accepts the string if it

ends in an accepting state.

The double circle indicates that this

state is an accepting state. The

automaton accepts the string if it

ends in an accepting state.

Wednesday, April 3, 13

Page 90: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

Wednesday, April 3, 13

Page 91: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

These are called -transitionsε . These

transitions are followed automatically and

without consuming any input.

These are called -transitionsε . These

transitions are followed automatically and

without consuming any input.

Wednesday, April 3, 13

Page 92: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 93: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 94: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 95: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 96: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 97: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 98: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 99: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 100: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 101: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

An Even More Complex Automatona, b

a, c

b, c

start

ε

ε

ε

c

b

a

b c b a

Wednesday, April 3, 13

Page 102: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Lexer Generator

• Given regular expressions to describe the language (token types),

• Generates NFA that can recognize the regular language defined

• existing algorithms

• Transforms NFA to DFA

• existing algorithms

• Tools: lex, flex

Wednesday, April 3, 13

Page 103: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Challenges for Lexical Analyzer

• How do we determine which lexemes are associated with each token?

• Regular expression to describe token type

• When there are multiple ways we could scan the input, how do we know which one to pick?

• How do we address these concerns efficiently?

Wednesday, April 3, 13

Page 104: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

Wednesday, April 3, 13

Page 105: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

f o tr

Wednesday, April 3, 13

Page 106: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

f o tr

f o tr

f o tr

f o tr

f o tr

f o tr

f o tr

f o tr

f o tr

f o tr

Wednesday, April 3, 13

Page 107: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Conflict Resolution

● Assume all tokens are specified as regular expressions.

● Algorithm: Left-to-right scan.

● Tiebreaking rule one: Maximal munch.

● Always match the longest possible prefix of the remaining text.

Wednesday, April 3, 13

Page 108: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Lexing Ambiguities

T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*

f o tr

f o tr

Wednesday, April 3, 13

Page 109: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Implementing Maximal Munch

● Given a set of regular expressions, how can we use them to implement maximum munch?

● Idea:

● Convert expressions to NFAs.

● Run all NFAs in parallel, keeping track of the last match.

● When all automata get stuck, report the last match and restart the search at that point.

Wednesday, April 3, 13

Page 110: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Implementing Maximal Munch

● Given a set of regular expressions, how can we use them to implement maximum munch?

● Idea:

● Convert expressions to NFAs.

● Run all NFAs in parallel, keeping track of the last match.

● When all automata get stuck, report the last match and restart the search at that point.

Wednesday, April 3, 13

Page 111: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

• Example

Wednesday, April 3, 13

Page 112: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

Wednesday, April 3, 13

Page 113: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

Wednesday, April 3, 13

Page 114: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 115: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 116: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 117: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 118: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 119: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 120: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 121: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 122: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 123: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 124: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 125: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 126: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 127: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 128: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 129: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 130: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 131: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 132: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 133: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 134: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 135: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 136: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 137: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 138: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 139: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 140: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 141: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 142: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 143: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 144: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 145: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 146: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 147: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 148: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 149: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 150: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 151: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 152: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 153: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 154: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 155: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 156: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 157: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

T_Do doT_Double doubleT_Mystery [A-Za-z]

Implementing Maximal Munch

start d o

start d o u b l e

start Σ

D O U B L ED O U B

Wednesday, April 3, 13

Page 158: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

A Minor Simplification

d o

d o u b l e

Σ

ε

ε

εstart

Wednesday, April 3, 13

Page 159: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

d o bu el

Wednesday, April 3, 13

Page 160: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

More Tiebreaking

● When two regular expressions apply, choose the one with the greater “priority.”

● Simple priority system: pick the rule that was defined first.

Wednesday, April 3, 13

Page 161: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

d o bu el

d o bu el

d o bu el

Wednesday, April 3, 13

Page 162: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Other Conflicts

T_Do doT_Double doubleT_Identifier [A-Za-z_][A-Za-z0-9_]*

d o bu el

d o bu el

Wednesday, April 3, 13

Page 163: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Implement a lexical analyzer

• Use regular expressions to describe token types (keyword, identifier, integer constant..)

• Use DFA/NFA to recognize the regular language

• But...good news. you don’t need to implement the algorithms to transform your regular expressions to DFA/NFA to recognize it

• flex: given regular expressions -> output c code that does lexical analysis (it internally

Wednesday, April 3, 13

Page 164: Lexical Analysiscseweb.ucsd.edu/classes/sp13/cse131-b/slides/Lecture2-Lexical.pdfLexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization

Summary

• Lexical Analysis

• Tokens

• Lexemes

• Regular expressions

• DFA

• NFA

• Flex: a tool for you to build a lexical analyzer

Wednesday, April 3, 13