75
– 2 – Lexical Analysis

– 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Embed Size (px)

Citation preview

Page 1: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

– 2 –

Lexical Analysis

Page 2: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Objectives

• To Understand1. The Role of a Lexical Analyzer

2. Lexical Analysis using formal Language definitions with Finite Automata

3. Specifications & Recognition of Tokens

4. A Language for Specifying Lexical Analyzerswww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 3: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Programming Language Structure Recall that a Programming Language is defined by

1. SYNTAX: – Decides whether a sentence in a language is well-formed

2. SEMANTICS– Determines the meaning, if any, of a syntactically well-

formed sentence

3. GRAMMAR – A formal system that provides a generative finite

description of the language www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 4: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Syntax of a Programming Language• Describes the structure of programs without any

consideration of their meaning. • The syntactic elements of a programming

language are determined by the computation model and pragmatic concerns

• well developed tools (regular, context-free and attribute grammars) are available for the description of the syntax of programming language

• Lexical Analyzer & the Parser of a compiler handle the Syntax of the programming language

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 5: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Some Basic Definitions• lex-i-cal :

• lexical analysis:

• syntax analysis:

• parsing:

Of or relating to words or the vocabulary of a language as distinguished from its grammar and construction

The task concerned with breaking an input into its smallest meaningful units, called tokens.

The task concerned with fitting a sequence of tokens into a specified syntax.

To break a sentence down into its component parts of speech with an explanation of the form, function,and syntactical relationship of each part.

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 6: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Lexical Analyzer (A.k.a. Scanner) • The only part of a compiler that looks at each character of the source text and does a linear analysis

• Reads source text and produces • Also keeps track of the source-coordinates of each

token - which file name, line number and position – (This is useful for debugging & error indication purposes.)

• Advantages of a separate Lexical Analyzer:– Keeps Compiler design simple– Improves Efficiency and – Increases Portability

TOKENS

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 7: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

The Role of a Lexical AnalyzerLexical analyzer

Syntaxanalyzer

symboltable

get nexttoken

SourceProgram

get nextchar

next char next token

(Contains a record for each identifier)

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 8: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Tokens, Patterns and Lexemes• What are Tokens ?– The basic lexical units of the language– A sequence of Abstract Characters that can be treated

as a unit in the grammar of the language – A programming language classifies the tokens into a

finite set of token types

• A note on TerminologySome texts refer to– token types as tokens &– tokens as lexemes

We will stick to the terms Tokens and Token Types

Some tokens may have attributesinteger constant token will have the

actual integer (17, 42) as an attribute;

Identifiers will have a string with the actual id

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 9: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Tokens Example• Let us Consider the program segment:

void main() { printf("Hello World\n"); }

• The tokens of this program segment are:1. void, 2. main,3. (, 4. ), 5. {6. printf,

7.7. (, (,

8.8. "Hello World\n","Hello World\n",

9.9. ), ),

10.10. ; and ; and

11.11. }}www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 10: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Specifications of TokensString Words and Sentences

1. Prefix of s A string obtained by deleting trailing symbols

2. suffix of s A string obtained by deleting leading symbols

3. Substring of s A string obtained by deleting a prefix & a suffix

4. Proper A prefix, suffix or sub string that is nonempty s.t s = x

5. Subsequence of s A string obtained by deleting symbols not necessarily contiguouswww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 11: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

The Principle of Longest match• In most languages, the scanner should pick the

longest possible string to make up the next token if there is a choice

• Examplereturn foobar != hohum;

should be recognized as 5 tokens

not more (i.e., not parts of words or identifiers, or ! and = as separate tokens)

RETURN ID(foobar)0 NEQ ID(hohum) SCOLON

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 12: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Typical Tokens in Programming Languages• Operators & Punctuation

– + - * / ( ) { } [ ] ; : :: < <= == = != ! …– Each of these is a distinct lexical class ( or token type )

• Keywords– if while for goto return switch void …– Each of these is also a distinct lexical class (not a string)

• Identifiers– A single ID lexical class, but parameterized by actual id

• Integer constants– A single INT lexical class, but parameterized by int value

• Other constants, etc.www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 13: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Tokens of a Typical Language

, (Comma) != (Noteq) ( (Lparen) …….SYMBOLS

IF DO WHILE INT ………KEYWORDS

66.1 .5 10. 1e67 5.5e-10 ……..REAL

73 , 0 , 00 , 515 , +2 ……..NUM

foo, n14, a, temp……ID

EXAMPLETYPE

Page 14: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Tokens of a Typical Language

, (Comma) != (Noteq) ( (Lparen) …….SYMBOLS

IF DO WHILE INT ………KEYWORDS

66.1 .5 10. 1e67 5.5e-10 ……..REAL

73 , 0 , 00 , 515 , +2 ……..NUM

foo, n14, a, temp……ID

EXAMPLETYPE

Question: How are tokens fo

rmally defined and recognized?

Answer: By u

sing regular expressions to

define a token as

a form

al regular la

nguage

Page 15: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Formal Theory of Languages• A language in real life is made up of

1. words made up of alphabets and2. Sentences made up of words arranged according to

the Grammar of that language

• Natural languages display amazing variety of expressions with Explicit & implicit meanings and variations in meaning as well as grammars

• Computer languages on the contrary focus on – The limited set of tasks to be performed– Hence mathematical precision is essential in

defining their structure and Grammarwww.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 16: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Formal Definition of Languages• Alphabet• String

• Language

A finite (non-empty) set of symbols denoted by Σ

A finite sequence of symbols from an alphabet which includes even the empty sequence (denoted by λ ) A set ( often infinite) of finite strings The set of all possible finite strings of elements of

alphabet Σ ( including λ ) is denoted by Σ* Finite specifications of (possibly infinite) languages is

possible with1. Automaton – a recognizer; a machine that accepts all strings

in a language (and rejects all other strings)2. Grammar – a generator; a system for producing all strings in

the language (and no other strings)www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 17: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Formal Definition of Languages• Alphabet• String

• Language

A finite (non-empty) set of symbols denoted by Σ

A finite sequence of symbols from an alphabet which includes even the empty sequence (denoted by λ ) A set ( often infinite) of finite strings The set of all possible finite strings of elements of

alphabet Σ ( including λ ) is denoted by Σ* Finite specifications of (possibly infinite) languages is

possible with1. Automaton – a recognizer; a machine that accepts all strings

in a language (and rejects all other strings)2. Grammar – a generator; a system for producing all strings in

the language (and no other strings)

A language may be specifie

d by many different grammars &

automata

BUT

A grammar or automaton specifies only one language

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 18: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Formal Language Definition ( Contd. )• As already defined A language L over an alphabet

Σ is a collection of strings of elements of Σ– The PASCAL Language is the set of all strings that

constitute legal PASCAL programs (infinite set)– The Language of primes is a set of all decimal digit

strings that constitute prime numbers (infinite set)– The language of C reserved words is the set of all

alphabetic strings that can not be used as identifiers in the C programming language (finite set)

• To specify some of these (possibly infinite) languages with finite description we use the notation of

Regular Expressionswww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 19: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Regular Expressions• Is always defined over some alphabet Σ (For programming languages, it is commonly ASCII

or Unicode)• If E is a regular expression, L(E ) is the “language”

(set of strings) generated by E• For Example – For each symbol ‘a’ in the alphabet

of the language the regular expression {a} denotes the language containing just the string a ( Known as symbol)

• A regular expression generated with empty sequence λ is denoted by ε

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 20: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Operations with Regular Expressions• Given 2 regular expressions M & N • Alternation ( denoted by | )

makes a new regular expression M | N denoting a “UNION” of languages L(M) and L(N) . { L(M) L(N) }

• Concatenation ( denoted by . Or )makes a new regular expression MN denoting a language L(M) followed by L(N).

• The Repetiton ( denoted y * ) makes a new expression denoting a language that has 0 or more occurrences (Kleene closure) of L(M)

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 21: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Regular Expression ExampleExpression Language Example Words a | b { a, b } a , b ab * a {a} {b} * {a) aa , aba , abba , abbba … (ab)* { ab} * ε , ab , abab , ababab , … abba { abba } abba (0 | 1) * 0 { {0} {1} } * {0} 0 , 00 , 10, 010, 110, …... ( All binary Even numbers)

b*(abb*)*(a | ε) Strings of a and b with NO consecutive a

Similarly, using symbols, | , . ,* and ε, we can specify the regular expressions corresponding to the lexical tokens of a programming

language using rules ( A.k.a. Productions)www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 22: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Table of Operators & Abbreviations

Stands for a single character ( except New line).

One of the given characters (a|b|x|y|z) [abxyz]Character set alteration[a–z A–z ]

Optional (Zero or one Occurrence of M)M?Repetition ( one or more times)M+

Repetition ( Zero or more Times)M*Concatenation : An M followed by NMNAlternation; Choosing from M OR NM | N

The empty StringεAn ordinary character that stands for itselfa

DescriptionNotation

Quotation: A string in quotes stands for itself literally‘a.+*’

Page 23: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Regular Expression Construction• Problem : Specify a set of unsigned numbers as a

regular expression. (Examples: 1997, 19.97)• Observations on numbers:

1. Could be made up of one or more digits from set (0 – 9)

2. Optionally Can have a decimal point in the end followed by 0 or more digits “.”(0 – 9)*

3. A number can also start with a Point followed by one or more digits

][“.”(0 – 9)*] ?[ (0 – 9)+ | [“.”(0 – 9) +] www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 24: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Regular Expressions for Some Tokens of a Programming Language

return ERROR.

return Comment(‘\*’ [ a – z ] * ‘\n’ ) | (‘ ’)| ‘\n’ | ‘*/’)+

Return REAL( [0 – 9 ] + ‘ . ’[ 0 – 9 ] * ) | ( [ ‘ . ’[ 0 – 9 ] +)

[ return NUM ][ 0 – 9 ] +

[ return ID ][ a – z ] [ a – z 0 – 9 ]*

[ Return IF; ]if

Token TypeRegular Expression

Page 25: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

A regular Expression Recognizer• Given an input string,

The function of a “regular Expression Analyzer” is to say :

– “YES, the input is part of the language generated from the regular expression”

– “NO, the input isn’t part of the language generated from the regular expression”

• Using results from Finite Automata theory and theory of algorithms, we can automate construction of such recognizers from Regular Expressions

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 26: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Finite Automata• A finite Automation is a Transition Graph that has:

– A finite set of states S (represented by Nodes) with Edges leading from one state to another

– Each edge is labeled with the symbol ( from the set Σ ) that causes the transition ( Could be ε also !)

– One state is denoted as start state S0 and certain of the states are distinguished as final states ( normally denoted with two concentric circles)

• Mathematically, It can be represented as:

A = {S, , s0, F, move }www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 27: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Recognizing Expressions as Tokens with Finite State Automaton

• Operate by reading input symbols (usually characters)– Transition can be taken if labeled with current symbol– ε-transition can be taken at any time

• Accept when final state reached & no more input– Scanner slightly different – accept longest match even

if more input

• Reject if no transition possible or no more input and not in final state (DFA)

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 28: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Finite Automata Examples

1 2i f 3 return IF

1 a – z 2

a – z

0 – 9

return ID

1 0 – 9 0 – 9 return NUM2start

start

start

[ 0 – 9 ] +

[ a – z ] [ a – z 0 – 9 ]*

if

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 29: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Finite Automata Examples ( Contd.)

1

2 3

4 5

start

0 – 9

0 – 9

0 – 9 0 – 9

0 – 9

.

.

return REAL

( [0 – 9 ] + ‘ . ’[ 0 – 9 ] * ) | ( ‘ . ’[ 0 – 9 ] +)

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 30: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Deterministic Finite Automata (DFA)• A finite automaton is deterministic if

1. It has no edges/transitions labeled with epsilon.2. For each state and for each symbol in the alphabet,

there is exactly one edge labeled with that symbol.• Such a transition graph is called a state graph.

A Deterministic Finite Automaton (DFA):

0 1 2 3a

b

b bstart

b*abbwww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 31: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Non-deterministic Finite Automata (NFA)

• In Non-deterministic Finite Automata:1. From a state (node), there may be more than one

edge labeled with the same alphabet and there may be no edge from a node labeled with an input symbol

2. An edge can be labeled by an empty symbol tooA Non-deterministic Finite Automaton (NFA):

0 1 2 3

a

a

b

b bstart

(a|b)*abbwww.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 32: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Another NFA

start

a

b

a

b

An -transition is taken without consuming any character from the input.

What does the above NFA accept? aa* | bb*www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 33: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

NFA and DFA – A Comparison• DFA

– no edges/transitions labeled with epsilon

– For each state and for each symbol in the alphabet, there is exactly one edge labeled with that symbol

– Slower to build but quicker to simulate

• NFA– Has edges/transitions

labeled with epsilon – From a state (node), there

may be more than one edge labeled with the same alphabet and there may be no edge from a node labeled with an input symbol

– Quicker to build but slower to simulatewww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 34: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Relationship between DFA & NFA

• It is obvious that DFA can be simulated with an NFA

• But what is not so obvious is that NFA can be simulated with a DFA !!!

• How ?• Simulate sets of possible states• Possible exponential blowup in the state space• Still, Maintain one state per character in the input

streamwww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 35: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Automating a RE Recognizer Construction

• To convert a specification into code:

1. Write down the RE for the input language

2. Build a big NFA

3. Build the DFA that simulates the NFA

4. Systematically shrink the DFA

5. Turn it into code

Note: The DFA construction is done automatically by a

tool such as lex ( More on this Later )www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 36: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Building NFA From Regular Expression• Remember that

A regular expression is formed by the use of :– Basic symbols and their– Alternation, – Concatenation, and – Repetition.

• Hence, All we need to do is to know is:– How to build the NFA for the above (symbols &

Operations), and – How to assemble those NFA’s corresponding to these

symbols into a composite NFA for the expressionwww.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 37: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Building NFA for Symbols & Operations1. Building NFA for a basic symbol a:

1. Start with an Initial State i,

2. Draw an edge / Transition labeled with an alphabet

(This Could be an epsilon symbol too!!)

3. to the final state f

ai fstart i fstart

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 38: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Building NFA for Symbols & Operations2. Building NFA for Alternation N (s | t) :

– Given two NFA N(s) and N(t),1. Construct new start state i, and new final state f.2. Add a transition from the start state i to the start states of N(s) and N(t) and

label them with epsilon symbol3. Add a transition from the Final states of N(s) and N(t) to the final state f and

label them with Epsilon symbol

start i fN(s)

N(t)

www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 39: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Building NFA for Symbols & Operations3. Building NFA for Concatenation N(s.t) or N(st) :

– Given two NFA N(s) and N(t),1. Construct new start state i, and new final state f.2. Overlap the Start state of later [ N(t) ] with the final state of the

former [N(s) ]3. From the start state, add an edge labeled with epsilon to start

state of N(s)4. From the final state of E1, add an epsilon transition to Start

state of N(t)

start i f

N(s) N(t)

www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 40: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Building NFA for Symbols & Operations4. Building NFA for Repetition N(s*) : 1. Construct new start state and new final state2. Add an epsilon transition from new Start state to

the new final state.3. Add an epsilon transition from the new final state to

the start state of N(s).4. Add another epsilon transition from the final state

of N(s) to the constructed final state.

start i fN(s)

www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 41: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Construction of NFA – Examples(a|b).(a|b)

a b

a

b

a

b

a

b

(a|b)

(a|b).(a|b)

(a) (b)

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 42: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Construction of NFA – Examples (Contd.)[ a – z ] [ a – z 0 – 9 ]*

86 7

a-z

0-9

1start a – z 2

Symbol Repetition

[ 0 – 9 ] + = [ 0 – 9 ] [ 0 – 9 ] *Symbol

Repetition

Return NUM

1 0 – 9 start 2

0 – 9

3 4 5

Return ID

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 43: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Combining Several NFA’s2

3

4

9

14

1

i

f

85 6 7a-z

a-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

character

NUM

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 44: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Automating a RE Recognizer Construction

• To convert a specification into code:

1. Write down the RE for the input language

2. Build a big NFA

3. Build the DFA that simulates the NFA

4. Systematically shrink the DFA

5. Turn it into code

Note: The DFA construction is done automatically by

a tool such as lexwww.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 45: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Conversion of NFA to DFA• A DFA can be constructed from the NFA, where

each DFA state represents a set of NFA states from the NFA

• Key idea

The state of the DFA after reading some input is the set of all states the NFA could have reached after reading the same input

• If NFA has n states, DFA will have at most 2n states

• Resulting DFA may have more states than needed

• Let us study the conversion with an examplewww.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 46: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

characterNUM

Q: What states can be reached from state 1 without consuming a character?

A: {1,4,9,14} form the -closure of state 1

Defn: Given a set of NFA states T, the -closure(T) is the set of states that are reachable through -transiton from any state s T.

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 47: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

characterNUM

What are ALL the state closures in this NFA?

closure(1) = {1,4,9,14}closure(5) = {5,6,8}closure(8) = {6,8}closure(7) = {7,8,6}

closure(10) = {10,11,13}closure(13) = {11,13}closure(12) = {12,13}www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 48: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA• We already Know that Given a set of NFA states T, the -closure(T) is the set

of states that are reachable through -transiton from any state s T.

• We now define Given a set of NFA states T, move( T, a) is the set of states that are reachable on input a from any state sT

• Now the Problem Definition:Given an NFA find the DFA with the minimum number of states that has the same behavior as the NFA for all inputs. www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 49: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

character NUM

1. Start with the initial state in the NFA ( s0), & work out the set of states in the DFA, Dstates, initialized with a state representing -closure(s0).

Dstates = {1-4-9-14}

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 50: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

character NUM

Dstates = {1-4-9-14}

1-4-9-14Now we need to compute:

Move(1-4-9-14,a-h) = ?{ 5,15 }

Then, -closure({5,15}) = {5,6,8,15}

a-h 5-6-8-15

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 51: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

characterNUM

1-4-9-14Next we need to compute:

Move(1-4-9-14,i) = ?{ 2,5,15 }Then, -closure({2,5,15}) ={2,5,6,8,15}

a-h 5-6-8-15

2-5-6-8-15i

Dstates = {1-4-9-14}

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 52: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

characterNUM

1-4-9-14Next we need to compute:Move(1-4-9-14,j-z) = ?{ 5,15 }

Then, -closure(5,15}) = {5,6,8,15}

a-h 5-6-8-15

2-5-6-8-15i

Dstates = {1-4-9-14}

j-z

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 53: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

characterNUM

1-4-9-14Next we need to compute:Move(1-4-9-14,0-9) = ?{10,15 }

Then, -closure(10,15}) = {10,13,11,15}

a-h 5-6-8-15

2-5-6-8-15i

Dstates = {1-4-9-14}

j-z

0-9

10-11-13-15www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 54: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

character NUM

1-4-9-14Next we need to compute:Move(1-4-9-14,other) = ?{15 }

Then, -closure(15) = {15}

a-h 5-6-8-15

2-5-6-8-15i

Dstates = {1-4-9-14}

j-z

10-11-13-150-9

15other

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 55: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converting NFA to DFA

1

23

4

9

14

i

f

85 6 7a-za-z

0-9

1310 11 120-90-9

IF

ERROR

ID

15Any

characterNUM

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

Dstates = {1-4-9-14}

j-z

10-11-13-150-9

15other

The analysis for 1-4-9-14 is complete. We mark it and pick

another state in the DFA to analyze.www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 56: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Converted DFA

5-6-8-15

2-5-6-8-15

10-11-13-15

3-6-7-8

11-12-13

6-7-8

15

1-4-9-14

a-e, g-z, 0-9

a-z,0-9

a-z,0-9

0-9

0-9

f

i

a-hj-z

0-9

other

ID

ID

NUM NUM

IF

error

ID

a-z,0-9

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 57: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Another Example of Conversiona

b

a

b

S0

S1

S2

S3

S4

S7

S8

S9

S10

S5 S6S11

s0,s1,s2

s3,s5,s6,s7,s8 s9,s11

s4,s5,s6,s7,s8 s10,s11

a a

ab

b

b

The above NFA would result in DFA below:

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 58: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Automating a RE Recognizer Construction

• To convert a specification into code:

1. Write down the RE for the input language

2. Build a big NFA

3. Build the DFA that simulates the NFA

4. Systematically shrink the DFA

5. Turn it into code

Note: The DFA construction is done automatically by a

tool such as lex ( More on this Later )www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 59: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Systematically shrink the DFA• The Big Picture

– Discover sets of equivalent states– Represent each such set with just one state

• Two states are equivalent if and only if:– The set of paths leading to them are equivalent– α Є Σ, transitions on α lead to equivalent states (DFA)– α-transitions to distinct sets states must be in distinct

sets

• A partition P of S– A collection of sets P s.t. each s Є S is in exactly one pi Є P– The algorithm iteratively partitions the DFA’s states

A

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 60: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Minimization

Group all the states together.

Separate states according to available exit transitions.

Separate a set to two if from some of its states one can reach another set and with others one cannot.

Repeat until cannot separate.

p0

p1 p3

p2 p4

a a

abb

b

{p0, p1, p2, p3, p4}.

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 61: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Minimization

bb

aa

The above DFA can now be minimized as:

p0

p1 p3

p2 p4

a a

abb

b

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 62: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Automating a RE Recognizer Construction

• To convert a specification into code:

1. Write down the RE for the input language

2. Build a big NFA

3. Build the DFA that simulates the NFA

4. Systematically shrink the DFA

5. Turn it into code

Note: The DFA construction is done automatically by a

tool such as lex ( More on this Later )www.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 63: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Pseudo Code For lexical Analyzerfunction lexan; integer Var lexbuf : array [0, ..100] of char C: charBegin loop begin

read a character into C: if C is a blank or a tab then do nothing else if C is a newline then increment lineno else if C is a digit

begin set Tokenval to the value

of this & flwg digits; return NUM end

else if C is a letter then begin place C and successive letters & digits into lexbuf : p := lookup ( lexbuf ) : tokenval := p: return the token field of table entry p

end else begin /* token is a single character */

set tokenval to NONE /* no attribute */ return integer encoding of character C end end end

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 64: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Automating a RE Recognizer Construction

• To convert a specification into code:

1. Write down the RE for the input language

2. Build a big NFA

3. Build the DFA that simulates the NFA

4. Systematically shrink the DFA

5. Turn it into code

Note: The DFA construction is done automatically by a

tool such as Lexwww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 65: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Building Lexical Analyzers Automatically

• The point to note is :The Process studied so far is well suited for Automation

1. Implementer writes down the regular expressions2. Scanner generator builds NFA, DFA, minimal DFA,

and then writes out the (table-driven or direct-coded) code

3. This process reliably produces fast, robust Lexical Analyzers

• One such Tool is Lexwww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 66: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Lexx – A tool for generating Scanner• A widely used tool for specifying Lexical Analyzers

for a wide variety of languages.1. Specs of a Lexical Analyzer is

prepared by creating a program lex.l ( containing RE’s) in the Lex language

2. Then lex.l is run thru Lex Compiler to produce a program lex.yy.c ( Contains a tabular representaion of state Transition Diagram)

3. Lex.yy.c is run thru C compiler to produce an object code of Lex Analyzer

LEX Compiler

Lexx Source Pgm lex.l

C Compiler

lex.yy.c

A.out

A.out

SequenceOf Tokens

InputStream

How does it work ?

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 67: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Lexx Functions1. Translates the definitions into an automaton.

2. The automaton looks for the longest matching string.

3. Either return some value to the reading program

(parser), or looks for next token.

4. Look ahead operator: x/y allow the token x only if y follows it (but y is not part of the token).

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 68: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Lexx Program Structure• A Lexx Program ( nothing but specifications in lex.l )

Consists of THREE Parts.

1. Declarations

2. Translation Rules

3. Auxilliary procedures

Three sections are separated by lines beginning with%%

This section includes declaration of Variables, manifest Constants.

This section includes patterns and the corresponding action to be taken ( RE)

This section includes what ever Auxiliary procedures that are needed

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 69: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

A Sample Lexx Program1) %{

/* Remove uppercase letters . Commands to execute are

lex test.l and gcc lex.yy.c -ll -o test */

%}

%%

[A-Z]+ ;

2) %{

/* Line numbering */

%}

%%

^.*\n printf(“%d\t%s”,yylineno-1,yytext);www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 70: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Any

Questions ????

Thank youThank you

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 71: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Regular Expression Construction• Problem : Specify a set of unsigned numbers as a

regular expression. (Examples: 1997, 19.97)• Solution : Start with symbol and keep defining

regular sub-expressions till the final expression is achieved 0 | 1 | 2 | 3 | … | 9

digit digit* (or digit+) [Kleene star closure meaning 1 or more digits]

‘.’ digits | epsilon

digits optional_fraction

1. digit

2. digits

3. optional_fraction

4. Num

RULE

RULE

RULE

RULEwww.Bookspar.com | Website for Students |

VTU - Notes - Question Papers

Page 72: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Regular Expression Construction• Problem : Specify a set of unsigned numbers as a

regular expression. (Examples: 1997, 19.97)• Solution : Start with symbol and keep defining

regular sub-expressions till the final expression is achieved 0 | 1 | 2 | 3 | … | 9

digit digit* (or digit+) [Kleene star closure meaning 1 or more digits]

‘.’ digit | epsilon

digit optional_fraction

1. digit

2. digit

3. optional_fraction

4. Num

RULE

RULE

RULE

RULENote that we have used ALL the definitions of a regular expression

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 73: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Unsigned Number validation using Rules • Let us derive the number from these rules

0 | 1 | 2 | 3 | … | 9

digit digit* (or digit+) [Kleene star closure meaning 1 or more digits]

‘.’ digits | epsilon

digits optional_fraction

RULE

RULE

RULE

RULE

1. digit

2. digits

3. optional_fraction

4. Num

1 9 9 7 2 5 9 7. 3 6 . 1 4.

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 74: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Regular Expression Construction • Qn: How to write a regular expression for

identifiers? (identifiers are letters followed by a letter or a digit).

• Answer:

• One can define similar regular expression (s) for comments, Strings, operators and delimiters ( the different tokens of a language)

a | A | b | B | … | z | Z

0 | 1 | 2 | 3 | … | 9 Letter | Digit

Letter | letter_or_digit

1. Letter2. Digit 3. Letter_or_Digit4. Identifier

www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

Page 75: – 2 – Lexical Analysis. Objectives To Understand 1.The Role of a Lexical Analyzer 2.Lexical Analysis using formal Language definitions with Finite Automata

Grammar for a Tiny Language• program ::= statement | program

statement

• statement ::= assignStmt | ifStmt

• assignStmt ::= id = expr ;

• ifStmt ::= if ( expr ) stmt

• expr ::= id | int | expr + expr

• Id ::= a | b | c | i | j | k | n | x | y | z

• int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9The rules of a grammar are also Known as Productions www.Bookspar.com | Website for Students | VTU - Notes - Question Papers