123
Compila(on 03683133 Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs Noam Rinetzky 1

Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Compila(on      0368-­‐3133  

Lecture  2:  Lexical    Analysis  

Syntax  Analysis  (1):  CFLs,  CFGs,  PDAs          

Noam  Rinetzky  1

Page 2: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

2

Page 3: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

3

Page 4: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

What  is  a  Compiler?  source language target language

Compiler

Executable

code

exe Source

text

txt

4

Page 5: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Compiler  vs.  Interpreter    

5

Page 6: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Toy  compiler  

6

Page 7: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

The  Real  Anatomy  of  a  Compiler  

7

Page 8: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

The  Real  Anatomy  of  a  Compiler  

Executable

code

exe

Source

text

txt Lexical Analysis

Sem. Analysis

Process text input

characters Syntax Analysis tokens AST

Intermediate code

generation

Annotated AST

Intermediate code

optimization

IR Code generation IR

Target code optimization

Symbolic Instructions

SI Machine code generation

Write executable

output

MI

8

Lexical Analysis

Syntax Analysis

Page 9: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Lexical  Analysis  

9 Lexical  analyzers  are  also  known  as  “scanners’’  

Page 10: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Lexical  Analysis:  from  Text  to  Tokens  

Lexical Analysis

Syntax Analysis

Sem. Analysis

Inter. Rep.

Code Gen.

x = b*b – 4*a*c

txt

<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>

Token Stream

10

Page 11: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

•  Scan  the  input    •  Par((ons  the  text  into  stream  of  tokens  

– Numbers  –  Iden(fiers  –  Keywords  –  Punctua(on    

•  Tokens  usually  represented  as  (kind,  value)  •  Defined  using  regular  expressions*  

•     “word”  in  the  source  language  •     “meaningful”  to  the  syntac(cal  analysis  

What  does  Lexical  Analysis  do?    

11

Page 12: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Context  free  language  

Regular  languages  

(      (      23      +        7      )      *      19      )  

Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’  

12

Page 13: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Context  free  language  

(      (      23      +        7      )      *      19      )  

13

Regular  languages  

Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’  

LP    LP    Num    Op    Num    RP    Op    Num    RP  

Page 14: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

From  scanning  to  parsing((23  +  7)  *  x)  

) ? * ) 7 + 23 ( (

RP Id OP RP Num OP Num LP LP

Lexical    Analyzer  

program  text  

token  stream  

Parser  Grammar:      Expr  →  ...  |  Id      Id  →  ‘a’  |  ...  |  ‘z’    

Op(*)  

Id(?)  

Num(23)   Num(7)  

Op(+)  

Abstract  Syntax  Tree  

valid  syntax  error  

14

Page 15: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Some  basic  terminology  

•  Lexeme  (aka  symbol)  -­‐  a  series  of  lecers  separated  from  the  rest  of  the  program  according  to  a  conven(on  (space,  semi-­‐column,  comma,  etc.)    

•  Pacern  -­‐  a  rule  specifying  a  set  of  strings.  Example:  “an  iden(fier  is  a  string  that  starts  with  a  lecer  and  con(nues  with  lecers  and  digits”  –  (Usually)  a  regular  expression      

•  Token  -­‐  a  pair  of  (pacern,  acributes)    

15

Page 16: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example  void  match0(char  *s)  /*  find  a  zero  */  

{  

   if  (!strncmp(s,  “0.0”,  3))  

       return  0.  ;  

}  

VOID  ID(match0)  LPAREN  CHAR  DEREF  ID(s)  RPAREN    

LBRACE    

IF  LPAREN  NOT  ID(strncmp)  LPAREN  ID(s)  COMMA  STRING(0.0)  COMMA  NUM(3)  RPAREN  RPAREN    

RETURN  REAL(0.0)  SEMI    

RBRACE    

EOF   16

Page 17: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Regular  languages

•  Formal  languages  –  Σ            =  finite  set  of  lecers  – Word                =  sequence  of  lecer  –  Language  =  set  of  words  

•  Regular  languages  defined  equivalently  by  –  Regular  expressions  –  Finite-­‐state  automata  

17

Page 18: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example:  Integers  w/o  Leading  Zeros

Digit  =  1|2|…|9  

Digit0          =  0|Digit  Pos  =  Digit  Digit0*  Integer  =  0  |  Pos|  -­‐Pos  

18

Page 19: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Challenge:  Ambiguity  

•  If  =  if•  Id  =  Lecer  (Lecer  |  Digit)*  

•  “if”  is  a  valid  iden(fiers…  what  should  it  be?  •  ‘’iffy”  is  also  a  valid  iden(fier  

•  Solu(on  –  Longest  matching  token  –  Break  (es  using  order  of  defini(ons…    

•  Keywords  should  appear  before  iden(fiers  19

Page 20: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Building  a  Scanner  –  Take  I  

•  Input:  String  

•  Output:  Sequence  of  tokens  

20

Page 21: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Building  a  Scanner  –  Take  I  Token  nextToken()  {  char  c  ;  loop:  c  =  getchar();  switch  (c){      case  `  `:  goto  loop  ;      case  `;`:  return  SemiColumn;      case  `+`:              c  =  getchar()  ;          switch  (c)  {              case  `+':  return  PlusPlus  ;              case  '=’    return  PlusEqual;              default:    ungetc(c);  return  Plus;                                                                        };      case  `<`:  …      case  `w`:  …    }   21

Page 22: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

There  must  be  a  becer  way!  

22

Page 23: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

A  becer  way  

•  AutomaGcally  generate  a  scanner  

•  Define  tokens  using  regular  expressions  

•  Use  finite-­‐state  automata  for  detec(on  

23

Page 24: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Reg-­‐exp  vs.  automata

•  Regular  expressions  are  declara(ve  – Good  for  humans      – Not  “executable”  

•  Automata  are  opera(ve  – Define  an  algorithm  for  deciding  whether  a  given  word  is  in  a  regular  language  

– Not  a  natural  nota(on  for  humans 24

Page 25: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Overview  

•  Define  tokens  using  regular  expressions  

•  Construct  a  nondeterminis(c  finite-­‐state  automaton  (NFA)  from  regular  expression  

•  Determinize  the  NFA  into  a  determinis(c  finite-­‐state  automaton  (DFA)  

•  DFA  can  be  directly  used  to  iden(fy  tokens    25

Page 26: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Automata  theory:  a  bird’s-­‐eye  view  

26

Page 27: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Determinis(c  Automata  (DFA)  

•  M  =  (Σ,  Q,  δ,  q0,  F)  –  Σ  -­‐  alphabet  –  Q  –  finite  set  of  state  –  q0  ∈  Q  –  ini(al  state  –  F    ⊆  Q  –  final  states  –  δ  :  Q  ×  Σ  à  Q  -­‐  transi(on  func(on  

•  For  a  word  w,  M  reach  some  state  x  – M  accepts  w  if  x  ∈  F    

27

Page 28: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

DFA  in  pictures

start  

a  

b,c  

a,b  

c  

accepting state

start state

transition

•  An  automaton  is  defined  by  states  and  transi(ons

28

a,b,c   a,b,c  

Page 29: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Accep(ng  Words

• Words  are  read  leq-­‐to-­‐rightcba

start  

a  

b  

c  

29

• Missing  transi(on  =  non-­‐acceptance    –  “Stuck  state”

Page 30: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

• Words  are  read  leq-­‐to-­‐right

Accep(ng  Words

cba

start  

a  

b  

c  

30

Page 31: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

• Words  are  read  leq-­‐to-­‐right

Accep(ng  Words

cba

start  

a  

b  

c  

31

Page 32: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

• Words  are  read  leq-­‐to-­‐right

Accep(ng  Words

cba

start  

a  

b  

c  

32

Page 33: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Rejec(ng  Words

cbb

start  

a  

b  

c  

33

• Words  are  read  leq-­‐to-­‐right

Page 34: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

start  

Rejec(ng  Words

• Missing  transi(on  means  non-­‐acceptancecbb

a  

b  

c  

34

Page 35: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Non-­‐determinis(c  Automata  (NFA)  

•  M  =  (Σ,  Q,  δ,  q0,  F)  –  Σ  -­‐  alphabet  –  Q  –  finite  set  of  state  –  q0  ∈  Q  –  ini(al  state  

–  F  ⊆  Q  –  final  states  –  δ  :  Q  ×  (Σ  ∪  {ε})  →  2Q -­‐  transi(on  func(on  

•  DFA:  δ  :  Q  ×  Σ  à  Q  

•  For  a  word  w,  M  can  reach  a  number  of  states  X  –  M  accepts  w  if  X  ∩  M  ≠  {}  

•  Possible:  X  =  {}  

•  Possible  ε-­‐transi(ons    

 

35

Page 36: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

NFA

•  Allow  mul(ple  transi(ons  from  given  state  labeled  by  same  lecer

start  

a  

a  

b  

c  

c  

b  

36

Page 37: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Accep(ng  words

cba

start  

a  

a  

b  

c  

c  

b  

37

Page 38: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Accep(ng  words

• Maintain  set  of  states

cba

start  

a  

a  

b  

c  

c  

b  

38

Page 39: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Accep(ng  words

cba

start  

a  

a  

b  

c  

c  

b  

39

Page 40: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Accep(ng  words•  Accept  word  if  reached  an  accep(ng  state

cba

start  

a  

a  

b  

c  

c  

b  

40

Page 41: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

NFA+Є  automata

•  Є  transi(ons  can  “fire”  without  reading  the  input

Є

start  a  

b  

c  

41

Page 42: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

NFA+Є  run  example

cba

Є

start  a  

b  

c  

42

Page 43: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

NFA+Є  run  example•  Now  Є  transi(on  can  non-­‐determinis(cally  take  place

cba

Є

start  a  

b  

c  

43

Page 44: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

NFA+Є  run  example

cba

Є

start  a  

b  

c  

44

Page 45: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

NFA+Є  run  example

cba

Є

start  a  

b  

c  

45

Page 46: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

NFA+Є  run  example

cba

Є

start  a  

b  

c  

46

•  Є  transi(ons  can  “fire”  without  reading  the  input

Page 47: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

NFA+Є  run  example

cba

• Word  accepted

Є

start  a  

b  

c  

47

Page 48: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

From  regular  expressions  to  NFA  

•  Step  1:  assign  expression  names  and  obtain  pure  regular  expressions  R1…Rm  

•  Step  2:  construct  an  NFA  Mi  for  each  regular  expression  Ri  

•  Step  3:  combine  all  Mi  into  a  single  NFA  

•  Ambiguity  resolu8on:  prefer  longest  accep8ng  word   48

Page 49: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

From  reg.  exp.  to  automata•  Theorem:  there  is  an  algorithm  to  build  an  NFA+Є  automaton  for  any  regular  expression  

•  Proof:  by  induc8on  on  the  structure  of  the  regular  expression  

start

49

Page 50: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

R = ε

R = φ

R = a a

Basic  constructs  

50

Page 51: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Composi(on  R = R1 | R2 ε M1  

M2  ε

ε

ε

R = R1R2

ε M1   M2  

ε ε

51

Page 52: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Repe((on  

R = R1*

ε M1  

ε

ε

ε

52

Page 53: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

53

Page 54: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Naïve  approach  

•  Try  each  automaton  separately  

•  Given  a  word  w:  –  Try  M1(w)  –  Try  M2(w)  – …  –  Try  Mn(w)  

•  Requires  rese{ng  aqer  every  acempt  54

Page 55: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Actually,  we  combine  automata  

1   2  a  a  

3  a  

4   b   5   b   6  

abb  

7   8  b   a*b+  b  a  

9  a  

10   b   11   a   12   b   13  

abab  

0  

ε  

ε  

ε  

ε  

a  abb  a*b+  abab  

combines  

55

Page 56: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Corresponding  DFA  

0  1  3  7  9    

8  

7  

b  

a  

a  2  4  7  10  

a  

b  b  

6  8  

5  8  11  b  

12   13  a   b  

b  

abb  a*b+  a*b+  

a*b+  

abab  

a  

Combine automata: an example.

Combine a, abb, a*b+, abab.

75#

1# 2#a#

a#

3#a#

4#b#

5#b#

6#

abb#

7# 8#b#

a*b+#b#a#

9#a#

10#b#

11#a#

12#b#

13#

abab#

0#

ε#

ε#

ε#

ε#

b  

56

Page 57: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Scanning  with  DFA  

•  Run  un(l  stuck  – Remember  last  accepGng  state    

•  Go  back  to  accep(ng  state  •  Return  token  

57

Page 58: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ambiguity  resolu(on  

•  Longest  word  •  Tie-­‐breaker  based  on  order  of  rules  when  words  have  same  length  

 58

Combine automata: an example.

Combine a, abb, a*b+, abab.

75#

1# 2#a#

a#

3#a#

4#b#

5#b#

6#

abb#

7# 8#b#

a*b+#b#a#

9#a#

10#b#

11#a#

12#b#

13#

abab#

0#

ε#

ε#

ε#

ε#

Page 59: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Examples  

0  1  3  7  9    

8  

7  

b  

a  

a  2  4  7  10  

a  

b  b  

6  8  

5  8  11  b  

12   13  a   b  

b  

abb  a*b+  a*b+  

a*b+  

abab  

a  

Combine automata: an example.

Combine a, abb, a*b+, abab.

75#

1# 2#a#

a#

3#a#

4#b#

5#b#

6#

abb#

7# 8#b#

a*b+#b#a#

9#a#

10#b#

11#a#

12#b#

13#

abab#

0#

ε#

ε#

ε#

ε#b  

abaa:  gets  stuck  aqer  aba  in  state  12,  backs  up  to  state  (5  8  11)  pacern  is  a*b+,  token  is  ab  Tokens:  <a*b+,  ab>  <a,a><a,a>   59

Page 60: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Examples  

0  1  3  7  9    

8  

7  

b  

a  

a  2  4  7  10  

a  

b  b  

6  8  

5  8  11  b  

12   13  a   b  

b  

abb  a*b+  a*b+  

a*b+  

abab  

a  

b  

abba:  stops  aqer  second  b  in  (6  8),  token  is  abb  because  it  comes  first  in  spec  60 Tokens:  <abb,  abb>  <a,a>  

Combine automata: an example.

Combine a, abb, a*b+, abab.

75#

1# 2#a#

a#

3#a#

4#b#

5#b#

6#

abb#

7# 8#b#

a*b+#b#a#

9#a#

10#b#

11#a#

12#b#

13#

abab#

0#

ε#

ε#

ε#

ε#

Page 61: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Summary  of  Construc(on  

•  Describe  tokens  as  regular  expressions    –  Decide  acributes  (values)  to  save  for  each  token  

•  Regular  expressions  turned  into  a  DFA    –  Also,  records  which  acributes  (values)  to  keep    

•  Lexical  analyzer  simulates  the  run  of  an  automata  with  the  given  transi(on  table  on  any  input  string    

61

Page 62: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

A  Few  Remarks  

•  Turning  an  NFA  to  a  DFA  is  expensive,  but  –  Exponen(al  in  the  worst  case  –   In  prac(ce,  works  fine  

•  The  construc(on  is  done  once  per-­‐language  – At  Compiler  construc(on  (me  – Not  at  compila(on  (me  

62

Page 63: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Implementa(on  

63

Page 64: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Implementa(on  by  Example  if          {  return  IF;  }  [a-­‐z][a-­‐z0-­‐9]*        {  return  ID;  }  [0-­‐9]+            {  return  NUM;  }  [0-­‐9]”.”[0-­‐9]+|[0-­‐9]*”.”[0-­‐9]+  {  return  REAL;  }  (\-­‐\-­‐[a-­‐z]*\n)|(“    “|\n|\t)    {  ;  }  .          {  error();  }  

64

if  

xy,  i,  zs98  

3,32,  032  

0.55,  33.1  

-­‐-­‐comm\n  \n,  \t,  “  “   ID  

IF  

ID   error   REAL  

NUM   REAL  

error   w.s.  error  w.s.  

01

2 3

9 10 11 12

Page 65: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

int  edges[][256]=  {            /*  …,  0,  1,  2,  3,  ...,  -­‐,  e,  f,  g,  h,  i,  j,    ...  */  

/*  state  0    */    {0,  …,  0,  0,  …,    0,  0,  0,  0,  0,  ...,  0,  0,  0,  0,  0,  0},  /*  state  1    */    {13,  …  ,  7,  7,  7,  7,    …,    9,    4,  4,  4,  4,  2,  4,  …,  13,  13},  /*  state  2    */    {0,    …,  4,  4,  4,  4,    ...,  0,    4,  3,  4,  4,  4,  4,  …,  0,  0},  /*  state  3    */    {0,  …,    4,  4,  4,  4,    …,  0,    4,  4,  4,  4,  4,  4,  ,  0,  0},  /*  state  4    */  {0,  …,    4,  4,  4,  4,      …,  0,    4,  4,  4,  4,  4,  4,  …,  0,  0},    /*  state  5    */  {0,  …,    6,  6,  6,  6,      …,  0,    0,  0,  0,  0,  0,  0,  …,    0,  0},  /*  state  6    */    {0,    …,    6,  6,  6,  6,    …,  0,    0,  0,  0,  0,  0,  0,  ...,  0,  0},  /*  state  7    */  /*  state  …    */      ...  /*  state  13  */  {0,    …,    0,  0,  0,  0,    …,  0,    0,  0,  0,  0,  0,  0,  …,    0,  0}  };   65

ID  

IF  

ID   error   REAL  

NUM   REAL  

error   w.s.  error  w.s.  

01

2 3

9 10 11 12

Page 66: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Pseudo  Code  for  Scanner  char*  input  =  …  ;    Token  nextToken()  {          lastFinal  =  0;            currentState  =  1  ;          inputPositionAtLastFinal  =  input;            currentPosition  =  input;            while  (not(isDead(currentState)))    {    

 nextState  =  edges[currentState][*currentPosition];        if    (isFinal(nextState))  {                lastFinal  =  nextState  ;                  inputPositionAtLastFinal  =  currentPosition;                  }      currentState  =  nextState;          advance  currentPosition;            }          input  =  inputPositionAtLastFinal  +  1;          return  action[lastFinal];    }  

66

Page 67: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example  

Input:  “if      -­‐-­‐not-­‐a-­‐com”  

67

2  blanks  

ID  

IF  

ID   error   REAL  

NUM   REAL  

error   w.s.  error  w.s.  

01

2 3

9 10 11 12

Page 68: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

final   state   input  

0   1   if    -­‐-­‐not-­‐a-­‐com  

2   2   if  -­‐-­‐not-­‐a-­‐com  

3   3   if    -­‐-­‐not-­‐a-­‐com  

3   0   if    -­‐-­‐not-­‐a-­‐com    

return  IF  

68

ID  

IF  

ID   error   REAL  

NUM   REAL  

error   w.s.  error  w.s.  

01

2 3

9 10 11 12

Page 69: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

found  whitespace  

final state input

0 1 --not-a-com

12 12 --not-a-com

12 12 --not-a-com

12 0 --not-a-com

69

Page 70: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

final   state   input  

0   1   -­‐-­‐not-­‐a-­‐com  

9   9   -­‐-­‐not-­‐a-­‐com  

9   10   -­‐-­‐not-­‐a-­‐com  

9   10   -­‐-­‐not-­‐a-­‐com  

9   10   -­‐-­‐not-­‐a-­‐com  

9   0   -­‐-­‐not-­‐a-­‐com  error      

70

ID  

IF  

ID   error   REAL  

NUM   REAL  

error   w.s.  error  w.s.  

01

2 3

9 10 11 12

Page 71: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

final   state   input  

0   1   -­‐not-­‐a-­‐com  

9   9   -­‐not-­‐a-­‐com  

9   0   -­‐not-­‐a-­‐com  

9   0   -­‐not-­‐a-­‐com  

9   0   -­‐not-­‐a-­‐com  

error      

71

ID  

IF  

ID   error   REAL  

NUM   REAL  

error   w.s.  error  w.s.  

01

2 3

9 10 11 12

Page 72: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Concluding  remarks  

•  Efficient  scanner  • Minimiza(on  •  Error  handling  •  Automa(c  crea(on  of  lexical  analyzers  

72

Page 73: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Efficient  Scanners  

•  Efficient  state  representa(on  •  Input  buffering  •  Using  switch  and  gotos  instead  of  tables  

73

Page 74: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Minimiza(on  

•  Create  a  non-­‐determinis(c  automaton  (NDFA)  from  every  regular  expression  

•  Merge  all  the  automata  using  epsilon  moves  (like  the  |  construc(on)  

•  Construct  a  determinis(c  finite  automaton  (DFA)  –  State  priority  

•  Minimize  the  automaton  –  separate  accep(ng  states  by  token  kinds  

74

Page 75: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example  if        {  return  IF;  }  [a-­‐z][a-­‐z0-­‐9]*    {  return  ID;  }  [0-­‐9]+        {  return  NUM;  }  

75 Modern  compiler  implementa(on  in  ML,  Andrew  Appel,  (c)1998,  Figures  2.7,2.8  

ID  IF  

error  NUM  

Page 76: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example  if        {  return  IF;  }  [a-­‐z][a-­‐z0-­‐9]*    {  return  ID;  }  [0-­‐9]+        {  return  NUM;  }  

76 Modern  compiler  implementa(on  in  ML,  Andrew  Appel,  (c)1998,  Figures  2.7,2.8  

ID  IF  

error  

NUM  

ID  

NUM  

ID  

ID  IF  

error  NUM  

Page 77: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example  

77

ID  IF  

error  NUM  

ID  IF  

error  

NUM  

ID  

NUM  

ID  

if        {  return  IF;  }  [a-­‐z][a-­‐z0-­‐9]*    {  return  ID;  }  [0-­‐9]+        {  return  NUM;  }  

ID  IF  

error  NUM  

ID  

ID  

ID  

IF  

NUM   NUM  

error  

Modern  compiler  implementa(on  in  ML,  Andrew  Appel,  (c)1998,  Figures  2.7,2.8  

Page 78: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example  

78

if        {  return  IF;  }  [a-­‐z][a-­‐z0-­‐9]*    {  return  ID;  }  [0-­‐9]+        {  return  NUM;  }  

ID  IF  

error  NUM  

ID  

ID  

ID  

IF  

NUM   NUM  

error  

Modern  compiler  implementa(on  in  ML,  Andrew  Appel,  (c)1998,  Figures  2.7,2.8  

Page 79: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Error  Handling  •  Many  errors  cannot  be  iden(fied  at  this  stage  •  Example:  “fi  (a==f(x))”.  Should  “fi”  be  “if”?  Or  is  it  a  rou(ne  name?    

–  We  will  discover  this  later  in  the  analysis  –  At  this  point,  we  just  create  an  iden(fier  token  

•  Some(mes  the  lexeme  does  not  match  any  pacern    –  Easiest:  eliminate  lecers  un(l  the  beginning  of  a  legi(mate  lexeme    –  Alterna(ves:  eliminate/add/replace  one  lecer,  replace  order  of  two  adjacent  

lecers,  etc.    

•  Goal:  allow  the  compila(on  to  con(nue  •  Problem:  errors  that  spread  all  over  

79

Page 80: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Automa(cally  generated  scanners  

•  Use  of  Program-­‐Genera(ng  Tools  –  Specifica(on  è  Part  of  compiler    –  Compiler-­‐Compiler  

Stream  of  tokens  

JFlex  regular  expressions  

input  program   scanner  80

Page 81: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Use  of  Program-­‐Genera(ng  Tools  

•  Input:    regular  expressions  and  ac(ons    •  Ac(on  =  Java  code  

•  Output:  a  scanner  program  that  •  Produces  a  stream  of  tokens  •  Invoke  ac(ons  when  pacern  is  matched  

Stream  of  tokens  

JFlex  regular  expressions  

input  program   scanner  81

Page 82: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Line  Coun(ng  Example  

•  Create  a  program  that  counts  the  number  of  lines  in  a  given  input  text  file  

82

Page 83: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Crea(ng  a  Scanner  using  Flex  

int  num_lines  =  0;  %%  \n            ++num_lines;  .              ;  %%  main()  {      yylex();      printf(  "#  of  lines  =  %d\n",  num_lines);  }      

83

Page 84: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Crea(ng  a  Scanner  using  Flex  

ini(al  

other  

newline  \n  

^\n  

int  num_lines  =  0;  %%  \n            ++num_lines;  .              ;  %%  main()  {      yylex();      printf(  "#  of  lines  =  %d\n",  num_lines);  }      

84

Page 85: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

JFLex  Spec  File              User  code:  Copied  directly  to  Java  file  %%              JFlex  direc(ves:  macros,  state  names  %%              Lexical  analysis  rules:  

–  Op(onal  state,  regular  expression,  ac(on  –  How  to  break  input  to  tokens  –  Ac(on  when  token  matched    

       

Possible  source  of  javac  errors  down  

the  road  

DIGIT=  [0-­‐9]  LETTER=  [a-­‐zA-­‐Z]  

 YYINITIAL    

{LETTER}  ({LETTER}|{DIGIT})*  

85

Page 86: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Crea(ng  a  Scanner  using  JFlex    import  java_cup.runtime.*;  %%  %cup  %{      private  int  lineCounter  =  0;  %}    %eofval{      System.out.println("line  number="  +  lineCounter);      return  new  Symbol(sym.EOF);  %eofval}    NEWLINE=\n  %%  {NEWLINE}    {  lineCounter++;  }    [^{NEWLINE}]    {  }  

86

Page 87: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Catching  errors

• What  if  input  doesn’t  match  any  token  defini(on?  

•  Trick:  Add  a  “catch-­‐all”  rule  that  matches  any  character  and  reports  an  error  – Add  aqer  all  other  rules

87

Page 88: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

A  JFlex  specifica(on  of  C  Scanner  import  java_cup.runtime.*;  %%  %cup  %{      private  int  lineCounter  =  0;  %}  Letter=  [a-­‐zA-­‐Z_]  Digit=  [0-­‐9]  %%  ”\t”  {  }  ”\n”          {  lineCounter++;  }  “;”          {  return  new    Symbol(sym.SemiColumn);}  “++”        {  return  new    Symbol(sym.PlusPlus);  }  “+=”      {  return  new    Symbol(sym.PlusEq);  }  “+”          {  return  new    Symbol(sym.Plus);  }  “while”  {  return  new    Symbol(sym.While);  }  {Letter}({Letter}|{Digit})*      

 {  return  new    Symbol(sym.Id,  yytext()  );  }  “<=”        {  return  new    Symbol(sym.LessOrEqual);  }  “<”          {  return  new    Symbol(sym.LessThan);  }  

88

Page 89: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Missing  

•  Crea(ng  a  lexical  analysis  by  hand  •  Table  compression  •  Symbol  Tables  •  Nested  Comments  •  Handling  Macros  

89

Page 90: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Lexical  Analysis:  What  

•  Input:  program  text  (file)  •  Output:  sequence  of  tokens  

90

Page 91: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Lexical  Analysis:  How  

•  Define  tokens  using  regular  expressions  

•  Construct  a  nondeterminis(c  finite-­‐state  automaton  (NFA)  from  regular  expression  

•  Determinize  the  NFA  into  a  determinis(c  finite-­‐state  automaton  (DFA)  

•  DFA  can  be  directly  used  to  iden(fy  tokens    91

Page 92: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Lexical  Analysis:  Why  

•  Read  input  file  •  Iden(fy  language  keywords  and  standard  iden(fiers  •  Handle  include  files  and  macros  •  Count  line  numbers  •  Remove  whitespaces  •  Report  illegal  symbols    

•  [Produce  symbol  table]  

92

Page 93: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Syntax  Analysis  (1)  

Context  Free  Languages  Context  Free  Grammars  Pushdown  Automata    

93

Page 94: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

The  Real  Anatomy  of  a  Compiler  

Executable

code

exe

Source

text

txt Lexical Analysis

Sem. Analysis

Process text input

characters Syntax Analysis tokens AST

Intermediate code

generation

Annotated AST

Intermediate code

optimization

IR Code generation IR

Target code optimization

Symbolic Instructions

SI Machine code generation

Write executable

output

MI

94

Lexical Analysis

Syntax Analysis

Page 95: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Frontend:  Scanning  &  Parsing((23  +  7)  *  x)  

) x * ) 7 + 23 ( (

RP Id OP RP Num OP Num LP LP

Lexical    Analyzer  

program  text  

token  stream  

Parser  Grammar:      E  →  ...  |  Id      Id  →  ‘a’  |  ...  |  ‘z’    

Op(*)  

Id(b)  

Num(23)   Num(7)  

Op(+)  

Abstract  Syntax  Tree  

valid  syntax  error  

95

Page 96: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

From  scanning  to  parsing((23  +  7)  *  x)  

) x * ) 7 + 23 ( (

RP Id OP RP Num OP Num LP LP

Lexical    Analyzer  

program  text  

token  stream  

Parser  Grammar:      E  →  ...  |  Id      Id  →  ‘a’  |  ...  |  ‘z’    

Op(*)  

Id(b)  

Num(23)   Num(7)  

Op(+)  

Abstract  Syntax  Tree  

valid  syntax  error  

96

Page 97: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Parsing  •  Construct  a  structured  representa(on  of  the  input  

•  Challenges  –  How  do  you  describe  the  programming  language?  –  How  do  you  check  validity  of  an  input?  

•  Is  a  sequence  of  tokens  a  valid  program  in  the  language?  

–  How  do  you  construct  the  structured  representa(on?    – Where  do  you  report  an  error?  

97

Page 98: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Some  founda(ons  

98

Page 99: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Context  free  languages  (CFLs)  

•  L01  =  {  0n1n  |  n  >  0  }    

•  Lpolyndrom  =  {pp’  |  p  ∊  Σ*  ,  p’=reverse(p)}  

•  Lpolyndrom#  =  {p#p’  |  p  ∊  Σ*  ,  p’=reverse(p),  #  ∉  Σ}  

   

99

Page 100: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Context  free  grammars  (CFG)  

•  V  –  non  terminals  (syntac(c  variables)  •  T  –  terminals  (tokens)  •  P  –  deriva(on  rules  

–  Each  rule  of  the  form  V  t  (T  4  V)*  

•  S  –  start  symbol    

G  =  (V,T,P,S)  

100

Page 101: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

What  can  CFGs  do?  

•  Recognize  CFLs  

•  S  t  0T1  •  T  t  0T1  |  ℇ  

101

Page 102: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

~ language-­‐defining  power  

Recognizing  CFLs  

•  Context  Free  Grammars  (CFG)  

•  Nondeterminis(c  push  down  automata  (PDA)  

102

Page 103: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Pushdown  Automata  (PDA)  

•  Nondeterminis(c  PDAs  define  all  CFLs    

•  Determinis(c  PDAs  model  parsers.  – Most  programming  languages  have  a  determinis(c  PDA  

–  Efficient  implementa(on  

103

Page 104: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Intui(on:  PDA  

•  An  ε-­‐NFA  with  the  addi(onal  power  to    manipulate  one  stack  

104

stack  

X

Y

IF

$

Top  

control  (ε-­‐NFA)    

Page 105: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Intui(on:  PDA  

•  Think  of  an  ε-­‐NFA  with  the  addi(onal  power  that  it  can  manipulate  a  stack  

•  PDA  moves  are  determined  by:  –  The  current  state  (of  its  “ε-­‐NFA”)  –  The  current  input  symbol  (or  ε)  –  The  current  symbol  on  top  of  its  stack  

105

Page 106: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Intui(on:  PDA  

input  

stack  

if  (oops)  then  stat:=  blah  else  abort

X

Y

IF

$

Top  

Current  

control  (ε-­‐NFA)    

106

Page 107: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Intui(on:  PDA  

• Moves:  –  Change  state  –  Replace  the  top  symbol  by  0…n  symbols  

• 0  symbols  =  “pop”  (“reduce”)  • 0  <  symbols  =  sequence  of  “pushes”  (“shiq”)  

•  Nondeterminis(c  choice  of  next  move    

107

Page 108: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

PDA  Formalism  

•  PDA  =  (Q,  Σ,  Γ,  δ,  q0,  $,  F):  – Q:  finite  set  of  states  –  Σ:  Input  symbols  alphabet      –  Γ:  stack  symbols  alphabet  –  δ:  transi(on  func(on  –  q0:  start  state      –  $:  start  symbol  –  F:  set  of  final  states  

108

Page 109: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

The  Transi(on  Func(on  

•  δ(q,  a,  X)    =  {  (p1,  σ1),  …  ,(pn,  σn)}  –  Input:  triplet  

• A  state  q  ∊  Q  • An  input  symbol  a  ∊  Σ  or  ε  • A  stack  symbol  X    ∊    Γ  

– Output:  set  of  0  …  k  ac(ons  of  the  form  (p,  σ)  • A  state  p  ∊  Q  • σ  a  sequence  X1⋯Xn  ∊  Γ*  of  stack  symbols  

109

Page 110: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  PDA  

•  Say  (p,  σ)  ∊  δ(q,  a,  X)    –  If  the  PDA  is  in  state  q  and  X  is  the  top  symbol  and  a    is  at  the  front  of  the  input  

–  Then  it  can  • Change  the  state  to  p.  • Remove  a    from  the  front  of  the  input    

–  (but  a    may  be  ε).  

• Replace  X  on  the  top  of  the  stack  by  σ.  

110

Page 111: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example:  Determinis(c  PDA  

•  Design  a  PDA  to  accept  {0n1n  |  n  >  1}.  •  The  states:  

–  q  =  We  have  not  seen  1  so  far    • start  state      

–  p  =  we  have  seen  at  least  one  1  and  no  0s  since  –  f  =  final  state;  accept.  

111

Page 112: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example:  Stack  Symbols  

•  $  =  start  symbol.      – Also  marks  the  bocom  of  the  stack,    –  Indicates  when  we  have  counted  the  same  number  of  1’s  as  0’s.  

•  X  =  “counter”    –  used  to  count  the  number  of  0s  we  saw  

112

Page 113: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example:  Transi(ons  

•  δ(q,  0,  $)  =  {(q,  X$)}.  •  δ(q,  0,  X)  =  {(q,  XX)}.      

–  These  two  rules  cause  one  X  to  be  pushed  onto  the  stack  for  each  0  read  from  the  input.  

•  δ(q,  1,  X)  =  {(p,  ε)}.      –  When  we  see  a  1,  go  to  state  p  and  pop  one  X.  

•  δ(p,  1,  X)  =  {(p,  ε)}.    –  Pop  one  X  per  1.  

•  δ(p,  ε,  $)  =  {(f,  $)}.    –  Accept  at  bocom.   113

Page 114: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  Example  PDA  

q

0 0 0 1 1 1

$

114

Page 115: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  Example  PDA  

q

X $

0 0 0 1 1 1

115

Page 116: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  Example  PDA  

q

X X $

0 0 0 1 1 1

116

Page 117: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  Example  PDA  

q

X X X $

0 0 0 1 1 1

117

Page 118: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  Example  PDA  

p

X X $

0 0 0 1 1 1

118

Page 119: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  Example  PDA  

p

X $

0 0 0 1 1 1

119

Page 120: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  Example  PDA  

p

$

0 0 0 1 1 1

120

Page 121: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Ac(ons  of  the  Example  PDA  

f

$

0 0 0 1 1 1

121

Page 122: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

Example:  Non  Determinis(c  PDA  

•  A  PDA  that  accepts  palindromes    –  L  {pp’  ∊  Σ*  |  p’=reverse(p)}  

122

Page 123: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · code exe Source text txt Lexical Analysis Sem . Analysis Process text input characters Syntax Analysis tokens AST

123