Upload
danae
View
35
Download
0
Embed Size (px)
DESCRIPTION
Parsing. Discrete Mathematics and Its Applications Baojian Hua [email protected]. Derivations. A string is valid in a language if and only if there exists a derivation from the start state which produces it Begin with the start symbol, and apply grammar rules until you produce the string - PowerPoint PPT Presentation
Citation preview
Derivations A string is valid in a language if
and only if there exists a derivation from the start state which produces it
Begin with the start symbol, and apply grammar rules until you produce the string Note that the final string (sentence)
consists of only terminals
Question
Given a formal grammar G and a sentence (program) p, is p derivable from grammar G ?
Or equivalently, is a given program p valid according to some language’s syntax (say C)?
Example: Context-Free Grammar
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
// derivable?
xum
Example: Context-Free Grammar
// derivable?
xum
xuwz
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
Example: Context-Free Grammar
// derivable?
xum
xuwz
xwu
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
Example: Context-Free Grammar
// derivable?
xum
xuwz
xwu
xuz
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
Lexical Analyzer The lexical analyzer translates the
source program into a stream of lexical tokens Source program:
stream of (ASCII or Unicode) characters Lexical token:
compiler data structure that represents the occurrence of a terminal symbol
Valid sentence consists of only allowable terminals
Example: Context-Free Grammar
// all terminals
T={x, y, u, v, t, w, z}
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
Example: Context-Free Grammar
// all terminals
T={x, y, u, v, t, w, z}
// allowable stringsT*
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
Predictive Parsing Parsing: recognizing a string and do
something useful The most naïve approach to use
when implementing a parser is to use recursive descent
A form of top-down parsing Not as powerful as other methods,
but easy enough to implement by hand
Predictive Parsing
// Valid?
xum
xuwz
xwu
xuz
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
A Predictive Parser in C (Sketch)tokenTy token;
void parseS (){ switch (token.kind) { case x: token = nextToken (); parseA (); break; case y: token = nextToken (); parseB (); break; default: error (…); } }// other functions are similar
Output:Abstract Syntax Tree
xuz
S
x A
u C
z
A Predictive Parser Emitting AST in C (Sketch)tokenTy token;
S parseS (){ switch (token.kind) { case x: token = nextToken (); a=parseA (); return newS1 (x, a); case y: token = nextToken (); b=parseB (); return newS2 (y, b); default: error (…); } }// other functions are similar
Predictive Parsing Difficulties
// derivable?
xuz
S ::= x A
| x B
A ::= u C
| v C
B ::= t
C ::= w
| z
E
By 4 => E * E
By 5 => E * (E + E)
By 2 => E * (E + 4)
By 2 => E * (3 + 4)
By 2 => 15 * (3 + 4)
Or Even Worse
1 E ::= id
2 | num
3 | E + E
4 | E * E
5 | ( E )
15*(3+4)
E
E * E
E * (E + E)
E * (E + 4)
E * (3 + 4)
15 * (3 + 4)
Or Even Worse15*(3+4)
E
E * E
15 * E
15 * (E + E)
15 * (3 + E)
15 * (3 + 4)rightmost derivation leftmost derivation
Ambiguous grammars
A grammar is ambiguous if there is a sentence with >1 parse tree
15 * 3 + 4E
E * E
15 E + E
3 4
E
E + E
15E * E
15 3
Eliminating ambiguity In programming language syntax,
ambiguity often arises from missing operator precedence or associativity * higher precedence than +? * and + are left associative?
Can sometimes rewrite the grammar to disambiguate this Beyond the scope of this course
Unambiguous Grammar
E ::= id
| num
| E + E
| E * E
| ( E )
E ::= E + T
| T
T ::= T * F
| F
F ::= id
| num
| ( E )Accepts the same language, but parses unambiguously
Limitations with Predictive Parsing
Rewriting grammar: to resolve ambiguity
Grammars/trees are ugly But…easy to write code by hand,
and very good for error reporting
Doing better We can do better We can use a parsing algorithm
that can handle all context-free languages (though not all context-free
grammars) Remember: a context-free language
might have many different context-free grammars
The Yacc Toolsemantic analyzer
specification
parser
YaccOriginally developed for C, and now almost every main-st
ream language has its own Yacc-tool:
bison (C), ml-yacc (SML), Cup (Java), GPPG (C#), …
Whole Structure
source code
abstract syntax
tree
lexical analyzer
parser
tokens
Pentium
other
part