25
Parsing Discrete Mathematics a nd Its Applications Baojian Hua [email protected]

Parsing

  • Upload
    danae

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Parsing. Discrete Mathematics and Its Applications Baojian Hua [email protected]. Derivations. A string is valid in a language if and only if there exists a derivation from the start state which produces it Begin with the start symbol, and apply grammar rules until you produce the string - PowerPoint PPT Presentation

Citation preview

Page 1: Parsing

Parsing

Discrete Mathematics andIts Applications

Baojian [email protected]

Page 2: Parsing

Derivations A string is valid in a language if

and only if there exists a derivation from the start state which produces it

Begin with the start symbol, and apply grammar rules until you produce the string Note that the final string (sentence)

consists of only terminals

Page 3: Parsing

Question

Given a formal grammar G and a sentence (program) p, is p derivable from grammar G ?

Or equivalently, is a given program p valid according to some language’s syntax (say C)?

Page 4: Parsing

Example: Context-Free Grammar

S ::= x A

| y B

A ::= u C

| v C

B ::= t

C ::= w

| z

// derivable?

xum

Page 5: Parsing

Example: Context-Free Grammar

// derivable?

xum

xuwz

S ::= x A

| y B

A ::= u C

| v C

B ::= t

C ::= w

| z

Page 6: Parsing

Example: Context-Free Grammar

// derivable?

xum

xuwz

xwu

S ::= x A

| y B

A ::= u C

| v C

B ::= t

C ::= w

| z

Page 7: Parsing

Example: Context-Free Grammar

// derivable?

xum

xuwz

xwu

xuz

S ::= x A

| y B

A ::= u C

| v C

B ::= t

C ::= w

| z

Page 8: Parsing

Lexical Analyzer The lexical analyzer translates the

source program into a stream of lexical tokens Source program:

stream of (ASCII or Unicode) characters Lexical token:

compiler data structure that represents the occurrence of a terminal symbol

Valid sentence consists of only allowable terminals

Page 9: Parsing

Example: Context-Free Grammar

// all terminals

T={x, y, u, v, t, w, z}

S ::= x A

| y B

A ::= u C

| v C

B ::= t

C ::= w

| z

Page 10: Parsing

Example: Context-Free Grammar

// all terminals

T={x, y, u, v, t, w, z}

// allowable stringsT*

S ::= x A

| y B

A ::= u C

| v C

B ::= t

C ::= w

| z

Page 11: Parsing

Predictive Parsing Parsing: recognizing a string and do

something useful The most naïve approach to use

when implementing a parser is to use recursive descent

A form of top-down parsing Not as powerful as other methods,

but easy enough to implement by hand

Page 12: Parsing

Predictive Parsing

// Valid?

xum

xuwz

xwu

xuz

S ::= x A

| y B

A ::= u C

| v C

B ::= t

C ::= w

| z

Page 13: Parsing

A Predictive Parser in C (Sketch)tokenTy token;

void parseS (){ switch (token.kind) { case x: token = nextToken (); parseA (); break; case y: token = nextToken (); parseB (); break; default: error (…); } }// other functions are similar

Page 14: Parsing

Output:Abstract Syntax Tree

xuz

S

x A

u C

z

Page 15: Parsing

A Predictive Parser Emitting AST in C (Sketch)tokenTy token;

S parseS (){ switch (token.kind) { case x: token = nextToken (); a=parseA (); return newS1 (x, a); case y: token = nextToken (); b=parseB (); return newS2 (y, b); default: error (…); } }// other functions are similar

Page 16: Parsing

Predictive Parsing Difficulties

// derivable?

xuz

S ::= x A

| x B

A ::= u C

| v C

B ::= t

C ::= w

| z

Page 17: Parsing

E

By 4 => E * E

By 5 => E * (E + E)

By 2 => E * (E + 4)

By 2 => E * (3 + 4)

By 2 => 15 * (3 + 4)

Or Even Worse

1 E ::= id

2 | num

3 | E + E

4 | E * E

5 | ( E )

15*(3+4)

Page 18: Parsing

E

E * E

E * (E + E)

E * (E + 4)

E * (3 + 4)

15 * (3 + 4)

Or Even Worse15*(3+4)

E

E * E

15 * E

15 * (E + E)

15 * (3 + E)

15 * (3 + 4)rightmost derivation leftmost derivation

Page 19: Parsing

Ambiguous grammars

A grammar is ambiguous if there is a sentence with >1 parse tree

15 * 3 + 4E

E * E

15 E + E

3 4

E

E + E

15E * E

15 3

Page 20: Parsing

Eliminating ambiguity In programming language syntax,

ambiguity often arises from missing operator precedence or associativity * higher precedence than +? * and + are left associative?

Can sometimes rewrite the grammar to disambiguate this Beyond the scope of this course

Page 21: Parsing

Unambiguous Grammar

E ::= id

| num

| E + E

| E * E

| ( E )

E ::= E + T

| T

T ::= T * F

| F

F ::= id

| num

| ( E )Accepts the same language, but parses unambiguously

Page 22: Parsing

Limitations with Predictive Parsing

Rewriting grammar: to resolve ambiguity

Grammars/trees are ugly But…easy to write code by hand,

and very good for error reporting

Page 23: Parsing

Doing better We can do better We can use a parsing algorithm

that can handle all context-free languages (though not all context-free

grammars) Remember: a context-free language

might have many different context-free grammars

Page 24: Parsing

The Yacc Toolsemantic analyzer

specification

parser

YaccOriginally developed for C, and now almost every main-st

ream language has its own Yacc-tool:

bison (C), ml-yacc (SML), Cup (Java), GPPG (C#), …

Page 25: Parsing

Whole Structure

source code

abstract syntax

tree

lexical analyzer

parser

tokens

Pentium

other

part