26
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering

1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering

  • View
    226

  • Download
    3

Embed Size (px)

Citation preview

1

CSC 3130: Automata theory and formal languages

Tutorial 4

KN Hung

Office: SHB 1026Department of Computer Science & Engineering

2

Agenda

• Context Free Grammar (CFG)– Design– Parse Tree

• Cocke-Younger-Kasami (CYK) algorithm– Parsing CFG in normal form

• Pushdown Automata (PDA)– Design

3

Context-Free Grammar (Recap)

• A context free grammar is consisted of

S AB | ba

A aA | a

B b

4) Start Variable

3) Production Rule

Another Production Rule

2) Terminal1) Variable

4

Context-Free Grammar (Recap)

• A string is said to belong to the language (of the CFG) if it can be derived from the start variable

S AB | ba

A aA | a

B b

AB

aAB

aaB

aab

CFG Example

S

Derivation

Therefore, aab belongs to the language

= Apply Production Rule

5

Why CFG?

• L = {w = 0n1n : n is an positive integer}

• L is not a regular language– Proved by “Pumping Lemma”

• A Context-Free Grammar can describe it

• Thus, CFG is more general than regular expression– NFA Regular Expression DFA

S 0S1

S 01

6

CFG Design

• Given a context-free language, design the CFG

• L = { ab-string, w : Number of a’s < Number of b’s }

• Some time for you to get into think… 1 min

S ?

7

CFG Design (Con’t)

• Trial: Bottom-up– Shortest string in L : “b”– Given a string in L, we can expand it, s.t. it is

still in L– i.e., Add terminals, while not violating the cons

traints

8

CFG Design (Con’t)

One Wrong Trial:

S b

S bS | Sb

S abS | baS | bSa | aSb

After adding 1 “b”, number of “b” is still greater than that of “a”

Adding 1 “a” and 1 “b”, the difference between the numbers of “a” and “b” keep constant

However, cannot parse strings like “aabbbbbaa”

9

CFG Design (Con’t)

Approach 1:

S b

S SS

S SaS | aSS | SSa

Base Case

#b still > #a

: #b ≥ #a + 1

: #b ≥ #a + 1

: #a = 1

#b ≥ #a + 2 - 1

1st S

2nd S

That a

But, is it sufficient to say the grammar is correct?

10

CFG Design (Con’t)

Approach 2:

• Start with the grammar for ab-strings with same number of a’s and b’s

• Call the start symbol of this grammar E

• Now, we generate all strings of typeEbE | EbEbE | EbEbEbE | …

• Thus, we have the grammar…

11

CFG Design (Con’t)

Approach 2 (Con’t):

S EbET

T bET | ε

E …

For the pattern : EbE | EbEbE | …

E generates ab-strings with same number of a’s and b’s(c.f. “09L7.pdf” – Slide #32)

12

CFG Design (Con’t)

• After designing the grammar, G, you may have to prove (if required) that the language of this grammar is equivalent to the given language

• i.e., Prove that L(G) = L• Proof

Part 1) L(G) L⊂Part 2) L L(G)⊂

• Due to time limit, I will not do this part

13

Parse Tree

• How to parse “aab” in this grammar? (Previous example)

S AB | ba

A aA | a

B b

CFG Example

S AB

aAB

aaB

aab

Derivation

14

Parse Tree (Con’t)

• Idea: Production Rule = Node + Children

• Should be very intuitive to understand

AB

aAB

aaB

aab

S

Derivation S

B

b

a

Aa

A

15

Parse Tree (Con’t)

• Ambiguity:S

S

2

1

AS

S

-

3

-

S

S

3

2

SS

S

-

1

-

S - S

1 | 2 | 3

S

S

3 - 1 - 2String:

CFG:

3 – 1 – 2 3 – (1 – 2)

16

Parse Tree (Con’t)

• Useful in programming language– CSC3180

• Useful in compiler– CSC3120

17

Cocke-Younger-Kasami Algorithm

• Used to parse context-free grammar in Chomsky normal form (or simply normal form)

S AB | BC

A BA | a

B CC | b

C AB | a

Example

Every production is of type

1) X YZ

2) X a

3) S ε

Normal Form

18

CYK Algorithm - Idea

• = Algorithm 2 in Lecture Note (09L8.pdf)

• Idea: Bottom Up Parsing

• Algorithm:Given a string s of length N

For k = 1 to N

For every substring of length k

Determine what variable(s) can derive it

• sub(x,y) : starts at index x, ends at index y

19

CYK Algorithm - Init

• Base Case : k = 1– The possible choices of variable(s) can be kn

own by scanning through each production

S AB | BC

A BA | a

B CC | b

C AB | a

We want to parse this string

ab a b a

B A,C A,C B A,C

20

i.e., “aab” = sub(2,4)

2

3

ab

Length of Substring

Start Index of Substring

B A,C A,C B A,C

CYK Algorithm – Table

a b a

• Each cell: Variables deriving the substringSubstring of length = 3

Starting with index = 2

21

• Possible:

BA | BC Variable A,S

– Since ABA, SBC

• When k = 2

• Example– sub(1,2) = “ba”– “ba” = “b” + “a”

= sub(1,1) + sub(2,2)

CYK Algorithm – Loop (k>1)

ab a b a

B A,C A,C B A,CS,A

S AB | BC

A BA | a

B CC | b

C AB | a

22

= sub(2,2) + sub(3,4)

= sub(2,3) + sub(4,4)S,C

A,CB A,C B A,CS,A B S,A• Possible:

AS, AC, CS, CC , BB

• For each substring– Decompose into two substrings

• Examplesub(2,4) = “aab”

CYK Algorithm – Loop (k>1)

ab a b a

S AB | BC

A BA | a

B CC | b

C AB | a

Therefore , B is put into the cell

23

CYK Algorithm – Loop (k>1)

• How about sub(3,5) ?

• Give you 1 min

ab a b a

B A,C A,C B A,CS,A B S,C S,A

S AB | BC

A BA | a

B CC | b

C AB | a

24

CYK Algorithm – Parse Tree

• Parse Tree is known from the table

• See “09L8.pdf” - Slide #21

Length of Substring

Start Index of Substring

B A,C A,C B A,C

S,A B S,C S,A

B B

S,A,C

S,A,C

ab a b a

S AB | BC

A BA | a

B CC | b

C AB | a

25

CYK Algorithm (Conclusion)

• Start from shortest substring to the longest– i.e., from single-character-string

to the whole string

• For Context-free grammar, G

1) Convert G into normal form• Remove ε-productions• Remove unit-productions

2) Apply CYK algorithm

• Con: Loss in intuition

26

End

• Thanks for coming! =]

• Any questions?