View
226
Download
3
Embed Size (px)
Citation preview
1
CSC 3130: Automata theory and formal languages
Tutorial 4
KN Hung
Office: SHB 1026Department of Computer Science & Engineering
2
Agenda
• Context Free Grammar (CFG)– Design– Parse Tree
• Cocke-Younger-Kasami (CYK) algorithm– Parsing CFG in normal form
• Pushdown Automata (PDA)– Design
3
Context-Free Grammar (Recap)
• A context free grammar is consisted of
S AB | ba
A aA | a
B b
4) Start Variable
3) Production Rule
Another Production Rule
2) Terminal1) Variable
4
Context-Free Grammar (Recap)
• A string is said to belong to the language (of the CFG) if it can be derived from the start variable
S AB | ba
A aA | a
B b
AB
aAB
aaB
aab
CFG Example
S
Derivation
Therefore, aab belongs to the language
= Apply Production Rule
5
Why CFG?
• L = {w = 0n1n : n is an positive integer}
• L is not a regular language– Proved by “Pumping Lemma”
• A Context-Free Grammar can describe it
• Thus, CFG is more general than regular expression– NFA Regular Expression DFA
S 0S1
S 01
6
CFG Design
• Given a context-free language, design the CFG
• L = { ab-string, w : Number of a’s < Number of b’s }
• Some time for you to get into think… 1 min
S ?
…
7
CFG Design (Con’t)
• Trial: Bottom-up– Shortest string in L : “b”– Given a string in L, we can expand it, s.t. it is
still in L– i.e., Add terminals, while not violating the cons
traints
8
CFG Design (Con’t)
One Wrong Trial:
S b
S bS | Sb
S abS | baS | bSa | aSb
After adding 1 “b”, number of “b” is still greater than that of “a”
Adding 1 “a” and 1 “b”, the difference between the numbers of “a” and “b” keep constant
However, cannot parse strings like “aabbbbbaa”
9
CFG Design (Con’t)
Approach 1:
S b
S SS
S SaS | aSS | SSa
Base Case
#b still > #a
: #b ≥ #a + 1
: #b ≥ #a + 1
: #a = 1
#b ≥ #a + 2 - 1
1st S
2nd S
That a
But, is it sufficient to say the grammar is correct?
10
CFG Design (Con’t)
Approach 2:
• Start with the grammar for ab-strings with same number of a’s and b’s
• Call the start symbol of this grammar E
• Now, we generate all strings of typeEbE | EbEbE | EbEbEbE | …
• Thus, we have the grammar…
11
CFG Design (Con’t)
Approach 2 (Con’t):
S EbET
T bET | ε
E …
For the pattern : EbE | EbEbE | …
E generates ab-strings with same number of a’s and b’s(c.f. “09L7.pdf” – Slide #32)
12
CFG Design (Con’t)
• After designing the grammar, G, you may have to prove (if required) that the language of this grammar is equivalent to the given language
• i.e., Prove that L(G) = L• Proof
Part 1) L(G) L⊂Part 2) L L(G)⊂
• Due to time limit, I will not do this part
13
Parse Tree
• How to parse “aab” in this grammar? (Previous example)
S AB | ba
A aA | a
B b
CFG Example
S AB
aAB
aaB
aab
Derivation
14
Parse Tree (Con’t)
• Idea: Production Rule = Node + Children
• Should be very intuitive to understand
AB
aAB
aaB
aab
S
Derivation S
B
b
a
Aa
A
15
Parse Tree (Con’t)
• Ambiguity:S
S
2
1
AS
S
-
3
-
S
S
3
2
SS
S
-
1
-
S - S
1 | 2 | 3
S
S
3 - 1 - 2String:
CFG:
3 – 1 – 2 3 – (1 – 2)
17
Cocke-Younger-Kasami Algorithm
• Used to parse context-free grammar in Chomsky normal form (or simply normal form)
S AB | BC
A BA | a
B CC | b
C AB | a
Example
Every production is of type
1) X YZ
2) X a
3) S ε
Normal Form
18
CYK Algorithm - Idea
• = Algorithm 2 in Lecture Note (09L8.pdf)
• Idea: Bottom Up Parsing
• Algorithm:Given a string s of length N
For k = 1 to N
For every substring of length k
Determine what variable(s) can derive it
• sub(x,y) : starts at index x, ends at index y
19
CYK Algorithm - Init
• Base Case : k = 1– The possible choices of variable(s) can be kn
own by scanning through each production
S AB | BC
A BA | a
B CC | b
C AB | a
We want to parse this string
ab a b a
B A,C A,C B A,C
20
i.e., “aab” = sub(2,4)
2
3
ab
Length of Substring
Start Index of Substring
B A,C A,C B A,C
CYK Algorithm – Table
a b a
• Each cell: Variables deriving the substringSubstring of length = 3
Starting with index = 2
21
• Possible:
BA | BC Variable A,S
– Since ABA, SBC
• When k = 2
• Example– sub(1,2) = “ba”– “ba” = “b” + “a”
= sub(1,1) + sub(2,2)
CYK Algorithm – Loop (k>1)
ab a b a
B A,C A,C B A,CS,A
S AB | BC
A BA | a
B CC | b
C AB | a
22
= sub(2,2) + sub(3,4)
= sub(2,3) + sub(4,4)S,C
A,CB A,C B A,CS,A B S,A• Possible:
AS, AC, CS, CC , BB
• For each substring– Decompose into two substrings
• Examplesub(2,4) = “aab”
CYK Algorithm – Loop (k>1)
ab a b a
S AB | BC
A BA | a
B CC | b
C AB | a
Therefore , B is put into the cell
23
CYK Algorithm – Loop (k>1)
• How about sub(3,5) ?
• Give you 1 min
ab a b a
B A,C A,C B A,CS,A B S,C S,A
S AB | BC
A BA | a
B CC | b
C AB | a
24
CYK Algorithm – Parse Tree
• Parse Tree is known from the table
• See “09L8.pdf” - Slide #21
Length of Substring
Start Index of Substring
B A,C A,C B A,C
S,A B S,C S,A
B B
S,A,C
S,A,C
ab a b a
S AB | BC
A BA | a
B CC | b
C AB | a
25
CYK Algorithm (Conclusion)
• Start from shortest substring to the longest– i.e., from single-character-string
to the whole string
• For Context-free grammar, G
1) Convert G into normal form• Remove ε-productions• Remove unit-productions
2) Apply CYK algorithm
• Con: Loss in intuition