CSCI 3130: Formal Languages and
Automata Theory
Tutorial 5Hung Chun Ho
Office: SHB 1026
Department of Computer Science & Engineering1
Agenda
• Cocke-Younger-Kasami (CYK) algorithm– Parsing CFG in normal form
• Pushdown Automata (PDA)– Design
2
CYK Algorithm
Bottom-up Parsing for normal form
3
Cocke-Younger-Kasami Algorithm
• Used to parse context-free grammar in Chomsky normal form (or simply normal form)
Every production is of type
1) X YZ
2) X a
3) S ε
Normal Form Example
S AB
A CC | a | c
B BC | b
C CB | BA | c
4
CYK Algorithm - Idea
• = Algorithm 2 in Lecture Note (10L8.pdf)• Idea: Bottom Up Parsing• Algorithm:
Given a string s of length NFor k = 1 to N
For every substring of length k Determine what variable(s) can derive it
5
CYK Algorithm - Example
• CFG
• Parse abbc
S AB
A CC | a | c
B BC | b
C CB | BA | c
6
CYK Algorithm – Idea (1)
• Idea: We parse the strings in this order:• Length-1 substring
abbcabbcabbcabbc
7
CYK Algorithm – Idea (1)
• Idea: We parse the strings in this order:• Length-2 substring
abbcabbcabbc
8
CYK Algorithm – Idea (1)
• Idea: We parse the strings in this order:• Length-3 substring
abbcabbc
• Length-4 substringabbc
• Done!
9
CYK Algorithm – Idea (2)
• Idea: Parsing of longer substrings depends on parsing of shorter substrings
• Example: abb may be decomposed as– ab + b– a + bb
• If we know how to parse ab and b (or, a and bb) then we know how to parse abb
10
CYK Algorithm – Substring
• Denote sub(i, j) := substring with start index = i and end index = j
• Example: For abbc, sub(2,4) = bbc• This notation is not to complicate things, but
just for the sake of convenience in the following discussion…
11
CYK Algorithm – Table
• Each cell corresponds to a substring• Store variables deriving the substring
Substring of length = 3Starting with index = 2
i.e., sub(2,3) = bbc
ba b c
Length of S
ubstring
Start Index of Substring 12
CYK Algorithm – Simulation
• Base Case : length = 1– The possible choices of variable(s) can be known
by scanning through each production
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c A B B A, C
13
A B B A, C
CYK Algorithm – Simulation
• Loop : length = 2– For each substring of length 2
• Decompose into shorter substrings• Check cells below it
S AB
A CC | a | c
B BC | b
C CB | BA | c
a b b c
ab Let’s parse this substring
14
CYK Algorithm – Simulation
• For sub(1,2) = ab, it can be decomposed:– ab = a + b
= sub(1,1) + sub(2,2)– Possible choices: AB– Scan rules
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c
: S
S
A B B A, C
15
CYK Algorithm – Simulation
• For sub(2,3) = bb, it can be decomposed:– bb = b + b
= sub(2,2) + sub(3,3)– Possible choices: BB– Scan rules
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c
16
: ∅
∅
No suitable rules are found The CFG cannot parse this substring
S A B B A, C
CYK Algorithm – Simulation
• For sub(3,4) = bc, it can be decomposed:– bc = b + c
= sub(3,3) + sub(4,4)– Possible choices: BA, BC– Scan rules
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c
17
: B, C
B, C
S ∅ A B B A, C
CYK Algorithm – Simulation
• For sub(1,3) = abb:– abb = ab + b
= sub(1,2) + sub(3,3)– Possible choices: SB– Scan rules
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c
18
: ∅
No suitable variables found yetBut, there is another way to decompose the string
S ∅ B,
C A B B A, C
CYK Algorithm – Simulation
• For sub(1,3) = abb:– abb = a + bb
= sub(1,1) + sub(2,3)– Possible choices: ∅– Scan rules
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c
19
Cant parse smaller substring Cant parse the string No need to scan rules
S ∅ B,
C A B B A, C
CYK Algorithm – Simulation
• For sub(1,3) = abb:– abb = sub(1,1) + sub(2,3) gives no valid parsing– abb = sub(1,2) + sub(3,3) gives no valid parsing
• Cannot parse
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c
20
∅
S ∅ B, C
A B B A, C
CYK Algorithm – Simulation
• For sub(2,4) = bbc:– bbc = sub(2,2) + sub(3,4)
• Possible choices: BB, BC
– bbc = sub(2,3) + sub(4,4)• Possible choices: ∅
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c
21
Variable: B
B
∅
S ∅ B,
C A B B A, C
CYK Algorithm – Simulation
• Finally, for sub(1,4) = abbc:– Possible choices:
•
– Variables:•
a b b c
S AB
A CC | a | c
B BC | b
C CB | BA | c
22
AB
S
, SB, SC
This cell represents the original string, and it consists S abbc is in the language
∅ B
S ∅ B,
C A B B A, C
CYK Algorithm – Parse Tree
• abbc is in the language!• How to obtain the parse tree?
– Tracing back the derivations:• sub(1,4) is derived using SAB from sub(1,1) and
sub(2,4)• sub(1,1) is derived using Aa• sub(2,4) is derived using BBC from sub(2,2) and
sub(3,4)• …
• So, record also the used derivations!
23
CYK Algorithm – Parse Tree
• Obtained from the table
a b b c
S ∅ B
S ∅ B,
C A B B A, C
24
CYK Algorithm – Conclusion
• A bottom up parsing algorithm– Dynamic Programming– Solution of a subproblem (parsing of a substring)
depends on that of smaller subproblems• Before employing CYK Algorithm, convert the
grammar into normal form– Remove ε-productions– Remove unit-productions
25
CYK Algorithm – DetailedD = “On input w = w1w2…wn:
If w = ε, and S ε is rule, AcceptFor i = 1 to n: For each variable A: Test whether A b is a rule, where b = wi.
If so, place A in table(i, i).For l = 2 to n: For i = 1 to n – l + 1: Let j = i + l – 1, For k = i to j – 1: For each rule A BC:If table(i,k) contains B and table(k+1, j) contains C
Put A in table(i, j)If S is in table (1,n), accept. Otherwise, reject.”
26
Pushdown Automata
NFA with infinite memory/states
27
Pushdown Automata
• PDA ~= NFA, with a stack of memory• Transition:
– NFA – Depends on input– PDA – Depends on input and top of stack
• Push a symbol to stack• Pop a symbol to stack• Read a terminal on string
• Transitions are non-deterministic
(possibly ε)(possibly ε)
(possibly ε)
28
Pushdown Automata and NFA
• Accept:– NFA – Go to an Accept state– PDA – Go to an Accept state
29
PDA – Example 1
• Given the following language:
• Design a PDA for itL = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
30
PDA – Example 1 - Idea
• Idea: The input has two sections– First half
• All ‘0’s
– Second half• All ‘1’s• #‘1 depends on #‘0’
– #‘0’ ≤ #‘1’ ≤ #‘0’ × 2
31
PDA – Example 1 – Solution
• Solution:
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
32
PDA – Example 1 – Explain
• Solution:
• Let’s try some string… w = 00111– See white board for simulation…
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
33
PDA – Example 1 – Explain
• Solution:
• Indicates the start of parsing
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
34
PDA – Example 1 – Explain
• Solution:
• This part saves information about #‘0’• # ‘X’ in stack = #‘0’
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
35
PDA – Example 1 – Explain
• Solution:
• This part accounts for #‘1’– #‘0’ ≤ #‘1’ ≤ #‘0’ × 2
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
36
PDA – Example 1 – Explain
• Solution:
• Consume one ‘X’ and eats one ‘1’
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
37
PDA – Example 1 – Explain
• Solution:
• Consume one ‘X’ and eats two ‘1’
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
38
PDA – Example 1 – Explain
• Solution:
• Consume one ‘X’, and then– eats one ‘1’, or– eat two ‘1’
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
39
PDA – Example 1 – Explain
• Solution:
• Indicates the end of parsing
q0
e,e/$
0,e/Xe,e/e q1
q2
e,$/e
1,X/e
1,X/X 1,X/eq3
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1}
40
PDA – Example 2
• Given the following language:
• Design a PDA for it
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d}
41
PDA – Example 2 – Idea
• Idea:– Sequentially read (multiple) ‘a’, ‘b’, ‘c’ and ‘d’– Maintain:
• #‘a’ + #‘c’• #‘b’ + #‘d’
– If these numbers equal• Accept
42
PDA – Example 2 – Solution
• Solution:
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d}
43
e,e/$ q5q1
a,e/X
e,e/e
b,$/$Y
q2e,e/e
c,X/XX
q3e,e/e q4
e, $ /e
b,X/e
b,Y/YY
c,$/$X
c,Y/e
d,X/e
d,$/$Y
d,Y/YY
PDA – Example 2 – Explain
• Solution:
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d}
a b c d endstart
44
e,e/$ q5q1
a,e/X
e,e/e
b,$/$Y
q2e,e/e
c,X/XX
q3e,e/e q4
e, $ /e
b,X/e
b,Y/YY
c,$/$X
c,Y/e
d,X/e
d,$/$Y
d,Y/YY
PDA – Example 2 – Explain
• Solution:
• Each X in stack = An extra a or c
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d}
45
e,e/$ q5q1
a,e/X
e,e/e
b,$/$Y
q2e,e/e
c,X/XX
q3e,e/e q4
e, $ /e
b,X/e
b,Y/YY
c,$/$X
c,Y/e
d,X/e
d,$/$Y
d,Y/YY
PDA – Example 2 – Explain
• Solution:
• Each Y in stack = An extra b or d
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d}
46
e,e/$ q5q1
a,e/X
e,e/e
b,$/$Y
q2e,e/e
c,X/XX
q3e,e/e q4
e, $ /e
b,X/e
b,Y/YY
c,$/$X
c,Y/e
d,X/e
d,$/$Y
d,Y/YY
PDA – Example 2 – Explain
• Solution:
• X and Y ‘cancel’ each other• The stack contains only X’s or only Y’s
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d}
47
e,e/$ q5q1
a,e/X
e,e/e
b,$/$Y
q2e,e/e
c,X/XX
q3e,e/e q4
e, $ /e
b,X/e
b,Y/YY
c,$/$X
c,Y/e
d,X/e
d,$/$Y
d,Y/YY
PDA – Example 2 – Explain
• Solution:
• No X’s and no Y’s means– #a + #c = #b + #d Accept
e,e/$ q5q1
a,e/X
e,e/e
b,$/$Y
q2e,e/e
c,X/XX
q3e,e/e q4
e, $ /e
b,X/e
b,Y/YY
c,$/$X
c,Y/e
d,X/e
d,$/$Y
d,Y/YY
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d}
48