7
1 Appendix F. CYK Algorithm for the Membership Test for CFL The membership test problem for context-free languages is, for a given arbitrary CFG G, to decide whether a string w is in the language L(G) or not. If it is, the problem commonly requires a sequence of rules applied to derive w. A brute force technique is to generate all possible parse trees yielding a string of length |w|, and check if there is any tree yielding w. This approach takes too much time to be practical. Here we will present the well-known CYK algorithm (for Cocke, Younger and Kasami, who first developed it). This algorithm, which takes O(n 3 ) time, is based on the dynamic programming technique. The algorithm assumes that the given CFG is in the Chomsky normal form (CNF). Let w = a 1 a 2 . . . . a n , w ij = a i a i+1 . . . a j and w ii = a i . Let V ij be the set of nonterminal symbols that can derive the string w ij , i.e., * V ij = { A | A w ij , A is a nonterminal symbol of G}

Appendix F. CYK Algorithm

Embed Size (px)

Citation preview

Page 1: Appendix F. CYK Algorithm

1

Appendix F. CYK Algorithm for the Membership Test for CFL

The membership test problem for context-free languages is, for a given arbitrary CFG G, to decide whether a string w is in the language L(G) or not. If it is, the problem commonly requires a sequence of rules applied to derive w. A brute force technique is to generate all possible parse trees yielding a string of length |w|, and check if there is any tree yielding w. This approach takes too much time to be practical.

Here we will present the well-known CYK algorithm (for Cocke, Younger and Kasami, who first developed it). This algorithm, which takes O(n3) time, is based on the dynamic programming technique. The algorithm assumes that the given CFG is in the Chomsky normal form (CNF).

Let w = a1a2 . . . . an, wij = aiai+1 . . . aj and wii = ai . Let Vij be the set of nonterminal symbols that can derive the string wij , i.e., *Vij = { A | A wij , A is a nonterminal symbol of G}

Page 2: Appendix F. CYK Algorithm

2

Vij

ai aj. . . . .wij =

V11 V22 V33 V44 V55

a1 a3

V66

a2 a4 a5 a6

V12 V23 V34 V45 V56

V13 V24 V35 V46

V14 V25 V36

V15 V26

V16

w =j

i

Construct an upper triangular matrix whose entries are Vij as shown below. In the matrix, j

corresponds to the position of input symbol, and i corresponds to the diagonal number.

Clearly, by definition if

S V16 , then string

w L(G).

CYK Algorithm

Page 3: Appendix F. CYK Algorithm

3

The entries Vij can be computed with the entries in the i-th diagonal and those in the j-th column, going along the direction indicated by the two arrows in the following figure. If A Vii (which implies A can derive ai ), B V(i+1)j (implying B can derive ai+1. . . aj ) and C AB, then put C in the set Vij . If D Vi(i+1) (which implies D can derive aiai+1), E V(i+2)j (implying E can derive ai+2. . . aj ) and F DE, then put F in the set Vij , and so on.

Vii Vjj

Vi(i+1)

V(i+2)j

V(i+1)j

Vij

Vi(j-1)

CYK Algorithm

ANI

. . . . .ai ai+1 ajwij = ai+2

Page 4: Appendix F. CYK Algorithm

4

V11 V22 V33 V44 V55

a1 a3

V66

a2 a4 a5 a6

V12 V23 V34 V45 V56

V13 V24 V35 V46

V14 V25 V36

V15 V26

V16

w =

For example, the set V25 is computed as follows.

Let A, B and C be nonterminals of G.

V25 = { A | B V22 , C V35 , and A BC }

{ B | C V23 , A V45 , and B CA }

{ C | B V24 , A V55 , and C BA }

. . . . .

(Recall that G is in CNF.)

CYK Algorithm

ANI

Page 5: Appendix F. CYK Algorithm

5

In general, Vij = { A | B Vik , C V(k+1) j and A BC } i k j-1

. . . . .

Vii

ai

Vjj

ai+1 aj

Vi(i+1)

V(i+2)j

V(i+1)j

Vij

wij =

Vi(j-1)

CYK Algorithm

Page 6: Appendix F. CYK Algorithm

6

{A, D}

a aa a b bw =

{A,D} {A,D} {A,D} {B} {B}

{D}D AD

{D}D AD

{D}D AD

{S,C}S ABC DB

{ }

{D} {D}{S,C}S ACC DB

{B}B SB

{D} {S,C}{S,B,C}

SAB,CDBB SB

{S,C}{S,B,C}

SAC,CDBB SB

{S,B,C}SAB,S ACCDB, BSB

S aSb | aDb

D aD | a

S AB | AC A a B SB B b

C DB D AD | a

CFG G

CNF CFG

CYK AlgorithmExample:

Since S V16 , we have w L(G).

Page 7: Appendix F. CYK Algorithm

7

CYK Algorithm

Here is a pseudo code for the algorithm.

//initially all sets Vij are empty

// Input x = a1a2 . . . . an.

for ( i = 1; i <= n; i ++ )

Vii = { A | A ai };

for ( j = 2; j <= n; j++ )

for ( i = j-1; i =1; i-- )

for ( k = i; k <= j-1; k++)

vij = vij { A | B Vik , C V(k+1) j and A BC };

if ( S Vin ) output “yes”; else output “no”;

V11 V22 V33 V44 V55

a1 a3

V66

a2 a4 a5 a6

V12 V23 V34 V45 V56

V13 V24 V35 V46

V14 V25 V36

V15 V26

V16

w =

The number of sets Vij is O(n2), and it takes O(n) steps to compute each vij. Thus the time complexity of the algorithm is O(n3).