6

Click here to load reader

DNA Screening, Pooling Design and Simplicial Complex

Embed Size (px)

Citation preview

Page 1: DNA Screening, Pooling Design and Simplicial Complex

Journal of Combinatorial Optimization, 7, 389–394, 2003c© 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

DNA Screening, Pooling Designand Simplicial Complex

HAESUN PARK∗ [email protected] of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA

WEILI WU† [email protected] LIU† [email protected] of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA

XIAOYU WU [email protected] G. ZHAO [email protected] of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA

Received September 20, 2003; Revised October 10, 2003; Accepted October 25, 2003

Abstract. Pooling designs are used for DNA library screening. In this paper, we present a construction of poolingdesign with simplicial complex and establish some general results on the construction.

Keywords: DNA screening, pooling design, clones, simplicial complex

1. Introduction

A clone is a DNA segment. A DNA library is a set of clones. The job of DNA screeningis to identify each clone, in a DNA library, whether it contains a probe from a given set ofprobes. This job can be done through a collection of tests, called a pooling design. Eachtest is on a subset of clones, called a pool. A clone is positive if it contains a given probe;otherwise, it is negative. A pool is positive if it contains a positive clone; otherwise, it isnegative. Each test determines whether a pool is positive or negative.

For practical reason, the pooling design is usually nonadaptive and hence can be rep-resented by a binary matrix with rows indexed by pools and columns indexed by clones.The cell (i, j) contains 1-entry if and only if the i th pool contains the j th clone. Its columncan also be corresponding to a subset of pools containing a certain clone. Therefore, wemay talk about the union of columns for the Boolean-sum of columns. A binary matrix isd-disjunct if each column is not contained by the union of every d other columns. Everyd-disjunct t × n binary matrix can be used to identify, with t tests, a DNA library withtotally n clones and at most d positive clones (see Du and Hwang (1999) for detail).

∗Support in part by NSF under grants CCR-0204109 and ACI-0305543.†Corresponding authors. Support in part by NSF under grant ACI-0305567.

Page 2: DNA Screening, Pooling Design and Simplicial Complex

390 PARK ET AL.

A d-disjunct matrix is (d, e)-disjunct if each column has at least e+1 1-entries at differentrows from the union of every d other collumns.

There are many pooling designs existing in the literature. For a survey, the reader maysee (Du and Hwang, 1999; Balding et al., 1996; Ngo and Du, 2000). In this paper, we areinterested in one of them, called containment design initiated by Macula (1996, 1999). Ngoand Du (2002) generalized Macula’s idea with matchings in a graph and linear space over afinite field. In this paper, we present a general setting in simplicial complex for containmentdesigns and results on d-disjunctness and (d, e)-disjunctness, which will provides a tool toconstruct pooling designs with graph properties.

2. Preliminaries

A simplicial complex � is a family of subsets of a base set X , satisfying the condition thatif A belongs to �, then every subset of A also belongs to �. Every element in X is calleda vertex. Every subset in � is called a face and furthermore called a k-face if it contains kvertices. For example, every vertex is a 1-face.

The monotone graph property is an important family of examples. Any graph propertycan be described by the set of all graphs having the property. Please note that throughoutthis paper, the terminology “subgraph” only applies to those graphs with the same vertexset as the original graph and a subset of edges. A k-subgraph is a subgraph with k edges.

A graph property is monotone increasing if a graph containing a subgraph with theproperty must have the property. A graph property is monotone decreasing if every subgraphof any graph with the property also has the property. A graph property is monotone if it iseither monotone increasing or monotone decreasing.

Let G be a graph and P a monotone graph property. Define

�GP = {E(H ) | H is a subgraph of G, satisfying P}

if P is decreasing, and

�GP = {E(H ) | H is a subgraph of G, not satisfying P}

if P is increasing. Clearly, �GP is a simplicial complex when the edge set of G is considered

as the base set.

3. Main results

We first extend Macula’s construction to simplicial complex.Assume k > d ≥ 1. Let � be a simplicial complex with d-faces A1, . . . , At and k-faces

B1, . . . , Bn . Define matrix M(�, d, k) = (ai j ) by setting

ai j ={

1 if Ai ⊂ B j

0 otherwise

Theorem 1. Let 1 ≤ d < k. Then M(�, d, k) is a d-disjunct matrix.

Page 3: DNA Screening, Pooling Design and Simplicial Complex

DNA SCREENING, POOLING DESIGN AND SIMPLICIAL COMPLEX 391

Proof: Consider any set of d +1 distinct columns C0, C1, . . . , Cd corresponding, respec-tively, d + 1 distinct k-faces B j0 , B j1 , . . . , B jd . Choose a1 ∈ B j0 \ B j1 , . . . , ad ∈ B j0 \ B jd .Set I = {a1, . . . , ad}. If |I | = d , then I is a d-face in B j0 , but not in B j1 , . . . , B jd . If |I | < d,then choose d − |I | elements from B j0 − I , together with elements in I to form a d-faceI ′ ⊃ I . Clearly, I ′ is in B j0 , but not in B j1 , . . . , B jd . This means that in either case, we canfind a d-face such that at the row corresponding it, column C0 has 1-entry and all columnsC1, . . . , Cd have 0-entry. Therefore, C0 cannot be contained in the union of C1, . . . , Cd .Hence, M(�, d, k) is d-disjunct.

Next, we generalize a result of Ngo and Du (2002) on graphs. To do so, we first showsome lemmas.

The following lemma can be found in Ngo and Du (2002). However, for convenience ofthe reader, we give a different proof which will also be used to show other lemmas.

Lemma 2. Suppose a graph G has at most d edges and more than d vertices. Then G hasat least d + 1 vertex covers of size d.

Proof: Let U be the set of vertices each with degree at least one. If |U | ≤ d + 1, thenadd d + 1 − |U | isolated vertices into U to form a set U ′ of d + 1 vertices. For any v ∈ U ′,U ′ − {v} is a vertex cover of size d. Therefore, d + 1 vertex covers of size d is found.

Now, consider |U | ≥ d + 2. Suppose G has c nonempty connected components C1, . . . ,

Cc. Here, a connected component is said to be nonempty if it contains at least one edge.Thus, a nonempty connected component has at least two vertices. Note that a subgraph with|U | vertices and at most d edge has at least |U | − d connected components. Therefore,c ≥ |U | − d = k. Let V (Ci ) denote the vertex set of Ci . For any 1 ≤ i1 < · · · < ik ≤ c andany vi1 ∈ V (Ci1 ), . . . , vik ∈ V (Cik ), U −{vi1 , . . . , vik } is a vertex-cover of size d. Therefore,the number of vertex covers of size d for G is at least

∑1≤i1<···<ik≤c

|V (Ci1 )| . . . |V (Cik )| ≥c∑

i=1

|V (Ci )| = |U | ≥ d + 2.

Here, we use an inequality as follows: Suppose xi ≥ 2 for i = 1, . . . , c and k ≥ 1. Then∑

1≤i1<...<ik≤c

xi1 . . . xik ≥c∑

i=1

xi .

This inequality can be proved easily by induction on k. For k = 1, it is trivial. Considerk ≥ 2. Then, we have

∑1≤i1<...<ik≤c

xi1 . . . xik ≥ xc

∑1≤i1<...<ik−1≤c−1

xi1 . . . xik−1

≥ xc

c−1∑i=1

xi

≥c∑

i=1

xi since xi ≥ 2.

Page 4: DNA Screening, Pooling Design and Simplicial Complex

392 PARK ET AL.

Lemma 3. Let d ≥ 2 and k ≥ d + 2. Suppose a graph G has d edges and k vertices.Then G has at least d + 2 vertex covers of size d.

Proof: Let U be the set of vertices each with degree at least one. If |U | ≤ d, then addd − |U | isolated vertices into U to form a vertex cover U ′ of size d. Exchanging any vertexin U ′ with any vertex not in U ′ results still in a vertex cover of size d. Therefore, thereexist at least 1 + d(k − d) vertex covers of size d. Since k − d ≥ 2 and d ≥ 2, we have1 + d(k − d) ≥ 1 + k > d + 2.

If |U | = d + 1, then all d edges form a tree T . Since d ≥ 2, T must have a vertex v

with degree at least two. Let u and w be two vertices adjacent to v. Then U − {u, w} is avertex cover of size d − 1 for G, which by putting any isolated vertex, would form a vertexcover of size d . This vertex cover is different from those d + 1 vertex covers U − {x} forany x ∈ U . Therefore, k (≥d + 2) vertex covers of size d are found.

If |U | ≥ d + 2, then the argument is similar to the second part in the proof of Lemma 2.

Lemma 4. Let 2 ≤ d < k ≤ 2d . Suppose a graph G has d edges and k vertices. Then Ghas at least k vertex covers of size d.

Proof: Let U be the set of vertices each with degree at least one. If |U | ≥ d + 1, then wecan find k vertex covers of size d by the same argument as that in the proof of Lemma 3.

Now, we consider |U | ≥ d + 2. If G has no connected component containing more thanone edge, then G has 2d vertex covers of size d. Since 2d ≥ k, this case is done.

If G has a connected component containing two or more edges, then it must contain avertex v with degree at least two. Let u and w be two distinct vertices adjacent to v. Notethat a subgraph with |U | vertices and d edges has at least |U | − d connected components.Therefore, G has at least |U | − d nonempty connected components. Let C1, . . . , C|U |−d−1

be |U |− d − 1 distinct nonempty connected components not containing v. Choose a vertexvi from each Ci for i = 1, . . . , |U | − d − 1. Set W = U − {u, w, v1, . . . , v|U |−d−1} andV = {v0, v1, . . . , v|V |−d−1} for v0 in the connected component containing v. Then W is avertex cover of size d − 1 and V is a vertex cover of size d. By the same argument as thatin the proof of Lemma 2, we know that there are at least |U | vertex covers of size d in theform of V . Moreover, W together with any vertex not in U would form other k −|U | vertexcovers of size d .

The following theorem is established from the above three lemmas.

Theorem 5. Let � is a simplicial complex satisfying condition that for any two distinctk-faces B and B ′, |B \ B ′| ≥ 2. Then the following holds:(a) If 1 ≤ d < k, then d-disjunct matrix M(�, d, k) is (d, d)-disjunct.(b) If 2 ≤ d ≤ k − 2, then d-disjunct matrix M(�, d, k) is (d, d + 1)-disjunct.(c) If 2 ≤ d and d + 2 ≤ k ≤ 2d , then d-disjunct matrix M(�, d, k) is (d, k − 1)-disjunct.

Proof: (a) It suffices to show that for any column C0 and other d distinct columnsC1, . . . , Cd , there exist d + 1 rows such that at each of these d + 1 rows, C0 has 1-entry

Page 5: DNA Screening, Pooling Design and Simplicial Complex

DNA SCREENING, POOLING DESIGN AND SIMPLICIAL COMPLEX 393

and all C1, . . . , Cd have 0-entry. To this end, one needs to show that for any d + 1 distinctk-faces B0, B1, . . . , Bd , there exist d + 1 distinct d-faces in B0, but not in B1, . . . , Bd . Todo so, one chooses two distinct elements ui and vi from each B0 \ Bi for i = 1, . . . , d.Construct a graph G with vertex set B0 and edges (ui , vi ) for i = 1, . . . , d. By Lemma 2,G has at least d + 1 vertex covers of size d. Each of those d + 1 vertex covers is actually ad-face in B0, but not in B1, . . . , Bd .

For (b) and (c), the argument is similar to (a) except that Lemma 2 should be replacedby Lemmas 3 and 4, respectively, and also in the case that (ui , vi ) for i = 1, . . . , d are notdistinct, we add some edges to make the number of edges being exactly d.

The following are directly followed from Theorems 1 and 5.

Corollary 6. Let P be a monotone decreasing graph property and G a graph with medges. If 1 ≤ d < k ≤ m, then M(�G

P , d, k) is a d-disjunct matrix.

Corollary 7. Let P be a monotone decreasing graph property and G a graph with medges. Suppose that for any two distinct k-subgraphs A and A′ of G, in P, A contains atleast two edges not in A′.(a) If 1 ≤ d < k ≤ m, then d-disjunct matrix M(�G

P , d, k) is (d, d)-disjunct.(b) If 2 ≤ d and d +2 ≤ k ≤ m, then d-disjunct matrix M(�G

P , d, k) is (d, d +1)-disjunct.(c) If 2 ≤ d and d +2 ≤ k ≤ min(2d , m), then d-disjunct matrix M(�G

P , d, k) is (d, k −1)-disjunct.

For example, all matchings in any graph form a decreasing graph property M. ByCorollaries 6 and 7, we can obtain the following.

Corollary 8. Let K2m be a complete graph on 2m vertices. Then M(�K2mM , d, k) is (d, d)-

disjunct. Moreover, it is (d, d + 1)-disjunct if 2 ≤ d and d + 2 ≤ k ≤ m, and (d, k − 1)-disjunct if 2 ≤ d and d + 2 ≤ k ≤ min(m, 2d ).

The first part of Corollary 8 is already obtained by Ngo and Du (2002).All Hamiltonian cycles and their subgraphs also form a decreasing graph properties H.

Note that any two Hamiltonian cycles are different in at least two edges. By Corollaries 6and 7, we can obtain the following.

Corollary 9. Let Km be a complete graph on m vertices. Then M(�KmH , d, k) is (d, d)-

disjunct. Moreover, it is (d, d + 1)-disjunct for 2 ≤ d and d + 2 ≤ k ≤ m, and (d, k − 1)-disjunct for 2 ≤ d and d + 2 ≤ k ≤ min(m, 2d ).

4. Discussion

Consider a Boolean Algebra lattice. Each set packing design (Du and Hwang, 1999) usesall points at rank 1 to index its rows and sampled points at rank k to index its columns.Our design with a simplicial complex uses points at rank d to index its rows and points at

Page 6: DNA Screening, Pooling Design and Simplicial Complex

394 PARK ET AL.

rank k to index its columns. Could we generalize some set packing designs to simplicialcomplexes? There are many topics for further research in this direction. This paper is justa successful example, which partially answered a question posted by Ngo and Du (2002):“Which conditions must hold to pick up some two levels of the lattice to construct d-disjunctmatrices? To avoid being too vague and for the ease of analysis, we could restraint ourselvesto the lattices with some regularity constraint. An example would be to work on latticeswhere the number of points covered by p depend only on k.”

Recently, Huang and Weng (2002) presented a more general setting with atomic poset,which successfully generaized Theorem 1. However, how to generalize Theorem 5 withatomic poset? It is an open problem.

References

D.J. Balding, W.J. Bruno, E. Knill, and D.C. Torney, “A comparative survey of non-adaptive pooling designs,” inGenetic Mapping and DNA Sequencing. Springer: New York, 1996, pp. 133–154.

D.-Z. Du and F.K. Hwang, Combinatorial Group Testing and Its Applications, World Scientific, 1999.T. Huang and C.-W. Weng, “Pooling spaces and non-adaptive pooling designs,” preprint, 2002.A.J. Macula, “A simple construction of d-disjunct matrices,” Discrete Mathematics, vol. 162, pp. 311–312, 1996.A.J. Macula, “Probabilistic nonadaptive group testing in the presence of errors and DNA library screening,” Annals

of Combinatorics, vol. 2, pp. 61–69, 1999.H.Q. Ngo and D.-Z. Du, “New constructions of non-adaptive and error-tolerance pooling designs,” Discrete

Mathematics, vol. 243, pp.161–170, 2002.H.Q. Ngo and D.-Z. Du, “A Survey on combinatorial group testing algorithms with applications to DNA li-

brary screening,” in Discrete Mathematical Problems with Medical Applications, DIMACS Ser. Discrete Math.Theoret. Comput. Sci., Amer. Math. Soc., Providence, RI, 2000, vol. 55, pp. 171–182.