15
Probabilistic Strategies for the Partition and Plurality Problems Zdenˇ ek Dvoˇ rák, 1 Vít Jelínek, 1 Daniel Král’, 1 Jan Kynˇ cl, 1 Michael Saks 2, 1 Department of Applied Mathematics and Institute for Theoretical Computer Science (ITI),* Charles University, Malostranské námˇ estí 25, 118 00 Prague, Czech Republic; e-mail: {rakdver,jelinek,kral,kyncl}@kam.mff.cuni.cz 2 Department of Mathematics, Rutgers, State University of New Jersey, 110 Frelinghuysen Road, Piscataway, New Jersey 08854-8019; e-mail: [email protected] Received 29 September 2005; accepted 22 April 2006 Published online 1 December 2006 in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/rsa.20148 ABSTRACT: We consider a game played by two players, Paul and Carol. Carol fixes a coloring of n balls with three colors. At each step, Paul chooses a pair of balls and asks Carol whether the balls have the same color. Carol truthfully answers yes or no. In the Plurality problem, Paul wants to find a ball with the most common color. In the Partition problem, Paul wants to partition the balls according to their colors. Paul’s goal is to ask Carol the fewest number of questions to reach his goal. We find optimal probabilistic strategies for the Partition problem and the Plurality problem in the considered setting. © 2006 Wiley Periodicals, Inc. Random Struct. Alg., 30, 63–77, 2007 1. INTRODUCTION In this paper, we consider a two-player game in which one of the players, Paul, wants to determine a certain property of the input based on the opponent’s answers. The opponent, Correspondence to: D. Král’ The REU program, during which the research was conducted, is supported by Grant KONTAKT ME 521. *Institute for Theoretical Computer Science is supported by Ministry of Education of Czech Republic as Projects LN00A056 and 1M0021620808. †Research supported in part by NSF Grant CCR-9988526. © 2006 Wiley Periodicals, Inc. 63

Probabilistic strategies for the partition and plurality problems

Embed Size (px)

Citation preview

Page 1: Probabilistic strategies for the partition and plurality problems

Probabilistic Strategies for the Partitionand Plurality Problems

Zdenek Dvorák,1 Vít Jelínek,1 Daniel Král’,1 Jan Kyncl,1 Michael Saks2,†1Department of Applied Mathematics and Institute for Theoretical Computer

Science (ITI),* Charles University, Malostranské námestí 25, 118 00 Prague,Czech Republic; e-mail: {rakdver,jelinek,kral,kyncl}@kam.mff.cuni.cz

2Department of Mathematics, Rutgers, State University of New Jersey,110 Frelinghuysen Road, Piscataway, New Jersey 08854-8019;e-mail: [email protected]

Received 29 September 2005; accepted 22 April 2006Published online 1 December 2006 in Wiley InterScience (www.interscience.wiley.com).DOI 10.1002/rsa.20148

ABSTRACT: We consider a game played by two players, Paul and Carol. Carol fixes a coloring ofn balls with three colors. At each step, Paul chooses a pair of balls and asks Carol whether the ballshave the same color. Carol truthfully answers yes or no. In the Plurality problem, Paul wants to find aball with the most common color. In the Partition problem, Paul wants to partition the balls accordingto their colors. Paul’s goal is to ask Carol the fewest number of questions to reach his goal. We findoptimal probabilistic strategies for the Partition problem and the Plurality problem in the consideredsetting. © 2006 Wiley Periodicals, Inc. Random Struct. Alg., 30, 63–77, 2007

1. INTRODUCTION

In this paper, we consider a two-player game in which one of the players, Paul, wants todetermine a certain property of the input based on the opponent’s answers. The opponent,

Correspondence to: D. Král’The REU program, during which the research was conducted, is supported by Grant KONTAKT ME 521.*Institute for Theoretical Computer Science is supported by Ministry of Education of Czech Republic as ProjectsLN00A056 and 1M0021620808.†Research supported in part by NSF Grant CCR-9988526.© 2006 Wiley Periodicals, Inc.

63

Page 2: Probabilistic strategies for the partition and plurality problems

64 DVORÁK ET AL.

Carol, fixes a coloring of n balls by k colors. Paul does not know the coloring of the balls.At each step, he chooses two balls and asks Carol whether they have the same color. Caroltruthfully answers YES or NO. Paul wants to ask the fewest number of questions in the worstcase to determine the desired property of the coloring.

One of the problems of this kind is the Majority problem, in which Paul wants to finda ball b such that the number of balls colored with the same color as b is greater thann/2 or to declare that there is no such ball. Saks and Werman [14], and later Alonso,Reingold, and Schott [4], showed that n − ν(n) questions are necessary and sufficientfor Paul to resolve the Majority problem with n balls of two colors, where ν(n) is thenumber of 1’s in the binary representation of n. In the average [5] and the probabilisticcase [10], the number of questions drops down to 2n/3 + o(n). If the number of colorsof the balls is unrestricted, Fisher and Salzberg [9] showed that �3n/2� − 2 questions arenecessary and sufficient to solve the Majority problem. A linear bound on the number ofquestions in the non-adaptive setting (when Paul’s questions are fixed in advance) wasestablished by Chung et al. [6]. Other variants of the Majority problem were studied, e.g.,in [1, 11].

A similar problem is the Plurality problem, which was introduced by Aigner, De Marco,and Montangero [2]. In the Plurality problem, Paul seeks a ball such that the number ofballs with the same color exceeds the number of balls of any other color (or he finds out thatthere is a tie between two or more different colors). Aigner, De Marco, and Montangero [2]found a deterministic strategy to solve the Plurality problem with n balls of three colorssuch that Paul asks at most � 5n

3 � − 2 questions. On the other hand, Carol can force Paul toask at least 3� n

2� − 2 questions. In addition, if the number k of colors of the balls containedin the input is not fixed to be three, Aigner, De Marco, and Montangero [3] provide lowerand upper bounds of order �(kn). In the non-adaptive setting, Chung et al. [6] obtain lowerand upper bounds, of order �(n2). A problem similar to the Plurality problem was alsostudied by Srivastava [15].

In addition to the Plurality problem, we also study Partition problem in which Paulwants to partition the balls according to their colors. We focus on the case when the ballsare colored with three colors and present optimal probabilistic strategies for the Pluralityproblem and the Partition problem. In the probabilistic setting, Paul may flip coins anduse the outcome of the flips to choose his questions. The quality of such a strategy ismeasured as the maximum of the expected number of Paul’s questions, i.e., we assumethat Carol knows the strategy and chooses the worst coloring in advance. Contrary to thedeterministic strategies, Carol cannot choose the coloring on-line based on Paul’s answersto the questions.

In the case of the Partition problem, we present a strategy such that Paul asks 53 n − 8

3questions on average for the worst-case coloring of n balls with three colors and show that forany probabilistic strategy there exists a coloring such that Paul asks at least 5

3 n − 83 − o(1)

questions on average. Let us remark that in the deterministic setting, the necessary andsufficient number of questions to resolve this problem is 2n − 3 [8]. In the case of thePlurality problem, we find a strategy where the expected number of Paul’s questions for theworst-case coloring (with three colors) does not exceed 3n/2 + O(1) and provide a lowerbound of 3n/2 − O(

√n log n). In the deterministic case, the best known lower bound [2]

is 3� n2� − 2 and the best known strategy guarantees that the number of questions does not

exceed � 5n3 � − 2. In particular, it might be the case that the numbers of questions needed

in the deterministic and the probabilistic settings for resolving the Plurality problem withballs of three colors are the same.

Random Structures and Algorithms DOI 10.1002/rsa

Page 3: Probabilistic strategies for the partition and plurality problems

PARTITION AND PLURALITY PROBLEMS 65

2. NOTATION

In this section, we introduce a compact way of representing a deterministic strategy for Pauland the state of his information about the colors of the balls as the strategy proceeds.

The game of Paul and Carol can be viewed as a game on a graph whose vertices are theballs. Initially the graph is empty. At each turn, Paul chooses a pair of nonadjacent verticesand adds that edge to the graph. Carol then colors the edge red if the two vertices havethe same color or blue if the two vertices have a different color. This edge-colored graphrepresents the state of Paul’s knowledge and is referred to as Paul’s graph. In the Partitionproblem with k colors, the game ends when Paul’s graph is uniquely vertex k-colorable(up to a permutation of the colors) with the constraints that the end vertices of red edgesget the same color and the end vertices of blue edges get different colors. In the Pluralityproblem with k colors, the game ends when there is a vertex v with the property that inevery vertex k-coloring, v belongs to the largest color class.

A deterministic strategy for Paul can be represented by a rooted binary tree in whichthe left edge from each internal vertex is colored red and the right edge blue. The rootis associated with Paul’s first question. The left subtree represents Paul’s strategy for thecase when Carol answers that the colors of the balls are the same and the right one forthe case when she answers that they are different. At each node, Paul’s information canbe represented by a graph as described above. For a given coloring of the balls, there is aunique path from the root to a leaf in the tree. This path in the tree is called the computationpath.

3. YAO’S PRINCIPLE

Yao [16] proposed a technique for proving lower bounds on probabilistic algorithms that isbased on the minimax principle from game theory. Informally, to prove such a lower bound,instead of constructing a hard coloring of the balls for every probabilistic algorithm, it isenough to find a probability distribution on colorings that is hard for every deterministicalgorithm. Yao’s technique applies to our setting, too. We formulate the principle formallyusing our notation as a proposition. Since the proof follows the same line as in the originalsetting, we do not include it and refer the reader, e.g., to [13, Subsection 2.2.2] if necessary.

Proposition 1. Consider the Plurality or Partition problem with n balls of k colors.Assume that there exists a probability distribution on colorings of the balls such that theexpected number of Paul’s questions is at least K for each deterministic strategy. Then,for each probabilistic strategy, there exists a coloring I of the balls such that the expectednumber of Paul’s questions for the coloring I is at least K.

4. STRATEGY FOR THE PLURALITY PROBLEM

We are now ready to present the first of our results, an asymptotically optimal probabilisticstrategy for the Plurality problem:

Theorem 2. There is a probabilistic strategy for the Plurality problem with n balls ofthree colors such that the expected number of Paul’s questions does not exceed 3

2 n + O(1)

for any coloring of the balls.

Random Structures and Algorithms DOI 10.1002/rsa

Page 4: Probabilistic strategies for the partition and plurality problems

66 DVORÁK ET AL.

Proof. Fix a coloring of the balls and choose any subset B0 of 3n′ balls from the input,where n′ = � n

3�. Partition randomly the set B0 into n′ ordered triples (ai, bi, ci), 1 ≤ i ≤ n′.For each i, 1 ≤ i ≤ n′, Paul asks Carol whether the balls ai and bi have the same color andwhether the balls bi and ci have the same color. If Carol answers in both cases that the ballshave different colors, Paul asks, in addition, whether the colors of balls ai and ci are the same.

Based on Carol’s answers, Paul is able to classify the triples into three types:

Type A: All three balls of the triple have the same color. This is the case when Carolanswers both the initial questions positively.

Type B: Two balls of the triple have the same color, but the remaining one has a dif-ferent color. This is the case when Carol answers one of the initial questionspositively and the other one negatively or both the initial questions negativelyand the additional question positively.

Type C: All three balls have different colors. This is the case when Carol answers theinitial questions and the additional question negatively.

Paul now chooses randomly and independently a representative ball from each triple oftype A or B. Let B be the set of (at most n′) chosen balls. In addition, he chooses randomlya ball d from B0.

For each ball from the set B, Paul asks Carol whether its color is the same as the colorof d. Let B′ ⊆ B be the set of the balls whose colors are different. Paul chooses arbitrarilya ball d ′ ∈ B′ and compares ball d ′ with the remaining balls of B′. Paul is able to determinethe partition of the balls of B according to their colors: the balls of B\B′, the balls of B′ thathave the same color as ball d ′, and the balls of B′ whose color is different from the colorof d ′.

Finally, Paul determines the partition of all the balls from the triples of type A or B. Theballs contained in a triple of type A have the same color as its representative. In the case ofa triple of type B, Paul asks Carol whether the colors of balls d1 and d2 are the same, whered1 is a ball of the triple whose color is different from the color of the representative and d2

is a ball of B whose color is different from the color of the representative. In this way, Paulobtains the partition of all the balls from triples of type A or B according to their colors.

After at most 2(n − 3n′) ≤ 4 additional questions, Paul knows the partition of the ballsof B according to their colors, where B is the set of all n balls except for the balls of B0,which are contained in triples of type C. Since each triple of type C contains one ball ofeach of the three colors, the plurality color of the balls of B is also the plurality color of allthe balls. If there is no plurality color in B, then there is no plurality color in the originalproblem either.

Before we formally analyze the described strategy, we explain some ideas behind it. Letα, β, and γ be the fractions of the balls of each of the colors among the balls of B0. If theratios α, β, and γ are close to 1/3, then a lot of the balls belong to the triples of type C.Clearly, such balls can be removed from the problem and we solve the Plurality problem forthe remaining balls (this reduces the size of the problem). However, if the ratios α, β, and γ

are unbalanced, then this approach fails to work. But in this case, with high probability, theball d has the same color as a lot of the balls of B and Paul does not need to compare toomany balls of B with the ball d ′.

We are now ready to start estimating the expected number of Paul’s questions. Theexpected numbers of triples of each type are the following:

•(α3 + β3 + γ 3 + O

(1n′))

n′ triples of type A,

Random Structures and Algorithms DOI 10.1002/rsa

Page 5: Probabilistic strategies for the partition and plurality problems

PARTITION AND PLURALITY PROBLEMS 67

•(3(α2β + α2γ + β2α + β2γ + γ 2α + γ 2β

) + O(

1n′))

n′ triples of type B, and•

(6αβγ + O

(1n′))

n′ triples of type C.

The expected numbers of Paul’s questions to determine the type of the triple are 2, 73 , and 3

for types A, B, and C, respectively. Therefore, the expected number of Paul’s questions todetermine the types of all the triples is

[2(α3 + β3 + γ 3) + 7(α2β + α2γ + β2α + β2γ + γ 2α + γ 2β) + 18αβγ

]n′ + O(1).

(1)

Next, we compute the expected number of Paul’s questions to determine the partition ofthe balls of B. Fix a single ball b0 out of the 3n′ balls. Assume that the color of b0 is the colorwith the fraction α. The probability that the ball b0 is in a triple of type C is 2βγ + O( 1

n′ )and, otherwise, it is chosen to B with probability 1/3. Hence, ball b0 is contained in theset B with probability 1

3 − 23βγ + O

(1n′). If b0 ∈ B and the colors of balls b0 and d are

the same, Paul asks Carol a single question; if b0 ∈ B, the colors of b0 and d are different,and if b0 = d ′, he asks two questions. The former is the case with probability α. Hence, ifb0 ∈ B, Paul asks at most 2 − α questions on average. We may conclude that in this stage,the expected number of Paul’s questions involving balls of the color with the fraction α

does not exceed

3αn′(

1

3− 2

3βγ + O

(1

n′

))(2 − α).

Hence, the expected number of the questions to determine the partition of B is

3αn′(

1

3− 2

3βγ + O

(1

n′

))(2 − α) + 3βn′

(1

3− 2

3αγ + O

(1

n′

))(2 − β)

+ 3γ n′(

1

3− 2

3αβ + O

(1

n′

))(2 − γ )

= (2 − α2 − β2 − γ 2)n′ − 2αβγ n′(6 − α − β − γ ) + O(1)

= (2 − α2 − β2 − γ 2)n′ − 10αβγ n′ + O(1). (2)

Next, Paul asks Carol a single question for each triple of type B. The expected numberof such questions is

3(α2β + α2γ + β2α + β2γ + γ 2α + γ 2β)n′ + O(1). (3)

Finally, Paul asks at most 2(n − 3n′) ≤ 4 questions to find the partition of B.The expected number of all questions asked by Paul is given by the sum of (1), (2),

and (3), which is equal to the expression

[2 + 2(α3 + β3 + γ 3) + 10(α2β + α2γ + β2α + β2γ + γ 2α + γ 2β)

+ 8αβγ − α2 − β2 − γ 2]n′ + O(1). (4)

It can be verified, (see Proposition 3 following the proof of this theorem) that the maximumof (4) for α, β, γ ∈ [0, 1] with α + β + γ = 1 is attained for α = β = 1/2 and γ = 0

Random Structures and Algorithms DOI 10.1002/rsa

Page 6: Probabilistic strategies for the partition and plurality problems

68 DVORÁK ET AL.

(and the two other symmetric permutations of the values of the variables α, β, and γ ).Therefore, the expected number of Paul’s questions does not exceed

(2 + 2 · 2 + 10 · 2 + 8 · 0

8− 2

4

)n′ + O(1) = 9

2n′ + O(1) = 3

2n + O(1).

We now prove technical Proposition 3 that is referred in the proof of Theorem 2:

Proposition 3. The maximum, which is equal to 92 , of the function

2+2(α3 +β3 +γ 3)+10(α2β +α2γ +β2α +β2γ +γ 2α +γ 2β)+8αβγ −α2 −β2 −γ 2

with the variables α, β, γ ∈ [0, 1] and with the additional constraint α + β + γ = 1 isattained for the following combinations of values of α, β, and γ :

• α = β = 12 and γ = 0,

• α = γ = 12 and β = 0, or

• β = γ = 12 and α = 0.

Proof. First, we apply several substitutions to the function. Since α +β +γ = 1, we havethe following:

(α2β + α2γ ) + (β2α + β2γ ) + (γ 2α + γ 2β) = α2(1 − α) + β2(1 − β) + γ 2(1 − γ )

= α2 + β2 + γ 2 − α3 − β3 − γ 3. (5)

Plug the equality (5) into the examined function:

2 − 8(α3 + β3 + γ 3) + 8αβγ + 9(α2 + β2 + γ 2). (6)

Similarly, we establish the following equality using the constraint α + β + γ = 1 and (5):

1 = (α + β + γ )3

1 = α3 + β3 + γ 3 + 3(α2β + α2γ + β2α + β2γ + γ 2α + γ 2β) + 6αβγ

1 = −2(α3 + β3 + γ 3) + 3(α2 + β2 + γ 2) + 6αβγ

−6αβγ = −1 − 2(α3 + β3 + γ 3) + 3(α2 + β2 + γ 2)

8αβγ = 4

3+ 8

3(α3 + β3 + γ 3) − 4(α2 + β2 + γ 2). (7)

We combine (6) and (7) and obtain that the examined function is equal to

10

3− 16

3(α3 + β3 + γ 3) + 5(α2 + β2 + γ 2) (8)

Let S ⊂ R3 be the set of α, β, γ ∈ [0, 1] satisfying α + β + γ = 1. Clearly, the set S

is an equilateral triangle in the 3-dimensional space that is contained in the plane with thenorm vector equal to (1, 1, 1). The maximum of the function is attained on the boundary of

Random Structures and Algorithms DOI 10.1002/rsa

Page 7: Probabilistic strategies for the partition and plurality problems

PARTITION AND PLURALITY PROBLEMS 69

the set S or in an internal point of S where the gradient of the function is a multiple of thenorm vector of the support plane.

We first examine the function on the boundary of the set S. In such a case, one of thevariables α, β, and γ is equal to zero. We can assume γ = 0 by symmetry. We substituteβ = 1 − α and γ = 0 to (8):

10

3− 16

3(1 − 3α + 3α2) + 5(1 − 2α + 2α2) = 3 + 6α − 6α2. (9)

Since the first derivative of (9) is 6 − 12α and it is equal to zero only for α = 1/2, theextremes of (9) for α ∈ [0, 1] can be attained only for α = 0, α = 1/2, and α = 1. Forthese values of α, the expression (9) is equal to 3, 9/2, and 3, respectively.

We now examine the possibility that the extreme of (8) is attained at an internal pointof S. In such case, the gradient determined by partial derivatives of (8) is a multiple of thevector (1, 1, 1). The partial derivatives of (8) are the following:

∂α

(10

3− 16

3(α3 + β3 + γ 3) + 5(α2 + β2 + γ 2)

)= −16α2 + 10α,

∂β

(10

3− 16

3(α3 + β3 + γ 3) + 5(α2 + β2 + γ 2)

)= −16β2 + 10β, and

∂γ

(10

3− 16

3(α3 + β3 + γ 3) + 5(α2 + β2 + γ 2)

)= −16γ 2 + 10γ .

Since the gradient (−16α2 + 10α, −16β2 + 10β, −16γ 2 + 10γ ) is a multiple of the vector(1, 1, 1) only if all the right-hand sides are equal, it holds that α = β or α = 5

8 − β.Similarly, α = γ or α = 5

8 − γ , and β = γ or β = 58 − γ . Hence, α = β = γ = 1/3,

or two of the variables are equal to 3/8 and the remaining one to 1/4. The value of (8) forα = β = γ = 1/3 is equal to 119

27 < 92 . The value for α = β = 3/8 and γ = 1/4 is equal

to 14132 < 9

2 .We conclude that the maximum of the function from the statement of the proposition is

equal to 92 and it is attained for α = β = 1/2 and γ = 0 and the other two permutation of

the values of the variables α, β, and γ .

5. LOWER BOUND FOR THE PLURALITY PROBLEM

We first survey some basic results from information theory that we use in the proof ofLemma 6. The reader is welcome to see [7] for further details. If X is a random variable takingvalues in the finite set S, the entropy H(X) of X is

∑s∈S ps log2 1/ps where ps is the probability

that X is equal to s ∈ S. In particular, if X is a zero-one random variable such that X = 1with probability p, then the entropy of X is equal to h(p) := p log2

1p + (1 − p) log2

11−p .

We need the following elementary result on entropies of random variables taking values infinite sets:

Proposition 4. Let X1, . . . , Xk be (not necessarily independent) random variables andlet X = (X1, . . . , Xk) be the vector random variable whose entries are equal to random

Random Structures and Algorithms DOI 10.1002/rsa

Page 8: Probabilistic strategies for the partition and plurality problems

70 DVORÁK ET AL.

variables X1, . . . , Xk. The following holds:

H(X) ≤k∑

i=1

H(Xi).

In addition to the previous proposition, we state the following well-known combinatorialestimate on the middle binomial coefficient which can be found, e.g., in [12]:

Proposition 5. Let n be a positive even integer. The following bounds hold:

2n

√2n

≤(

n

n/2

)≤ 2n

√n

.

We are now ready to prove the following lemma on binary trees of bounded depths:

Lemma 6. Let n be an even positive integer, and let T be a rooted binary tree of depth atmost n − 1 with

( nn/2

)/2 leaves. Assume that for each inner node w of T, the edge that leads

to its left child is colored red and the edge that leads to its right child blue. The averagenumber of blue edges on the paths from the root to the leaves in T is at least

n

2− O(

√n log n).

Proof. We first assign each leaf w of T a string σ(w) of R’s and B’s of length n. If v0, . . . , vk

is the path in T from the root to a leaf w = vk , then the ith letter of the sequence σ(w) is Rif the edge vi−1vi is red and B otherwise. If i > k, the ith letter of σ(w) is R. Observe thatthe strings assigned to the leaves of T are mutually distinct. Moreover, the average numberµ of blue edges on the paths from the root to the leaves is equal to the average number ofletters B contained in the strings σ(w).

Choose uniformly at random a leaf w of the tree T and let Xi be the random variableequal to the ith letter of the string σ(w) for 1 ≤ i ≤ n. Let pi be the probability that Xi

is equal to B. Clearly, µ = ∑ni=1 pi. Consider the random variable X equal to the vector

(X1, . . . , Xn). For every string σ(w), X is equal to σ(w) with probability 2( n

n/2

)−1. Hence,

we have the following by Proposition 5:

H(X) = log

(n

n/2

)/2 ≥ log

2n

2√

2n. (10)

On the other hand, we can infer from Proposition 4

H(X) ≤n∑

i=1

h(pi). (11)

We combine inequalities (10) and (11), and using the fact that h(p) is a concave functionon the interval (0, 1), together with the estimate h(1/2 − x) ≤ 1 − x2 for x ∈ (−1/2, 1/2),

Random Structures and Algorithms DOI 10.1002/rsa

Page 9: Probabilistic strategies for the partition and plurality problems

PARTITION AND PLURALITY PROBLEMS 71

we obtain the required bound (recall that µ = ∑ni=1 pi):

log2n

2√

2n≤ H(X) ≤

n∑i=1

h(pi)

n − log 8n

2≤ n · h

n

)

1 − log 8n

2n≤ 1 −

(1

2− µ

n

)2

∣∣∣∣1

2− µ

n

∣∣∣∣ ≤√

log 8n

2n

1

2−

√log 8n

2n≤ µ

nn

2− O(

√n log n) ≤ µ.

The statement of the lemma now follows.

We are now ready to prove the desired lower bound:

Theorem 7. For any probabilistic strategy for the Plurality problem with n balls of threecolors, where n is an even number, there is a coloring of the balls such that Paul asks atleast 3

2 n − O(√

n log n) questions on average.

Proof. By Yao’s principle (Proposition 1), it is enough to find a probability distribution oncolorings of the balls such that if Paul uses any deterministic strategy, then he asks at least32 n −O(

√n log n) questions on average. The desired distribution is the uniform distribution

on colorings in which half of balls have the first color, the other half have the second color,and there are no balls of the third color. Clearly, there is no plurality color for any of thesecolorings. Let I be the set of all

( nn/2

)such colorings.

Fix a deterministic strategy of Paul. Let G be Paul’s final graph for one of the coloringsfrom I. We claim that each of the two subgraphs of G induced by the balls of the first or thesecond color is connected. Otherwise, let V1 and V2 be the sets of the vertices correspondingto the balls of these colors, and assume that G[V1] contains a component with a vertex setW ⊂ V1. Based on G, Paul is unable to determine whether the balls of W have the samecolor as the balls of V1\W , and thus he is unable to decide whether there is a plurality color(which would be the case if the balls of W had the third color). We conclude that Paul’sfinal graph contains at least n − 2 red edges.

Let T be the rooted binary tree corresponding to Paul’s strategy and let Vlf be the setof the leaves of T to which there is a valid computation path for a coloring of I (notethat T contains additional leaves corresponding to colorings not contained in I). Since foreach of the colorings of I the two subgraphs induced by the balls of the first two colors inPaul’s final graph are connected, each leaf of T corresponds to two such colorings (whichdiffer just by a permutation of the first two colors). Therefore, Vlf consists of

( nn/2

)/2 leaves

of T .We now modify T to a tree T ′ and eventually to a tree T ′′. The tree T ′ is a subtree of T

formed by the union of the paths from the root to the leaves contained in Vlf . In particular,T ′ has exactly

( nn/2

)/2 leaves. Color the edges of T ′ red and blue according to whether they

Random Structures and Algorithms DOI 10.1002/rsa

Page 10: Probabilistic strategies for the partition and plurality problems

72 DVORÁK ET AL.

join an inner vertex with its left or right child. For each inner vertex w with a single childin T ′, contract the edge leading from w to its only child. Let T ′′ be the obtained binary treewith

( nn/2

)/2 leaves. The colors of the edges that have not been contracted are preserved.

The tree T ′′ corresponds to a deterministic strategy for distinguishing the colorings fromthe set I. At each inner node w of T ′′, the edge corresponding to Paul’s question at node wjoins two different components of Paul’s graph. If this were not the case, the answer wouldbe uniquely determined by the graph. Consequently, node w would have a single child (thereare no colorings of I consistent with the other answer) and the edge leading from w to itschild should have been contracted.

Since all Paul’s questions correspond to edges between different components, Paul’s finalgraph (for his strategy determined by T ′′) is a forest for each coloring of I. In particular,Paul’s final graph contains at most n − 1 edges. Therefore, the depth of T ′′ does not exceedn − 1. By Lemma 6, the average number of blue edges on the path from the root to a leafof T ′′ is at least n

2 − O(√

n log n). Since the number of blue edges on such a path is equal tothe number of blue edges in Paul’s final graph G′′ if Paul follows the strategy determinedby T ′′, the average number of blue edges in Paul’s graphs is at least n

2 − O(√

n log n).Observe that for each coloring of I, the edges of the computation path in T ′′ form a

subset of the edges of the computation path in T . Therefore, the average number of blueedges in Paul’s final graphs with respect to the strategy corresponding to T is also at leastn2 −O(

√n log n). Since each final graph contains also (at least) n −2 red edges, the average

number of Paul’s questions, which is equal to the average number of edges in the final graph,is at least 3

2 n − O(√

n log n).

Let us make a few comments on the possibility of extending Theorem 7 to odd n’s. If nis odd, it seems natural to consider the probability distribution in which one ball is assignedthe third color and the remaining n − 1 balls are divided equally between the first and thesecond colors. In the proof of Theorem 7, the absence balls of the third color allowed us toconclude that G[V1] must be connected since otherwise the algorithm could not infer thatall the balls in V1 have the same color. However, for the proposed distribution for odd n,there are balls of all three colors, and it may be possible to infer that all of the balls of V1

have the same color although G[V1] is disconnected (indeed, even when G[V1] is empty).Hence, we cannot establish that the final graph G contains at least n − 2 red edges. We donot know how to circumvent this obstacle and show that the bound of 3

2 n − O(√

n log n)

holds for odd n’s (although we strongly believe that it is so).

6. STRATEGY FOR THE PARTITION PROBLEM

We first state the following auxiliary lemma:

Lemma 8. Consider a random ordering of n balls such that ξn balls are white and(1 − ξ)n are black. The expected length of the initial segment composed entirely of whiteballs in the random ordering is at least

min

{9,

ξ

1 − ξ− O

(log n√

n

)}.

Proof. Because of the O-term, it is enough to prove the lemma for sufficiently large n,say n ≥ 1000. We distinguish several cases according to whether ξ is close to one, to zero,

Random Structures and Algorithms DOI 10.1002/rsa

Page 11: Probabilistic strategies for the partition and plurality problems

PARTITION AND PLURALITY PROBLEMS 73

or to neither of them. Assume first that ξ > 99100 . The probability that the initial segment of

length 10 is composed entirely of white balls is the following:

ξn

n· ξn − 1

n − 1· · · ξn − 9

n − 9>

(ξn − 9

n − 9

)10

>

(98

99

)10

>9

10.

Hence, the expected length of the initial segment composed entirely of white balls is at least9

10 · 10 = 9.In the rest, we assume that ξ ≤ 99

100 . Let m = �√n log n�. If ξ < 2mn , the expected length

of the initial segment of white balls is at least ξ (the probability that the first ball is white).Since the difference between ξ

1−ξand the lower bound ξ is small,

ξ

1 − ξ− ξ = ξ 2

1 − ξ≤ O(ξ 2) = O

(log2 n

n

)≤ O

(log n√

n

),

the expected length of the initial segment of white balls is at least ξ

1−ξ−O

(log n√

n

)as claimed

in the statement of the lemma.Therefore, we assume that 2m

n ≤ ξ ≤ 99100 in what follows. The expected length of the

initial segment composed of white balls is at least

m∑i=1

P(the first i balls are white) =m∑

i=1

i−1∏j=0

ξn − j

n − j≥

m∑i=1

(ξn − m

n − m

)i

≥m∑

i=1

(ξn − m

n

)i

≥m∑

i=1

(ξ − m

n

)i =(ξ − m

n

) − (ξ − m

n

)m+1

1 − (ξ − m

n

)

≥ ξ

1 − (ξ − m

n

) − O(m

n

)= ξ

1 − ξ− O

(m

n

)= ξ

1 − ξ− O

(log n√

n

).

We can now describe Paul’s strategy:

Theorem 9. There is a probabilistic strategy for the Partition problem with n balls of threecolors such that the expected number of Paul’s questions does not exceed 5

3 n − 83 + O

( log n√n

)for any coloring of the balls.

Proof. Fix a coloring of the n balls. Let α, β, and γ be fractions of the balls of each of thethree colors in the coloring. Paul first chooses a random ordering of the balls. His strategyis divided into n steps, and in the ith step Paul determines the color of the ith ball (in therandom ordering).

If the first i − 1 balls have the same color, then Paul just compares the ith ball with anyof the first i − 1 balls. If the first i − 1 balls have two distinct colors, then Paul randomlychooses one of the two colors and compares the ith ball with a representative of this color.If Carol answers that the balls have different colors, then he compares the ith ball with arepresentative of the other color. Finally, if the first i − 1 balls have three distinct colors,then he randomly chooses one of the colors and compares the ith ball with a representativeof this color. If Carol answers that the balls have different colors, Paul chooses one of thetwo remaining colors and compares the ith ball with its representative.

Random Structures and Algorithms DOI 10.1002/rsa

Page 12: Probabilistic strategies for the partition and plurality problems

74 DVORÁK ET AL.

If the input does not contain balls of all three colors, then the expected number of Paul’squestions in each step is at most 3/2. Consequently, the expected number of all the questionsasked by Paul does not exceed 3

2 n, and the statement of the lemma readily follows. Therefore,we assume in the rest that there is a ball of each of the three colors, i.e., α, β, γ > 0.

Fix an ordering of the balls. Let j be the largest integer such that the first j balls have thesame color and j′ be the largest integer such that the first j′ balls have at most two differentcolors. We compute the expected number of Paul’s questions for the fixed ordering. In thefirst j + 1 steps (except for the first step), Paul always asks a single question. At each ofthe next j′ − j − 1 steps, Paul asks 3/2 questions on average. In the ( j′ + 1)th step, Paulasks two questions. At each of the n − j′ − 1 remaining steps, Paul asks 5/3 questions onaverage. Hence, the expected number of Paul’s questions for the fixed ordering is

j + 3

2( j′ − j − 1) + 2 + 5

3(n − j′ − 1) = 5

3n − 1

2j − 1

6j′ − 7

6.

The expected number of the questions (averaged through all the orderings) is

5

3n − 1

2j − 1

6j′ − 7

6, (12)

where j and j′ are the expected lengths of the initial segments in the random orderingcomposed of balls of one color and two colors, respectively.

Let jα , jβ , and jγ be the expected lengths of initial segments in the ordering formedentirely by balls of the color with the fractions α, β, and γ , respectively. Similarly, jαβ ,jαγ , and jβγ are the expected lengths of initial segments formed by balls of two indexedcolors. Clearly, the following holds:

j = jα + jβ + jγj′ = jαβ + jαγ + jβγ − jα − jβ − jγ .

If one of the numbers jα , jβ , jγ , jαβ , jαγ , and jβγ is at least 9, then the sum j + j′ is at least 9,and the expected number of Paul’s questions is at most 5

3 n − 96 − 7

6 = 53 n − 8

3 . If none ofthe numbers exceed 9, we have by Lemma 8

j = α

1 − α+ β

1 − β+ γ

1 − γ+ O

(log n√

n

)and

j + j′ = α + β

1 − α − β+ α + γ

1 − α − γ+ β + γ

1 − β − γ+ O

(log n√

n

).

Since the function ξ

1−ξis convex and α + β + γ = 1, the minimum of both the expressions

on the right-hand side in the above equations is attained for α = β = γ = 1/3. Therefore,we have the following:

j ≥ 3 · 1/3

1 − 1/3− O

(log n√

n

)= 3

2− O

(log n√

n

)(13)

j + j′ ≥ 3 · 2/3

1 − 2/3− O

(log n√

n

)= 6 − O

(log n√

n

). (14)

Random Structures and Algorithms DOI 10.1002/rsa

Page 13: Probabilistic strategies for the partition and plurality problems

PARTITION AND PLURALITY PROBLEMS 75

Let us plug estimates (13) and (14) into expression (12) for the expected number of Paul’squestions:

5

3n − 1

3j − 1

6

(j + j′

) − 7

6≤ 5

3n − 1

2− 1 − 7

6+ O

(log n√

n

)= 5

3n − 8

3+ O

(log n√

n

).

As the first step toward our lower bound, we state the following lemma on the averagedepths of leaves in binary trees with unbalanced inner vertices:

Lemma 10. Let T be a rooted binary tree with N leaves. Assume that on each path fromthe root of T to a leaf, at least inner nodes have the following property (): the number ofthe leaves in the left subtree is precisely half of the number of the leaves in the right subtree.The average length of the path from the root to a leaf of T is at least the following:

log N

log 2+

(5

3− log 3

log 2

).

Proof. The proof proceeds by induction on N . If N = 1, the tree consists just of its root, must be equal to zero, and the average length of the path from the root to a leaf is also zero.Assume that N > 1. In particular, the root has two children. Let N1 and N2 be the numbersof leaves in each of the two subtrees. Clearly, N = N1 + N2.

If the root does not have the property (), then there are at least inner nodes with theproperty () on every path from the root to a leaf in each of the two subtrees. Using theinduction, we infer that the average length of the path from the root to each of the leaves inthe tree T is at least the following:(

1 + log N1log 2 +

(53 − log 3

log 2

))

N1 +(

1 + log N2log 2 +

(53 − log 3

log 2

))

N2

N

= 1 + N1 log N1 + N2 log N2

N log 2+

(5

3− log 3

log 2

)

≥ 1 + log N2

log 2+

(5

3− log 3

log 2

) = log N

log 2+

(5

3− log 3

log 2

).

On the other hand, if the root has the property (), then N1 = N/3, N2 = 2N/3, andthere are at least − 1 inner nodes with the property () on every path from the root to aleaf in each of the two subtrees. We again use the induction and derive the following lowerbound on the average length of the paths:(

1 + log N1log 2 +

(53 − log 3

log 2

)( − 1)

)N1 +

(1 + log N2

log 2 +(

53 − log 3

log 2

)( − 1)

)N2

N

= 1 +N3 log N

3 + 2N3 log 2N

3

N log 2+

(5

3− log 3

log 2

)( − 1)

= 1 + log N − log 3 + 23 log 2

log 2+

(5

3− log 3

log 2

)( − 1)

= log N

log 2+ 5

3− log 3

log 2+

(5

3− log 3

log 2

)( − 1) = log N

log 2+

(5

3− log 3

log 2

).

Random Structures and Algorithms DOI 10.1002/rsa

Page 14: Probabilistic strategies for the partition and plurality problems

76 DVORÁK ET AL.

We are now able to show that the number of Paul’s questions in Theorem 9 is optimal:

Theorem 11. For any probabilistic strategy for the Partition problem with n balls ofthree colors, there is a coloring of the balls that forces Paul to ask at least 5

3 n − 83 questions

on average.

Proof. By Yao’s principle (Proposition 1), it is enough to find a probability distributionon colorings such that if Paul uses any deterministic strategy, then he asks at least 5

3 n − 83

questions on average. We claim that the uniform distribution on all 3n possible coloringshas this property.

Fix a deterministic strategy and let T be the corresponding binary tree. Since Paul isable to solve the Partition problem using this strategy, the computation paths may end inthe same leaf for at most six different colorings (they can differ only by a permutation ofthe colors). Hence, T has at least N = 3n/6 leaves.

Consider an inner node w of T at which Paul’s question corresponds to an edge betweentwo different components of Paul’s graph. Let I be the set of colorings whose computationpath reaches the node w. Since the edge corresponding to the question joins two differentcomponents, for exactly one third of the input colorings Carol answers that the balls have thesame colors, and for exactly two thirds she answers that their colors are different: for eachcoloring X ∈ I, the set I contains all five other colorings obtained from X by permutationof the colors in one of the components. We conclude that if Paul’s question at the node wcorresponds to an edge between two different components of his graph, then the node whas the property () from the statement of Lemma 10.

Observe now that Paul’s final graph is connected: otherwise, we could permute the colorsof the balls in one of the components while keeping the colors of the remaining balls thesame, which would yield a different partition consistent with Paul’s final graph, and Paulwould be unable to uniquely determine the partition of the balls. Since Paul’s final graph isconnected, on each path from the root to a leaf, there are at least n − 1 nodes in which Paulasked a question that corresponds to an edge between two different components. Therefore,on each such path, at least n − 1 nodes have the property () from Lemma 10.

By Lemma 10, the average length of the path from the root to a leaf of T , which is equalto the expected number of Paul’s questions, is at least

log 3n

6

log 2+

(5

3− log 3

log 2

)(n − 1) = log 3

log 2n − log 3

log 2− 1 +

(5

3− log 3

log 2

)(n − 1)

= 5

3n − 1 − 5

3= 5

3n − 8

3.

The arguments of this section can be generalized to inputs with balls of more colors.However, the obtained lower and upper bounds do not match: if Paul uses a probabilisticstrategy similar to that in Proposition 9, he asks k2+k−2

2k n+O(1) questions on average for thePartition problem with n balls of k colors. On the other hand, the tree of any deterministicstrategy must contain at least n − 1 nodes w on any path from the root to a leaf such that thesubtree of the left child of w contains exactly the fraction of 1/k of the leaves of the wholesubtree of w. Based on this, one can establish the lower bound

(k−1

klog(k−1)

log 2 +1)n−�(k log k)

on the expected number of questions asked by Paul (in the worst case).

Random Structures and Algorithms DOI 10.1002/rsa

Page 15: Probabilistic strategies for the partition and plurality problems

PARTITION AND PLURALITY PROBLEMS 77

ACKNOWLEDGMENTS

The paper is based on the results obtained during the DIMACS/DIMATIA Research Experi-ence for Undergraduates program in 2004. The REU program is a joint project of DIMACSat Rutgers University and DIMATIA at Charles University. The authors are also grateful toJános Komlós, the director of the REU program at Rutgers, for his helpful comments onthe considered problems.

REFERENCES

[1] M. Aigner, Variants of the majority problem, Discrete Appl Math 137 (2004), 3–25.

[2] M. Aigner, G. De Marco, and M. Montangero, “The plurality problem with three colors,”STACS 2004, LNCS 2296, V. Diekert and M. Habib (Editors), Springer-Verlag, Berlin, 2004,pp. 513–521.

[3] M. Aigner, G. De Marco, and M. Montangero, The plurality problem with three colors andmore, Theor Comput Sci 337 (2005), 319–330.

[4] L. Alonso, E. M. Reingold, and R. Schott, Determining the majority, Inform Process Lett 47(1993), 253–255.

[5] L. Alonso, E. M. Reingold, and R. Schott, The average-case complexity of determining themajority, SIAM J Comput 26(1) (1997), 1–14.

[6] F. R. K. Chung, R. L. Graham, J. Mao, and A. C.-C. Yao, “Oblivious and adaptive strategiesfor the majority and plurality problems,” COCOON 2005, LNCS 3595, L. Wang (Editor),Springer-Verlag, Berlin, 2005, pp. 329–338.

[7] T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, 1991.

[8] Z. Dvorák, V. Jelínek, D. Král’, J. Kyncl, and M. Saks, “Three optimal algorithms for balls ofthree colors,” STACS 2005, LNCS 3404, V. Diekert and B. Durand (Editors), Springer-Verlag,Berlin, 2005, pp. 206–217.

[9] M. Fisher and S. Salzberg, Finding a majority among n votes, J Algor 3 (1982), 375–379.

[10] D. Král’, J. Sgall, and T. Tichý, Randomized strategies for the plurality problem, Discrete ApplMath, to appear.

[11] G. De Marco and A. Pelc, “Randomized algorithms for determining the majority on graphs,”MFCS 2003, LNCS 2747, B. Rovan and P. Vojtás (Editors), Springer-Verlag, Berlin, 2003,pp. 368–377.

[12] J. Matoušek and J. Nešetril, Invitation to Discrete Mathematics, Oxford University Press, 1998.

[13] R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.

[14] M. Saks and M. Werman, On computing majority by comparisons, Combinatorica 11 (1991),383–387.

[15] N. Srivastava, Tight bounds on plurality using equality tests, unpublished manuscript, 2003.

[16] A. C.-C. Yao, “Probabilistic computations: Towards a unified measure of complexity,” Pro-ceedings of the 17th Annual Symposium on Foundations of Computer Science (FOCS), 1977,pp. 222–227.

Random Structures and Algorithms DOI 10.1002/rsa