3
9/16/2014 www.feynmanlectures.info/solutions/restaurant_problem_sol_1.html http://www.feynmanlectures.info/solutions/restaurant_problem_sol_1.html 1/3 Feynman's restaurant problem Richard Feynman and Ralph Leighton had a hard time at restaurants deciding whether to order the best dish they had tried so far or something new, which - who knows - might be even better. So, naturally, they decided to formalize this as a mathematical problem: Assume that a restaurant has N dishes on its menu that are rated from worst to best, 1 to N (according to your personal preferences). You, however, don't know the ratings of the dishes, and when you try a new dish, all you learn is whether it is the best (highest rated) dish you have tried so far, or not. Each time you eat a meal at the restaurant you either order a new dish or you order the best dish you have tried so far. Your goal is to maximize the average total ratings of the dishes you eat in M meals (where M is less than or equal to N). The average total ratings in a sequence of meals that includes n "new" dishes and b "best so far" dishes can be no higher than the average total ratings in the sequence having all n "new" dishes followed by all b "best so far" dishes. Thus a successful strategy requires you to order some number of new dishes and thereafter only order the best dish so far. The problem then reduces to the following: Given N (dishes on the menu) and M <= N (meals to be eaten at the restaurant), how many new dishes D should you try before switching to ordering the best of them for all the remaining (M–D) meals, in order to maximize the average total ratings of the dishes consumed? Solution by Michael A. Gottlieb (based on a letter to Chris Siegert) If you randomly select an integer from the range [1...N], with all integers in that range being equiprobable, the expected value of the integer is just the average value of the range, (N+1)/2. Similarly if you select D (unique) integers from the range, the expected value of each one is (N+1)/2. [So, for example if D=1 (you try only one new dish) your average score for that day is (N+1)/2, while if D=N (ie you try ALL the dishes on the menu) then your average score for those N days is N* (N+1)/2 = 1+2+...+N, which, of course, is your score on every N-day trial, because you have tried each dish once.] Thus your score for the first D days averages D*(N+1)/2. Your score for the remaining M-D days, when you switch to eating the best dish you have tried in the first D days, is (M-D)*E[N,D], where E[N,D] is the expected (average) maximum integer from amongst D (unique) integers selected from the range [1...N]. Your total average score is equal to, D*(N+1)/2 + (M-D)*E[N,D]. To solve the problem you take the derivative of this expression with respect to D, set it to zero, and solve for D. That gives the answer posted with the problem on The Feynman Lectures website Thus the problem really reduces to finding an explicit expression for E[N,D]. It is not too hard to guess what E[N,D] has to be, without making many computations. We know that E[N,k] has to be strictly increasing with k (because the average highest will always be higher when you select more). We know that E[N,1] = (N+1)/2, because if you only sample one dish, it's most likely average. We also know that E[N,N] = N. That is to say, if you sample all the dishes, N is always the highest rating, so that is also the average highest rating. Now consider E[N,N-1], the average largest integer selected when you select all but

Document

Embed Size (px)

Citation preview

Page 1: Document

9/16/2014 www.feynmanlectures.info/solutions/restaurant_problem_sol_1.html

http://www.feynmanlectures.info/solutions/restaurant_problem_sol_1.html 1/3

Feynman's restaurant problemRichard Feynman and Ralph Leighton had a hard time at restaurants deciding whether to order thebest dish they had tried so far or something new, which - who knows - might be even better. So,naturally, they decided to formalize this as a mathematical problem:

Assume that a restaurant has N dishes on its menu that are rated from worst to best, 1 to N (accordingto your personal preferences). You, however, don't know the ratings of the dishes, and when you try anew dish, all you learn is whether it is the best (highest rated) dish you have tried so far, or not. Eachtime you eat a meal at the restaurant you either order a new dish or you order the best dish you havetried so far. Your goal is to maximize the average total ratings of the dishes you eat in M meals(where M is less than or equal to N).

The average total ratings in a sequence of meals that includes n "new" dishes and b "best so far"dishes can be no higher than the average total ratings in the sequence having all n "new" dishesfollowed by all b "best so far" dishes. Thus a successful strategy requires you to order some number ofnew dishes and thereafter only order the best dish so far. The problem then reduces to the following: Given N (dishes on the menu) and M <= N (meals to be eaten at the restaurant), how many new dishesD should you try before switching to ordering the best of them for all the remaining (M–D) meals, inorder to maximize the average total ratings of the dishes consumed?

Solution by Michael A. Gottlieb (based on a letter to Chris Siegert)

If you randomly select an integer from the range [1...N], with all integers in that range beingequiprobable, the expected value of the integer is just the average value of the range, (N+1)/2.Similarly if you select D (unique) integers from the range, the expected value of each one is (N+1)/2.[So, for example if D=1 (you try only one new dish) your average score for that day is (N+1)/2, whileif D=N (ie you try ALL the dishes on the menu) then your average score for those N days is N*(N+1)/2 = 1+2+...+N, which, of course, is your score on every N-day trial, because you have triedeach dish once.] Thus your score for the first D days averages D*(N+1)/2. Your score for theremaining M-D days, when you switch to eating the best dish you have tried in the first D days, is(M-D)*E[N,D], where E[N,D] is the expected (average) maximum integer from amongst D (unique)integers selected from the range [1...N]. Your total average score is equal to,

D*(N+1)/2 + (M-D)*E[N,D].

To solve the problem you take the derivative of this expression with respect to D, set it to zero, andsolve for D. That gives the answer posted with the problem on The Feynman Lectures website Thusthe problem really reduces to finding an explicit expression for E[N,D].

It is not too hard to guess what E[N,D] has to be, without making many computations. We know thatE[N,k] has to be strictly increasing with k (because the average highest will always be higher whenyou select more). We know that

E[N,1] = (N+1)/2,

because if you only sample one dish, it's most likely average. We also know that

E[N,N] = N.

That is to say, if you sample all the dishes, N is always the highest rating, so that is also the averagehighest rating. Now consider E[N,N-1], the average largest integer selected when you select all but

Page 2: Document

9/16/2014 www.feynmanlectures.info/solutions/restaurant_problem_sol_1.html

http://www.feynmanlectures.info/solutions/restaurant_problem_sol_1.html 2/3

one integer in the range [1...N]. That is easy to compute because for all cases where the unselectednumber is not N, the largest integer selected is N, there are N-1 such cases, and in the one remainingcase, the largest integer selected is N-1, so the average is ((N-1)*N + (N-1))/N = ((N-1)/N) * (N+1).So,

E[N,N-1] = ((N-1)/N) * (N+1).

In a similar manner you can easily deduce that

E[N,N-2] = ((N-2)/(N-1)) * (N+1).

It is also not very difficult to compute E[N,2], since there's 1 possible pair where 2 is maximum, 2where 3 is maximum, 3 where 4 is maximum, etc., and all together there are N*(N-1)/2 possible pairs,so

E[N,2] = (Sum m=2 to N of (m-1)*m) / (N*(N-1)/2)

= ((N^3- N)/3) / (N*(N-1)/2)

= (2/3)(N+1).

From all these clues you might guess that

E[N,k] = (k/(k+1)) * (N+1),

and you would be right!

But here is a more formal derivation of E[N,k] :

Define P[N,k,m] = probability that the largest of k (unique) integers selected in the range [1...N] equals m. Then

E[N,k] = Sum m=1 to N of m*P[N,k,m] .

P[N,k,m] equals the number of ways you can select k unique integers from the range [1...N] with mbeing the maximum, divided by the total number of ways you can select k unique integers from therange [1...N]. If you have k unique integers with m being the maximum, then the remaining k-1integers must be selected from the range [1...m-1], so

P[N,k,m] = Binomial[m-1,k-1]/Binomial[N,k]

with m>=k. P[N,k,m] = 0 if m<k, since amongst k unique integers selected from [1...N], the smallestpossible maximum is k. Thus,

E[N,k] = Sum m=k to N of m*P[N,k,m]

We have

P[N,k,m] = ((m-1)!/((m-k)!(k-1)!)) / (N!/((N-k)!k!)),

and

E[N,k] = Sum m=k to N of m*((m-1)!/((m-k)!(k-1)!)) / (N!/((N-k)!k!))

Page 3: Document

9/16/2014 www.feynmanlectures.info/solutions/restaurant_problem_sol_1.html

http://www.feynmanlectures.info/solutions/restaurant_problem_sol_1.html 3/3

= Sum m=k to N of (m!/((m-k)!(k-1)!)) / (N!/((N-k)!k!)). = ((N-k)!k!)/(N!(k-1)!) * Sum m=k to N of m!/(m-k)!.

Now,

Sum m=k to N of m!/(m-k)! = k! * Sum m=k to N of m!/(k!(m-k))

= k! * Sum m=k to N Binomial[m,k] = k! * Binomial[N+1,k+1].

= k! (N+1)!/((N-k)!(k+1)!)

So,

E[N,k] = ((N-k)!k!)/(N!(k-1)!) * k! (N+1)!/((N-k)!(k+1)!)

= (k/(k+1)) * (N+1).

Finally, returning to the first formula above, we find for the expected average rating

D*(N+1)/2 + (M-D)*E[N,D]

= D*(N+1)/2 + (M-D)*(D/(D+1))*(N+1).

Taking the derivative with respect to D and setting it to 0,

(D^2 + 2D - (2M+1))*(N+1)/2(D+1)^2 = 0,

and noting that (N+1)/2(D+1)^2 > 0, we end up with the quadratic equation,

D^2 + 2D - (2M+1) = 0,

having positive solution,

D = Sqrt[2(M+1)] - 1.