32
Economics and Search Hal Varian SIGIR, August 16, 1999 http://www.sims.berkeley.edu/~hal

Economics and Search Hal Varian SIGIR, August 16, 1999 hal

  • View
    232

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Economics and Search

Hal VarianSIGIR, August 16, 1999http://www.sims.berkeley.edu/~hal

Page 2: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Three points of contact

1. Value of information2. Estimating degree of relevance3. Optimal search behavior

Page 3: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

1. Value of information

Economic value of information More information helps us make better

decisions Economic value of information = value

of best decision with information - value of best decision without the informationIncrease in expected utility due to the

better decision, or decrease in expected cost

Page 4: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Properties

Information has non-negative private value (because it can be ignored)

Information is valuable only when it is “new” -- when it changes a decision

Example financial information gets quickly

incorporated into stock prices subsequent “news” may not move prices “buy on the rumor, sell on the news”

Page 5: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Relevance to search?

Information is valuable when it is “new”

“Relevance” captures only part of information value since a document may be relevant but not “new”

Example repeated occurrence of documents many similar documents

Page 6: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

How to handle?

Post-retrieval clustering often-proposed strategy

for disambiguationorganization

possible additional motivationmaximize the “information content” in each

new document clustermay allow for more effective search

Page 7: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

2. Estimating relevance

Estimate probability of relevance as function of characteristics of document and query

E.g., logistic regression a la Bill CooperWhy logistic form?

Formerly data-poor environment Had to assume functional form Now that we have a data-rich environment,

can use nonparametric methods

Page 8: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Example with TREC dat

100,102 WSJ doc-query pairs for fitting173,330 WSJ doc-query pairs for

extrapolationOne explanatory variable: x=terms in

common (after stemming, etc.)

(Thanks to Aito Chen and Fred Gey for data)

Page 9: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Outline of estimation

Maximum likelihood (classical procedure)

Calculate frequencies of relevance as a function of terms-in-common fit by logistic transformation fit by nonparametric regression

Compare shapes of fitted functions

Page 10: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Frequency of relevance

Look at all document-query pairs with 1 word-in-common

See what fraction of these are relevantRepeat for 2, 3, 4 … words in common

generates a histogram with words-in-common on horizontal axis, frequency of relevance on vertical axis

Page 11: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

ML-fitted logit and freqs

10 20 30 40 50 60 70

0.2

0.4

0.6

0.8

1

Page 12: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Direct estimate of logit

Logit p(x) = exb/(1+exb) p(x)/(1-p(x)) = exb

Regression log [fi/(1-fi)] = xb

Note: have to censor observations fi = 0 or 1

Page 13: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Results

10 20 30 40 50 60 70

0.2

0.4

0.6

0.8

1

Page 14: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Nonparametric regression

Find monotone function that minimizes sum of squared residuals between observations and fitted expression

PAV = “pool adjacent violators” algorithm doesn’t require solving minimization problem directly

Page 15: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Nonparametric results

10 20 30 40 50 60 70

0.2

0.4

0.6

0.8

1

Page 16: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Further smoothing

10 20 30 40 50 60 70

0.2

0.4

0.6

0.8

1

Page 17: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Extrapolation to other data

10 20 30 40 50 60 70

0.2

0.4

0.6

0.8

1

Page 18: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Further work

Add another variable, e.g., query length/ document length “inverse document frequency”

Look at other collections

Note: since there is only one variable, recall-precision is same for all estimators

Page 19: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

3. Search behavior

Economic model: search for lowest price or highest wage

With or without “recall” (revisit stores)Results do not cumulate, care only

about the max May or may not be natural in IR context Of course, can generalize to k-best choices

Page 20: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Example

Marty Weitzman’s “Pandora problem” “Optimal Search for the Best Alternative”,

Econometrica, May 1979 n boxes reward in box i is random with cdf Fi(x)

costs ci to open a box, time discount factor d<1

payoff is maximum value found up to point when you stop opening

Page 21: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

IR story

You work at airport book store people are in a hurry (d < 1) mental effort to examining books (c > 0) will only take one book with them you have an idea of how likely it is that

person will like the book (Fi(x))

Problem: in what order to show them books?

Page 22: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Analysis

State is summarized by maximum reward so far

Question is whether to open next box

Can be solved by dynamic programming

Page 23: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Nature of solution

Assign a “score” to each box depends only on that box can be computed “easily”

Selection rule: if you open a box, open that box with the highest score

Stopping rule: stop searching when the maximum sampled reward exceeds the score of every closed box

Page 24: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Riskiness and search order

Score is not expected value“Other things being equal, it is optimal

to sample first from distributions that are more spread out or riskier in hopes of striking it rich early and ending the search.”

“Low-probability, high-payoff situations should be prime candidates for early investigation…”

Page 25: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Simple example

Box S: gives 6 for sureBox R: equally likely to give 10 or 0

Note: expected value of S > expected value of R

Page 26: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Open box S first

Have 6 for sure, should you continue? 1/2 of time get 10d -c 1/2 of time get -c expected payoff from continuing is 5d - c this is less than 6

Conclusion if open box S first, get payoff of 6 and

will not continue

Page 27: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Open box R first

1/2 of time get 10 can’t do any better, so stop

1/2 of time get 0 continue if 6d-c > 0 (1)

expected payoff = 5 +3d - c/2opening R first is best strategy if

5 + 3d - c/2 > 6, or 6d - c > 2 [if this is true (1) is true]

Page 28: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Summary

If 6d - 2 < c, open S first and stopIf 6d -2 > c, open R first

if get 10, stop if get 0, open S

small search cost and small time preference implies open risky box first

Page 29: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Airport bookstore

Customer runs in says “I want a travel guide to Borneo.”

S = Fodors, R = Lonely PlanetWhich do you show first?

If only time for one book, show Fodors If time for two books, show Lonely Planet

Why: may be able to stop search early and get higher payoff

Page 30: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Risk and search

Don’t necessarily want to order search by expected payoff

Want some high-variance choices early to reduce search costs/time

Generalization Want to sample from high-variance

populations (if they have similar means) Result depends on time-value, search cost,

utility is maximum of choices

Page 31: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Estimation of value?

From a Bayesian perspective, forecast relevance (or value) is random variable as in regressions described earlier

Can apply a Weitzman-type rule to determine optimal order

Is it worth the effort? Depends on how good an estimate of value, discount factor, search cost we have...

Page 32: Economics and Search Hal Varian SIGIR, August 16, 1999 hal

Summary

Information has economic value since it helps make better decisions

Nonlinear estimation (which requires lots of data) may be useful in prediction

Risk and search cost are important factors for determining optimal search order and stopping rule