Risk-sensitive Information Retrieval Kevyn Collins-Thompson Associate Professor, University of Michigan FIRE Invited talk, Friday Dec. 6, 2013

Risk-sensitive Information Retrieval

Kevyn Collins-ThompsonAssociate Professor, University of Michigan

FIRE Invited talk, Friday Dec. 6, 2013

2

We tend to remember that 1 failure, rather than the previous 200 successes

Current retrieval algorithms work well on average across queries…

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Query expansion:Current state-of-the-art method

Queries hurt Queries helped

Mean Average Precision gain: +30%

3

Model ≤ Baseline Model > Baseline

…but are high risk = significant expectation of failure due to high variance across individual queries.

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies



4

This is one of the reasons that even state-of-the-art algorithms are

impractical for many real-world scenarios.


Model ≤ Baseline Model > Baseline

Failure = Your algorithm makes results worse than if it had not been applied.

We want more robust IR algorithms having as objective:1. Maximize average effectiveness

2. Minimize risk of significant failures

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies


Robust version

0

5

10

15

20

25

-100 -9

0

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies


Average gain: +30% Average gain: +30%

5

Defining risk and reward in IR

1. Reward = Effectiveness measure - NDCG, ERR, MAP, …

2. Define failure for a single query– Typically relative to a baseline– e.g. 25% loss in MAP– e.g. Query hurt (ΔMAP < 0)

3. Risk= aggregate failure across queries– e.g. P(> 25% MAP loss)– e.g. Average NDCG loss > 10%– e.g. # of queries hurt

6

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Some examples of risky IR operations

• Query rewriting and expansion– Spelling correction, common word variants, synonyms and

related words, acronym normalization, …– Baseline: the unmodified query

• Personalized search– Trying to disambiguate queries, given unknown user intent– Personalized, groupized and contextual re-ranking– Baseline: the original, non-adjusted ranking. Or: ranking

from previous version of ranking algorithm.• Resource allocation

– Choice of index tiering, collection selection

7

DDR 2012 Seattle

Example: Gain/loss distribution of topic-based personalization across queries

[Sontag et al. WSDM 2012]

-6 -5 -4 -3 -2 -1 1 2 3 4 5 6-0.00999999999999997

3.29597460435593E-17

0.01

0.02

0.03

0.04

0.05

0.06

0.07Reliability of Personalization Models

Change in Rank Position of Last Satisfied Click

Prop

ortio

n of

Que

ries

Relative to Bing production

ranking

DDR 2012 Seattle

Another example: Gain/loss distribution of location-based personalization across queries

[Bennett et al., SIGIR 2011]

[-100,-9

0)

[-90,-8

0)

[-80,-7

0)

[-70,-6

0)

[-60,-5

0)

[-50,-4

0)

[-40,-3

0)

[-30,-2

0)

[-20,-1

0)

[-10,0)

[10,20)

[20,30)

[30,40)

[40,50)

[50,60)

[60,70)

[70,80)

[80,90)

[90,100)0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

P(Loss > 20%) = 8%when ranking is

affected

The three key points of this talk

1. Many key IR operations are risky to apply.2. This risk can often be reduced by better

algorithm design.3. Evaluation should include risk analysis.

– Look at the nature of gain and loss distribution– Not just averages.

10

This risk-reward tradeoff occurs again and again in search… but is often ignored

• A search engine provider must choose between two personalization algorithms:– Algorithm A has expected NDCG gain = +2.5 points

• But P(Loss > 20%) = 60%

– Algorithm B has NDCG gain = +2.1 points• But P(Loss > 20%) = 10%

• Which one will be deployed?

Algorithm deployment typically driven by focus on average NDCG/ERR/MAP/… gain

• Little/no consideration of downside risk.• Benefits of reducing risk:

– User perception: failures are memorable– Desire to avoid churn – predictability, stability– Increased statistical power of experiments

• Goal: Understand, optimize, and control risk-reward tradeoffs in search algorithms

Motivating questions

• How can effectiveness and robustness be jointly optimized for key IR tasks?

• What tradeoffs are possible?• What are effective definitions of “risk” for

different IR tasks?• When and how can search engines effectively

“hedge” their bets for uncertain choices?• How can we improve our valuation models for

more complex needs, multiple queries or sessions

14

Scenario 1: Query expansion[Collins-Thompson, NIPS 2008; CIKM 2009]

Example: Ignoring aspect balance increases algorithm risk

court 0.026appeals 0.018federal 0.012employees 0.010case 0.010education 0.009school 0.008union 0.007seniority 0.007salary 0.006

Hypothetical query: ‘merit pay law for teachers’

legal aspect is modeled…

education & pay aspects thrown away..

BUT

A better approach is to optimize selection of terms as a set

court 0.026appeals 0.018federal 0.012employees 0.010case 0.010education 0.009school 0.008union 0.007seniority 0.007salary 0.006

Hypothetical query: ‘merit pay law for teachers’

More balanced query model

16

Empirical evidence: Udupa, Bhole and Bhattacharya. ICTIR 2009

Using financial optimization based on portfolio theory to mitigate risk in query expansion [Collins-Thompson, NIPS 2008]

• Reward: – Baseline provides initial weight vector c – Prefer words with higher ci values: R(x) = cTx

• Risk: – Model uncertainty in c using a covariance matrix Σ– Model uncertainty in Σ using regularized Σγ = Σ + γD – Diagonal: captures individual term variance (term centrality)– Off-diagonal: term covariance (co-occurrence/term association)

• Combined objective:

• Markowitz-type model

17

xDxxcxVxRxU TT )(2

)()()(

Black-box approach works with any expansion algorithm via post-process optimizer

[Collins-Thompson, NIPS 2008]

18

Query

Baseline expansion algorithm

Convexoptimizer

Top-ranked documents(or other source of term

associations)

Robust query model

Constraints on word sets

We don’t assume the baseline is good or

reliable!

Word graph (Σ):• Individual term risk (diagonal)

• Conditional term risk (off-diagonal)

Controlling the risk of using query expansion terms

10

,

Q ,

subject to

2

- minimize

x

sparsity/ Budgetyxw

supporttermQuery Qwuxl

coverage Aspectwxg

anceAspect balAx

Budget reward; & iskRyxxxc

T

iiii

iiT

i

TT

Aspect balance Term centrality Aspect coverage

19

Bad Good

Χ

Y

Χ

Y

Variable Centered

Χ

Y

Χ

Y

Low High

Χ

Y

Χ

Y

REXPalgorithm

Example solution output

parkinson 0.996disease 0.848syndrome 0.495disorders 0.492parkinsons 0.491patient 0.483brain 0.360patients 0.313treatment 0.289diseases 0.153alzheimers 0.114...and 90 more...

parkinson 0.9900disease 0.9900syndrome 0.2077parkinsons 0.1350patients 0.0918brain 0.0256

Baseline expansion Post-processed robust version

(All other terms zero)

Query: parkinson’s disease

Evaluating Risk-Reward Tradeoffs: Introducing Risk-Reward Curves

21

Ave

rag

e E

ffec

tive

nes

s(o

ver

bas

elin

e)

Risk (Probability of Failure)

Robust algorithm:Higher effectiveness for any given level of risk

Given a baseline Mb, can we improve average effectiveness over Mb without hurting too many queries?

Gain-only model Risk-averse model

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

0

5

10

15

20

25

-100 -9

0

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Risk-reward curves as a function of algorithm risk-aversion parameter

Risk-Reward Tradeoff Curves

0

5

10

15

20

25

30

35

40

0 500 1000 1500 2000 2500

R-Loss (Risk increase)

Per

cen

t M

AP

Gai

n

Risk-reward curves: Algorithm A dominates algorithm B with consistently superior tradeoff

Algorithm A

Algorithm B

23

Curves UP and to the LEFT are better

Risk-aversion parameter in query expansion: weight given to original vs expansion query

0

5

10

15

20

25

0 500 1000 1500R-Loss

QMOD trec7a

Baseline

0

5

10

15

20

25

0 500 1000 1500 2000 2500

R-Loss

QMOD trec8a

Baseline

0

5

10

15

20

25

30

35

40

45

0 500 1000 1500 2000 2500

R-Loss

QMOD trec12

Baseline

0

2

4

6

8

10

12

14

16

18

20

0 2000 4000 6000 8000 10000 12000

R-Loss

QMOD Robust2004

Baseline

-16

-14

-12

-10

-8

-6

-4

-2

0

2

4

6

8

10

12

0 2000 4000 6000 8000 10000 12000

R-Loss

QMOD gov2

Baseline

-16-14-12-10-8-6-4-202468

101214161820

0 1000 2000 3000 4000 5000

R-Loss

QMOD wt10g

Baseline

24

Robust version dominates baseline version (MAP)

0

5

10

15

20

25

0 500 1000 1500

R-Loss

QMOD trec7a

QMOD

Baseline

0

5

10

15

20

25

0 500 1000 1500 2000 2500

R-Loss

QMOD trec8a

QMOD

Baseline

0

5

10

15

20

25

30

35

40

45

0 500 1000 1500 2000 2500

R-Loss

QMOD trec12

QMOD

Baseline

0

2

4

6

8

10

12

14

16

18

20

0 2000 4000 6000 8000 10000 12000

R-Loss

QMOD Robust2004

QMOD

Baseline

-16

-14

-12

-10

-8

-6

-4

-2

0

2

4

6

8

10

12

0 2000 4000 6000 8000 10000 12000

R-Loss

QMOD gov2

QMOD

Baseline

-16-14-12-10-8-6-4-202468

101214161820

0 1000 2000 3000 4000 5000

R-Loss

QMOD wt10g

QMOD

Baseline

25

Robust version significantly reduces the worst expansion failures

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Nu

mb

er o

f q

uer

ies

Percent MAP gain

QMOD trec12

Baseline

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Nu

mb

er o

f q

uer

ies

Percent MAP gain

QMOD trec7a

Baseline

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Nu

mb

er o

f q

uer

ies

Percent MAP gain

QMOD trec8a

Baseline

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Nu

mb

er o

f q

uer

ies

Percent MAP gain

QMOD wt10g

Baseline

0

5

10

15

20

25

30

35

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Nu

mb

er o

f q

uer

ies

Percent MAP gain

QMOD robust2004

Baseline

0

5

10

15

20

25

30

35

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Nu

mb

er o

f q

uer

ies

Percent MAP gain

QMOD gov2

Baseline

26

Robust version significantly reduces the worst expansion failures

QMOD trec12

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Baseline

QMOD

QMOD trec7a

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Baseline

QMOD

QMOD trec8a

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Baseline

QMOD

QMOD wt10g

0

5

10

15

20

25

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Baseline

QMOD

QMOD robust2004

0

5

10

15

20

25

30

35

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Baseline

QMOD

QMOD gov2

0

5

10

15

20

25

30

35

-100 -90

-80

-70

-60

-50

-40

-30

-20

-10 0 10 20 30 40 50 60 70 80 90 100

>100

Percent MAP gain

Nu

mb

er o

f q

uer

ies

Baseline

QMOD

27

Aspect constraints are well-calibrated to actual expansion benefit

• About 15% of queries have infeasible programs (constraints can’t be satisfied)

• Infeasible → No expansion

28

-0.35

-0.25

-0.15

-0.05

0.05

0.15

0.25

0.35

[-1

00

%, -

75

%)

[-7

5%

, -5

0%

)

[-5

0%

, -2

5%

)

[-2

5%

, 0%

)

[0%

, 25

%)

[25

%, 5

0%

)

[50

%, 7

5%

)

[75

%, 1

00

%)

>=

10

0%

Percent MAP gain using baseline expansion

Lo

g-o

dd

s o

f re

ve

rtin

g t

o o

rig

ina

l q

ue

ry

29

Scenario 2:Risk-sensitive objectives in learning to rank

[Wang, Bennett, Collins-Thompson SIGIR 2012]

What Causes Risk in Ranking?

30

Significant differences exist between queries

- Click entropies, clarity, length

- Transactional, informational, navigational

Many ways to rank / re-rank

- What features to use?

- What learning algorithm to use?

- How much personalization?

“Risk”: One intuitive definition: probability that this is the wrong technique for a particular query (i.e. hurts performance relative to baseline)

Framing the Learning Problem

31

Learning

Ranking Model

Training data

Ranked retrieval Top-K

Model class

Objective DocumentsQuery

= =

Ranking model?

Optimization objective?

How to learn?

CHALLENGES:

Low-risk and effective (relative to baseline)

Optimally balancerisk & reward

Captures risk & reward

=

Baseline model

A Combined Risk-Reward Optimization Objective

32


Reward: average positive gain (over all queries)

Risk: average negative gain (over all queries)

Objective: T(α) = Reward – (1+α) Risk# queries

baseline new model

TQQ

bm (Q)]M - (Q)M max[0,N

1

new model baseline

33

A General Family of Risk-Sensitive Objectives

Objective: T(α) = Reward – (1+α) Risk

Gives a family of tradeoff objectives that captures a spectrum of risk/reward tradeoffs

Some special cases: : standard average performance optimization

(high reward, high risk) = very large (low risk, low reward) Robustness of model increases with larger

Optimal value of can be chosen based on application

Can substitute in any effectiveness measure

Integrating Risk-Sensitive Objective into LambdaMART

• Extension of LambdaMART (MART + LambdaRank)

• Each tree models gradient of tradeoff wrt doc scores

34

+ +… +

ij

Derivative of cross-entropy

Change in tradeoff due to swapping i and j

Sorted by scores

0

5

10

15

20

25

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10 0 10

20

30

40

50

60

70

80

90

100

>100

Percent MAP gain

Nu

mb

er

of

qu

eri

es

Queries hurt

Queries helped

Heavily promote

Heavily penalize

Experiment Setup

• Task: Personalization– Dataset: Location (Bennett et al., 2011)– Selective per-query strategy: Min location entropy

• Low location entropy predicts likely local intent– Baseline: Re-ranking model learned on all personalization

features

35

Risk-sensitive re-ranking for location personalization(α = 0, no risk-aversion)

[-100,-9

0)

[-90,-8

0)

[-80,-7

0)

[-70,-6

0)

[-60,-5

0)

[-50,-4

0)

[-40,-3

0)

[-30,-2

0)

[-20,-1

0)

[-10,0)

[10,20)

[20,30)

[30,40)

[40,50)

[50,60)

[60,70)

[70,80)

[80,90)

[90,100)0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

alpha = 0

Risk-sensitive re-ranking for location personalization(α = 1, mild risk-aversion)

[-100,-9

0)

[-90,-8

0)

[-80,-7

0)

[-70,-6

0)

[-60,-5

0)

[-50,-4

0)

[-40,-3

0)

[-30,-2

0)

[-20,-1

0)

[-10,0)

[10,20)

[20,30)

[30,40)

[40,50)

[50,60)

[60,70)

[70,80)

[80,90)

[90,100)0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

alpha = 0

alpha = 1

Risk-sensitive re-ranking for location personalization(α = 5, medium risk-aversion)

[-100,-9

0)

[-90,-8

0)

[-80,-7

0)

[-70,-6

0)

[-60,-5

0)

[-50,-4

0)

[-40,-3

0)

[-30,-2

0)

[-20,-1

0)

[-10,0)

[10,20)

[20,30)

[30,40)

[40,50)

[50,60)

[60,70)

[70,80)

[80,90)

[90,100)0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

alpha = 0

alpha = 1

alpha = 5

DDR 2012 Seattle

[-100,-9

0)

[-90,-8

0)

[-80,-7

0)

[-70,-6

0)

[-60,-5

0)

[-50,-4

0)

[-40,-3

0)

[-30,-2

0)

[-20,-1

0)

[-10,0)

[10,20)

[20,30)

[30,40)

[40,50)

[50,60)

[60,70)

[70,80)

[80,90)

[90,100)0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

alpha = 0

alpha = 1

alpha = 5

alpha = 10

Risk-sensitive re-ranking for location personalization(α = 10, highly risk-averse)

P(Loss > 20%) → 0 while maintaining significant

gains

41

Gain vs Risk

TREC Web Track 2013:Promoting research on risk-sensitive retrieval

• New collection:– ClueWeb12

• New task:– Risk-sensitive retrieval

• New topics:– Single + multi-faceted topics

Participating groups

TU Delft (CWI)TU Delft (wistud)Univ. MontrealOmarTech, BeijingChinese Acad. SciencesMSR/CMURMITTechnionUniv. Delaware (Fang)Univ. Delaware (udel)Jiangsu Univ. Univ. GlasgowUniv. TwenteUniv. WaterlooUniv. Weimar

TREC 2013: 15 groups, 61 runs (TREC 2012: 12 groups, 48 runs)

Automatic runs: 53Manual runs: 8

Category A runs: 52Category B runs: 9

Topic development

• Multi-faceted vs single-faceted topics• Faceted type and structure were not revealed

until after run submission• Initial topic release provided the query only

201:raspberry pi202:uss carl vinson203:reviews of les miserable204:rules of golf205:average charitable donation

Example multi-faceted topicsshowing informational, navigational subtopics

<topic number="235" type="faceted"><query>ham radio</query><description> How do you get a ham radio license? </description>

<subtopic number="1" type="inf">How do you get a ham radio license?</subtopic><subtopic number="2" type="nav">What are the ham radio license classes?</subtopic><subtopic number="3" type="inf">How do you build a ham radio station?</subtopic><subtopic number="4" type="inf">Find information on ham radio antennas.</subtopic><subtopic number="5" type="nav">What are the ham radio call signs?</subtopic><subtopic number="6" type="nav">Find the web site of Ham Radio Outlet.</subtopic>

</topic>

Example single-facet topics<topic number="227" type="single"><query>i will survive lyrics</query><description>Find the lyrics to the song "I Will Survive".</description></topic>

<topic number="229" type="single"><query>beef stroganoff recipe</query><description>Find complete (not partial) recipes for beef stroganoff.</description></topic>

Track instructions

• Via github, participants were provided:– Baseline runs (ClueWeb09 and ClueWeb12)– Risk-sensitive versions of standard evaluation tools

• Compute risk-sensitive versions of ERR-IA, NDCG, etc.• gdeval, ndeval: new alpha parameter

• Ad-hoc task– Submit up to 3 runs, each with top 10k results, etc.

• Risk-sensitive task– Submit up to 3 runs: alpha = 1,5,10– Could perform new retrieval, not just re-ranking– Participants asked to self-identify alpha-level for each run

Ad-hoc run rank (ERR@10)

0 2 4 6 8 10 12 14 16 180

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Visualization of adhoc runs ERR@10 vs nDCG@10

Baseline run for risk evaluation

• Goals:– Good ad-hoc effectiveness (ERR and NDCG)– Standard, easily reproducible algorithm

• Approach:– Selected based on ClueWeb09 performance– RM3 Pseudo-relevance feedback from Indri retrieval engine.– For each query:

• 10 feedback documents, 20 feedback terms• Linear interpolation weight of 0.60 with original query.

– Waterloo spam classifier filtered out all documents with percentile-score < 70.

Ad-hoc run performance (ERR@10) by topic

Topics201-225

Topics226-250

Baseline in red

Two systems with strong average performance but different per-query variability profiles

Technion201-225

clustmrfaf

Glasgow201-225

uogTrAIwLmb

Two systems with strong average performance but different per-query variability profiles

Technion226-250

clustmrfaf

Glasgow226-250

uogTrAIwLmb

Risk-sensitive evaluation measures

Losses are weighted times as heavily as successes.

When the system will ignore the baseline.When the system will try to avoid large losses w.r.t. baseline.

The ad-hoc task corresponds to case.

Set of queries that gain over baseline by

Set of queries that lose over baseline by

Risk-sensitive results summary(ordered by alpha = 1)

Relative ad-hoc vs risk-sensitive ERR@20(alpha = 1)

Ad-hoc vs risk-averse ERR@10





Change in relative ranking of the top 10 systems as risk-aversion (alpha) increases (ERR@10)

Did runs self-identified as risk-sensitive do better under the corresponding risk-sensitive measure?

0 2 4 6 8 10 12

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

diro_web_13ICTNETMSR_RedmondudelUJSuogTrwebiswistud

Alpha

Risk

-sen

sitiv

e ER

R@10

Del

ta

from

Tea

m's

Ow

n A

d H

oc R

un

uogTr

ICTNET

wistud

diro_w

eb_1

3udel UJS

webis

MSR_R

edmond

00.060.12

Ad hoc Err@ 10

Conclusions from TREC 2013 Risk-sensitive Task

• Evidence of some success in building robust systems that avoid baseline failures

• Less evidence of systems that are good at making explicit risk-reward tradeoffs

• Error (failure) profiles are still very different across systems, suggesting room for further improvements:– Query performance/failure prediction– Robust ranking objectives– Combining or selecting from multiple systems

Research directions in risk-aware retrieval

• Measuring user-perceived impact of risky systems– Some limited user studies, for recommender systems– No large-scale studies of Web search

• Whole-page relevance as investment– Objective: Diversify across different user intent

hypotheses…• While also enforcing consistency constraints

– When and how to modify the UI based on task/intent?• Federated search

– Handle growing number of diverse information resources– Integrating latency, cost with retrieval risk

The three key points of this talk

1. Many key IR operations are risky to apply.• e.g. query expansion, personalized ranking

2. This risk can often be reduced by better algorithm design and feature choices

• Convex optimization, confidence-oriented features

3. Evaluation should include risk analysis.– Robustness gain/loss histograms– Risk-reward curves– Risk-averse effectiveness measures

63

Consider participating inTREC Web Track 2014!

64

Thanks! Questions?

• Now admitting new PhD students to my lab for Fall 2014

• Application deadline: December 15, 2013

Contact Kevyn Collins-Thompson [email protected]