Task-aware query recommendation Henry Feild James Allan Center for Intelligent Information Retrieval University of Massachusetts Amherst July 29, 2013

1

Task-aware query recommendation

Henry Feild James Allan

Center for Intelligent Information RetrievalUniversity of Massachusetts Amherst

July 29, 2013Dublin, Ireland

2

Delaware Solar Energy CoalitionThe Delaware Solar Energy Coalition (DSEC) was founded…http://delsec.org

Deaf Smith Electric Cooperative – HomeThe Deaf Smith Electric Cooperative (DSEC).http://dsec.org

when was the dsec created

Related searches: dsec infodsecdelaware solar energy coalition createddeaf smith electric cooperative createddayton service engineers collaborative created

DuPont ChallengeThe DuPont Challenge calls on students to rsearch, think critically…http://thechallenge.dupont.com

DuPontFor more than 200 years, DuPont has brought world-class science and engineering…http://www.dupont.com

Related searches: dupont challenge informationdupont essay submissiondupont challenge historydeaf smith electric cooperative createddayton service engineers collaborative created

what is the dupont science essay contestwhen was the dsec created

3

object management groupandrew watson

watson omg

Disambiguation

elliptical trainerelliptical trainer benefits

Find key terms/phrases

crisis intervention professional servicescrisis intervention services

Potential benefits of context

michworksmichigan unemployed

Term expansion

4

Likelihood of x-length tasks

57% of AOL tasks consist of 2 or more queries

Based on a sample of 503 AOL users; tasks were manually labeled by UMass students.

5

A couple of issuesInterleaved taskspampered chef

elliptical trainerhosting pampered chef

elliptical trainer benefits

Interleaved17%

Not interleaved83%

Interleaved22%

Not interleaved78%

3 days of Yahoo! data

3 months of AOL data

Jones & Klinkner (CIKM 2009)

Aggregated over 503 AOL users

Multi-tasking within a fixed

windowYou are more likely than not to encounter off-task queries even going back one query!

6

Ideal system

pampered chefelliptical trainer

hosting pampered chefelliptical trainer benefits



pampered chef

hosting pampered chefelliptical trainer

elliptical trainer benefits

Identify on-task queries

Ignore off-task queries

7

Research questions





Does on-task context help?1

How much does off-task context hurt?2

How well does current technology deal with mixed contexts?

3

Basic models



Reference query only

0.6 the benefits of an elliptical trainer0.5 image 8.0 elliptical trainer0.4 total trainer pro0.4 total trainer pro reviews

Term-Query GraphQuery Recommender(Bonchi et al. SIGIR’12)

9

0.7 pampered chef merchandise0.4 pampered chef parties0.3 cooking supplies0.3 pampered chef stuff

0.6 orbitrek elliptical trainer0.4 image 8.0 elliptical trainer0.4 total trainer pro0.2 total trainer pro reviews

0.6 pampered chef parties0.3 pampered chef parties0.2 cooking supplies0.2 pampered chef stuff

0.6 the benefits of an elliptical trainer0.5 image 8.0 elliptical trainer0.4 total trainer pro0.4 elliptical trainer vs treadmill

Basic models



Decay model





λ: 1.0λ: 0.8

λ: 0.6

λ: 0.4 0.2 elliptical trainer vs treadmill0.2 the benefits of an elliptical trainer0.1 total trainer pro0.1 orbitrek elliptical trainer

10

0.7 pampered chef merchandise0.4 pampered chef parties0.3 cooking supplies0.3 pampered chef stuff

0.6 orbitrek elliptical trainer0.4 image 8.0 elliptical trainer0.4 total trainer pro0.2 total trainer pro reviews

0.6 pampered chef parties0.3 pampered chef parties0.2 cooking supplies0.2 pampered chef stuff


Basic models



Decay model





λ: 1.0λ: 0.8

λ: 0.6

λ: 0.4 0.2 elliptical trainer vs treadmill0.2 the benefits of an elliptical trainer0.1 total trainer pro0.1 orbitrek elliptical trainer

Model Uses context?


Decay ✔

11

Task-aware models



Hard task threshold model

pampered chef


elliptical trainer

λ: 1.0

λ: 0.8



Soft task threshold model

1.00.02

0.900.01

λ: 1.0λ: 0.1

λ: 0.5λ: 0.1

weight = task_decay (decay over on-task queries only)

weight = decay × same_task_score

Same task?(Lucchese et al. WSDM’11)

0.010.90

0.021.0

0.010.90

0.021.0

0.0

1.00.0

12

Task-aware models



Hard task threshold model

pampered chef


elliptical trainer

λ: 1.0

λ: 0.8



Soft task threshold model

1.00.02

0.900.01

λ: 1.0λ: 0.1

λ: 0.5λ: 0.1

weight = task_decay (decay over on-task queries only)

weight = decay × same_task_score

Same task?(Lucchese et al. WSDM’11)

0.010.90

0.021.0

0.010.90

0.021.0

0.0

1.00.0

Model Uses context?

Uses task info?

Hard threshold?

Task decay?

Uses same-task score as weight?


Decay ✔

Hard task threshold ✔ ✔ ✔ ✔

Soft task threshold ✔ ✔ ✔

14

Task-aware models (cont’d)

Firm task threshold model 2

pampered chef


elliptical trainer

1.00.02

0.900.01

1.0

0.7

weight = decay × same_task_score if on-task; 0 otherwise(soft task score for on-task queries; 0 for off-task queries)

weight = task_decay × same_task_score



Firm task threshold model 1

1.00.02

0.900.01

1.00.0

0.50.0

0.0

0.0

0.0

0.0Model Uses

context?Uses task

info?Hard

threshold?Task

decay?Uses same-task score as weight?


Decay ✔

Hard task threshold ✔ ✔ ✔ ✔

Soft task threshold ✔ ✔ ✔

Firm task threshold 1 ✔ ✔ ✔ ✔

Firm task threshold 2 ✔ ✔ ✔ ✔ ✔

Data and evaluationQuery recommendations: 2006 AOL search log

Tasks: 2010—2011 TREC Session Track

used to bulid a Term-Query Graph

On-task contexts- 212 judged sessions- 2+ queries/session- task = session

Off-task contexts

Mixed contexts

Evaluation- When’s the soonest a user can find a novel document?- MRR over documents retrieved with the top scoring recommendation (ClueWeb ‘09)- removed documents retrieved in top 10 of context queries

15


ClueWebelliptical-trainer.comsportsequipment.comtotaltrainer.com…

16

Effect of on-/off-task context

How far back in the user’s history we look, including the reference query

Yes, on average

Does on-task context help?1

Off-task context hurts, on average

How much does off-task context hurt?2

on-task context

reference-query only

off-task context

17

Effect of noise (mixed contexts)Firm 1 & 2 and Hard Task Threshold models are robust to noise

Decay model can’t handle even 1 noisy query

Soft task threshold model is easily distracted by noise

Lucchese et al. (WSDM 2011)

How well does current technology deal with mixed contexts? — Pretty well

3

decay

soft-task

reference-query only

Firm1-, Firm2-, and hard-task

18

Variance across task set

Some really high highs

Some really low lows

Several moderate lows

One tiny little gainMRR

Diff

eren

ce

Tasks

Tasks

19

Conclusions

• on-task context can be very helpful– it can also hurt

• off-task context is bad• current state-of-the-art task segmentation

works well

20

Future work

• do these results hold for other query recommendation algorithms?

• are our findings consistent across additional tasks?

• can we predict when context will help?

21

Thanks!

22

23

Example

Context:5. (1.00) satellite internet providers4. (0.44) hughes internet3. (0.02) ocd beckham2. (0.04) reviews xc901. (0.04) buy volvo semi trucks

same-task score

Reference-query only recommendations: 0.08 alabama satellite internet providers 0.25 sattelite internet 1.00 satellite internet providers 0.02 satelite internet for 30.00 0.06 satellite internet providers northern california

Decay recommendations: 0.00 2006 volvo xc90 reviews 0.00 volvo xc90 reviews 1.00 hughes internet.com 1.00 hughes satelite internet 0.08 alabama satellite internet providers

Hard task recommendations: 1.00 hughes internet.com 1.00 hughes satelite internet 0.08 alabama satellite internet providers 0.17 satellite high speed internet 0.25 sattelite internet

MRR

MRR

MRR

Data and evaluationQuery recommendations: 2006 AOL search log

Tasks: 2010—2011 TREC Session Track

used to bulid a Term-Query Graph

On-task contexts- 212 judged sessions- 2+ queries/session- task = session

Off-task contexts- 50 × 212 sessions- all queries but the last are replaced by queries from other tasks- 50 samples per TREC session

Mixed contexts- 10 × 50 × 212 sessions- added up to 10 off-task queries per session- 50 samples per TREC session and noise level

Evaluation- MRR over documents retrieved with the top scoring recommendation (ClueWeb ‘09)- removed documents retrieved in top 10 of context queries

24

Documents

Task-aware query recommendation Henry Feild James Allan Center for Intelligent Information Retrieval University of Massachusetts Amherst July 29, 2013