24
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos 1, 2, 4 Alípio Jorge 3, 4 Gaël Dias 2 2 Centre of Human Language Tecnnology and Bioinformatics, University of Beira Interior, Covilhã, Portugal QRU 2011 – 2nd International Query Representation and Understanding Workshop in association with SIGIR 2011, Beijing - China, July 28, 2011 1 Tomar Polytechnic Institute, Tomar, Portugal 3 Faculty of Sciences, University of Oporto, OPorto, Portugal 4 LIAAD-INESC Porto L.A , OPorto, Portugal w w . i p t . p t ] [ w w w . l i a a d . u p . p t ] h u l t i g . d i . u b i .

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

  • View
    237

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos 1, 2, 4 Alípio Jorge 3, 4 Gaël Dias 2

2 Centre of Human Language Tecnnology and Bioinformatics, University of Beira Interior, Covilhã, Portugal

QRU 2011 – 2nd International Query Representation and Understanding Workshop in association with SIGIR 2011, Beijing - China, July 28, 2011

1Tomar Polytechnic Institute, Tomar, Portugal

3 Faculty of Sciences, University of Oporto, OPorto, Portugal

4 LIAAD-INESC Porto L.A , OPorto, Portugal

[ w w w . i p t . p t ] [ w w w . l i a a d . u p . p t ] h u l t i g . d i . u b i . p t ]

Page 2: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Query: Lady Gaga

Official web site

INTRODUCTION

Web Snippets

Web Query Logs

Conclusions

Query: Lady Gaga. Official Website

MOTIVATIONS

Difficulties

Objectives

Different Approaches in the Extraction of T-I

This is a particular hard task that can become

even more difficult if the user is not clear in his

purpose.

2 - 21

Page 3: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Query: Lady Gaga

Informative texts:

Rihanna passes Gaga as Facebook's most popular lady…

Rumor texts:

Lady Gaga, queen of extravagant fashion, is planning to intern for ... the milliner confirmed the rumors that the 'Born This Way' singer and he were ...

INTRODUCTION

Web Snippets

Web Query Logs

Conclusions

Query: Lady Gaga. Informative and Rumor texts

MOTIVATIONS

Difficulties

Objectives

Different Approaches in the Extraction of T-I

2 - 21

Page 4: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Query: Lady Gaga

Biography:

Discography Release

INTRODUCTION

Web Snippets

Web Query Logs

Conclusions

Query: Lady Gaga. Biography and Discography

MOTIVATIONS

Difficulties

Objectives

Different Approaches in the Extraction of T-I

2 - 21

Page 5: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Query: Lady Gaga

Tour Dates:

INTRODUCTION

Web Snippets

Web Query Logs

Conclusions

Query: Lady Gaga. Tour Dates

Understanding the temporal nature of a query, namely of implicit ones, is one of the most interesting challenges (Berberich et al (2010)) in (T-IR) that would enable to apply specific strategies to improve web search results retrieval.

MOTIVATIONS

Difficulties

Objectives

Different Approaches in the Extraction of T-I

2 - 21

Page 6: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

However, this may prove to be a particularly difficult task and a hard challenge:

1. Different semantic concepts can be related to a query:

2. Difficult to define the boundaries between what is temporal and what is not and so is the definition of temporal ambiguity;

3. Even if temporal intents can be inferred by human annotators, the question is how to transpose this to an automatic process.

INTRODUCTION

Web Snippets

Web Query Logs

Conclusions

Motivations

DIFFICULTIES

Deal with Implicit Temporal Queries is Difficult

Objectives

Different Approaches in the Extraction of T-I

3 - 21

Page 7: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

In our work we aim to understand whether temporal information can be used to automatically disambiguate query terms, namely implicit temporal queries.

INTRODUCTION

Web Snippets

Web Query Logs

Conclusions

Understand the Temporal Nature of Implicit Temporal Queries

Motivations

Difficulties

OBJECTIVES

Different Approaches in the Extraction of T-I

4 - 21

Page 8: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Usually the extraction of temporal information is based on a metadata-based approach upon time-tagged controlled collections such as news articles, using the timestamp of the document.

Jun 16, 2009 – The city of São Paulo shall have to make use of the Credicard Hall as the venue for the 2011 Miss Universe. Today was also announced that Miss Morumbi show is going to be on July 27, 2009.From Miss Universe.Com

This information can be particularly useful to date relative temporal expressions found in a document (e.g., today) with a concrete date (e.g., document creation time):

However, it can be a tricky process if used to date implicit temporal queries as the time of the document can differ significantly from the actual content of the

document;

Metadata-Based Approach

INTRODUCTION

Web Snippets

Web Query Logs

Conclusions

Motivations

Difficutlies

Objectives

DIFFERENT APPROACHES IN THE EXTRACTION OF T-I

5 - 21

Page 9: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

One possible solution is to seek for related temporal references over complementary web resources:

Query-Log Resources, based on similar year-qualified queries

Simply requires the set of web search results.

Imply that some versions of the query have already been issued.

Content Approach. Query-Logs. Query-Dependency

Content-Related Resources, based on a web content approach

INTRODUCTION

Web Snippets

Web Query Logs

Conclusions

Motivations

Difficutlies

Objectives

DIFFERENT APPROACHES IN THE EXTRACTION OF T-I

6 - 21

Page 10: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Content-Related Resources

Query-Log Resources

Conclusions

Introduction

Web Snippets

Web Query Logs

Conclusions

7 - 21

Page 11: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

One of the most interesting approaches to date implicit temporal queries is to rely on the exploration of temporal evidence within web pages:

Introduction

WEB SNIPPETS

Web Query Logs

Conclusions

Temporal Evidence within Web Pages

Difficulties

Temporal Value

TEMPORAL INFORMATION

Temporal Classification

8 - 21

Page 12: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

The use of web documents to date queries not entailing any temporal information can be however a tricky process.

The main problem is related to the difficulties underlying the association of the year date found in the document and the query:

Introduction

WEB SNIPPETS

Web Query Logs

ConclusionsDIFFICULTIES

Temporal Value

Temporal Information

Temporal Classification

Correlation between the Dates and Query Concepts

9 - 21

Page 13: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

450

Oil Spill;

BP Oil Spill;Waka Waka;

In this work we aim to determine the temporal value of web snippets:

TSnippets =# Snippets Retrieved with Dates

# Snippets RetrievedTSnippets(.)

TTitle(.)

TUrl(.)

Introduction

WEB SNIPPETS

Web Query Logs

Conclusions

Measures

Difficulties

TEMPORAL VALUE

Temporal Information

Temporal Classification

10 - 21

Page 14: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Conceptual Classification Number Queries

Ambiguous 220

Clear 176

Temporal Classification Number Queries %

ATemporal 132 75%

Temporal 44 25%

Broad 54

If (TA(q) < 10%) then

Query is ATemporal

ElseQuery is Temporal

Each query was classified on the basis of a temporal ambiguity value:

Introduction

WEB SNIPPETS

Web Query Logs

ConclusionsDifficulties

Temporal Value

Temporal Information

TEMPORAL CLASSIFICATION

Temporal Ambiguity Value

11 - 21

Page 15: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

In order to evaluate our simple classification model, we conducted a user study;

Human annotators were asked to consider each of the 176 queries, to look at web search results and to classify them as ATemporal or Temporal;

Introduction

WEB SNIPPETS

Web Query Logs

Conclusions

Evaluation

Difficulties

Temporal Value

Temporal Information

TEMPORAL CLASSIFICATION

Overall, results pointed at 35% of implicit temporal queries from human annotators, while only 25% were given by our methodology;

12 - 21

Page 16: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Another approach to date implicit temporal queries is to use web query logs based on similar year-qualified queries:

Introduction

Web Snippets Conclusions

WEB QUERY LOGS

Bp oil spill

Bp oil spill live feed

Bp oil spill 2010

Bp oil spill map

Bp oil spill claims

Completion Search-Engine Features

Difficulties

Temporal Value

TEMPORAL INFORMATION

13 - 21

Page 17: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Extremely hard to access outside the big industrial labs;

Queries that have never been typed, thus not existing in the web search log e.g. Blaise Pascal 1623 (his year birth date)

Highly dependent on the user own intents:

Not adapted to concept disambiguation;

Query: EuroEuro 2008;

Euro 2012;

Introduction

Web Snippets Conclusions

WEB QUERY LOGS

Web Query Logs Drawbacks

DIFFICULTIES

Temporal Value

Temporal Information

14 - 21

Page 18: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Explicit temporal queries only represent 1.21% of the overall set [5];

Introduction

Web Snippets

WEB QUERY LOGS

Conclusions

Temporal Information

Furthermore, we must also take into account that the simple fact that a query is year-qualified does not necessarily mean that it has a temporal intent;

Similarly to TTitle(.), TSnippets(.) and TUrl(.)

TLogYahoo(.)

TLogGoogle(.)

Difficulties

TEMPORAL VALUE

Measures

TLogGoogle =#Suggested Queries Retrieved with Dates

# Suggested Queries Retrieved

15 - 21

Page 19: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Pearson correlation coefficient between each of the dimensions:

TSnippets(.)

TTitle(.)

TUrl(.)

Results show that:

TLogGoogle(.)

TLogYahoo(.)

TLogGoogle TTitle TSnippet TUrl

TLogYahoo 0.63 0.61 0.52 0.48

TLogGoogle 0.69 0.63 0.44

This means that as dates appear in the titles and snippets, they also tend to appear, albeit in a more reduced form, in the auto-complete query suggestion of Google.

Introduction

Web Snippets

WEB QUERY LOGS

Conclusions

Results

Difficulties

TEMPORAL VALUE

Temporal Information

16 - 21

Page 20: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

An additional analysis led us to conclude that the temporal information is more frequent in web snippets than in any of the query logs of Google and Yahoo!;

Overall, while most of the queries have a TSnippet(.) value around 20%, TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.

Introduction

Web Snippets

WEB QUERY LOGS

Conclusions

Results

Difficulties

TEMPORAL VALUE

Temporal Information

17 - 21

Page 21: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Finally, we studied how strongly a given query is associated to a set of different dates, both in web snippets and in web query logs.

For this, we have built a confidence interval for the difference of means, for paired samples, between the number of times that the dates appear in the web snippets and in web query logs:

TLogGoogle(.)

TLogYahoo(.) [5.10; 6.38]

[5.12; 6.43]

Results show that the number of different dates that appear in web snippets is significantly higher than in either one of the two web query logs.

Introduction

Web Snippets

WEB QUERY LOGS

Conclusions

Results

Difficulties

TEMPORAL VALUE

Temporal Information

18 - 21

Page 22: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

In this paper, we showed that web snippets are a very rich source of temporal information, especially years. Dates often appear correlated in snippets and titles.

Results show that future dates are very common in web snippets, but seldom used in Queries;

Dates mostly appear together with the categories of automotive, sports, politics, both in web snippets and web query logs;

Some of the items have even more than one date;

Introduction

Web Snippets

Web Query Logs

CONCLUSIONS

Contrary to web snippets, web query logs have a very small temporal value (at about 1.2%), which is statistically smaller when compared to the former;

Temporal Value of Web Snippets and Web Query Logs

19 - 21

Page 23: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Our experiments, also showed that web snippets can be used for query understanding;

So, the use of complementary information, such as the number of instances or the number of different dates, should be considered in future approaches;

Introduction

Web Snippets

Web Query Logs

CONCLUSIONS

We introduced a simple model for the temporal classification of queries based on the temporal value of web snippets that showed that 25% of the queries have a temporal nature. These values contrast with the 35% resulted from our user study;

Query Understanding based on Web Snippets

20 - 21

Page 24: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos, Alípio Jorge, Gaël Dias Using Web Snippets and Query-Logs

[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]

Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries

Ricardo Campos, Alípio Jorge, Gaël Dias

Thanks for your attention!

Both experimental datasets are available for download at www.ccc.ipt.pt/~ricardo/software

VipAccess is online at http://hultig.di.ubi.pt/vipaccess

Web Snippets

Web Query Logs

Conclusions

Introduction

HULTIG is online at http://hultig.di.ubi.pt

LIAAD is online at http://liaad.up.pt

Polytechnic Institute of Tomar is online at http://www.ipt.pt

Gaël Dias is online at http://www.di.ubi.pt/~ddg

Alípio Jorge is online at http://liaad.up.pt/~amjorge

21 - 21