32
A Language Modeling Approach for Temporal Information Needs Klaus Berberich , Srikanta Bedathur Omar Alonso, Gerhard Weikum 2010, March 29th ECIR 2010 – Milton Keynes

A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach forTemporal Information Needs

Klaus Berberich, Srikanta BedathurOmar Alonso, Gerhard Weikum

2010, March 29th ECIR 2010 – Milton Keynes

Page 2: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Motivation Information needs having a temporal dimension, e.g.:

Queries containing a temporal expression (e.g., in  1998) indicate an underlying temporal information need make up 1.5% of general web queries (Nunes et al. ’08) are more common for specific domains (e.g., News or Sports)

and/or specific user groups (e.g., journalists or historians)

But: Not well-supported by existing retrieval models!

FIFA World Cup tournaments of the 1990’sMovies that won an Academy Award in 2007

Crusades of the 12th century

Page 3: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Outline Motivation

Temporal Expressions

Language Models for Temporal Information Needs

Experimental Evaluation

Conclusion

Page 4: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Temporal Expressions Temporal expressions are frequent across many classes

of documents and can be categorized as: explicit (e.g., March  29th,  2010 or September  1872) implicit (e.g., Christmas  2009 or New  Year’s  Eve  2000) relative (e.g., yesterday, last  month, or in  January)

Off-the-shelf tools available to identify and interpret temporal expressions (e.g., TARSQI and TimexTag)

TimeML mark-up language specification to annotate temporal expressions found in a document

Page 5: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Challenges Existing retrieval models ignore temporal expressions

and their meaning and therefore fail to match, e.g.:

Meaning of a temporal expression is often uncertain, e.g.:

France won the FIFA World Cup in  1998In  1998 Bill Clinton was President of the U.S.Nagano hosted the Winter Olympics in  1998

fifa world cup 1990’s

During   the   90’s the FIFA Wor ld Cup was won by Germany (in   1990), Brazil (in  1994), and France (in  1998)…

Page 6: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Temporal Expression Model We formally model temporal expressions as 4-tuples:

that record the earliest/latest begin/end of time intervals that may refer to

The temporal expression may thus refer to any time interval such that

In  1998, e.g, is represented as

b ≤ e ∧ tbl ≤ b ≤ tbu ∧ tel ≤ e ≤ teu

[b, e]

( 1998/01/01, 1998/12/31, 1998/01/01, 1998/12/31 )

T = ( tbl, tbu, tel, teu )

T

T

Page 7: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Outline Motivation

Temporal Expressions

Language Models for Temporal Information Needs

Experimental Evaluation

Conclusion

Page 8: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Document & Query Model We distinguish between the textual part and temporal

part of a document is a bag of textual terms is a bag of temporal expressions

We distinguish between the textual part and temporal part of a query is a bag of textual terms is a bag of temporal expressions

Two modes of deriving a query from the user’s input inclusive mode: includes all input terms exclusive mode: excludes input terms that are part of a

temporal expression

q text

q time

d text

d time

q text

q text

Page 9: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Language Model Framework Query-likelihood approach assuming independent

generation of textual and temporal query part

estimated using a unigram language model with Jelinek-Mercer smoothing as

P(q | d ) = P(q text | d text )× P(q time | d time )

P(q text | d text )

q∈q text

λ · P(q | d text) + (1 − λ) · P(q | C)

Page 10: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Language Model Framework We assume that query temporal expressions are

generated independently from each other

Two-step generation of query temporal expression (1) Draw a temporal expression at uniform random

(II) Generate from

P(q time | d time ) =�

Q∈q time

P(Q | d time)

Q

T

P(Q | d time) =1

|d time|

T∈d time

P(Q | T)

Q T

Page 11: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Uncertainty-ignorant Approach The temporal expression only generates itself

Ignores temporal expressions’ inherent uncertainty Profits from the extraction of temporal expressions

T

P(Q | T) =

�1 : Q = T0 : otherwise

fifa world cup

1990’sDuring   the  90’s the FIFA

World Cup was won by Germany (in   1990), Brazil (in   1994), and France (in  1998)…

Page 12: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Uncertainty-aware Approach We assume that any time interval that the query

temporal expression may refer to is equally likely

The temporal expression generates any time interval that it may refer to with equal probability

P(Q | T ) =1

|Q|

[qb, qe]∈Q

P( [qb, qe] | T )

Q[qb, qe]

T[qb, qe]

P( [qb, qe] | T ) =

�1/|T | : [qb, qe] ∈ T

0 : otherwise

Page 13: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Uncertainty-aware Approach Intuitively, reflects the probability that the user

issuing the query and the author writing the document had the same time interval in mind

The definition can be simplified as

treating and as sets of time intervals

can be computed efficiently without “materializing” the huge but finite sets of time intervals

P(Q | T)

P(Q | T ) =|T ∩Q|

|T | · |Q|

Q T

P(Q | T)

Page 14: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Uncertainty-aware Approach Considers temporal expressions’ inherent uncertainty Profits from the extraction of temporal expressions

fifa world cup

1990’sDuring   the  90’s the FIFA

World Cup was won by Germany

(in  1990), Brazil (in  1994),

and France (in  1998)…

Page 15: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Outline Motivation

Temporal Expressions

Language Models for Temporal Information Needs

Experimental Evaluation

Conclusion

Page 16: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Datasets & Methods Document collections:

New York Times Annotated Corpus English Wikipedia (as of July ’09)

Temporal expressions annotated using TARSQI 40 queries collected using Amazon Mechanical Turk

Five temporal granularities (Day, Month, Year, Decade, Century) Four topics (Sports, Culture, Technology, World Affairs)

Methods under comparison: LM Unigram LM LMT-IN / LMT-EX Uncertainty-ignorant LM LMTU-IN / LMTU-EX Uncertainty-aware LM

Page 17: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Datasets & Methods Document collections:

New York Times Annotated Corpus English Wikipedia (as of July ’09)

Temporal expressions annotated using TARSQI 40 queries collected using Amazon Mechanical Turk

Five temporal granularities (Day, Month, Year, Decade, Century) Four topics (Sports, Culture, Technology, World Affairs)

Methods under comparison: LM Unigram LM LMT-IN / LMT-EX Uncertainty-ignorant LM LMTU-IN / LMTU-EX Uncertainty-aware LM

Page 18: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Datasets & Methods Document collections:

New York Times Annotated Corpus English Wikipedia (as of July ’09)

Temporal expressions annotated using TARSQI 40 queries collected using Amazon Mechanical Turk

Five temporal granularities (Day, Month, Year, Decade, Century) Four topics (Sports, Culture, Technology, World Affairs)

Methods under comparison: LM Unigram LM LMT-IN / LMT-EX Uncertainty-ignorant LM LMTU-IN / LMTU-EX Uncertainty-aware LM

Page 19: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Datasets & Methods Document collections:

New York Times Annotated Corpus English Wikipedia (as of July ’09)

Temporal expressions annotated using TARSQI 40 queries collected using Amazon Mechanical Turk

Five temporal granularities (Day, Month, Year, Decade, Century) Four topics (Sports, Culture, Technology, World Affairs)

Methods under comparison: LM Unigram LM LMT-IN / LMT-EX Uncertainty-ignorant LM LMTU-IN / LMTU-EX Uncertainty-aware LM

boston red sox [october 27, 2004] kurt cobain [april 5, 1994]pink floyd [march 1973] babe ruth [1921]michael jordan [1990s] mickey mouse [1930s]soccer [21st century] jazz music [21st century]voyager [september 5, 1977] berlin [october 27, 1961]poland [december 1970] wright brothers [1905]sewing machine [1850s] siemens [19th century]…

Page 20: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Datasets & Methods Document collections:

New York Times Annotated Corpus English Wikipedia (as of July ’09)

Temporal expressions annotated using TARSQI 40 queries collected using Amazon Mechanical Turk

Five temporal granularities (Day, Month, Year, Decade, Century) Four topics (Sports, Culture, Technology, World Affairs)

Methods under comparison: LM Unigram LM LMT-IN / LMT-EX Uncertainty-ignorant LM LMTU-IN / LMTU-EX Uncertainty-aware LM

Page 21: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Relevance Assessments & Measures Relevance assessments on pooled query results

collected using Amazon Mechanical Turk binary relevance assessments with “I don’t know” option mandatory justification of relevance assessment three assessors per query-document pair

Fleiss’ Kappa statistic indicates fair level of agreement New York Times Annotated Corpus (0.36) English Wikipedia (0.40)

We measure retrieval effectiveness using Precision @ 10 (P@10) Normalized Discounted Cumulative Gain @ 10 (nDCG@10)

Page 22: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Relevance Assessments & Measures Relevance assessments on pooled query results

collected using Amazon Mechanical Turk binary relevance assessments with “I don’t know” option mandatory justification of relevance assessment three assessors per query-document pair

Fleiss’ Kappa statistic indicates fair level of agreement New York Times Annotated Corpus (0.36) English Wikipedia (0.40)

We measure retrieval effectiveness using Precision @ 10 (P@10) Normalized Discounted Cumulative Gain @ 10 (nDCG@10)

Page 23: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Relevance Assessments & Measures Relevance assessments on pooled query results

collected using Amazon Mechanical Turk binary relevance assessments with “I don’t know” option mandatory justification of relevance assessment three assessors per query-document pair

Fleiss’ Kappa statistic indicates fair level of agreement New York Times Annotated Corpus (0.36) English Wikipedia (0.40)

We measure retrieval effectiveness using Precision @ 10 (P@10) Normalized Discounted Cumulative Gain @ 10 (nDCG@10)

Page 24: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Retrieval Effectiveness Overall

0

0.2

0.4

0.6

P@10 nDCG@10

0.490.51

0.40.390.330.32 0.280.25

0.380.37

New York Times

LM LMT-IN LMT-EX LMTU-IN LMTU-EX

Page 25: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Retrieval Effectiveness Overall

0

0.2

0.4

0.6

P@10 nDCG@10

0.510.56

0.460.48

0.360.36 0.340.33

0.480.51

Wikipedia

LM LMT-IN LMT-EX LMTU-IN LMTU-EX

Page 26: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Retrieval Effectiveness by Topic

0

0.2

0.4

0.6

P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10

LM LMT-IN LMT-EX LMTU-IN LMTU-EX

New York Times

Sports Culture Technology World Affairs

Page 27: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Retrieval Effectiveness by Topic

0

0.2

0.4

0.6

P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10

LM LMT-IN LMT-EX LMTU-IN LMTU-EX

Wikipedia

Sports Culture Technology World Affairs

Page 28: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Retrieval Effectiveness by Granularity

0

0.2

0.4

0.6

0.8

P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10

LM LMT-IN LMT-EX LMTU-IN LMTU-EX

New York Times

Day Month Year Decade Century

Page 29: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Retrieval Effectiveness by Granularity

0

0.2

0.4

0.6

0.8

P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10 P@10 nDCG@10

LM LMT-IN LMT-EX LMTU-IN LMTU-EX

Wikipedia

Day Month Year Decade Century

Page 30: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Outline Motivation

Temporal Expressions

Language Models for Temporal Information Needs

Experimental Evaluation

Conclusion

Page 31: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Conclusion Our approach integrates temporal expressions

seamlessly into a language modeling framework

Experiments show that temporal expressions are helpful to better satisfy temporal information needs if their inherent uncertainty is taken into account

More (details on) experiments and links to download extracted temporal expressions and relevance assessments available in our technical report!

Page 32: A Language Modeling Approach for Temporal Information Needskberberi/data/ecir2010/ecir2010-talk.pdf · A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

A Language Modeling Approach for Temporal Information Needs (Klaus Berberich)

Thanks!Questions?

http://www.mpi-inf.mpg.de/~kberberi/ecir2010/