61
Outline Exploiting Time-based Synonyms in Searching Document Archives Nattiya Kanhabua and Kjetil Nørvåg Database System Group Norwegian University of Science and Technology Trondheim, Norway JCDL’2010, June 21 - 25, Gold Coast, Australia Kanhabua and Nørvåg Exploiting Time-based Synonyms in Search

Exploiting Time-based Synonyms in Searching Document Archives

Embed Size (px)

Citation preview

Outline

Exploiting Time-based Synonyms in SearchingDocument Archives

Nattiya Kanhabua and Kjetil Noslashrvaringg

Database System GroupNorwegian University of Science and Technology

Trondheim Norway

JCDLrsquo2010 June 21 - 25 Gold Coast Australia

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

Outline

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Problem statement

In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives

Searching in such resources is not straightforwardContents in these resources are strongly time-dependent

Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Observation

Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]

Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift

Synonyms are different words with similar meanings

In our context synonyms are terms used as name variants(other names titles or roles) of a named entity

Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

What are time-based synonyms

Time-independent synonyms are invariant to time

Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Application

News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria

Challenge

Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Problem StatementContributions

Contributions

1 Formal modelsWikipedia viewed as a temporal resource

2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms

3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Figure A snapshot of Wikipediaand current revisions at time tk

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized

President_of_the_United_StatesrArr named entity

2) Single-word titles with multiple capitalletters

UNICEF and WHO are namedentities

3) 75 of occurrences in the article textitself are capitalized

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Recognizing named entities

Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz

Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej

Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm

Example

ei President_of_the_United_States

tk 112001

sj ldquoGeorge W Bushrdquo

ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links

Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm

Example[[President_of_the_United_

States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_

States

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Extracting synonyms

OutputEntity-synonym relationships and time periods

Named Entity Synonym Time Period

Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009

Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009

Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009

The time of synonyms are timestamps of Wikipedia articles (8 years) in which they

appear not temporal expression extracted from the contents

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time

18M articles from January 1987 to June 2007 (20 years)

Use the burst detection algorithm [Kleinberg in KDDrsquo2002]

Generate bursty periods of ξij by computing a rate of occurrencefrom document streams

Output bursty intervals and bursty weight ie periods ofoccurrence and intensity

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Entity Recognition and Synonym ExtractionImproving the Accuracy of Time

Improving the accuracy of time using burst detection

OutputResults from burst-detection algorithm

Synonym Entity Burst Weight TimeStart End

President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993

Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Classifying synonyms into two types

DefinitionClass A time-independent

Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)

Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo

Class B time-dependent

Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered

Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-independent synonyms

DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature

TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )

pf (sj ) is the time partition frequency in which sj occurs

tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum

i tf (sj pi )

pf (sj )

micro underlines the importance of a temporal feature and a frequency feature

micro = 05 yields the best performance in the experiments

IntuitionThe model measures popularity of synonyms based on two factors

Robustness to change over time ie the more partitions synonyms occur themore robust to time they are

High usages over time ie a high value of averaged frequencies over time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

Ranking time-dependent synonyms

DefinitionGiven time tk time-dependent synonyms at tk are weighted by

TDP(sj tk ) = tf (sj tk )

tf (sj tk ) is a term frequency of sj at tk

IntuitionOnly term frequencies will be used to measure the importance of synonyms

Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Overview of experiments

Our experimental evaluation is divided into three main parts

1 Extracting and improving the accuracy of time of synonyms

2 Query expansion using time-independent synonyms

3 Query expansion using time-dependent synonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonymsData collection

The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)

New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007

ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725

Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1

Measurement Accuracy

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

Data collection

TREC Robust Track (2004)

250 topics (topics 301-450 and topics 601-700)

Tools

Terrier ndash an open source search engine developed by University of Glasgow

BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting

Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight

qexp = qorg s1andw1 s2

andw2 skandwk

Measurement Mean Average Precision (MAP) R-precision and Recall

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonymsData collection

NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications

Select 20 strongly time-dependent queries

Measurement Precision at 10 20 and 30 retrieved documents

Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period

American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Extracting and improving time of synonyms

Statistics and accuracy of entity-synonym relationships extracted from Wikipedia

NER Method NE NE-Syn Avg Syn Accuracyper NE ()

BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73

BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2

BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo

Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Robust2004 query statistics

Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)

2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)

k = 2 if k gt 2 bring noise to the NERQ process

Number of queries using two different NERType MW-NERQ MRW-NERQ

Named entity 42 149Not named entity 208 101Total 250 250

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-independent synonyms

MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ

MAP R-precision Recall MAP R-precision Recall

PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504

PM Probabilistic Model without query expansion

PRF Pseudo Relevance Feedback using Rocchio algorithm

SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback

SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback

Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

Query expansion using time-dependent synonyms

P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800

TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Experiment SettingExperimental Results

QUEST Query Expansion using Synonyms over Time

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Outline1 Introduction

Problem StatementContributions

2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time

3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms

4 EvaluationExperiment SettingExperimental Results

5 ConclusionsConclusions and Future Work

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

Conclusions and future work

Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work

Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work

IntroductionSynonym Detection

Query ExpansionEvaluation

Conclusions

Conclusions and Future Work

QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest

Thank you

Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search

  • Outline
  • Main Talk
    • Introduction
      • Problem Statement
      • Contributions
        • Synonym Detection
          • Entity Recognition and Synonym Extraction
          • Improving the Accuracy of Time
            • Query Expansion
              • Time-based Synonyms
              • Ranking Time-independent Synonyms
              • Ranking Time-dependent Synonyms
                • Evaluation
                  • Experiment Setting
                  • Experimental Results
                    • Conclusions
                      • Conclusions and Future Work