Intent Subtopic Mining for Web Search Diversification

Intent Subtopic Mining for Web

Search Diversification

Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping MaState Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer

Science and Technology, Tsinghua University, Beijing 100084, Chinaaymeric.damien@gmail.com, {z-m, yiqunliu, msp}@tsinghua.edu.cn

CONTENT

1. Introduction

2. Subtopic Miningi. External resources based subtopic mining

ii. Top results based subtopic mining

3. Fusion & Optimization

4. Conclusion

INTRODUCTION

Intent Subtopic Mining

•Extraction of topics related to a larger ambiguous or broad topic

“Star Wars” => “Star Wars Movies” => “Star Wars Episode 1” …“Star Wars Books” => “The Last Commando” …“Star Wars Video Games” => …“Star Wars Goodies” => …

SUBTOPIC MINING

External Resources

Based Subtopic Mining

SUBTOPIC MINING

ResourcesExternal Resources Based Subtopic Mining

Query Suggestion

•From Google, Bing and Yahoo

Query Completion

•From Google, Bing and Yahoo

Google Insights

•Top Searches

Google Keyword Tools

•Related Keywords

Wikipedia• Disambiguation Feature • Sub-Categories

Filtering, Clustering and

RankingExternal Resources Based Subtopic Mining

Filtering

•Keyword Large Inclusion FilteringoFilter all candidate subtopics that do not contain, in any order, the

original query words without the stop words

Snippet Based Clustering

•Use of top results page snippets to compare the similarity of two candidate intent subtopics

•Jaccard Similarity:

Snippet Based Clustering

•Bottom-up hierarchical clustering algorithm with extended Jaccard similarity coefficient

1. Select k (define experimentally)

2. Create for every subtopic candidate a cluster

3. For each cluster

1. For each remaining cluster

1. If Ext. Jacc. similarity of the two clusters > k Then combine

clusters

4. Repeat 3 while the similarity between two clusters is above k.

Ranking

•Ranking based on intent subtopics popularity (amount of search per month)

•Scores source weightoJaccard Similarity between the subtopic and the original query: 5%oNormalized Google Insights score: 15%oNormalized Google Keywords Generator score: 75%oBelongs to the query suggestion/completion: 5%

•Scores normalization•Every subtopic candidate score is normalized in a percentage of the

same resource’s top subtopic candidate score

Evaluation and Results

External Resources Based Subtopic Mining

Evaluation

•Experimentation SetupoBased on a 50 query set, used for TREC Web Track 2012oAnnotation of resultsoCompute D#-nDCG score

•RunsoBaseline: Query Suggestion + Query CompletionoRun 1: Baseline + WikipediaoRun 2: Baseline + Google InsightsoRun 3: Baseline + Google Keywords GeneratoroRun 4: Baseline + Google Keywords Generator + Google Insights +

Wikipedia

Results

D#-nDCG% inc /

baselineI-rec

% inc / baseline

D-nDCG% inc /

baseline

Baseline 0.23 - 0.2398 - 0.2203 -

E.R. Mining Run 1 0.2627 14.2% 0.2735 14.1% 0.2519 14.3%

E.R. Mining Run 2 0.3294 43.2% 0.3116 29.9% 0.3472 37.6%

E.R. Mining Run 3 0.367 59.6% 0.3811 58.9% 0.3529 60.2%

E.R. Mining Run 4 0.3707 61.2% 0.3908 63.0% 0.3506 59.1%

Wikipedia

Google InsightsGoogle

KeywordsInsights+Keywords+Wilkpedia

Intent Subtopic Mining for Web Search Diversification

Documents

Subtopic HeadingSource # Note – Summary, Paraphrase, or Quote Page #

Outlining. Thesis Statement I.First major topic A. first subtopic 1. first subdivision 2. second subdivision B. second subtopic C. third subtopic II.Second

Malawi -- Letter of Intent, Memorandum of Economic … · classification and presentation of the budget, ... anti-money-laundering and combating of financing of terrorism. ... diversification

GO Virginia Region 3 Growth & Diversification Plan · businesses, and citizens committed to crafting a positive regional brand. The intent of the Region 3 Growth and Diversification

recommender systems Diversity in - Tampereen yliopistokostas.stefanidis/dbir16/... · Jacek Wasilewski and Neil Hurley. 2016. Intent-Aware Diversification Using a Constrained PLSA

SumCR: A new subtopic-based extractive approach for text ...eeeweba.ntu.edu.sg/ELHCHEN/ChenLH-ResearchPaper/SumCR-A New Subtopic... · 18.07.2011 · SumCR: A new subtopic-based extractive

Approved Experiential Essay Topics Social · PDF file3 UD 3 UD 3 UD 3 UD 3 UD 3 UD 3 UD 3 UD 3 UD Credit Award 3 LD Subtopic 1: Subtopic 2: Subtopic 3: Subtopic 4: Subtopic 5: Subtopic

Introduction- can use any one of the following: file · Web view5. SHARE. Essay about me. Outline: Theme. 1.Topic. A.subtopic. B.subtopic. C subtopic. Topic. subtopic. subtopic. subtopic

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification

Slide subtopic 4

Slide subtopic 5

WEEK 2 & 3 ; SUBTOPIC 1&2.ppt

Subtopic annotation and automatic segmentation for news ...mtaboada/docs/publications/Cardoso_Pardo... · Subtopic annotation and automatic segmentation for ... (NLP), such as automatic

Diversification Optimization Diversification Visualization Diversification Measurement Diversification Search Diversification Analysis & Prediction James

Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization

2ème leçon : Diversification génétique, diversification

Educational Materials for Abraham Lincoln Subtopic

WEEK 5 subtopic 4.ppt

SUBTOPIC SPECIAL TOOLS SERVICES - arl.org · RESEARCH LIBRARY VIRTUAL RESOURCES & INSTRUCTIONAL INITIATIVES SUBTOPIC: SPECIAL TOOLS & SERVICES The Virtual Resources and Instructional

10th Grade English ESLR · 2019. 12. 7. · I. Introduction II. Subtopic 1 III. Subtopic 2 IV. Subtopic 3** (Add additional subtopics and Roman numerals as needed.) V. Conclusions