18
Concept-Based Information Retrieval using Explicit Semantic Analysis OFER EGOZI, SHAUL MARKOVITCH, and EVGENIY GABRILOVICH Technion-Israel Institute of Technology

Concept based information retrieval using explicit

Embed Size (px)

Citation preview

Page 1: Concept based information retrieval using explicit

Concept-Based Information Retrieval using Explicit

Semantic Analysis

OFER EGOZI, SHAUL MARKOVITCH, and EVGENIY GABRILOVICHTechnion-Israel Institute of Technology

Page 2: Concept based information retrieval using explicit

Content

• Information Retrieval • Keyword- retrieval

Bag- Of-Word (BOW)• Irrelevant Data • Concept Based Retrieval• Explicit Semantic Analysis• Morag System• Conclusion

Page 3: Concept based information retrieval using explicit

Information Retrieval Systems

QueryIR

Recall

Precision

Query

Page 4: Concept based information retrieval using explicit

Keyword-Based Retrieval

QueryIR

Bag Of Words (BOW)

Page 5: Concept based information retrieval using explicit

Irrelevant Data ??

• Vocabulary Problems - Synonymy

- World Knowledge

Page 6: Concept based information retrieval using explicit

Concept Based IR

• Transform to a domain of concepts (not to domain of words)

• Less dependent on specific terms

Page 7: Concept based information retrieval using explicit

Explicit Semantic Analysis

Page 8: Concept based information retrieval using explicit

Wikipedia Based ESA

Page 9: Concept based information retrieval using explicit

ESA Based Data Retrieval - Example

salvaging shipwreck treasure

“ANCIENT ARTIFACTS FOUND. Divers have recovered artifacts lying underwater for more than 2,000 years in the wreck of a Roman ship that sank in the Gulf of Baratti, 12 miles off the island of Elba, newspapers reported Saturday."

•SHIPWRECK•TREASURE•MARITIME ARCHAEOLOGY•MARINE SALVAGE•HISTORY OF THE BRITISH VIRGIN ISLANDS•WRECKING (SHIPWRECK)•KEY WEST, FLORIDA•FLOTSAM AND JETSAM•WRECK DIVING•SPANISH TREASURE FLEET•SCUBA DIVING•WRECK DIVING•RMS TITANIC•USS HOEL (DD-533)•SHIPWRECK•UNDERWATER ARCHAEOLOGY•USS MAINE (ACR-1)•MARITIME ARCHAEOLOGY•TOMB RAIDER II•USS MEADE (DD-602)

Page 10: Concept based information retrieval using explicit

Irrelevant Docs

• ESTONIA AT THE 2000 SUMMER OLYMPICS• ESTONIA AT THE 2004 SUMMER OLYMPICS• 2006 COMMONWEALTH GAMES• ESTONIA AT THE 2006 WINTER OLYMPICS• 1992 SUMMER OLYMPICS• ATHLETICS AT THE 2004 SUMMER OLYMPICS• 2000 SUMMER OLYMPICS• 2006 WINTER OLYMPICS• CROSS-COUNTRY SKIING 2006 WINTER OLYMPICS• NEW ZEALAND AT THE 2006 WINTER OLYMPICS

“Olympic News In Brief: Cycling win for Estonia. Erika Salumae won Estonia's first Olympic gold when retaining the women's cycling individual sprint title she won four years ago in Seoul as a Soviet athlete. "

Estonia Economy

• ESTONIA• ECONOMY OF ESTONIA• ESTONIA AT THE 2000 SUMMER OLYMPICS• ESTONIA AT THE 2004 SUMMER OLYMPICS• ESTONIA NATIONAL FOOTBALL TEAM• ESTONIA AT THE 2006 WINTER OLYMPICS• BALTIC SEA ??• EUROZONE• TIIT VÄHI• MILITARY OF ESTONIA??

Page 11: Concept based information retrieval using explicit

Selecting Query Features

• Selection could remove noisy ESA concepts

• However, IR task provides no training data…

Utility function U(+|-) requires target measure

>> training set

f=ESA(q) Filter

U

f’

Focus on query concepts - Query is short and noisy, while

FS at indexing lacks context

Page 12: Concept based information retrieval using explicit

Pseudo Relevant Feedback

Page 13: Concept based information retrieval using explicit

ESA Feature Selection Methods

• IG- calculate each feature’s Information Gain in separating positive and negative examples, take best performing features

• IIG- add concepts in the positive examples to candidate features, and re-weight all features based on their weights in examples

• RV- find subset of features that best separates positive and negative examples, employing heuristic search

Page 14: Concept based information retrieval using explicit

MORAG System

Page 15: Concept based information retrieval using explicit

MORAG Evaluation

Page 16: Concept based information retrieval using explicit

Conclusion

• MORAG: a new methodology for concept-based information retrieval

• Documents and query are enhanced by Wikipedia concepts

• Informative features are selected using pseudo-relevance feedback

• The generated features improve the performance of BOW-based systems

Page 17: Concept based information retrieval using explicit

Thank You

Page 18: Concept based information retrieval using explicit

Q & A