34
HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1

HyKSS: Hybrid Keyword and Semantic Search

  • Upload
    raheem

  • View
    38

  • Download
    1

Embed Size (px)

DESCRIPTION

HyKSS: Hybrid Keyword and Semantic Search. Andrew Zitzelberger. 1. Keyword Search. 2. Form Based Search. 3. What about?. over 8,000 meters in elevation. less than 100K miles. faster than 100 mph. 4. 5. HyKSS. Hy brid K eyword and S emantic S earch - PowerPoint PPT Presentation

Citation preview

Page 1: HyKSS: Hybrid Keyword and Semantic Search

HyKSS: Hybrid Keyword and Semantic Search

Andrew Zitzelberger

1

Page 2: HyKSS: Hybrid Keyword and Semantic Search

Keyword Search

2

Page 3: HyKSS: Hybrid Keyword and Semantic Search

Form Based Search

3

Page 4: HyKSS: Hybrid Keyword and Semantic Search

4

over 8,000 meters in elevation less than 100K miles faster than 100 mph

What about?

Page 5: HyKSS: Hybrid Keyword and Semantic Search

5

Page 6: HyKSS: Hybrid Keyword and Semantic Search

HyKSS

• Hybrid Keyword and Semantic Search• Semantics – extracted annotations–Multiple ontologies

• Keywords – text

6

Page 7: HyKSS: Hybrid Keyword and Semantic Search

Thesis Statement

• HyKSS (hybrid search)– Outperforms keyword and semantic search– Dynamic query weighting outperforms various

other hybrid search approaches– Allows queries over multiple ontologies– Allows pay-as-you-go improvement

7

Page 8: HyKSS: Hybrid Keyword and Semantic Search

Extraction Ontologies

8

Page 9: HyKSS: Hybrid Keyword and Semantic Search

Data Frames

9

Page 10: HyKSS: Hybrid Keyword and Semantic Search

Indexing Architecture

10

Keyword Indexer Semantic Indexer

Keyword Index Semantic Index

Document Collection

Page 11: HyKSS: Hybrid Keyword and Semantic Search

Indexing Architecture Implementation

1111

Keyword Indexer

Semantic Indexer

Keyword Index

Semantic Index

Document Collection

OntoES

OntologyLibrary

Sesame

Lucene

Page 12: HyKSS: Hybrid Keyword and Semantic Search

Query Processing

12

Free Form Query

Execute Query

Post-Process Query

Combine Results

Pre-Process Query

Execute Query

Post-Process Query

Pre-Process Query

Keyword Processing Semantic Processing

Page 13: HyKSS: Hybrid Keyword and Semantic Search

Keyword Query Pre-Processing

13

• Remove Lucene special characters (except quotes)• Remove (inequality) comparison constraints• Remove non-phrase stopwords

hondas in "excellent condition" in orem for under 12 grand

hondas “excellent condition” orem

Page 14: HyKSS: Hybrid Keyword and Semantic Search

Keyword Query Execution and Post-Processing

• Executed by Lucene• Empty Post-Processing step

14

Page 15: HyKSS: Hybrid Keyword and Semantic Search

Semantic Query Pre-ProcessingIndividual Ontology Scoring

hondas in "excellent condition" in orem for under 12 grand

15

Page 16: HyKSS: Hybrid Keyword and Semantic Search

Semantic Query Pre-ProcessingOntology Set Creation

• For each ontology sorted by score:– For each remaining ontology:• Add point for each new or subsuming match• If added points > 0 add ontology

• Completely subsumed ontologies are removed during query generation

16

Page 17: HyKSS: Hybrid Keyword and Semantic Search

Semantic Query Pre-ProcessingOntology Set Creation

17

Price < 12000

LocationVehicle

ContractualServices Location

Vehicle

ContractualServices

Vehicle_Score + 1

US_City=“orem”

Price < 12000

Price < 12000

ContractualServices_Score + 1 Vehicle_Score

US_City=“orem”

Page 18: HyKSS: Hybrid Keyword and Semantic Search

Semantic Query Pre-ProcessingStructured Query Generation

• Open world assumption• SPARQL query

18

Page 19: HyKSS: Hybrid Keyword and Semantic Search

Semantic Query Execution and Post-Processing

• Sesame query execution• Semantic ranking:– 1 point for each requested projection satisfied– Normalized by # of projections requested

hondas in "excellent condition" in orem for under 12 grand– Projections on Make, Price and US_City

19

Page 20: HyKSS: Hybrid Keyword and Semantic Search

Hybrid Query Processing

• Linear interpolation:– (kw_weight * kw_score) + (sm_weight * sm_score)

• Dynamic solution:– # keywords remaining (#kw)– concept match score (cms)

= ½ * (selections + projections)– kw_weight = #kw/(#kw + cms)– sm_weight = cms/(#kw + cms)

20

Page 21: HyKSS: Hybrid Keyword and Semantic Search

Basic Search

21

Page 22: HyKSS: Hybrid Keyword and Semantic Search

Results Display

22

Page 23: HyKSS: Hybrid Keyword and Semantic Search

23

Form Based Search

Page 24: HyKSS: Hybrid Keyword and Semantic Search

Results Display

Page 25: HyKSS: Hybrid Keyword and Semantic Search

Experimental Setup – Ontology Libraries

• 5 Ontology Levels– Number– Generic Units– Vehicle Units– Vehicle– Vehicle+

25

Page 26: HyKSS: Hybrid Keyword and Semantic Search

Experimental Setup – Query Sets

• 113 syntactically unique queries from database students

• 60 syntactically unique queries from linguistic students

26

Page 27: HyKSS: Hybrid Keyword and Semantic Search

Experimental Setup – Document Collection

• 250 vehicle advertisements (Craigslist)– 100 training, 50 validation, 100 test

• 318 mountain pages (Wikipedia)• 66 roller coaster (Wikipedia)• 88 video game advertisements (Craigslist)

27

Page 28: HyKSS: Hybrid Keyword and Semantic Search

Experiments

1) Training queries over test vehicle documents2) Test queries over test vehicle documents3) Training queries over test vehicle documents +

additional noise4) Test queries over test vehicle documents + additional

noise5) 5 queries over noisy data (Generic Units only)

28

Page 29: HyKSS: Hybrid Keyword and Semantic Search

Experiments - Metric

• Mean Average Precision

29

Page 30: HyKSS: Hybrid Keyword and Semantic Search

Experimental Results

30

Page 31: HyKSS: Hybrid Keyword and Semantic Search

Experimental Results

31

Page 32: HyKSS: Hybrid Keyword and Semantic Search

Experimental Results

32

Page 33: HyKSS: Hybrid Keyword and Semantic Search

Conclusions

• Hybrid search outperforms keyword and semantic search

• HyKSS’s dynamic query weighting approach outperforms various other weighting techniques

• Using multiple does not outperform selecting and using a single ontology

33

Page 34: HyKSS: Hybrid Keyword and Semantic Search

External Image Citations• Slide 2 Google search screenshot: http://www.google.com (07/30/11)• Slide 3 partial car search form screenshots: http://autotrader.com/fyc (07/30/11)• Slide 4 mountain image: http://en.wikipedia.org/wiki/Lhotse (04/26/11)• Slide 4 car image: http://en.wikipedia.org/wiki/Honda (04/26/11)• Slide 4 roller coaster image: http://en.wikipedia.org/wiki/Kingda_Ka (04/26/11)• Slide 4 Wikipedia logo: http://en.wikipedia.org/wiki/Main_Page (04/26/11)• Slide 4 craigslist logo: http://provo.craigslist.org/ (04/26/11)

34