23
Keyword Search over RDF Graphs Shady Elbassuoni * and Roi Blanco ** * Max-Planck Institute for Informatics ** Yahoo! Research, Barcelona

Keyword Search over RDF Graphs

Embed Size (px)

Citation preview

Page 1: Keyword Search over RDF Graphs

Keyword Search over RDF Graphs

Shady Elbassuoni* and Roi Blanco**

* Max-Planck Institute for Informatics

** Yahoo! Research, Barcelona

Page 2: Keyword Search over RDF Graphs

RDF Datasets

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardInnerspace hasGenre ComedyJoe_Dante directed InnerspaceToy_Story hasWonPrize Academy_AwardRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyTom_Hanks actedIn Toy_StoryDiner hasWonPrize Academy_AwardDiner type Comedy_filmsSteve_Guttenberg actedIn DinerThe_Pink_Panther type Criminal_comedy_filmsThe_Pink_Panther hasWonPrize Academy_AwardPolice_Academy type Comedy_filmsSteve_Guttenberg actedIn Police_AcademyThe_Darwin_Awards type Comedy_films

subject predicate object

Page 3: Keyword Search over RDF Graphs

Searching RDF Data

Structured triple-pattern queries (SPARQL) Example: comedies that have won an

academy award

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

SELECT ?mWHERE {?m hasGenre Comedy . ?m hasWonPrize Academy_Award}

Page 4: Keyword Search over RDF Graphs

Searching RDF Data

Triple-pattern queries are very expressive but are not that useable Most users/ Search APIs prefer keyword queries

Support keyword search over RDF graphs

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 5: Keyword Search over RDF Graphs

Keyword Search over RDF Data

How to process keyword queries? Translate keyword queries into SPARQL Directly process the queries over the RDF graph

What are the results to a keyword query? Resources Triples Tuples of triples (subgraphs)

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 6: Keyword Search over RDF Graphs

Keyword Search over RDF Data

How to process keyword queries? Translate keyword queries into SPARQL Directly process the queries over the RDF graph

What are the results to a keyword query? Resources Triples Tuples of triples (subgraphs)

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 7: Keyword Search over RDF Graphs

How to process keyword queries? Translate keyword queries into SPARQL Directly process the queries over the RDF graph

What are the results to a keyword query? Resources Triples Tuples of triples (subgraphs)

Keyword Search over RDF Data

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 8: Keyword Search over RDF Graphs

How to process keyword queries? Translate keyword queries into SPARQL Directly process the queries over the RDF graph

What are the results to a keyword query? Resources Triples Tuples of triples (subgraphs)

Keyword Search over RDF Data

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 9: Keyword Search over RDF Graphs

Processing Keyword Queries

Construct a document D(t) for each triple t D(t) contains all literals in t and any text

associated with the URIs in t

Example:

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

innerspace USA1987 science fiction comedy film Joe Dante Michael Finnell Dennis Quaid Martin Short Meg Ryan academy award best visual effects …

innerspace USA1987 science fiction comedy film Joe Dante Michael Finnell Dennis Quaid Martin Short Meg Ryan academy award best visual effects …

t: Innerspace hasGenre Comedy

We can now create triple-term indexes

Page 10: Keyword Search over RDF Graphs

Retrieving Query Results For each query keyword, retrieve a list of triples Join the triples from different lists based on their URIs

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

comedy award

Innerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...

Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...

`

T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award

Page 11: Keyword Search over RDF Graphs

Retrieving Query Results Retrieve a list of triples matching a query keyword Join the triples from different lists based on their URIs

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

comedy award

Innerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...

Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...

`

T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardT: Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_Award

Page 12: Keyword Search over RDF Graphs

Retrieving Query Results Retrieve a list of triples matching a query keyword Join the triples from different lists based on their URIs

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

comedy award

Innerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...

Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...

`

T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardT: Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_AwardT: Police_Academy type Comedy_Films . The_Darwin_Awards type Comedy_Films

Result Ranking is crucial!!

Page 13: Keyword Search over RDF Graphs

Language Models for Triples

D(t)

t:Innerspace hasGenre Comedy

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

w P(w|D(t))

innerspace 0.234

1987 0.123

science 0.012

fiction 0.020

comedy 0.111

film 0.179

classic 0.111

meg 0.019

ryan 0.019

oscar 0.148

. . . . . .

Esitmate from

w

P(w)

Page 14: Keyword Search over RDF Graphs

Ranking Model

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

comedy award

T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award

but we treat triples as bags of words!

Page 15: Keyword Search over RDF Graphs

Ranking Model

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

comedy award

T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award

probability of the structure of triple t being relevant to keyword w

Page 16: Keyword Search over RDF Graphs

Estimating Structural Relevance

For each keyword, construct a probability distribution over predicates

Example: award

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

r P(r|w)

hasWonPrize 0.459

wasNominatedFor 0.387

type 0.112

directed 0.020

actedIn 0.021

producedIn 0.025

bornIn 0.008

. . . . . .

P(Innerspace hasWonPrize Academy_Award|award) = P(hasWonPrize|award)

estimated from the whole dataset

Page 17: Keyword Search over RDF Graphs

Example Ranked Query Results

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

comedy award

Bag of Words

Combat_Academy type Comedy_films . The_Darwin_Awards type Comedy_filmsPolice_Academy type Comedy_films . The_Darwin_Awards type Comedy_films Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award

Structure Aware

Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardToy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_AwardShrek hasWonPrize Academy_Award_Best_Animated_Feature . Shrek hasGenre Comedy

Page 18: Keyword Search over RDF Graphs

Experimental Setup

User study over two RDF datasets: movies from IMDB books from LibraryThing

Models compared: Structure Aware Approach Bag of Words Approach Language-model-based Object Retrieval BANKS (keyword search over databases)

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 19: Keyword Search over RDF Graphs

Experimental Setup

30 evaluation queries Gathered relevance assessments for the top-

50 results retrieved by each model

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 20: Keyword Search over RDF Graphs

Experimental Results

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

P-value < 0.05

Page 21: Keyword Search over RDF Graphs

Conclusion

Keyword Search over RDF data is crucial To support keyword search over RDF data

Combine structured triples with text Construct a document for each triple

Retrieve meaningful query results Tuples of joined triples Can be extended to larger subgraphs of the RDF

graph Rank the retrieved results

A language model approach that uses both text and structure

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 22: Keyword Search over RDF Graphs

Ranking Model

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

Page 23: Keyword Search over RDF Graphs

RDF Graphs

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011