19
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases: A Distributional-Compositional Semantics Perspective André Freitas, Sean O’Riain, Edward Curry DEOS 2013, Oxford, UK

Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Embed Size (px)

DESCRIPTION

Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases: A Distributional-Compositional Semantics Perspective

Citation preview

Page 1: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Crossing the Vocabulary Gap for Querying Complex and

Heterogeneous Databases:A Distributional-Compositional Semantics

Perspective

André Freitas, Sean O’Riain, Edward Curry

DEOS 2013, Oxford, UK

Page 2: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Big Data

Big Data: More complete data-based picture of the world.

Page 3: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Growing Schema Size

10s-100s attributes1,000s-1,000,000s attributes

Heterogeneous, complex and large-scale databases.

Very-large and dynamic “schemas”.

Page 4: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Growing Semantic Heterogeneity

Multiple perspectives (conceptualizations) of the reality.

Ambiguity, vagueness, inconsitency.

Page 5: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Problem

Structured queries are still the primary way to query databases.

Page 6: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Structured query

Schema size & heterogeneity

Query construction

time

HighLow

High

Low

10-100s attributes

103-106s attributes

Page 7: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Vocabulary Problem for Databases

Who is the daughter of Bill Clinton married to?

Schema-agnostic queries

Possible representations

Page 8: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Vocabulary Problem for Databases

Who is the daughter of Bill Clinton married to ?

Semantic Gap

Lexical-level

Abstraction-level

Structural-level

Page 9: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Vocabulary Problem for Databases

Who is the daughter of Bill Clinton married to ?

Semantic Gap

Lexical-level

Abstraction-level

Structural-level

Query:

Data

Page 10: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Solution: Schema-agnostic queries

Lexical-level

Abstraction-level

Structural-level

Distributional Semantics

Compositional Semantics

Based on the statistical analysis of large unstructured corpora

Query Processing and Planning

Page 11: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Statistical analysis

Datasets

Page 12: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Statistical analysis

Datasets

Page 13: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Core Elements of the Proposed Approach

Hybrid model database/IR/QA. Ranked query results. Existing IR approaches: traditional Vector Space

Models (VSMs) were not able to: (i) capture the structure of data. (ii) support a precise and comprehensive semantic

matching. A VSM supporting these two requirements was

formulated: Ƭ-Space. Ranking function based on a distributional

semantic relatedness measure.

Page 14: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Does it work?

DBpedia 3.7 + YAGO. 102 natural language queries (QALD 2011).

Entity-Attribute-Value (EAV) Dataset:

45,767 predicates5,556,492 classes

9,434,677 instances

Page 15: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Page 16: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Page 17: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Page 18: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

Selected Publications

André Freitas, Edward Curry, João Gabriel Oliveira, João C. Pereira da Silva, Sean O'Riain, Querying the Semantic Web using Semantic Relatedness: A Vocabulary Independent Approach. Data & Knowledge Engineering (DKE) Journal, 2013. (Article). André Freitas, Fabricio de Faria, Sean O'Riain, Edward Curry, Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach, In Proceedings of the 36th Annual ACM SIGIR Conference, Dublin, Ireland, 2013. (Demonstration Paper in Proceedings).

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends. IEEE Internet Computing, Special Issue on Internet-Scale Data, 2012 (Article).

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012 (Article).

 

Page 19: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Digital Enterprise Research Institute www.deri.ie

http://treo.deri.ie