Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

Preview:

DESCRIPTION

Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases: A Distributional-Compositional Semantics Perspective

Citation preview

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Crossing the Vocabulary Gap for Querying Complex and

Heterogeneous Databases:A Distributional-Compositional Semantics

Perspective

André Freitas, Sean O’Riain, Edward Curry

DEOS 2013, Oxford, UK

Digital Enterprise Research Institute www.deri.ie

Big Data

Big Data: More complete data-based picture of the world.

Digital Enterprise Research Institute www.deri.ie

Growing Schema Size

10s-100s attributes1,000s-1,000,000s attributes

Heterogeneous, complex and large-scale databases.

Very-large and dynamic “schemas”.

Digital Enterprise Research Institute www.deri.ie

Growing Semantic Heterogeneity

Multiple perspectives (conceptualizations) of the reality.

Ambiguity, vagueness, inconsitency.

Digital Enterprise Research Institute www.deri.ie

Problem

Structured queries are still the primary way to query databases.

Digital Enterprise Research Institute www.deri.ie

Structured query

Schema size & heterogeneity

Query construction

time

HighLow

High

Low

10-100s attributes

103-106s attributes

Digital Enterprise Research Institute www.deri.ie

Vocabulary Problem for Databases

Who is the daughter of Bill Clinton married to?

Schema-agnostic queries

Possible representations

Digital Enterprise Research Institute www.deri.ie

Vocabulary Problem for Databases

Who is the daughter of Bill Clinton married to ?

Semantic Gap

Lexical-level

Abstraction-level

Structural-level

Digital Enterprise Research Institute www.deri.ie

Vocabulary Problem for Databases

Who is the daughter of Bill Clinton married to ?

Semantic Gap

Lexical-level

Abstraction-level

Structural-level

Query:

Data

Digital Enterprise Research Institute www.deri.ie

Solution: Schema-agnostic queries

Lexical-level

Abstraction-level

Structural-level

Distributional Semantics

Compositional Semantics

Based on the statistical analysis of large unstructured corpora

Query Processing and Planning

Digital Enterprise Research Institute www.deri.ie

Statistical analysis

Datasets

Digital Enterprise Research Institute www.deri.ie

Statistical analysis

Datasets

Digital Enterprise Research Institute www.deri.ie

Core Elements of the Proposed Approach

Hybrid model database/IR/QA. Ranked query results. Existing IR approaches: traditional Vector Space

Models (VSMs) were not able to: (i) capture the structure of data. (ii) support a precise and comprehensive semantic

matching. A VSM supporting these two requirements was

formulated: Ƭ-Space. Ranking function based on a distributional

semantic relatedness measure.

Digital Enterprise Research Institute www.deri.ie

Does it work?

DBpedia 3.7 + YAGO. 102 natural language queries (QALD 2011).

Entity-Attribute-Value (EAV) Dataset:

45,767 predicates5,556,492 classes

9,434,677 instances

Digital Enterprise Research Institute www.deri.ie

Digital Enterprise Research Institute www.deri.ie

Digital Enterprise Research Institute www.deri.ie

Digital Enterprise Research Institute www.deri.ie

Selected Publications

André Freitas, Edward Curry, João Gabriel Oliveira, João C. Pereira da Silva, Sean O'Riain, Querying the Semantic Web using Semantic Relatedness: A Vocabulary Independent Approach. Data & Knowledge Engineering (DKE) Journal, 2013. (Article). André Freitas, Fabricio de Faria, Sean O'Riain, Edward Curry, Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach, In Proceedings of the 36th Annual ACM SIGIR Conference, Dublin, Ireland, 2013. (Demonstration Paper in Proceedings).

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends. IEEE Internet Computing, Special Issue on Internet-Scale Data, 2012 (Article).

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012 (Article).

 

Digital Enterprise Research Institute www.deri.ie

http://treo.deri.ie

Recommended