19
A short intro to PDQ: Proof-driven Querying Michael Benedikt with Julien Leblay, Efi Tsamoura, and Michael Vanden Boom

PDQ: Proof-driven Querying presentation

  • Upload
    dbonto

  • View
    158

  • Download
    3

Embed Size (px)

DESCRIPTION

Abstract: The data needed to answer queries is often available through Web-based APIs. Indeed, for a given query there may be many Web-based sources which can be used to answer it, with the sources overlapping in their vocabularies, and differing in their access restrictions (required arguments) and cost. We introduce PDQ (Proof-Driven Query Answering), a system for determining a query plan in the presence of web-based sources. It is: (i) constraint-aware -- exploiting relationships between sources to rewrite an expensive query into a cheaper one, (ii) access-aware -- abiding by any access restrictions known in the sources, and (iii) cost-aware -- making use of any cost information that is available about services. PDQ takes the novel approach of generating query plans from proofs that a query is answerable. We demonstrate the use of PDQ and its effectiveness in generating low-cost plans.

Citation preview

Page 1: PDQ: Proof-driven Querying presentation

A short intro to PDQ: Proof-driven

Querying

A short intro to PDQ: Proof-driven

QueryingMichael Benedikt

with Julien Leblay, Efi Tsamoura, and Michael Vanden Boom

Michael Benedikt

with Julien Leblay, Efi Tsamoura, and Michael Vanden Boom

Page 2: PDQ: Proof-driven Querying presentation

BackgroundBackground

DBOnto: Semantics for a better worldDBOnto: Semantics for a better world

• Enable new applications

• Deliver better performance for current data-intensive tasks

• Diminish effort in integrating complex data sources

Exploit semantics of data: within a single source, among distributed sources, across data models

Page 3: PDQ: Proof-driven Querying presentation

BackgroundBackground

Dimensions of Semantic DataDimensions of Semantic Data

Completenessof Sources/Source Access Model

TargetImplementation

Data modelfor queries and constraints

Page 4: PDQ: Proof-driven Querying presentation

BackgroundBackground

Dimensions of Semantic DataDimensions of Semantic Data

Completenessof Sources/Source Access Model

TargetImplementation

Data modelfor queries and constraints

Page 5: PDQ: Proof-driven Querying presentation

BackgroundBackground

Semantic Data Technology Semantic Data Technology

Completeness of Sources/

Source Access Model

TargetImplementation

Data modelfor queries and constraints

• RDF data model, description logic constraints• Inherently incomplete sources• Certain answer semantics• Wide range of target implementations

Semantic Web

Page 6: PDQ: Proof-driven Querying presentation

BackgroundBackground

Semantic Data Technology Semantic Data Technology

TargetImplementation

Data modelfor queries and constraints

Query Optimizationwith Constraints

• Relational data model and constraints• Complete information• Access via lookup indices in sources• Compile to plan language of DBMS

Completeness of Sources/

Source Access Model

Page 7: PDQ: Proof-driven Querying presentation

BackgroundBackground

Semantic Data Technology Semantic Data Technology

TargetImplementation

Data modelfor queries and constraints

Query Optimizationwith Constraints via Reformulation

• Relational data model and constraints• Complete sources • Compile to query language (e.g. SQL)

Completeness of Sources/

Source Access Model

Page 8: PDQ: Proof-driven Querying presentation

BackgroundBackground

Semantic Data Technology Semantic Data Technology

TargetImplementation

Data modelfor queries and constraints

Query Rewriting with Exact Views

• Relational sources and constraints• Base data may not be accessible• Can still look for exact answers to queries• Compile to query language (e.g. SQL)

Completeness of Sources/

Source Access Model

Page 9: PDQ: Proof-driven Querying presentation

BackgroundBackground

Semantic Data Technology Semantic Data Technology

TargetImplementation

Data modelfor queries and constraints

Federated Querying Over Web-basedSources

• Model sources and constraints relationally • Complete information on subset of sources• Distributed sources with mix of access

regimes• Compile to middleware plan

Completeness of Sources/

Source Access Model

Page 10: PDQ: Proof-driven Querying presentation

BackgroundBackground

Long-term PDQ visionLong-term PDQ vision

Completenessof Sources/Source Access Model

TargetImplementation

Data modelfor queries and constraints

PDQ

Page 11: PDQ: Proof-driven Querying presentation

FunctionalityFunctionality

PDQ: what it is todayPDQ: what it is today

Unified framework for:•Query Optimization/Reformulation with Constraints •Querying with Materialized Views•Federated Querying with Complete Information

System for answering queries Q in the presence of semantic relationships and access restrictions on sources

Targets:•Relational data model and constraints•Sufficient accessible information assumption: there is sufficient accessible data to obtain the exact answers to the query Q•Compilation into a “static plan” (reformulation, physical plan, middleware plan)

Page 12: PDQ: Proof-driven Querying presentation

FunctionalityFunctionality

PDQ: what it is PDQ: what it is

Metadata including •D description of access to sources•integrity constraints C

Pbest: plan using access model described by D with minimal cost giving the exact answer to Q for databases satisfying constraints C

PDQ planner

PDQ runtime Executes plans on top ofWeb-based or local datasources

Query Q

Cost information (e.g. cost function on plans)

Page 13: PDQ: Proof-driven Querying presentation

Under the hoodUnder the hood

PDQ: how it works (sort of)PDQ: how it works (sort of)

Key observation: Under the sufficient accessible information assumption on Q, C, D there is always a “static plan” (e.g. relational algebra query) PQ that can be run to answer Q

We can find such a PQ by looking for a “proof that there is sufficientinformation to answer Q”.

• First main component: procedures to turn “proofs of answerability” into plans • Proof-to-plan procedure works for extremely rich class of integrity constraints• Adaptable to different target implementations (SQL query, physical plan, distributed

plan…)

• These “proof-to-plan” procedures are coupled with a reasoning system

for finding the proofs of answerability. • Plug-in architecture: Chase procedure, Tableau-based FO theorem-prover, …

Page 14: PDQ: Proof-driven Querying presentation

Under the hoodUnder the hood

PDQ: how it works in a bit more detail PDQ: how it works in a bit more detail

PDQ planner

Reasoningsystem for

finding “proofs of

answerability”

Proof-to-Plan

conversion

Metadata including •D description of access to sources•integrity constraints C Query Q

Cost information (e.g. cost function on plans)

Page 15: PDQ: Proof-driven Querying presentation

Under the hoodUnder the hood

PDQ: how it works, still morePDQ: how it works, still more

We can find a static plan PQ getting the exact answer to Q by looking for a “proof that Q is answerable” and then applying a proof-to-plan procedure.

Last component – search strategy: we can find a good PQ by searching for a proof that 1.witnesses that Q is answerable2.generates a low-cost planSearch is directed by proof goal and cost

Page 16: PDQ: Proof-driven Querying presentation

Under the hoodUnder the hood

PDQ architecture PDQ architecture

Page 17: PDQ: Proof-driven Querying presentation

StatusStatus

PDQ today and tomorrowPDQ today and tomorrow

• Theoretical basis given in PODS 2014 paper

• Demonstration implemented over web services in VLDB 2014

• Implementation generates SQL reformulation over relational sources (run on top of Postgres)

Moving forward:

•Pilot project beginning Oct 2014 to explore “native implementation” of PDQ on top of the plan language of the LogicBlox DBMS

•Large EPSRC-funded project 2015-2020 to explore diverse uses of PDQ

Page 18: PDQ: Proof-driven Querying presentation

StatusStatus

PDQ today and tomorrowPDQ today and tomorrow

Completenessof Sources/Source Access Model

TargetImplementation

Data modelfor queries and constraints

PDQ2014

PDQ2020

Page 19: PDQ: Proof-driven Querying presentation

PDQ: Next StepsPDQ: Next Steps

Next StepsNext Steps

• More info at http://cs.ox.ac.uk/pdq• See the demo!