Upload
dbonto
View
158
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Abstract: The data needed to answer queries is often available through Web-based APIs. Indeed, for a given query there may be many Web-based sources which can be used to answer it, with the sources overlapping in their vocabularies, and differing in their access restrictions (required arguments) and cost. We introduce PDQ (Proof-Driven Query Answering), a system for determining a query plan in the presence of web-based sources. It is: (i) constraint-aware -- exploiting relationships between sources to rewrite an expensive query into a cheaper one, (ii) access-aware -- abiding by any access restrictions known in the sources, and (iii) cost-aware -- making use of any cost information that is available about services. PDQ takes the novel approach of generating query plans from proofs that a query is answerable. We demonstrate the use of PDQ and its effectiveness in generating low-cost plans.
Citation preview
A short intro to PDQ: Proof-driven
Querying
A short intro to PDQ: Proof-driven
QueryingMichael Benedikt
with Julien Leblay, Efi Tsamoura, and Michael Vanden Boom
Michael Benedikt
with Julien Leblay, Efi Tsamoura, and Michael Vanden Boom
BackgroundBackground
DBOnto: Semantics for a better worldDBOnto: Semantics for a better world
• Enable new applications
• Deliver better performance for current data-intensive tasks
• Diminish effort in integrating complex data sources
Exploit semantics of data: within a single source, among distributed sources, across data models
BackgroundBackground
Dimensions of Semantic DataDimensions of Semantic Data
Completenessof Sources/Source Access Model
TargetImplementation
Data modelfor queries and constraints
BackgroundBackground
Dimensions of Semantic DataDimensions of Semantic Data
Completenessof Sources/Source Access Model
TargetImplementation
Data modelfor queries and constraints
BackgroundBackground
Semantic Data Technology Semantic Data Technology
Completeness of Sources/
Source Access Model
TargetImplementation
Data modelfor queries and constraints
• RDF data model, description logic constraints• Inherently incomplete sources• Certain answer semantics• Wide range of target implementations
Semantic Web
BackgroundBackground
Semantic Data Technology Semantic Data Technology
TargetImplementation
Data modelfor queries and constraints
Query Optimizationwith Constraints
• Relational data model and constraints• Complete information• Access via lookup indices in sources• Compile to plan language of DBMS
Completeness of Sources/
Source Access Model
BackgroundBackground
Semantic Data Technology Semantic Data Technology
TargetImplementation
Data modelfor queries and constraints
Query Optimizationwith Constraints via Reformulation
• Relational data model and constraints• Complete sources • Compile to query language (e.g. SQL)
Completeness of Sources/
Source Access Model
BackgroundBackground
Semantic Data Technology Semantic Data Technology
TargetImplementation
Data modelfor queries and constraints
Query Rewriting with Exact Views
• Relational sources and constraints• Base data may not be accessible• Can still look for exact answers to queries• Compile to query language (e.g. SQL)
Completeness of Sources/
Source Access Model
BackgroundBackground
Semantic Data Technology Semantic Data Technology
TargetImplementation
Data modelfor queries and constraints
Federated Querying Over Web-basedSources
• Model sources and constraints relationally • Complete information on subset of sources• Distributed sources with mix of access
regimes• Compile to middleware plan
Completeness of Sources/
Source Access Model
BackgroundBackground
Long-term PDQ visionLong-term PDQ vision
Completenessof Sources/Source Access Model
TargetImplementation
Data modelfor queries and constraints
PDQ
FunctionalityFunctionality
PDQ: what it is todayPDQ: what it is today
Unified framework for:•Query Optimization/Reformulation with Constraints •Querying with Materialized Views•Federated Querying with Complete Information
System for answering queries Q in the presence of semantic relationships and access restrictions on sources
Targets:•Relational data model and constraints•Sufficient accessible information assumption: there is sufficient accessible data to obtain the exact answers to the query Q•Compilation into a “static plan” (reformulation, physical plan, middleware plan)
FunctionalityFunctionality
PDQ: what it is PDQ: what it is
Metadata including •D description of access to sources•integrity constraints C
Pbest: plan using access model described by D with minimal cost giving the exact answer to Q for databases satisfying constraints C
PDQ planner
PDQ runtime Executes plans on top ofWeb-based or local datasources
Query Q
Cost information (e.g. cost function on plans)
Under the hoodUnder the hood
PDQ: how it works (sort of)PDQ: how it works (sort of)
Key observation: Under the sufficient accessible information assumption on Q, C, D there is always a “static plan” (e.g. relational algebra query) PQ that can be run to answer Q
We can find such a PQ by looking for a “proof that there is sufficientinformation to answer Q”.
• First main component: procedures to turn “proofs of answerability” into plans • Proof-to-plan procedure works for extremely rich class of integrity constraints• Adaptable to different target implementations (SQL query, physical plan, distributed
plan…)
• These “proof-to-plan” procedures are coupled with a reasoning system
for finding the proofs of answerability. • Plug-in architecture: Chase procedure, Tableau-based FO theorem-prover, …
Under the hoodUnder the hood
PDQ: how it works in a bit more detail PDQ: how it works in a bit more detail
PDQ planner
Reasoningsystem for
finding “proofs of
answerability”
Proof-to-Plan
conversion
Metadata including •D description of access to sources•integrity constraints C Query Q
Cost information (e.g. cost function on plans)
Under the hoodUnder the hood
PDQ: how it works, still morePDQ: how it works, still more
We can find a static plan PQ getting the exact answer to Q by looking for a “proof that Q is answerable” and then applying a proof-to-plan procedure.
Last component – search strategy: we can find a good PQ by searching for a proof that 1.witnesses that Q is answerable2.generates a low-cost planSearch is directed by proof goal and cost
Under the hoodUnder the hood
PDQ architecture PDQ architecture
StatusStatus
PDQ today and tomorrowPDQ today and tomorrow
• Theoretical basis given in PODS 2014 paper
• Demonstration implemented over web services in VLDB 2014
• Implementation generates SQL reformulation over relational sources (run on top of Postgres)
Moving forward:
•Pilot project beginning Oct 2014 to explore “native implementation” of PDQ on top of the plan language of the LogicBlox DBMS
•Large EPSRC-funded project 2015-2020 to explore diverse uses of PDQ
StatusStatus
PDQ today and tomorrowPDQ today and tomorrow
Completenessof Sources/Source Access Model
TargetImplementation
Data modelfor queries and constraints
PDQ2014
PDQ2020
PDQ: Next StepsPDQ: Next Steps
Next StepsNext Steps
• More info at http://cs.ox.ac.uk/pdq• See the demo!