Finding Plans from Proofs
PDQ: Proof-driven Query Answering over Web-based Data Michael Benedikt, Julien Leblay, Efthymia Tsamoura - Oxford University
Supported by EPSRC grant EP/H017690/1, Query-driven Data Acquisition from Web-based Data Sources
Project homepage: http://www.cs.ox.ac.uk/projects/pdq/
Contact: [email protected]
Example: online services for geographic information
r1: Places(id, name, type, coordinates, ...) information about places (e.g. city, country, continent, lake, etc.)
r2: BelongsTo(source, target) containment between places, "China belongs to Asia".
r3: Countries(id, name, iso_code, ...) information about countries.
φ1:Places(x, y, Country, ...) ↔ Countries(x, y, ...)
Query for countries in Asia: not answerable without considering constraints. SELECT p1.name FROM BelongsTo AS bt
JOIN Places AS p1 ON p1.id=bt.source
JOIN Places AS p2 ON p2.id=bt.target
WHERE p1.type = ’Country’ AND p2.name = ’Asia’
Pre-processing steps create auxiliary schema by adding relations InferredAccPlaces,
InferredAccBelongsTo, InferredAccCountries, Accessible and constraints: φ’1: InferredAccPlaces(x, y, Country, ...) ↔ InferredAccCountries(x, y,...)
α1: Accessible(y)∧Places(x, y , z, ...)
→ InferredAccPlaces(x, y, z, ...)∧Accessible(x)∧Accessible(z)∧ ...
α2: Accessible(x)∧BelongsTo(x, y) → InferredAccBelongsTo(x, y)∧Accessible(y)
α3: Countries(x, y, z, ...) → InferredAccCountries(x, y, z, ...)∧Accessible(x)∧…
α4: …
Context
Web data sources which may have: • overlapping information, • access restrictions.
As a result: • There may be no web query plan for a given user query. • There may be many plans using different sources with different costs.
Need to reason about Integrity constraints and access limitations.
PDQ
System for determining a query plan in the presence of web-based sources. i. constraint-aware ii. access-aware – abiding by access restrictions, iii. cost-aware – making use of any cost information
Approach: generating query plans from proofs that a query is answerable.
Input S: Schema ⟨R, Σ⟩, R set of relations with access methods (free, limited, inaccessible), Σ set of integrity constraints (TGDs). Q: Conjunctive query over S.
f: Cost function on evaluation plans.
Output Pbest: plan with minimal cost.
Step 1: Pre-processing S augmented with new relations and axioms modelling the access restrictions. A goal query Qinferred is created based on the relations of the augmented schema. Q is grounded to form the initial state of the plan search.
Step 2: Basic search step Each state is closed under firing of rules (blue arrows) other than accessibility axioms (denoted αi).
Every possible firing of accessibility axioms (red arrows) gives a new candidate state, inheriting all the facts of its ancestors.
Step 3: Plans and costs Each new state gives a plan, to which a cost is assigned (orange circles).
If state corresponds to a match with Qinferred and its plan’s cost is lower than the best so far, it becomes the new best state.
Queries over Web Data
Architecture & User Experience
User interface for creating and editing schemas and queries
Interactive exploration of the planner’s search space. Online execution of plans.
User interface for creating and configuring planning sessions.
Dashboard
Architecture Runtime Planner
InferredAccPlaces(id2, "Asia", c2, …), Accessible(id2), Accessible(c2), …
T’1 ⇐ Places ⇐ ("𝐴𝑠𝑖𝑎")
InferredAccPlaces(id2, "Asia", c2, …), Accessible(id2), Accessible(c2), …
T2 ⇐ Places ⇐("𝐴𝑠𝑖𝑎") T3 := T1 ⋈ T2
InferredAccBelongsTo(id1, id2)
T4 ⇐ BelongsTo ⇐ π source (T3)
T5 := π name ( T3 ⋈ T4 )
Places(id1, name1, "Country", …), Places(id2, "Asia", …), BelongsTo(id1, id2), Accessible("Asia"), Accessible("Country")
Initial State
Countries(id1, name1, c1, …)
φ1
Goal : Qinferred(name) ← InferredAccPlaces(id1, name1, "Country", …)
∧ InferredAccPlaces(id2, "Asia", …)∧ InferredAccBelongsTo(id1, id2)
φ‘1
α1
α1
α2
α3
InferredAccCountries(id1, name1, c1, …), Accessible(id1), Accessible(name1), Accessible(c1)
T1 ⇐ Countries ⇐ Ø
InferredAccPlaces(id1, name1, "Country", …)
InferredAccCountries(id1, name1, c1, …), Accessible(id1), Accessible(name1), …
T’2 ⇐ Countries ⇐ Ø
T’3 := T’1 ⋈ T’2
InferredAccPlaces(id1, name1, "Country", …)
φ‘1
α3
InferredAccBelongsTo(id1, id2)
T’4 ⇐ BelongsTo ⇐ π source (T’3)
T‘5 := π name ( T‘3 ⋈ T‘4 )
α2
3
2
25
35
45 55
Models free access on Countries