Upload
net2-project
View
180
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Exchanging More than Complete Data
Jorge PerezUniversidad de Chile
joint work with Marcelo Arenas (PUC-Chile) and Juan Reutter (Univ. Edinburgh)
Father
Andy BobBob DannyDanny Eddie
Mother
Carrie Bob
Father(x , y) → Parent(x , y)Mother(x , y) → Parent(x , y)
Parent(x , y) ∧ Parent(y , z) → Gr-Parent(x , z)
Father
Andy BobBob DannyDanny Eddie
Mother
Carrie Bob
Father(x , y) → Parent(x , y)Mother(x , y) → Parent(x , y)
Parent(x , y) ∧ Parent(y , z) → Gr-Parent(x , z)
SourceFather(x , y) → Pateras(x , y)
Gr-Parent(x , y) → Pappoi(x , y)Target
Father
Andy BobBob DannyDanny Eddie
Mother
Carrie Bob
Father(x , y) → Parent(x , y)Mother(x , y) → Parent(x , y)
Parent(x , y) ∧ Parent(y , z) → Gr-Parent(x , z)
SourceFather(x , y) → Pateras(x , y)
Gr-Parent(x , y) → Pappoi(x , y)Target
Pateras
Andy BobBob DannyDanny Eddie
Pappoi
Andy DannyCarrie DannyBob Eddie
Father
Andy BobBob DannyDanny Eddie
Mother
Carrie Bob
Father(x , y) → Parent(x , y)Mother(x , y) → Parent(x , y)
Parent(x , y) ∧ Parent(y , z) → Gr-Parent(x , z)
SourceFather(x , y) → Pateras(x , y)
Gr-Parent(x , y) → Pappoi(x , y)Target
Pateras(x , y) ∧ Pateras(y , z) → Pappoi(x , z)
Father
Andy BobBob DannyDanny Eddie
Mother
Carrie Bob
Father(x , y) → Parent(x , y)Mother(x , y) → Parent(x , y)
Parent(x , y) ∧ Parent(y , z) → Gr-Parent(x , z)
SourceFather(x , y) → Pateras(x , y)
Gr-Parent(x , y) → Pappoi(x , y)Target
Pateras
Andy BobBob DannyDanny Eddie
Pappoi
Carrie Danny
Pateras(x , y) ∧ Pateras(y , z) → Pappoi(x , z)
Exchanging More than Complete Data
Jorge PerezUniversidad de Chile
joint work with Marcelo Arenas (PUC-Chile) and Juan Reutter (Univ. Edinburgh)
We can exchange more than complete data
◮ In data exchange we begin with a database instancewe begin with complete data
◮ What if we have a representation of a set ofpossible instances?
◮ We propose a new general formalism to exchangerepresentations of possible instancesand apply it to incomplete instances and knowledge bases
Outline
Formalism for exchanging representations systems
Applications to incomplete instances
Applications to knowledge basesWhat is the complexity of testing kb-solutions?How can we compute a good kb-solution?
Concluding remarks
Outline
Formalism for exchanging representations systems
Applications to incomplete instances
Applications to knowledge basesWhat is the complexity of testing kb-solutions?How can we compute a good kb-solution?
Concluding remarks
Representation systems
A representation system is composed of:
◮ a set W of representatives
◮ a function rep from W to sets of instances
Vrep−→ {I1, I2, I3, . . .} = rep(V)
Representation systems
A representation system is composed of:
◮ a set W of representatives
◮ a function rep from W to sets of instances
Vrep−→ {I1, I2, I3, . . .} = rep(V)
◮ Incomplete instances and knowledge bases arerepresentation systems
In classical data exchange
we consider only complete data
M = (S,T,Σ) a schema mapping from S to T
I ∈ Inst(S), J ∈ Inst(T)
J is a solution for I under M iff (I , J) |= Σ
J ∈ SolM(I )
In classical data exchange
we consider only complete data
M = (S,T,Σ) a schema mapping from S to T
I ∈ Inst(S), J ∈ Inst(T)
J is a solution for I under M iff (I , J) |= Σ
J ∈ SolM(I )
We can extend this to set of instances:given a set X ⊆ Inst(S)
SolM(X ) =⋃
I∈X
SolM(I )
Extending the definition to representation systems
Given:
◮ a mapping M = (S,T,Σ)
◮ a representation system R = (W, rep)
◮ U ,V ∈ W
Extending the definition to representation systems
Given:
◮ a mapping M = (S,T,Σ)
◮ a representation system R = (W, rep)
◮ U ,V ∈ W
Definition
U is an R-solution of V if
rep(U) ⊆ SolM(rep(V))
Extending the definition to representation systems
Given:
◮ a mapping M = (S,T,Σ)
◮ a representation system R = (W, rep)
◮ U ,V ∈ W
Definition
U is an R-solution of V if
rep(U) ⊆ SolM(rep(V))
or equivalently,
∀J ∈ rep(U), ∃I ∈ rep(V) such that J ∈ SolM(I )
Universal solutions and strong representation systems
Definition
U is a universal R-solution of V if
rep(U) = SolM(rep(V))
Universal solutions and strong representation systems
Definition
U is a universal R-solution of V if
rep(U) = SolM(rep(V))
Let C be a class of mappings,
Definition
R = (W, rep) is a strong representation system for C iffor every M ∈ C, and V ∈ W, there exists a U ∈ W such that:
rep(U) = SolM(rep(V))
Universal solutions and strong representation systems
Definition
U is a universal R-solution of V if
rep(U) = SolM(rep(V))
Let C be a class of mappings,
Definition
R = (W, rep) is a strong representation system for C iffor every M ∈ C, and V ∈ W, there exists a U ∈ W such that:
rep(U) = SolM(rep(V))
Strong representation systems ⇒ universal solutions forrepresentatives in R can always be represented in the same system.
Outline
Formalism for exchanging representations systems
Applications to incomplete instances
Applications to knowledge basesWhat is the complexity of testing kb-solutions?How can we compute a good kb-solution?
Concluding remarks
Incomplete Instances
We consider:
◮ Naive instances: instances with null values
R(1,N1)R(N1, 2)
Incomplete Instances
We consider:
◮ Naive instances: instances with null values
R(1,N1)R(N1, 2)
◮ Positive conditional instances: naive instances plusconditions for tuples
◮ equalities between nulls, and between nulls and constants◮ conjunctions and disjunctions
T (1,N1) trueR(1,N1,N2) N1 = N2 ∨ (N1 = 2 ∧ N2 = 3)
Incomplete Instances
We consider:
◮ Naive instances: instances with null values
R(1,N1)R(N1, 2)
◮ Positive conditional instances: naive instances plusconditions for tuples
◮ equalities between nulls, and between nulls and constants◮ conjunctions and disjunctions
T (1,N1) trueR(1,N1,N2) N1 = N2 ∨ (N1 = 2 ∧ N2 = 3)
Semantics given by valuations of nulls (+ open world):
rep(I) = {K | µ(I) ⊆ K for some valuation µ}
Strong representation systems for st-tgds
Let M = (S,T,Σ) be specified by source-to-target tgds(st-tgd: φS(x) → ∃yψT(x , y), φS and ψT are CQs)
(FKMP03)
For every complete instance I there exists a naive instance J s.t.
rep(J ) = SolM(I )
Strong representation systems for st-tgds
Let M = (S,T,Σ) be specified by source-to-target tgds(st-tgd: φS(x) → ∃yψT(x , y), φS and ψT are CQs)
(FKMP03)
For every complete instance I there exists a naive instance J s.t.
rep(J ) = SolM(I )
Proposition
Naive instances are not a strong representation system for st-tgds
Idea
Mng(N ,Alice) Mng(x , y) → Reports(x , y)Mng(x , x) → Self-Mng(x)
Strong representation systems for st-tgds
Let M = (S,T,Σ) be specified by source-to-target tgds(st-tgd: φS(x) → ∃yψT(x , y), φS and ψT are CQs)
(FKMP03)
For every complete instance I there exists a naive instance J s.t.
rep(J ) = SolM(I )
Proposition
Naive instances are not a strong representation system for st-tgds
Idea
Mng(N ,Alice) Mng(x , y) → Reports(x , y) ?Mng(x , x) → Self-Mng(x)
Positive conditional instances are enough
Theorem
Positive conditional instances are astrong representation system for st-tgds.
Positive conditional instances are enough
Theorem
Positive conditional instances are astrong representation system for st-tgds.
Mng(N, Alice) Mng(x , y) → Reports(x , y)Mng(x , x) → Self-Mng(x)
Positive conditional instances are enough
Theorem
Positive conditional instances are astrong representation system for st-tgds.
Mng(N, Alice) Mng(x , y) → Reports(x , y) Reports(N ,Alice) true
Mng(x , x) → Self-Mng(x)
Positive conditional instances are enough
Theorem
Positive conditional instances are astrong representation system for st-tgds.
Mng(N, Alice) Mng(x , y) → Reports(x , y) Reports(N ,Alice) true
Mng(x , x) → Self-Mng(x) Self-Mng(Alice)
Positive conditional instances are enough
Theorem
Positive conditional instances are astrong representation system for st-tgds.
Mng(N, Alice) Mng(x , y) → Reports(x , y) Reports(N ,Alice) true
Mng(x , x) → Self-Mng(x) Self-Mng(Alice) N = Alice
Positive conditional instances are enough
Theorem
Positive conditional instances are astrong representation system for st-tgds.
Mng(N, Alice) Mng(x , y) → Reports(x , y) Reports(N ,Alice) true
Mng(x , x) → Self-Mng(x) Self-Mng(Alice) N = Alice
Corollary
Positive conditional instances are astrong representation system for SO-tgds.
Positive conditional instances are
exactly the needed representation system
Theorem
Positive conditional instances are minimal:
◮ equalities between nulls◮ equalities between constant and nulls◮ conjunctions and disjunctions
are all needed for a strong representation system of st-tgds
Positive conditional instances
match the complexity bounds of classical data exchange
Theorem
For positive conditional instances and (fixed) st-tgds
◮ Universal solutions can be constructed in polynomial time.
◮ Certain answers of unions of conjunctive queriescan be computed in polynomial time.
Outline
Formalism for exchanging representations systems
Applications to incomplete instances
Applications to knowledge basesWhat is the complexity of testing kb-solutions?How can we compute a good kb-solution?
Concluding remarks
The semantics of knowledge bases
is given by sets of instances
Knowledge base over S: (I ,Γ) s.t.
I ∈ Inst(S)
Γ a set of rules over S
Semantics: finite models
Mod(I ,Γ) = {K ∈ Inst(S) | I ⊆ K and K |= Γ}
We can apply our formalism
to knowledge bases
(I2,Γ2) is a kb-solution for (I1,Γ1) under M iff
Mod(I2,Γ2) ⊆ SolM(Mod(I1,Γ1))
Outline
Formalism for exchanging representations systems
Applications to incomplete instances
Applications to knowledge basesWhat is the complexity of testing kb-solutions?How can we compute a good kb-solution?
Concluding remarks
We study the following decision problem
Check-KB-Sol:
Input: M = (S,T,Σ) with Σ st-tgds(I1,Γ1) knowledge base over S with Γ1 tgds(I2,Γ2) knowledge base over T with Γ2 tgds
Output: Is (I2,Γ2) a kb-solution of (I1,Γ1) under M?
We study the following decision problem
Check-KB-Sol:
Input: M = (S,T,Σ) with Σ st-tgds(I1,Γ1) knowledge base over S with Γ1 tgds(I2,Γ2) knowledge base over T with Γ2 tgds
Output: Is (I2,Γ2) a kb-solution of (I1,Γ1) under M?
Theorem
Check-KB-Sol is undecidable (even for fixed M).
We study the following decision problem
Check-KB-Sol:
Input: M = (S,T,Σ) with Σ st-tgds(I1,Γ1) knowledge base over S with Γ1 tgds(I2,Γ2) knowledge base over T with Γ2 tgds
Output: Is (I2,Γ2) a kb-solution of (I1,Γ1) under M?
Theorem
Check-KB-Sol is undecidable (even for fixed M).
Undecidability is a consequence of using ∃ in knowledge bases
Check-Full-KB-Sol: Γ1, Γ2 full-tgds (no ∃)
The complexity of Check-Full-KB-Sol
Theorem
Check-Full-KB-Sol is EXPTIME-complete.
The complexity of Check-Full-KB-Sol
Theorem
Check-Full-KB-Sol is EXPTIME-complete.
Theorem
If M = (S,T,Σ) is fixed
Check-Full-KB-Sol is ∆P2 [O(log n)]-complete.
∆P2 [O(log n)] ≡ PNP with only a logarithmic
number of calls to the NP oracle.
Hardness is obtained by reducing
the Max-Odd-Clique problem
Given a graph G with n nodes:
S: E (·, ·), Succ(·, ·), Clique(·)
Hardness is obtained by reducing
the Max-Odd-Clique problem
Given a graph G with n nodes:
S: E (·, ·), Succ(·, ·), Clique(·)T: E ′(·, ·), Succ ′(·, ·), Clique′(·)
Hardness is obtained by reducing
the Max-Odd-Clique problem
Given a graph G with n nodes:
S: E (·, ·), Succ(·, ·), Clique(·)T: E ′(·, ·), Succ ′(·, ·), Clique′(·)
I1 and I2 are copies of G , plus a successor relation over [1, n]
Hardness is obtained by reducing
the Max-Odd-Clique problem
Given a graph G with n nodes:
S: E (·, ·), Succ(·, ·), Clique(·)T: E ′(·, ·), Succ ′(·, ·), Clique′(·)
I1 and I2 are copies of G , plus a successor relation over [1, n]
Γ1: “E has a clique of even size x ≤ n” → Clique(x)
Hardness is obtained by reducing
the Max-Odd-Clique problem
Given a graph G with n nodes:
S: E (·, ·), Succ(·, ·), Clique(·)T: E ′(·, ·), Succ ′(·, ·), Clique′(·)
I1 and I2 are copies of G , plus a successor relation over [1, n]
Γ1: “E has a clique of even size x ≤ n” → Clique(x)Γ2: “E ′ has a clique of odd size x ≤ n” → Clique′(x)
Hardness is obtained by reducing
the Max-Odd-Clique problem
Given a graph G with n nodes:
S: E (·, ·), Succ(·, ·), Clique(·)T: E ′(·, ·), Succ ′(·, ·), Clique′(·)
I1 and I2 are copies of G , plus a successor relation over [1, n]
Γ1: “E has a clique of even size x ≤ n” → Clique(x)Γ2: “E ′ has a clique of odd size x ≤ n” → Clique′(x)
Σ: Clique(x) ∧ Succ(x , y) → Clique′(y)
Hardness is obtained by reducing
the Max-Odd-Clique problem
Given a graph G with n nodes:
S: E (·, ·), Succ(·, ·), Clique(·)T: E ′(·, ·), Succ ′(·, ·), Clique′(·)
I1 and I2 are copies of G , plus a successor relation over [1, n]
Γ1: “E has a clique of even size x ≤ n” → Clique(x)Γ2: “E ′ has a clique of odd size x ≤ n” → Clique′(x)
Σ: Clique(x) ∧ Succ(x , y) → Clique′(y)
(I2,Γ2) is a kb-solution of (I1,Σ1) iffthe maximum size of a clique in G is odd.
Outline
Formalism for exchanging representations systems
Applications to incomplete instances
Applications to knowledge basesWhat is the complexity of testing kb-solutions?How can we compute a good kb-solution?
Concluding remarks
Minimal knowledge bases as good kb-solutions
What are good knowledge-base solutions?
◮ first choice: universal kb-solutions, but
◮ there are some other kb-solutions desirable to materialize
Minimal knowledge bases as good kb-solutions
What are good knowledge-base solutions?
◮ first choice: universal kb-solutions, but
◮ there are some other kb-solutions desirable to materialize
X ≡min Y: X and Y coincide in the minimal instances
Minimal knowledge bases as good kb-solutions
What are good knowledge-base solutions?
◮ first choice: universal kb-solutions, but
◮ there are some other kb-solutions desirable to materialize
X ≡min Y: X and Y coincide in the minimal instances
(I2,Γ2) is a minimal kb-solution of (I1,Γ1) under M if
Mod(I2,Γ2) ≡min SolM(Mod(I1,Γ1))
We consider minimal kb-solutions as the good solutions
Two requirements to construct
minimal knowledge-base solutions
Given (I1,Γ1) and Σ, when constructing aminimal-knowledge base solution (I2,Γ2) we would want:
1. Γ2 to only depend on Γ1 and Σ
Γ2 is safe for Γ1 and Σ
Two requirements to construct
minimal knowledge-base solutions
Given (I1,Γ1) and Σ, when constructing aminimal-knowledge base solution (I2,Γ2) we would want:
1. Γ2 to only depend on Γ1 and Σ
Γ2 is safe for Γ1 and Σ
Definition
Γ2 is safe for Γ1 and Σ, if for every I1 there exists I2 s.t.
(I2,Γ2) is a minimal kb-solution of (I1,Γ1)
Two requirements to construct
minimal knowledge-base solutions
Given (I1,Γ1) and Σ, when constructing aminimal-knowledge base solution (I2,Γ2) we would want:
2. Γ2 to be as informative as possiblethus minimizing the size of I2
Two requirements to construct
minimal knowledge-base solutions
Given (I1,Γ1) and Σ, when constructing aminimal-knowledge base solution (I2,Γ2) we would want:
2. Γ2 to be as informative as possiblethus minimizing the size of I2
Γ2 is optimum-safe if for every other safe set Γ′
Γ2 |= Γ′
Safe sets: data independent target knowledge bases
Unfortunately, optimum-safe sets are not (always) FO-expressible
Theorem
There exist sets Γ1 (full & acyclic) and Σ (full) s.t.
no FO sentence is optimum-safe for Γ1 and Σ
Safe sets: data independent target knowledge bases
Unfortunately, optimum-safe sets are not (always) FO-expressible
Theorem
There exist sets Γ1 (full & acyclic) and Σ (full) s.t.
no FO sentence is optimum-safe for Γ1 and Σ
In our paper:
An algorithm to compute optimum-safe set in SO
Safe sets: data independent target knowledge bases
Unfortunately, optimum-safe sets are not (always) FO-expressible
Theorem
There exist sets Γ1 (full & acyclic) and Σ (full) s.t.
no FO sentence is optimum-safe for Γ1 and Σ
In our paper:
An algorithm to compute optimum-safe set in SO
Can we do better? not optimum but useful set?(non-trivial safe set in a good language)
We can compute a good safe set
for acyclic tgds knowledge bases
Input: Γ1 acyclic full-tgds, Σ full-tgds (from S to T)
We can compute a good safe set
for acyclic tgds knowledge bases
Input: Γ1 acyclic full-tgds, Σ full-tgds (from S to T)
1. Unfold Γ1 to get Γ+1
We can compute a good safe set
for acyclic tgds knowledge bases
Input: Γ1 acyclic full-tgds, Σ full-tgds (from S to T)
1. Unfold Γ1 to get Γ+1
2. Construct Σ− (from T to S) as follows:
for every α(x) → R(x) in Γ+1
We can compute a good safe set
for acyclic tgds knowledge bases
Input: Γ1 acyclic full-tgds, Σ full-tgds (from S to T)
1. Unfold Γ1 to get Γ+1
2. Construct Σ− (from T to S) as follows:
for every α(x) → R(x) in Γ+1
Rewrite α(x) in the target of Σ to obtain β(x)
We can compute a good safe set
for acyclic tgds knowledge bases
Input: Γ1 acyclic full-tgds, Σ full-tgds (from S to T)
1. Unfold Γ1 to get Γ+1
2. Construct Σ− (from T to S) as follows:
for every α(x) → R(x) in Γ+1
Rewrite α(x) in the target of Σ to obtain β(x)
Add β(x) → R(x) to Σ−
We can compute a good safe set
for acyclic tgds knowledge bases
Input: Γ1 acyclic full-tgds, Σ full-tgds (from S to T)
1. Unfold Γ1 to get Γ+1
2. Construct Σ− (from T to S) as follows:
for every α(x) → R(x) in Γ+1
Rewrite α(x) in the target of Σ to obtain β(x)
Add β(x) → R(x) to Σ−
3. Let Γ2 = Σ− ◦ Σ
We can compute a good safe set
for acyclic tgds knowledge bases
Input: Γ1 acyclic full-tgds, Σ full-tgds (from S to T)
1. Unfold Γ1 to get Γ+1
2. Construct Σ− (from T to S) as follows:
for every α(x) → R(x) in Γ+1
Rewrite α(x) in the target of Σ to obtain β(x)
Add β(x) → R(x) to Σ−
3. Let Γ2 = Σ− ◦ Σ
Theorem
Γ2 is safe for Γ1 and Σ
(Γ2 is a set of full-tgds with 6=)
In the paper: a complete strategy for computing
minimal knowledge-base solutions
Given Γ1 and Σ:
◮ We compute Γ2 safe for Γ1 and Σ
◮ We then compute minimal set Γ∗ (implied by Γ1) such that
for every I1
In the paper: a complete strategy for computing
minimal knowledge-base solutions
Given Γ1 and Σ:
◮ We compute Γ2 safe for Γ1 and Σ
◮ We then compute minimal set Γ∗ (implied by Γ1) such that
for every I1
chaseΓ∗(I1)
In the paper: a complete strategy for computing
minimal knowledge-base solutions
Given Γ1 and Σ:
◮ We compute Γ2 safe for Γ1 and Σ
◮ We then compute minimal set Γ∗ (implied by Γ1) such that
for every I1
chaseΣ( chaseΓ∗(I1))
In the paper: a complete strategy for computing
minimal knowledge-base solutions
Given Γ1 and Σ:
◮ We compute Γ2 safe for Γ1 and Σ
◮ We then compute minimal set Γ∗ (implied by Γ1) such that
for every I1(
chaseΣ( chaseΓ∗(I1)),Γ2
)
In the paper: a complete strategy for computing
minimal knowledge-base solutions
Given Γ1 and Σ:
◮ We compute Γ2 safe for Γ1 and Σ
◮ We then compute minimal set Γ∗ (implied by Γ1) such that
for every I1(
chaseΣ( chaseΓ∗(I1)),Γ2
)
is a minimal kb-solution for (I1,Γ1).
Outline
Formalism for exchanging representations systems
Applications to incomplete instances
Applications to knowledge basesWhat is the complexity of testing kb-solutions?How can we compute a good kb-solution?
Concluding remarks
We can exchange more than complete data
We propose a general formalismto exchange representation systems
◮ Applications to incomplete instances
◮ Applications to knowledge bases
We can exchange more than complete data
We propose a general formalismto exchange representation systems
◮ Applications to incomplete instances
◮ Applications to knowledge bases
Next step: apply our general setting to the Semantic Web
◮ Semantic Web data have nulls (blank nodes)
◮ Semantic Web specifications have rules (RDFS, OWL)
Exchanging More than Complete Data
Jorge PerezUniversidad de Chile
joint work with Marcelo Arenas (PUC-Chile) and Juan Reutter (Univ. Edinburgh)
Outline
Formalism for exchanging representations systems
Applications to incomplete instances
Applications to knowledge basesWhat is the complexity of testing kb-solutions?How can we compute a good kb-solution?
Concluding remarks