Upload
mufutau-kramer
View
33
Download
1
Embed Size (px)
DESCRIPTION
On Propagation of Deletions and Annotations through Views. Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna. Data Annotations (share annotations). Knowledge sharing through “annotations” - PowerPoint PPT Presentation
Citation preview
On Propagation of Deletions and On Propagation of Deletions and Annotations through ViewsAnnotations through Views
Wang-Chiew TanUniversity of Pennsylvania
Database Group
Joint work with Peter Buneman and Sanjeev Khanna
Wang-Chiew Tan, Penn Database Group 2
Data Annotations (share Data Annotations (share annotations)annotations)
• Knowledge sharing through “annotations”• Annotations on data at various levels of granularity,
annotations on annotations• Improve accuracy of data
– data and annotations can be reviewed by independent parties
• Annotations:– loosely structured
• Source Data:– proprietary– fixed schema
• A system that overlays annotations on existing data• “big business” in scientific databases
Wang-Chiew Tan, Penn Database Group 3
Restaurant Cost Type
Peacock Alley
Bull & Bear
PacificaSoho Kitchen & Bar
$$$ French
$$$ Seafood
$ Chinese$ American
Restaurant Cost Type
PacificaSoho Kitchen & Bar
$ Chinese$ American
All Restaurants (View 1) Cheap Restaurants (View 2)
Yummy chicken curry!!
NYRestaurants (Source Table)
Restaurant Cost Type
Peacock Alley
Bull & Bear
PacificaSoho Kitchen & Bar
Zip
$$$ French 10022
$$$ Seafood 10022
$ Chinese 10013$ American10022
Serves fine French Cuisine in elegant setting. Jackets required.
Extensive wine list!
Data Annotations (share Data Annotations (share annotations)annotations)
Wang-Chiew Tan, Penn Database Group 4
Data AnnotationsData Annotations
• Communicate “meta data” through annotations– “bounce” or “spread” annotations around by piggybacking
annotations on data items in the source-query-view model.
• An annotation is placed in the view– where do we place the annotation on source?
• Annotation placement problem presented in relational setting– results carry over to fragments of XML (hierarchical model)
Source:RelationalDatabase
View : result of query applied on source
Model:
Not an easy problem!
Query
Wang-Chiew Tan, Penn Database Group 5
Location and Propagation RulesLocation and Propagation Rules• A location is a triple: (R, t, A)
A1 A2 A3 A1 A2 A3
A3
A1 A2 A3
A1 A2 A2 A3
A1 A2 A3
A1 A2 A3
A1 A2 A3
A1 A2 A3
R
R
R1 R2
R1
R2
relation name tuple in R A is an attribute in schema of R
• Propagation Rules:
– Select:
– Project:
– Join:
– Union:
Wang-Chiew Tan, Penn Database Group 6
Annotation Placement ProblemAnnotation Placement Problem
• Annotation Placement Problem: – Given a view V = Q(S) and an annotation A placed in the
view V, decide if there is an annotation in the source that when propagated to the view, produces no other annotation except A.
• Q = query• S = data source
– “side-effect-free annotation” : an annotation on the source that produces no other annotation except A in the view
S
QV=Q(S)
Wang-Chiew Tan, Penn Database Group 7
A Dichotomy TheoremA Dichotomy Theorem
(a) It is NP-hard to decide if there is a side-effect-free annotation for a PJ query.
(b) There is a polynomial time algorithm for queries which do not simultaneously contain a Project and a Join operation.
Theorem:
S
QV=Q(S)
Wang-Chiew Tan, Penn Database Group 8
Project and Join QueryProject and Join Query• Intuition: PJ can encode 3SAT
(x1 + x2 + x3) . . . ( x3 + x5 + x2)
x1 x2 x3 C1
C1 Cm
C1 ... Cm
Query OutputQuery:Join, then Project on C1 … Cm
...
C1ddd
T - trueF - false
Assignment tuples:All possible satisfying assignments for C1
C1
C1
FFFTFF
C1FTFC1TTFC1FFTC1FTTC1TTT
Dummy tuple
Assignment tuples:All possible satisfying assignments for Cm
x3 x5 x2 Cm
Cm
Cm
Cmddd
TFFFTF
CmTTFCmFFTCmTFTCmFTTCmTTT
Dummy tuple
. . .
Wang-Chiew Tan, Penn Database Group 9
• Intuition: PJ can encode 3SAT(x1 + x2 + x3) … ( x3 + x5 + x2)
Assignment tuples:All possible satisfying assignments for C1
x1 x2 x3 C1
C1
C1
C1
Assignment tuples:All possible satisfying assignments for Cm
ddd
C1 ... Cm
Output
C1 Cm
FFFTFF
C1FTFC1TTFC1FFTC1FTT
x3 x5 x2 Cm
Cm
Cm
Cmddd
TFFFTF
CmTTFCmFFTCmTFTCmFTT
T - trueF - false
C1TTT CmTTT
Dummy tuple
Dummy tuples
C’mddd
C1 ... C’m
...
Query:Join, then Project on C1 … Cm
Project and Join QueryProject and Join Query
Wang-Chiew Tan, Penn Database Group 10
Related Work on AnnotationsRelated Work on Annotations
• Superimposed Information (D. Maier, L. Delcambre [WebDB’99])
– data “placed over” existing information eg. bookmark files, schema of a database
• Annotation Systems– Annotea (W3C)
• annotate web pages• location is defined with XPointer
– Multivalent Browser (R. Wilensky, T. A. Phelps. UC Berkeley DL Project)• annotate on PDF files, HTML, etc.• robust locations
– BioDAS (Distributed Annotation Server) (L.Stein et. al )• annotate on genome sequences• notion of location is genome specific
• No one has formally studied annotation placement problem
Wang-Chiew Tan, Penn Database Group 11
The classical view deletion problemThe classical view deletion problem
• A view tuple is to be deleted– What changes should be made to the source?
• Many kinds of view-to-source deletion translations– eg. deletion-to-insertion, deletion-to-modification, etc.
• Update Semantics of Relational Views (F. Banchilon, N. Spyratos, [TODS’81])
• On the correct translation of Update Operations on Relational Views (U. Dayal, P. Bernstein, [TODS’82])
• Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins (A. M. Keller, [PODS’85])
– deletion-to-deletion • Run-Time translations of View Tuple Deletions Using Data
Lineage
(Y. Cui, J. Widom, [2001])– exploits lineage information to find “side-effect free” deletions
whenever possible
Wang-Chiew Tan, Penn Database Group 12
View Deletion ProblemView Deletion Problem(Deletion-to-deletion translation)(Deletion-to-deletion translation)
• View Deletion Problem (minimize view side-effect):– Given a view V=Q(S) and a tuple t in V, decide if there is a side-
effect free deletion for t
– “side-effect-free deletion” : a set of source tuples whose removal from the database will only remove t from the view
Source:RelationalDatabase
View : result of query applied on source
Query
Wang-Chiew Tan, Penn Database Group 13
A Dichotomy TheoremA Dichotomy Theorem
(a) It is NP-hard to decide if there is a side-effect free deletion for a PJ or JU query in normal form.
(b) There is a polynomial time algorithm to find the set of source deletions with minimum side-effects for all other queries, i.e., queries that involve only S,P,U or S,J operators).
• Theorem (a) is true even for a constant size PJ query involving only two relations!
Theorem:
PROJ A,C(R1 JOIN R2)
Wang-Chiew Tan, Penn Database Group 14
View Deletion: PJ QueryView Deletion: PJ Query
It is NP-hard to decide if there is a side-effect free deletion for a PJ query in normal form.
A BB C
c2 x2
c2 x4
c2 x5
c3x4 c3x1 c3x3
(x1+x2+x3)(x2+x4+x5)(x4+x1+x3)
R1R2
A Ca ca c1a c
3c2 cc2 c
1c2 c
3
PROJ A,C(R1 JOIN R2)
c1x2 c1x3
c1x1
a x5
a x1a x2a x
3a x4
cx1
cx2
cx3 cx4 cx5
For each xi, decide whether to delete (a,xi) or (xi,c).
Theorem:
Wang-Chiew Tan, Penn Database Group 15
Ongoing and Future WorkOngoing and Future Work
• Implementation of annotation system– on RDBMS
• special cases of PJ queries with polynomial time algorithm
– PJ queries that do not project out key information
– on XML– effects on query languages?
Wang-Chiew Tan, Penn Database Group 16
Do we need an “annotation-conscious” Do we need an “annotation-conscious” QL?QL?
• The same query in different languages, but different annotation behaviorEmp(Name, Sal, Dept)
[Name:”Joe”, Sal:50K , Dept:”Marketing” ]
Relational Algebra:Emp JOIN Department
SQL:SELECT e.Name, e.Sal, e.Dept, d.ManagerFROM Emp e, Department dWHERE e.Dept = d.Dept
[Name:”Joe”, Sal:50k ] [Name:”Joe”, Sal:50k]
Department(Dept, Manager)[Dept:”Marketing” , Manager:”Jane”]
[Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”]
[Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”]
Q1 = SELECT e.Name, e.Sal FROM Emp e WHERE e.Sal = “50K”
Q2 = SELECT e.Name, “50K” AS Sal FROM Emp e WHERE e.Sal = “50K”
• Equivalent queries in the same language, but different annotation behavior
=a
Wang-Chiew Tan, Penn Database Group 17
• Relational algebra seems to suggest a natural set of propagation rules
• SQL seems to suggest another natural propagation rule– one that is based on variable bindings
• Not clear how we extend the semantics of query languages so that annotation propagation is “well-behaved”.
• Should a query language be “annotation-conscious” ?OR• Should the user be allowed to control which annotation
gets propagated to where?
Do we need an “annotation-conscious” Do we need an “annotation-conscious” QL?QL?
Wang-Chiew Tan, Penn Database Group 18
End of Talk