18
On Propagation of Deletions and On Propagation of Deletions and Annotations through Views Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna

On Propagation of Deletions and Annotations through Views

Embed Size (px)

DESCRIPTION

On Propagation of Deletions and Annotations through Views. Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna. Data Annotations (share annotations). Knowledge sharing through “annotations” - PowerPoint PPT Presentation

Citation preview

Page 1: On Propagation of Deletions and Annotations through Views

On Propagation of Deletions and On Propagation of Deletions and Annotations through ViewsAnnotations through Views

Wang-Chiew TanUniversity of Pennsylvania

Database Group

Joint work with Peter Buneman and Sanjeev Khanna

Page 2: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 2

Data Annotations (share Data Annotations (share annotations)annotations)

• Knowledge sharing through “annotations”• Annotations on data at various levels of granularity,

annotations on annotations• Improve accuracy of data

– data and annotations can be reviewed by independent parties

• Annotations:– loosely structured

• Source Data:– proprietary– fixed schema

• A system that overlays annotations on existing data• “big business” in scientific databases

Page 3: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 3

Restaurant Cost Type

Peacock Alley

Bull & Bear

PacificaSoho Kitchen & Bar

$$$ French

$$$ Seafood

$ Chinese$ American

Restaurant Cost Type

PacificaSoho Kitchen & Bar

$ Chinese$ American

All Restaurants (View 1) Cheap Restaurants (View 2)

Yummy chicken curry!!

NYRestaurants (Source Table)

Restaurant Cost Type

Peacock Alley

Bull & Bear

PacificaSoho Kitchen & Bar

Zip

$$$ French 10022

$$$ Seafood 10022

$ Chinese 10013$ American10022

Serves fine French Cuisine in elegant setting. Jackets required.

Extensive wine list!

Data Annotations (share Data Annotations (share annotations)annotations)

Page 4: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 4

Data AnnotationsData Annotations

• Communicate “meta data” through annotations– “bounce” or “spread” annotations around by piggybacking

annotations on data items in the source-query-view model.

• An annotation is placed in the view– where do we place the annotation on source?

• Annotation placement problem presented in relational setting– results carry over to fragments of XML (hierarchical model)

Source:RelationalDatabase

View : result of query applied on source

Model:

Not an easy problem!

Query

Page 5: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 5

Location and Propagation RulesLocation and Propagation Rules• A location is a triple: (R, t, A)

A1 A2 A3 A1 A2 A3

A3

A1 A2 A3

A1 A2 A2 A3

A1 A2 A3

A1 A2 A3

A1 A2 A3

A1 A2 A3

R

R

R1 R2

R1

R2

relation name tuple in R A is an attribute in schema of R

• Propagation Rules:

– Select:

– Project:

– Join:

– Union:

Page 6: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 6

Annotation Placement ProblemAnnotation Placement Problem

• Annotation Placement Problem: – Given a view V = Q(S) and an annotation A placed in the

view V, decide if there is an annotation in the source that when propagated to the view, produces no other annotation except A.

• Q = query• S = data source

– “side-effect-free annotation” : an annotation on the source that produces no other annotation except A in the view

S

QV=Q(S)

Page 7: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 7

A Dichotomy TheoremA Dichotomy Theorem

(a) It is NP-hard to decide if there is a side-effect-free annotation for a PJ query.

(b) There is a polynomial time algorithm for queries which do not simultaneously contain a Project and a Join operation.

Theorem:

S

QV=Q(S)

Page 8: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 8

Project and Join QueryProject and Join Query• Intuition: PJ can encode 3SAT

(x1 + x2 + x3) . . . ( x3 + x5 + x2)

x1 x2 x3 C1

C1 Cm

C1 ... Cm

Query OutputQuery:Join, then Project on C1 … Cm

...

C1ddd

T - trueF - false

Assignment tuples:All possible satisfying assignments for C1

C1

C1

FFFTFF

C1FTFC1TTFC1FFTC1FTTC1TTT

Dummy tuple

Assignment tuples:All possible satisfying assignments for Cm

x3 x5 x2 Cm

Cm

Cm

Cmddd

TFFFTF

CmTTFCmFFTCmTFTCmFTTCmTTT

Dummy tuple

. . .

Page 9: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 9

• Intuition: PJ can encode 3SAT(x1 + x2 + x3) … ( x3 + x5 + x2)

Assignment tuples:All possible satisfying assignments for C1

x1 x2 x3 C1

C1

C1

C1

Assignment tuples:All possible satisfying assignments for Cm

ddd

C1 ... Cm

Output

C1 Cm

FFFTFF

C1FTFC1TTFC1FFTC1FTT

x3 x5 x2 Cm

Cm

Cm

Cmddd

TFFFTF

CmTTFCmFFTCmTFTCmFTT

T - trueF - false

C1TTT CmTTT

Dummy tuple

Dummy tuples

C’mddd

C1 ... C’m

...

Query:Join, then Project on C1 … Cm

Project and Join QueryProject and Join Query

Page 10: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 10

Related Work on AnnotationsRelated Work on Annotations

• Superimposed Information (D. Maier, L. Delcambre [WebDB’99])

– data “placed over” existing information eg. bookmark files, schema of a database

• Annotation Systems– Annotea (W3C)

• annotate web pages• location is defined with XPointer

– Multivalent Browser (R. Wilensky, T. A. Phelps. UC Berkeley DL Project)• annotate on PDF files, HTML, etc.• robust locations

– BioDAS (Distributed Annotation Server) (L.Stein et. al )• annotate on genome sequences• notion of location is genome specific

• No one has formally studied annotation placement problem

Page 11: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 11

The classical view deletion problemThe classical view deletion problem

• A view tuple is to be deleted– What changes should be made to the source?

• Many kinds of view-to-source deletion translations– eg. deletion-to-insertion, deletion-to-modification, etc.

• Update Semantics of Relational Views (F. Banchilon, N. Spyratos, [TODS’81])

• On the correct translation of Update Operations on Relational Views (U. Dayal, P. Bernstein, [TODS’82])

• Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins (A. M. Keller, [PODS’85])

– deletion-to-deletion • Run-Time translations of View Tuple Deletions Using Data

Lineage

(Y. Cui, J. Widom, [2001])– exploits lineage information to find “side-effect free” deletions

whenever possible

Page 12: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 12

View Deletion ProblemView Deletion Problem(Deletion-to-deletion translation)(Deletion-to-deletion translation)

• View Deletion Problem (minimize view side-effect):– Given a view V=Q(S) and a tuple t in V, decide if there is a side-

effect free deletion for t

– “side-effect-free deletion” : a set of source tuples whose removal from the database will only remove t from the view

Source:RelationalDatabase

View : result of query applied on source

Query

Page 13: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 13

A Dichotomy TheoremA Dichotomy Theorem

(a) It is NP-hard to decide if there is a side-effect free deletion for a PJ or JU query in normal form.

(b) There is a polynomial time algorithm to find the set of source deletions with minimum side-effects for all other queries, i.e., queries that involve only S,P,U or S,J operators).

• Theorem (a) is true even for a constant size PJ query involving only two relations!

Theorem:

PROJ A,C(R1 JOIN R2)

Page 14: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 14

View Deletion: PJ QueryView Deletion: PJ Query

It is NP-hard to decide if there is a side-effect free deletion for a PJ query in normal form.

A BB C

c2 x2

c2 x4

c2 x5

c3x4 c3x1 c3x3

(x1+x2+x3)(x2+x4+x5)(x4+x1+x3)

R1R2

A Ca ca c1a c

3c2 cc2 c

1c2 c

3

PROJ A,C(R1 JOIN R2)

c1x2 c1x3

c1x1

a x5

a x1a x2a x

3a x4

cx1

cx2

cx3 cx4 cx5

For each xi, decide whether to delete (a,xi) or (xi,c).

Theorem:

Page 15: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 15

Ongoing and Future WorkOngoing and Future Work

• Implementation of annotation system– on RDBMS

• special cases of PJ queries with polynomial time algorithm

– PJ queries that do not project out key information

– on XML– effects on query languages?

Page 16: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 16

Do we need an “annotation-conscious” Do we need an “annotation-conscious” QL?QL?

• The same query in different languages, but different annotation behaviorEmp(Name, Sal, Dept)

[Name:”Joe”, Sal:50K , Dept:”Marketing” ]

Relational Algebra:Emp JOIN Department

SQL:SELECT e.Name, e.Sal, e.Dept, d.ManagerFROM Emp e, Department dWHERE e.Dept = d.Dept

[Name:”Joe”, Sal:50k ] [Name:”Joe”, Sal:50k]

Department(Dept, Manager)[Dept:”Marketing” , Manager:”Jane”]

[Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”]

[Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”]

Q1 = SELECT e.Name, e.Sal FROM Emp e WHERE e.Sal = “50K”

Q2 = SELECT e.Name, “50K” AS Sal FROM Emp e WHERE e.Sal = “50K”

• Equivalent queries in the same language, but different annotation behavior

=a

Page 17: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 17

• Relational algebra seems to suggest a natural set of propagation rules

• SQL seems to suggest another natural propagation rule– one that is based on variable bindings

• Not clear how we extend the semantics of query languages so that annotation propagation is “well-behaved”.

• Should a query language be “annotation-conscious” ?OR• Should the user be allowed to control which annotation

gets propagated to where?

Do we need an “annotation-conscious” Do we need an “annotation-conscious” QL?QL?

Page 18: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 18

End of Talk