27
LIVE A lineage-supported, versioned DBMS Anish Das Sarma Martin Theobald Jennifer Widom

LIVE A lineage-supported, versioned DBMS Anish Das Sarma Martin Theobald Jennifer Widom

Embed Size (px)

Citation preview

Page 1: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

LIVE

A lineage-supported, versioned DBMS

Anish Das Sarma Martin Theobald Jennifer Widom

Page 2: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ULDB Data Model and the Trio System Uncertainty & Lineage

LIVE Data Model (LDM) Uncertainty, Lineage & Versioning

Data Modifications Insert/Delete Tuples, Update Values, Update

Confidences Query Evaluation

Valid-At vs. Snapshot Queries, Interval Computations, Confidence Computations, Complexity

Experiments/Conclusions

Agenda

21.04.232 LIVE - A lineage-supported, versioned DBMS

Page 3: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ULDB Data Model

21.04.233 LIVE - A lineage-supported, versioned DBMS

Different types of uncertainty: 1. Tuple Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences

Implementation of the ULDB data model: Trio System

TriQL query language TrioExplorer browser frontend, trioplus client,

API Enhanced PostgreSQL backend (SPI) Search for “Stanford Trio”

Page 4: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ULDBs – Alternatives

21.04.234 LIVE - A lineage-supported, versioned DBMS

1. Alternatives: uncertainty about attribute values

2. ‘?’ (Maybe) Annotations 3. Confidences

Saw (witness, color, car)

Amy red, Honda ∥ red, Toyota ∥ orange, Mazda

Three possibleworlds

Page 5: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ULDBs – Maybe Annotations

21.04.235 LIVE - A lineage-supported, versioned DBMS

Six possibleworlds

1. Alternatives 2. ‘?’ (Maybe): uncertainty about tuple

presence 3. Confidences

?

Saw (witness, color, car)

Amy red, Honda ∥ red, Toyota ∥ orange, Mazda

Betty blue, Acura

Page 6: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ULDBs – Confidences

21.04.236 LIVE - A lineage-supported, versioned DBMS

1. Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences: weighted uncertainty

Six possible worlds,each with a probability

?

Saw (witness, color, car)

Amy red, Honda 0.5 ∥ red, Toyota 0.3 ∥ orange, Mazda 0.2

Betty blue, Acura 0.6

Page 7: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ULDBs – Closure

21.04.237 LIVE - A lineage-supported, versioned DBMS

Saw (witness, car)

Cathy

Mazda ∥ Honda

Drives (person, car)

Jimmy, Toyota ∥ Jimmy, Mazda

Billy, Honda ∥ Frank, Honda

Hank, Honda

Suspects

Jimmy

Billy ∥ Frank

Hank

Suspects = πperson(Saw ⋈ Drives)

???

Does not correctlycapture possibleworlds in theresult!

CANNOT

Page 8: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ULDBs – Lineage

21.04.238 LIVE - A lineage-supported, versioned DBMS

ID Saw (witness, car)

11

Cathy

Honda ∥ Mazda

ID Drives (person, car)

21

Jimmy, Toyota ∥ Jimmy, Mazda

22

Billy, Honda ∥ Frank, Honda

23

Hank, Honda

ID Suspects

31

Jimmy

32

Billy ∥ Frank

33

Hank

Suspects = πperson(Saw ⋈ Drives)

???

λ(31) = (11,2)(21,2)

λ(32,1) = (11,1)(22,1)

λ(33) = (11,1)23

; λ(32,2) = (11,1)(22,2)

Page 9: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ULDBs – Summary

21.04.239 LIVE - A lineage-supported, versioned DBMS

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences4. Lineage

ULDBs are closed and complete

Uncertainty-Lineage Databases (ULDBs)Uncertainty-Lineage Databases (ULDBs)

Page 10: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Can exclusively utilize lineage in order to compute the confidence of a result tuple.

#P-complete for general Boolean formulas Approximation algorithms: Luby-Karp, etc.

Lineage & Confidences

21.04.2310 LIVE - A lineage-supported, versioned DBMS

λ(21) = (11 12 13)

ID Saw(witness, car)

11 (Mary, Honda) : 0.8

12 (Susan, Honda) : 0.9

13 (Betty, Honda) : 0.5

ID SuspectCars(car)

21 Honda : ?

Select distinct car from Saw;

P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5)

0.99

Page 11: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

ID Photo(Number,Name)2

11 (1, Amy) [0,1] : 1.0

12 (1, Bob) [0,] : 0.6

13 (2, Carl) [0,1] : 0.314 (3, Dale) [1,1] : 0.1

Versioning (LDM Data Model)

21.04.2311 LIVE - A lineage-supported, versioned DBMS

Version intervals for tuples Contiguous version numbers 0,…, Database has current version vD

Tuples have a validity intervals [s, e]

Valid-At Queries: Select * from Photo valid-at 2;

Snapshot Queries: View Photo at 2;

Possible Worlds: LDM databases encode lists of sets of

possible worlds.

ID Photo(Number,Name)2

12 (1, Bob) [0,] : 0.6

ID Photo@2(Number,Name)

12 (1, Bob) : 0.6

Page 12: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Insert Tuple: Insert t with version [vD+1,]

commit; Increase vD

Data Modifications – Insert

21.04.2312 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)0

21 (Bob, NY, Analyst) [0,] : 1.0

22 (Carl, IL, Teacher) [0,] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.3

ID People(Name, State, Job)1

ID People(Name, State, Job)2

25 (David, PA, CEO) [2,] : 0.3

(1)

(2)

(2)

Page 13: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Insert Tuple: Insert t with version [vD+1,]

Delete Tuple: Set end(t) to vD

commit; Increase vD

Data Modifications – Delete

21.04.2313 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)2

21 (Bob, NY, Analyst) [0,] : 1.0

22 (Carl, IL, Teacher) [0,] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.3

22 (Carl, IL, Teacher) [0,2] : 1.0

ID People(Name, State, Job)3

(1)

(2)

(3)

(2)

Page 14: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Insert Tuple: Insert t with version [vD+1,]

Delete Tuple: Set end(t) to vD

Update Value: Set end(t) to vD

Insert t’ with version [vD+1,]

commit; Increase vD

Data Modifications – Update

21.04.2314 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)3

21 (Bob, NY, Analyst) [0,] : 1.0

22 (Carl, IL, Teacher) [0,2] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.321 (Bob, CA, Student) [4,] : 0.3

21 (Bob, NY, Analyst) [0,3] : 1.0

(1)

(2)

(3)

(2)

(4)

(4)

ID People(Name, State, Job)4

Page 15: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Insert Tuple: Insert t with version [vD+1,]

Delete Tuple: Set end(t) to vD

Update Value: Set end(t) to vD

Insert t’ with version [vD+1,]

Update Probability: Set end(t) to vD

Insert t’=t with probability p’ and version [vD+1,]

commit; Increase vD

Data Modifications – Update

21.04.2315 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)4

21 (Bob, NY, Analyst) [0,3] : 1.0

22 (Carl, IL, Teacher) [0,2] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.321 (Bob, CA, Student) [4,] : 0.3

(1)

(2)

(3)

(2)

(4)

(4)21 (Bob, CA, Student) [5,] :

0.7

21 (Bob, CA, Student) [4,4] : 0.3 (5)

ID People(Name, State, Job)5

Page 16: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Insert Tuple: Insert t with version [vD+1,]

Delete Tuple: Set end(t) to vD

Update Value: Set end(t) to vD Insert t’ with version [vD+1,]

Update Probability: Set end(t) to vD Insert t’=t with probability p’ and version

[vD+1,]

Possible worlds: Updates may create duplicate

worlds, which are merged (at any version v).

Data Modifications – Summary

21.04.2316 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)4

21 (Bob, NY, Analyst) [0,3] : 1.0

22 (Carl, IL, Teacher) [0,2] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.326 (Bob, CA, Student) [4,] : 0.3

(1)

(2)

(3)

(2)

(4)

(4)21 (Bob, CA, Student) [5,] :

0.7

21 (Bob, CA, Student) [4,4] : 0.3 (5)

ID People(Name, State, Job)5

Page 17: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

1) Data Computation (regular SQL, including lineage) 2) Interval Computation (stored procedure)

Query Evaluation

21.04.2317 LIVE - A lineage-supported, versioned DBMS

DD

D1, D2, …, Dn1D1, D2, …, Dn1

possibleworlds

at versionsQ on each

world

encodingof possible worlds

Q(D1), Q(D2), …, Q(Dn)Q(D1), Q(D2), …, Q(Dn)

implementation of Q

operational semantics

D + ResultD + Result

D1, D2, …, Dn2D1, D2, …, Dn2

@ (0)

@ (1)

D1, D2, …, DnvD1, D2, …, Dnv @ (vD)

@ (0)

Page 18: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Can exclusively utilize lineage in order to compute the confidence of any result tuple.

Can exclusively utilize lineage in order to compute the version interval of any result tuple.

Lineage, Confidences & Versions

21.04.2318 LIVE - A lineage-supported, versioned DBMS

Page 19: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Positive Lineage (disjunctions & conjunctions) In the lineage formula λ(t)

Replace every tuple t’ by its version interval Replace every with and every with

Version Interval Computation

21.04.2319 LIVE - A lineage-supported, versioned DBMS

λ(21) = (11 12 13)

ID Saw(witness, car)3

11 (Mary, Honda) [1,] : 0.8

12 (Susan, Honda) [2,] : 0.9

13 (Betty, Honda) [3,] : 0.5

ID SuspectCars(car)3

21 (Honda) ? : ?

Select distinct car from Saw;

P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5)

[1,] :

0.99

Page 20: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Positive Lineage (disjunctions & conjunctions) In the lineage formula λ(t)

Replace every tuple t’ by its version interval Replace every with and every with

Version & Confidence Computation

21.04.2320 LIVE - A lineage-supported, versioned DBMS

λ(21) = (11 12)

ID Saw(witness, car)3

11 (Mary, Honda) [1,] : 0.8

12 (Susan, Honda) [2,] : 0.9

13 (Betty, Honda) [3,] : 0.5

ID SuspectCars(car)3

21 (Honda) [1,] : 0.99

Select distinct car from Saw;

P(21) = 1 – (1-0.8) X (1-0.9)

ID SuspectCars(car)2

21 (Honda) ? : ?

Select distinct car from Saw valid-at 2;

[1,] : 0.98

Page 21: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

21.04.2321 LIVE - A lineage-supported,

versioned DBMS

Can decouple interval computation from data computation

Or: push interval computation into query plans only when there is no negation.

Interval Computations & Query Plans

Select R.A from R EXCEPT ( Select R.A from R EXCEPT Select S.A from S ); r=(a)[0,10] u=(a)[0,10]

t=(a)[0,10]

r=(a)[0,10] s=(a)[5,15]

Select R.A from R,SWhere R.A=S.A;

r=(a)[0,10] s=(a)[5,15]

t=(a)[5,10]

Page 22: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Positive Lineage (disjunctions & conjunctions) Version interval computation

PTIME (linear) Confidence computation

#P-complete

Arbitrary Lineage (including negation) Version interval computation

PTIME (linear) if all confidences are known NP-hard if confidences are not known

(need to check for idempotence of negated tuples) Confidence computation

#P-complete

Complexity Results

21.04.2322 LIVE - A lineage-supported, versioned DBMS

Page 23: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Probabilistic & versioned TPC-H setting Queries over Lineitem, Orders tables

with varying join selectivity from 0.1% to 1% (6,000-60,000 and1,500-15,000 tuples for Lineitem & Orders)

Update 0.1% to 1% of the input data Assign probabilities within [0,1] uniform-randomly to

tuples

Additional indexes for versioning Two B+-trees on (start, end) and end points of intervals Rewrite valid-at & snapshot queries using

WHERE (start ≤ v ≤ end) predicates

Experiments – Setup

21.04.2323 LIVE - A lineage-supported, versioned DBMS

Page 24: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Experiments – Results (I)

21.04.2324 LIVE - A lineage-supported, versioned DBMS

Join query Overhead of versioned

system vs. non-versioned system (versions not computed)

Join query Overhead of

computing versions (versioned system)

(%)

Page 25: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Experiments – Results (II)

21.04.2325 LIVE - A lineage-supported, versioned DBMS

Join query Progressive data

updates (overwrite multiple times)

Join query Valid-at queries vs. full version

computation

Page 26: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

Experiments – Results (III)

21.04.2326 LIVE - A lineage-supported, versioned DBMS

Overhead of version computation, different query types (1% data modified)

Page 27: LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

LDMs are closed and complete Generalizes to full ULDB data

model (including value alternatives & maybe (?) annotations)

Can employ lineage also for update propagations Supports all of

INSERT/DELETE/UPDATE with INTERSECT/UNION/EXCEPT set operations

Conclusions

21.04.2327 LIVE - A lineage-supported, versioned DBMS

Lineage

Uncertainty Versioning

DBMS