Mazda R27 1

Preview:

DESCRIPTION

Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com

Citation preview

UncertaintyLineage

DataBases

VeryLargeDataBases

19752006

ULDBs: Databases with ULDBs: Databases with Uncertainty and LineageUncertainty and Lineage

Omar Benjelloun, Anish Das Sarma,

Alon Halevy, Jennifer WidomStanford InfoLab

33

MotMotivivationation

• Many applications involve data that is uncertain (approximate, probabilistic, inexact, incomplete,

imprecise, fuzzy, inaccurate, ...)

• Many of the same applications need to track the lineage of their data

Neither uncertainty nor lineage are supported by conventional DBMSs

Coincidence or Fate?Coincidence or Fate?

44

Sample Applications Needing Sample Applications Needing Uncertainty and LineageUncertainty and Lineage

•Scientific databases

•Sensor databases

•Data cleaning

•Data integration

• Information extraction

55

Trio ProjectTrio Project

Building a new kind of DBMS in which:1. Data2. Uncertainty3. Lineage

are all first-class interrelated concepts

66

Lineage and UncertaintyLineage and Uncertainty

•Lots of independent work in lineage and uncertainty (related work at end of talk)

•Turns out: The connection between uncertainty and lineage goes deeper than just a shared need by several applications

Coincidence or Fate?Coincidence or Fate?

77

Lineage and UncertaintyLineage and Uncertainty

• Lineage...1) Enables simple and consistent

representation of uncertain data2) Correlates uncertainty in query results

with uncertainty in the input data3) Can make computation over uncertain

data more efficient

88

Outline of the TalkOutline of the Talk

•The ULDB data model

•Querying ULDBs

•ULDB properties

•Membership and extraction operations

•Confidences

•Current, related, and future work

99

Running Example: Crime Running Example: Crime SolverSolver

Saw(witness,car) Drives(person,car)

Suspects(person) = πperson(Saw ⋈ Drives)

1010

UncertaintyUncertainty

•An uncertain database represents a set of possible instances. Examples:– Amy saw either a Honda or a Toyota

– Jimmy drives a Toyota, a Mazda, or both

– Betty saw an Acura with confidence 0.5 or a Toyota with confidence 0.3

– Hank is a suspect with confidence 0.7

1111

Uncertainty in a ULDBUncertainty in a ULDB

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences

1212

Uncertainty in a ULDBUncertainty in a ULDB

1. Alternatives: uncertainty about value

2. ‘?’ (Maybe) Annotations3. Confidences

Saw (witness,car)

(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)

Three possibleinstances

witness carAmy { Honda, Toyota,

Mazda }

=

1313

Six possibleinstances

Uncertainty in a ULDBUncertainty in a ULDB

1. Alternatives2. ‘?’ (Maybe): uncertainty about

presence3. Confidences

Saw (witness,car)

(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)

(Betty, Acura)?

1414

Uncertainty in a ULDBUncertainty in a ULDB

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences: weighted

uncertaintySaw (witness,car)

(Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy, Mazda): 0.2

(Betty, Acura): 0.6?

Six possible instances, each with a probability

1515

Data Models for Data Models for UncertaintyUncertainty•Our model (so far) is not especially new•We spent some time exploring the

space of models for uncertainty [ICDE 2006]

•Tension between understandability and expressiveness– Our model is understandable– But it is not complete, or even closed under

common operations

1616

Closure and Closure and CompletenessCompleteness

•Completeness Can represent all sets of possible

instances

•Closure Can represent results of operations

•Note: Completeness Closure

1717

Model (so far) Not ClosedModel (so far) Not Closed

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

(Billy, Honda) ∥ (Frank, Honda)

(Hank, Honda)

Suspects

Jimmy

Billy ∥ Frank

Hank

Suspects = πperson(Saw ⋈ Drives)

?

??

Does not correctlycapture possibleinstances in theresult

CANNOT

1818

LineageLineage to the Rescue to the Rescue

•Lineage: “where data came from”– Internal lineage– External lineage (not covered in this

talk)

• In ULDBs: A function λ from alternatives to sets of alternatives (or external sources)

1919

Example with LineageExample with Lineage

ID Saw (witness,car)

11

(Cathy, Honda) ∥ (Cathy, Mazda)

ID Drives (person,car)

21

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

22

(Billy, Honda) ∥ (Frank, Honda)

23

(Hank, Honda)

ID Suspects

31

Jimmy

32

Billy ∥ Frank

33

Hank

?

?

?

Suspects = πperson(Saw ⋈ Drives) λ(31) = (11,2),(21,2)

λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)

λ(33) = (11,1), 23

Correctly captures possible instances inthe result

2020

ULDBsULDBs

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences4. Lineage

ULDBs are Closed and Complete

2121

Outline of the TalkOutline of the Talk

•The ULDB data model

•Querying ULDBs

•ULDB properties

•Membership and extraction operations

•Confidences

•Current, related, and future work

2222

Querying ULDBsQuerying ULDBs

•Query Q on ULDB D

DD

D1, D2, …, DnD1, D2, …, Dn

possibleinstances

Q on eachinstance

representationof instances

Q(D1), Q(D2), …, Q(Dn)Q(D1), Q(D2), …, Q(Dn)

D’D’implementation of Q

D + ResultD + Result

2323

Well-Behaved ULDBsWell-Behaved ULDBs

• If we start with a well-behaved ULDB and perform standard queries, it remains well-behaved

• Intuitively (details in paper):– Acyclic: No cycles in the lineage– Deterministic: Non-empty lineages of distinct

alternatives are distinct– Uniform: Alternatives of same tuple are derived

from the same set of tuples

2424

ULDB MinimalityULDB Minimality

• Data-minimality• Does every alternative appear in some

possible instance? (no extraneous alternatives)

• Does every maybe-tuple in R not appear in some possible instance? (no extraneous ‘?’s)

•Lineage-minimality

2525

Data-Minimality ExamplesData-Minimality ExamplesExtraneous ‘?’

. . .

10 (Billy, Honda) ∥ (Frank, Honda)

. . .

. . .

20

Billy ∥ Frank

. . .

?λ(20,1)=(10,1); λ(20,2)=(10,2)

extraneous

2626

Data-Minimality ExamplesData-Minimality ExamplesExtraneous alternative

(Diane, Mazda) ∥ (Diane, Acura)

Dianeextraneous

(Diane, Mazda)

(Diane, Acura)

?

? ?

2727

Data-MinimizationData-Minimization

• Extraneous alternative theorem:– An alternative is extraneous iff it is (possibly

transitively) derived from multiple alternatives of the same tuple.

• Extraneous “?” theorem– A “?” on tuple t is extraneous iff

1. it is derived from base tuples without “?”2. t has as many alternatives as the product of the

number in its base tuples

• Minimization algorithm based on the theorems

(see paper)

2828

Data-minimalLineage-minimal

Lineage-minimal

Data-minimal

Data-minimize

Queries

Lineage-minimize

ULDB Properties and ULDB Properties and OperationsOperations

Mem

bers

hip

Extraction

2929

Membership QuestionsMembership Questions

• Does a given tuple t appear in some (all) possible instance(s) of R ?– Polynomial algorithms based on Data-

minimization

• Is a given table T one of (all of) the possible instances of R ?– NP-Hard

t?, T?

RR

I1, I2, …, InI1, I2, …, In

possibleinstances

3030

ExtractionExtraction

•Extraction algorithm in paper

Suspects Eats

Saw Drives

3131

Outline of the TalkOutline of the Talk

•The ULDB data model

•Querying ULDBs

•ULDB properties

•Membership and extraction operations

•Confidences

•Current, related, and future work

3232

ConfidencesConfidences

• Confidences supplied with base data

• Trio computes confidences on query results– Default probabilistic interpretation– Can choose to plug in different arithmetic

Saw (witness,car)

(Cathy, Honda): 0.6 ∥ (Cathy, Mazda): 0.4

Drives (person,car)

(Jimmy, Mazda): 0.3 ∥ (Bill, Mazda): 0.6

(Hank, Honda): 1.0

Suspects

Jimmy: 0.12 ∥ Bill: 0.24

Hank: 0.6

?

?

?

0.3 0.4

0.6

ProbabilisticProbabilistic Min

3333

Query Processing with Query Processing with ConfidencesConfidences• Previous approach (probabilistic

databases)– Each operator computes confidences during

query execution– Only certain query plans allowed

• In ULDBs– Confidence of alternative A is function of

confidences in its transitive lineage

• Our approach: Decouple data and confidence computation– Use any query plan for data computation– Compute confidences on-demand using

lineage– Can give arbitrarily large improvements

3434

Current Work: AlgorithmsCurrent Work: Algorithms

•Algorithms: confidence computation, extraneous data, membership questions– Minimize lineage traversal– Memoization– Batch computations

3535

The Trio TrioThe Trio Trio

• Data Model– ULDBs (Coming: incomplete relations;

continuous uncertainty; correlation uncertainty)

• Query Language– Simple extension to SQL– Query uncertainty, confidences, and lineage

• System– Did you see our demo? – Version 1: Entirely on top of conventional DBMS– Surprisingly easy and complete, reasonably

efficient

TriQL

3636

Brief Related WorkBrief Related Work

•Uncertainty– Modeling

•C-tables [IL84], Probabilistic Databases [CP87], using Nested Relations [F90]

– Systems•ProbView [LLRS97], MYSTIQ [BDM+05],

ORION [CSP05], Trio [BDHW05]

•Lineage– DBNotes [CTV05], Data Warehouses

[CW03]

but don’t forgetthe lineage…

Thank YouThank You

Search “stanford trio”

(or, http://i.stanford.edu/trio)

Recommended