CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design

CS 347 Notes 02 1

CS 347: Parallel and Distributed

Data ManagementNotes02: Distributed DB

Design

Hector Garcia-Molina

CS 347 Notes 02 2

Distributed DB DesignTop-down approach: - have DB…

- how to split and allocate the sites

Multi-DBs (or bottom-up): no design issues!

Chapter 5Ozsu & Valduriez

CS 347 Notes 02 3

Two issues in DDB design:

• Fragmentation• Allocation

Note: issues not independent, but will cover separately

CS 347 Notes 02 4

Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *

from E from Ewhere loc=Sa where loc=Sband… and ...

CS 347 Notes 02 5

Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *

from E from Ewhere loc=Sa where loc=Sband… and ...

Motivation: Two sites: Sa, Sb Qa QbSa Sb

CS 347 Notes 02 6

• It does not take a rocket scientist to figure out fragmentation...

CS 347 Notes 02 7

# NM Loc Sal E 5

78

Sa 10Sally Sb 25Tom Sa 15

Joe

# NM Loc Sal # NM Loc Sal58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

..

..

....

F

At Sa At Sb

CS 347 Notes 02 8

F = { F1, F2 }

F1 = loc=Sa E F2 = loc=Sb E

CS 347 Notes 02 9

F = { F1, F2 }

F1 = loc=Sa E F2 = loc=Sb E

called primary horizontal fragmentation

CS 347 Notes 02 10

Fragmentation• Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

• VerticalR

CS 347 Notes 02 11

Fragmentation• Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

• VerticalR Fragmentation also called

Sharding

CS 347 Notes 02 12

Three common horizontal partitioning techniques• Round robin• Hash partitioning• Range partitioning

CS 347 Notes 02 13

• Round robinRD0 D1 D2t1 t1t2 t2t3 t3t4 t4... t5• Evenly distributes data• Good for scanning full relation• Not good for point or range queries

CS 347 Notes 02 14

• Hash partitioningRD0 D1 D2t1h(k1)=2 t1t2h(k2)=0 t2t3h(k3)=0 t3t4h(k4)=1 t4...• Good for point queries on key; also for joins• Not good for range queries; point queries not on

key• If hash function good, even distribution

CS 347 Notes 02 15

• Range partitioningRD0 D1 D2t1: A=5 t1t2: A=8 t2t3: A=2 t3t4: A=3 t4...• Good for some range queries on A• Need to select good vector: else unbalance

data skew execution skew

4 7

partitioningvector

V0 V1

CS 347 Notes 02 16

Which are good fragmentations?Example:F = { F1, F2 }

F1 = sal<10 E F2 = sal>20 E

CS 347 Notes 02 17

Which are good fragmentations?Example:F = { F1, F2 }

F1 = sal<10 E F2 = sal>20 E

Problem: Some tuples lost!

CS 347 Notes 02 18

Which are good fragmentations?Second example:F = { F3, F4 }

F3 = sal<10 E F4 = sal>5 E

CS 347 Notes 02 19

Which are good fragmentations?Second example:F = { F3, F4 }

F3 = sal<10 E F4 = sal>5 E

Tuples with 5 < sal < 10 are duplicated...

CS 347 Notes 02 20

Prefer to deal with replication explicitly

Example: F = { F5, F6, F7 }

F5 = sal 5 E F6 = 5< sal <10

EF7 = sal 10 E

Then replicate F6 if convenient (part of allocation problem)

CS 347 Notes 02 21

Desired properties for horizontal fragmentationR F ={ F1, F2, … }

(1) Completenesst R, Fi F such that t

Fi

CS 347 Notes 02 22

(2) Disjointness t Fi, Fj such that tFj, i j, Fi, Fj F

(3) Reconstruction - ignore

CS 347 Notes 02 23

How do we get completeness and disjointness?(1) Check it “manually”!

e.g., F1 = sal<10 E ; F2 = sal10 E

CS 347 Notes 02 24

How do we get completeness and disjointness?(2) “Automatically” generate fragments

with these properties

Desired simple predicates Fragments

CS 347 Notes 02 25

Example of generation• Say queries use predicates:

A<10, A>5, Loc = SA, Loc = SB

• Next: - generate “minterm” predicates

- eliminate useless ones

CS 347 Notes 02 26

Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB(8) A<10 ¬(A>5) ¬(Loc=SA)

¬(Loc=SB)

CS 347 Notes 02 27


¬(Loc=SB)

CS 347 Notes 02 28


¬(Loc=SB)A 5

5 < A < 10

CS 347 Notes 02 29

Minterm predicates (part II) (9) ¬(A<10) A>5 Loc=SA Loc=SB(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)

CS 347 Notes 02 30

Minterm predicates (part II) (9) ¬(A<10) A>5 Loc=SA Loc=SB(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)

A 10

CS 347 Notes 02 31

Final fragments:

F2: 5 < A < 10 Loc=SA F3: 5 < A < 10 Loc=SB F6: A 5 Loc=SA F7: A 5 Loc=SB F10: A 10 Loc=SA F11: A 10 Loc=SB

CS 347 Notes 02 32

Note: elimination of useless fragments depends on application semantics:

e.g.: if LOC could be SA, SB, we need to add fragments

F4: 5 <A <10 Loc SA Loc SB

F8: A 5 Loc SA Loc SB F12: A 10 Loc SA Loc SB

CS 347 Notes 02 33

Why does this work?Predicates: p1 p2 p3 p4

p1 p2 p3 ¬ p4

¬ p1 ¬ p2 ¬ p3 ¬ p4

...

CS 347 Notes 02 34

(1) Completeness: Take t Rpi(t) must be T or F!

Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F

Then t is in fragment with predicate p1 p2 ¬ p3 ¬ p4

CS 347 Notes 02 35

(2) DisjointnessSay t Fragment p1 p2 ¬ p3 ¬ p4Then:

p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F

t cannot be in any other fragment!

CS 347 Notes 02 36

Summary• Given simple predicates Pr= { p1, p2,..

pm }minterm predicates are

M={m | m = pk*, 1 k m }

where pk* is pk or is ¬ pk

pkPr

• Fragments m R for all m M are complete and disjoint

Another Desired Fragmentation Property:Match Access Patterns

CS 347 Notes 02 37

data Adata B

data C

frequentlyaccessedtogether

try toplace insame

fragment

CS 347 Notes 02 38

Return to example:E(#, NM, LOC, SAL,…)Common queries:

Qa: select * Qb: select * from E from E where LOC=Sa where

LOC=Sb and … and ...

CS 347 Notes 02 39

Three choices:

(1) Pr = { } F1 ={ E }(2) Pr = {LOC=Sa, LOC=Sb}

F2={ loc=Sa E, loc=Sb E }(3) Pr = {LOC=Sa, LOC=Sb, Sal<10}

F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E }

CS 347 Notes 02 40

In other words:Loc=Sa sal < 10

Loc=Sa sal 10

Loc=Sb sal < 10

Loc=Sb sal 10

F1 F3F2

Qa: Select … loc = Sa ...

Qb: Select … loc = Sb ...

CS 347 Notes 02 41

In other words:Loc=Sa sal < 10

Loc=Sa sal 10

Loc=Sb sal < 10

Loc=Sb sal 10

F1 F3F2

Qa: Select … loc = Sa ...

Qb: Select … loc = Sb ...

F2 is good…(not F1 , F3 )

CS 347 Notes 02 42

Derived horizontal fragmentation

Example: E(#, NM, SAL, LOC) F={ E1, E2} by LOC J(#, DES,…)

Common query for project: [Given employee name,

list projects (s)he works in]

CS 347 Notes 02 43

E1

(at Sa) (at Sb)

E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…

# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…

# Description5 work on 347 hw7 go to moon5 build table12 rest…

J

CS 347 Notes 02 44

E1

(at Sa) (at Sb)

E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…

# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…

J1 J2

J1 = J E1 J2 = J E2

# Des5 work on 347 hw5 build table…

# Des7 go to moon12 rest…

CS 347 Notes 02 45

Derived horizontal fragmentationR, F = { F1, F2, ... Fn}

S, D = {D1, D2, …Dn} where Di =S

Fi

Convention: R is owner S is member

F could beprimary or derived

CS 347 Notes 02 46

• Checking completeness and disjointness of derived fragmentation

But no #= 33 in E1 nor in E2!

# Des…33 build chair…

Example: Say J is:

This J tuple will not be in J1 nor J2Fragmentation not complete

CS 347 Notes 02 47

Need to enforcereferential integrity constraint:

join attr(#) of member relation

joint attr(#) of owner

relation

To get completeness

CS 347 Notes 02 48

# NM Loc Sal5 Joe Sa 10…

# NM Loc Sal5 Fred Sb 20…

Example:

E1 E2

# Description5 day off…



J1

J

J2

Fragmentationis not

disjoint!

CS 347 Notes 02 49

Join attribute(#) should bekey of owner relationTo get

disjointness

CS 347 Notes 02 50

Summary: horizontal fragmentation• Type: primary, derived• Properties: completeness,

disjointness

CS 347 Notes 02 51

Vertical fragmentation

E1

# NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15…

# NM Loc5 Joe Sa7 Sally Sb8 Fred Sa…

# Sal5 107 258 15…

E

E2

Example:

CS 347 Notes 02 52

R[T] R1[T1] Ti T

Rn[Tn]

Just like normalization of relations

...

CS 347 Notes 02 53

Properties: R[T] Ri[Ti]

(1) Completeness

U Ti = Tall i

CS 347 Notes 02 54

(2) DisjointnessTi Tj = for all i,j ij

E(#,LOC,SAL)E1(#,LOC)

E2(SAL)

CS 347 Notes 02 55

(2) DisjointnessTi Tj = for all i,j ij

E(#,LOC,SAL)E1(#,LOC)

E2(SAL)

Not a desirable property!!(could not reconstruct R!)

CS 347 Notes 02 56

(3) Lossless join

Ri = Rall i

One way to achieve lossless join: Repeat key in all fragments, i.e.,

Key Ti for all i

CS 347 Notes 02 57

How do we decide what attributes are grouped with which?

E1(#,NM,LOC)E2(#,SAL)

Example:E(#,NM,LOC,SAL) E1(#,NM)

E2(#,LOC)E3(#,SAL)?

CS 347 Notes 02 58

A1 A2 A3 A4 A5A1 - - - - -A2 50 - - - -A3 45 48 - - -A4 1 2 0 - -A5 0 0 4 75 -

Attribute affinity matrix

CS 347 Notes 02 59

A1 A2 A3 A4 A5A1 - - - - -A2 50 - - - -A3 45 48 - - -A4 1 2 0 - -A5 0 0 4 75 -

Attribute affinity matrix

R1[K,A1,A2,A3] R2[K,A4,A5]

CS 347 Notes 02 60

• Textbook (Ozsu & Valduriez) discusses– How to build affinity matrix– How to identify attribute clusters– How to partition relation

• You are not responsible for– Clustering and partitioning algorithms (i.e., Skip pages 135-145)

CS 347 Notes 02 61

AllocationExample: E(#,NM,LOC,SAL)

F1 = loc=Sa E ; F2 = loc=Sb E Qa: select … where loc=Sa...Qb: select … where loc=Sb…

Site a Site b

Where do F1,F2 go?

?

CS 347 Notes 02 62

Issues• Where do queries originate• What is communication cost?

and size of answers, relations,…

• What is storage capacity, cost at sites? and size of fragments?

• What is processing power at sites?

CS 347 Notes 02 63

• What is query processing strategy?

– How are joins done?– Where are answers collected?

More Issues

CS 347 Notes 02 64

• Cost of updating copies?• Writes and concurrency control?• ...

Do we replicate fragments?

CS 347 Notes 02 65

Optimization problem:• What is best placement of fragments

and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...

• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...

CS 347 Notes 02 66

Optimization problem:• What is best placement of fragments

and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...

• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...

This is an incredibly hard problem

CS 347 Notes 02 67

Example: Single fragment F

Read cost: [ti MIN Cij]

i: Originating site of requestti: Read traffic at SiCij: Retrieval cost

Accessing fragment F at Sj from Si

ji=1

m

CS 347 Notes 02 68

Scenario - Read cost1 2

.

. .

.

.

.3

i

C=inf

ci,3

ci,1 ci,2

Stream of readrequests for F ti REQ/SECC=inf

C=infF

F F

CS 347 Notes 02 69

Write cost Xj ui C’ij

i: Originating site of requestj: Site being updatedXj: 0 if F not stored at Sj

1 if F stored at Sjui: Write traffic at Si C’ij: Write cost

Updating F at Sj from Si

i=1 j=1

m m

CS 347 Notes 02 70

Scenario - write cost

Updates ui updates/sec. .

. .

. .i

F

F F

CS 347 Notes 02 71

Storage cost:

Xi di

Xi: 0 if F not stored at Si1 if F stored at Si

di: storage cost at Si

i=1

m

CS 347 Notes 02 72

Target function:

min [ti MIN Cij + Xj ui C’ij ]

+ Xi di

ji=1 j=1

i=1

m m

m

CS 347 Notes 02 73

Can add more complications:

Examples: - Multiple fragments- Fragment sizes- Concurrency control cost

Case Study: PNUTS• Where in the World is My Data?

Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, Hector Garcia-Molina; VLDB 2011

• Distributed object/tuple store for Yahoo!

CS 347 Notes 02 74

Case Study: PNUTS• Issue: Where to locate data• Issue: What and where to replicate

CS 347 Notes 02 75

PNUTS Discussion

• Dynamic vs Static fragment placement

• Caching vs Replication

CS 347 Notes 02 76

Policy Constraints• MIN_COPIES: The minimum number of

full replicas of the record that must exist.

• INCL_LIST: An inclusion list -- the locations where a full replica of the record must exist.

• EXCL_LIST: An exclusion list -- the locations where a full replica of the record cannot exist.

CS 347 Notes 02 77

Example Rule

• Rule 1: • IF TABLE_NAME = "Users“

THENSET 'MIN_COPIES' = 2

CONSTRAINT_PRI = 0

CS 347 Notes 02 78

Another Example Rule

• Rule 2:• IF TABLE_NAME = "Users" AND

FIELD STR('home location') = 'France‘THEN

SET 'MIN_COPIES' = 3 ANDSET 'EXCL LIST' = 'USWest,

USEast‘CONSTRAINT PRI = 1

CS 347 Notes 02 79

CS 347 Notes 02 80

Summary• Description of fragmentation• Good fragmentations• Design of fragmentation• Allocation

Documents

CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design