19
Replication and Distribution CSE 444 Spring 2012 University of Washington

Replication and Distribution CSE 444 Spring 2012 University of Washington

Embed Size (px)

Citation preview

Page 1: Replication and Distribution CSE 444 Spring 2012 University of Washington

Replication and Distribution

CSE 444Spring 2012

University of Washington

Page 2: Replication and Distribution CSE 444 Spring 2012 University of Washington

HASH MAPS

Page 3: Replication and Distribution CSE 444 Spring 2012 University of Washington

Hash Maps

• Precursors to Bloom filters.• Used to reduce communication while joining.• S = Set to transmit.– S = {x1, x2, …, xn}

• H = Hash Map.– An array of m bits.

Page 4: Replication and Distribution CSE 444 Spring 2012 University of Washington

Operation

• To insert x in H: – Compute the hash on x to get a bit position j– Set j to 1.

• To send S, insert all of its elements in H.

• Two distinct elements can hash to 1 position.– Creates false positives.

Page 5: Replication and Distribution CSE 444 Spring 2012 University of Washington

Question

Data supplier R has N = 1 million documents. Data supplier S also has N = 1 million documents. Each document is 1KB. They have 50 documents in common and they want to compute these. They will proceed as follows:

1. R computes a hash map M with cN bits, where c=8 and sends it to S.

2. S checks its items in M and sends all matches to R. 3. R computes the result and sends the matching 50 documents to S.

Q: Indicate the total number of bytes transferred over the network in each step.

Page 6: Replication and Distribution CSE 444 Spring 2012 University of Washington

Analysis

• Recall |H| = m.• Insert one element into H.• Probability that bit j remains 0?

• p = (1 – 1/m)

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 7: Replication and Distribution CSE 444 Spring 2012 University of Washington

Analysis

• Recall |H| = m.• Insert all n elements into H.• Probability that bit j remains 0?

• p = (1 – 1/m)n = e-n/m (for large m)

0 1 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 1 0 0

Page 8: Replication and Distribution CSE 444 Spring 2012 University of Washington

Probability of False Positives

• Take a random element y, and check if its hash is set to 1 in H.

• Probability of FP = probability that the hash is 1.

• Probability that bit j is 1?• p = 1 – (1 – 1/m)n = 1 – e-n/m (for large m)

0 1 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 1 0 0

Page 9: Replication and Distribution CSE 444 Spring 2012 University of Washington

Question

Data supplier R has N = 1 million documents. Data supplier S also has N = 1 million documents. Each document is 1KB. They have 50 documents in common and they want to compute these. They will proceed as follows:

1. R computes a hash map M with cN bits, where c=8 and sends it to S.

2.S checks its items in M and sends all matches to R. 3. R computes the result and sends the matching 50 documents to S.

Indicate the total number of bytes transferred over the network in each step.

Page 10: Replication and Distribution CSE 444 Spring 2012 University of Washington

Solution

• Step 1: Send the hash map.• cN bits = 1 million bytes = 1 MB.

• Step 2: Number of matched tuples (included false positives)

• FP rate = 1 – e-n/m = 11%• 110,000 false positive documents• 110,050 documents in total (including the 50 common ones)• 110.05 MB

• 50 documents = 50KB• Total of 111.1 MB

The naïve solution without hash maps takes 1 GB of data transfer

Page 11: Replication and Distribution CSE 444 Spring 2012 University of Washington

DISTRIBUTED LOCKING

Page 12: Replication and Distribution CSE 444 Spring 2012 University of Washington

Setup

50% read only2% writes

10% read only

2% writes

10% read only

2% writes

10% read only

2% writes

10% read only

2% writes

Each site can communicate with every other site.

Page 13: Replication and Distribution CSE 444 Spring 2012 University of Washington

Read-locks-oneWrite-locks-all

What is the average number of inter-site messages exchanged?

All reads are local, so no locks are acquired.Each write requires 4 other locks

Page 14: Replication and Distribution CSE 444 Spring 2012 University of Washington

Majority locking

What is the average number of inter-site messages?

2 other locks needed for both reads and writes.

What if you could broadcast across sites with 1 message?

Lock acquisition and release is 1 message for all sitesLock grants still takes at 1 message per site.

Page 15: Replication and Distribution CSE 444 Spring 2012 University of Washington

Primary-copy locking

What is the average number of inter-site messages?

The copies need to acquire locks for each operation.48% of the actions need locks.

Page 16: Replication and Distribution CSE 444 Spring 2012 University of Washington

TWO PHASE COMMIT

Page 17: Replication and Distribution CSE 444 Spring 2012 University of Washington

Two-Phase Commit

• Coordinator : 0• Three subordinates : {1, 2, 3}• Messages– P (Prepare)– C (Commit)– A (Abort)– Y (Yes vote)– N (No vote)– Ignore acks.

Page 18: Replication and Distribution CSE 444 Spring 2012 University of Washington

2PC

• What messages are exchanged for a successful commit?– (0,1,P), (0,2,P), (0,3,P), (1,0,Y), (2,0,Y), (3,0,Y), (0,1,C), (0,2,C), (0,3,C)

• When exactly does the commit occur?– When coordinator force-wrote the commit record.

Page 19: Replication and Distribution CSE 444 Spring 2012 University of Washington

2PC (continued)

• If the coordinator has sent all the prepare messages but has not yet received a vote from site 1, can it abort the transaction at this point, and send abort messages to the subordinates?

• If the coordinator has sent all the prepare messages, received a No vote from site 1, but has not yet received the votes of sites 2 and 3, should it wait for the two missing votes, or should it proceed to abort?

• If site 1 has received a prepare message and voted Yes, but has not received any commit or abort messages, and Site 1 contacts all other subordinates and discovers that they have all voted Yes, can site 1 commit the transaction?