Replication and Distribution
CSE 444Spring 2012
University of Washington
HASH MAPS
Hash Maps
• Precursors to Bloom filters.• Used to reduce communication while joining.• S = Set to transmit.– S = {x1, x2, …, xn}
• H = Hash Map.– An array of m bits.
Operation
• To insert x in H: – Compute the hash on x to get a bit position j– Set j to 1.
• To send S, insert all of its elements in H.
• Two distinct elements can hash to 1 position.– Creates false positives.
Question
Data supplier R has N = 1 million documents. Data supplier S also has N = 1 million documents. Each document is 1KB. They have 50 documents in common and they want to compute these. They will proceed as follows:
1. R computes a hash map M with cN bits, where c=8 and sends it to S.
2. S checks its items in M and sends all matches to R. 3. R computes the result and sends the matching 50 documents to S.
Q: Indicate the total number of bytes transferred over the network in each step.
Analysis
• Recall |H| = m.• Insert one element into H.• Probability that bit j remains 0?
• p = (1 – 1/m)
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Analysis
• Recall |H| = m.• Insert all n elements into H.• Probability that bit j remains 0?
• p = (1 – 1/m)n = e-n/m (for large m)
0 1 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 1 0 0
Probability of False Positives
• Take a random element y, and check if its hash is set to 1 in H.
• Probability of FP = probability that the hash is 1.
• Probability that bit j is 1?• p = 1 – (1 – 1/m)n = 1 – e-n/m (for large m)
0 1 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 1 0 0
Question
Data supplier R has N = 1 million documents. Data supplier S also has N = 1 million documents. Each document is 1KB. They have 50 documents in common and they want to compute these. They will proceed as follows:
1. R computes a hash map M with cN bits, where c=8 and sends it to S.
2.S checks its items in M and sends all matches to R. 3. R computes the result and sends the matching 50 documents to S.
Indicate the total number of bytes transferred over the network in each step.
Solution
• Step 1: Send the hash map.• cN bits = 1 million bytes = 1 MB.
• Step 2: Number of matched tuples (included false positives)
• FP rate = 1 – e-n/m = 11%• 110,000 false positive documents• 110,050 documents in total (including the 50 common ones)• 110.05 MB
• 50 documents = 50KB• Total of 111.1 MB
The naïve solution without hash maps takes 1 GB of data transfer
DISTRIBUTED LOCKING
Setup
50% read only2% writes
10% read only
2% writes
10% read only
2% writes
10% read only
2% writes
10% read only
2% writes
Each site can communicate with every other site.
Read-locks-oneWrite-locks-all
What is the average number of inter-site messages exchanged?
All reads are local, so no locks are acquired.Each write requires 4 other locks
Majority locking
What is the average number of inter-site messages?
2 other locks needed for both reads and writes.
What if you could broadcast across sites with 1 message?
Lock acquisition and release is 1 message for all sitesLock grants still takes at 1 message per site.
Primary-copy locking
What is the average number of inter-site messages?
The copies need to acquire locks for each operation.48% of the actions need locks.
TWO PHASE COMMIT
Two-Phase Commit
• Coordinator : 0• Three subordinates : {1, 2, 3}• Messages– P (Prepare)– C (Commit)– A (Abort)– Y (Yes vote)– N (No vote)– Ignore acks.
2PC
• What messages are exchanged for a successful commit?– (0,1,P), (0,2,P), (0,3,P), (1,0,Y), (2,0,Y), (3,0,Y), (0,1,C), (0,2,C), (0,3,C)
• When exactly does the commit occur?– When coordinator force-wrote the commit record.
2PC (continued)
• If the coordinator has sent all the prepare messages but has not yet received a vote from site 1, can it abort the transaction at this point, and send abort messages to the subordinates?
• If the coordinator has sent all the prepare messages, received a No vote from site 1, but has not yet received the votes of sites 2 and 3, should it wait for the two missing votes, or should it proceed to abort?
• If site 1 has received a prepare message and voted Yes, but has not received any commit or abort messages, and Site 1 contacts all other subordinates and discovers that they have all voted Yes, can site 1 commit the transaction?