Upload
lavey
View
24
Download
0
Embed Size (px)
DESCRIPTION
A Case Study in Building Layered DHT Applications. Yatin Chawathe Sriram Ramabhadran, Sylvia Ratnasamy, Anthony LaMarca, Scott Shenker, Joseph Hellerstein. Building distributed applications. Distributed systems are designed to be scalable, available and robust - PowerPoint PPT Presentation
Citation preview
A Case Study in Building Layered DHT Applications
Yatin Chawathe
Sriram Ramabhadran, Sylvia Ratnasamy, Anthony LaMarca, Scott Shenker,
Joseph Hellerstein
2
Building distributed applications
Distributed systems are designed to be scalable, available and robust
What about simplicity of implementation and deployment?
DHTs proposed as simplifying building block Simple hash-table API: put, get, remove Scalable content-based routing, fault tolerance
and replication
3
Can DHTs help
Can we layer complex functionality on top of unmodified DHTs? Can we outsource the entire DHT operation to a
third-party DHT service, e.g., OpenDHT?
Existing DHT applications fall into two classes Simple unmodified DHT for rendezvous or storage,
e.g., i3, CFS, FOOD Complex apps that modify the DHT for enhanced
functionality, e.g, Mercury, CoralCDN
4
Outline
Motivation
A case study: Place Lab
Range queries with Prefix Hash Trees
Evaluation
Conclusion
5
A Case Study: Place Lab Positioning service for location-enhanced apps
Clients locate themselves by listening for known radio beacons (e.g. WiFi APs)
Database of APs and their known locations
Place Lab service
Computes maps of AP MAC address ↔ lat,lon “War-drivers” submit
neighborhood logsClients download local WiFi maps
{ lat, lon → list of APs }...
{ AP → lat, lon }...
6
Why Place Lab Developed by group of ubicomp researchers
Not experts in system design and management
Centralized deployment since March 2004 Software downloaded by over 6000 sites
Concerns over organizational control decentralize the service But, want to avoid implementation and
deployment overhead of distributed service
7
How DHTs can help Place Lab
Automatic content-based routing Route logs by AP MAC address to appropriate Place Lab
server
Robustness and availability DHT managed entirely by third party Provides automatic replication and failure recovery of
database content
“War-drivers” submit neighborhood logs
Clients download local WiFi maps …
Place Lab servers compute AP location
DHTstorage and routing
8
Downloading WiFi Maps
Clients perform geographic range queries Download segments of the database
e.g., all access points in Philadelphia
Can we perform this entirely on top of unmodified third-party DHT DHTs provide exact-match queries, not range queries
“War-drivers” submit neighborhood logs
Clients download local WiFi maps …
Place Lab servers compute AP location
DHTstorage and routing?
9
Supporting range queries
Prefix Hash Trees Index built entirely with put, get, remove
primitives No changes to DHT topology
or routing
Binary tree structure Node label is a binary prefix
of values stored under it Nodes split when they get
too big
Stored in DHT with node label as key Allows for direct access to interior and leaf nodes
R
R1R0
R11R10R00 R01
R010R011 R111R110
00000
30011
40100
50101
60110
81000
121100
131101
141110
151111
10
PHT operations Lookup(K)
Find leaf node whose label is prefix of K Binary search across K’s bits O(log log D) where
D = size of key space
Insert(K, V) Lookup leaf node for K If full, split node into two Put value V into leaf node
Query(K1, K2) Lookup node for P, where P=longest common prefix of K1,K2
Traverse subtree rooted at node for P
R
R1R0
R11R10R00 R01
R010R011 R111R110
00000
30011
40100
50101
60110
81000
121100
131101
141110
151111
R
R1
R11
R110
R1101
131101
R01
R010R011
40100
50101
60110
11
2-D geographic queries
Convert lat/lon into 1-D key Use z-curve linearization Interleave lat/lon bits to create
z-curve key
Linearized query results may not be contiguous Start at longest prefix subtree Visit child nodes only if
they can contribute to query result
P(=R000…00)
P1P0
P11P10P00 P01
P010 P011 P100
P101
P111P110
P0100 P0101 P0110 P0111 P1100 P1101(2,4)(2,5)(3,5)
(3,6)(3,7)
(0,4)(1,5)
(1,0)
(0,7)(1,6)(1,7)
0 1 2 3 4 5 6 701234567
longitude
latit
ude
…
…
( 5 , 6 )(0101,0110)
00110110
(54)
P10
12
PHT Visualization
13
Ease of implementation and deployment
2,100 lines of code to hook Place Lab into underlying DHT service Compare with 14,000 lines for the DHT
Runs entirely on top of deployed OpenDHT service DHT handles fault tolerance and robustness, and
masks failures of Place Lab servers
14
Flexibility of DHT APIs
Range queries use only the get operation Updates use combination of put, get, remove But…
Concurrent updates can cause inefficiencies No support for concurrency in existing DHT APIs A test-and-set extension can be beneficial to
PHTs and a range of other applications put_conditional: perform the put only if value has not
changed since previous get
15
PHT insert performance
Median insert latency is 1.45 sec w/o caching = 3.25 sec; with caching = 0.76 sec
16
PHT query performance
Data size Latency (sec)
5k 2.13
10k 2.76
50k 3.18
100k 3.75
Queries on average take 2–4 seconds Varies with block size
Smaller (or very large) block size implies longer query time
17
Conclusion
Concrete example of building complex applications on top of vanilla DHT service
DHT provides ease of implementation and deployment Layering allows inheriting of robustness,
availability and scalable routing from DHT Sacrifices performance in return