Upload
dinhhuong
View
218
Download
0
Embed Size (px)
Citation preview
1
15-744: Computer Networking
L-12 DHT and DTN
Examples of “Overlay” Networks
• Overlay routing infrastructure• Distributed name look up• Content delivery networks• Peer-to-peer content sharing • Distributed hash tables• Disruption tolerant networks
2
3
Overview
• Routed Lookups – Chord
• Comparison of DHTs
• I3
• DTNs
4
Routing: Structured Approaches
• Goal: make sure that an item (file) identified is always found in a reasonable # of steps
• Abstraction: a distributed hash-table (DHT) data structure • insert(id, item);• item = query(id);• Note: item can be anything: a data object, document, file, pointer
to a file…• Many proposals
• CAN (ICIR/Berkeley)• Chord (MIT/Berkeley)• Pastry (Rice)• Tapestry (Berkeley)• …
2
5
Routing: Chord
• Associate to each node and item a unique idin an uni-dimensional space
• Properties • Routing table size O(log(N)) , where N is the
total number of nodes• Guarantees that a file is found in O(log(N))
steps
6
Aside: Hashing• Advantages
• Let nodes be numbered 1..m• Client uses a good hash function to map a URL to 1..m • Say hash (url) = x, so, client fetches content from node
x• One hop access
• Any problems?• What happens if a node goes down?• What happens if a node comes back up? • What if different nodes have different views?
7
Robust hashing• Assume 90 documents, node 1..9, and node 10
which was dead is alive again• % of documents in the wrong node?
• 10, 19-20, 28-30, 37-40, 46-50, 55-60, 64-70, 73-80, 82-90
• Disruption coefficient = ½• Unacceptable, use consistent hashing – idea behind
Akamai!
8
Consistent Hash
• “view” = subset of all hash buckets that are visible
• Desired features• Balanced – in any one view, load is equal
across buckets• Smoothness – little impact on hash bucket
contents when buckets are added/removed• Spread – small set of hash buckets that may
hold an object regardless of views • Load – across all views # of objects assigned to
hash bucket is small
3
9
Consistent Hash – Example
• Smoothness addition of bucket does not cause much movement between existing buckets
• Spread & Load small set of buckets that lie near object• Balance no bucket is responsible for large number of
objects
• Construction• Assign each of C hash buckets to
random points on mod 2n circle, where, hash key size = n.
• Map object to random position on circle
• Hash of object = closest clockwise bucket
0
8
412Bucket
14
10
Routing: Chord Basic Lookup
N32
N90
N105
N60
N10N120
K80
“Where is key 80?”
“N90 has K80”
11
Routing: Finger table - Faster Lookups
N80
½¼
1/8
1/161/321/641/128
12
Routing: Chord Summary
• Assume identifier space is 0…2m
• Each node maintains• Finger table
• Entry i in the finger table of n is the first node that succeeds or equals n + 2i
• Predecessor node• An item identified by id is stored on the
successor node of id
4
13
Routing: Chord Example
• Assume an identifier space 0..8
• Node n1:(1) joinsall entries in its finger table are initialized to itself
01
2
34
5
6
7i id+2i succ0 2 11 3 12 5 1
Succ. Table
14
Routing: Chord Example
• Node n2:(3) joins
01
2
34
5
6
7i id+2i succ0 2 21 3 12 5 1
Succ. Table
i id+2i succ0 3 11 4 12 6 1
Succ. Table
15
Routing: Chord Example
• Nodes n3:(0), n4:(6) join
01
2
34
5
6
7i id+2i succ0 2 21 3 62 5 6
Succ. Table
i id+2i succ0 3 61 4 62 6 6
Succ. Table
i id+2i succ0 1 11 2 22 4 0
Succ. Table
i id+2i succ0 7 01 0 02 2 2
Succ. Table
16
Routing: Chord Examples
• Nodes: n1:(1), n2(3), n3(0), n4(6)
• Items: f1:(7), f2:(1) 01
2
34
5
6
7 i id+2i succ0 2 21 3 62 5 6
Succ. Table
i id+2i succ0 3 61 4 62 6 6
Succ. Table
i id+2i succ0 1 11 2 22 4 0
Succ. Table7
Items 1
Items
i id+2i succ0 7 01 0 02 2 2
Succ. Table
5
17
Routing: Query• Upon receiving a
query for item id, a node:
• Check whether stores the item locally
• If not, forwards the query to the largest node in its successor table that does not exceed id
01
2
34
5
6
7 i id+2i succ0 2 21 3 62 5 6
Succ. Table
i id+2i succ0 3 61 4 62 6 6
Succ. Table
i id+2i succ0 1 11 2 22 4 0
Succ. Table7
Items 1
Items
i id+2i succ0 7 01 0 02 2 2
Succ. Table
query(7)
18
What can DHTs do for us?
• Distributed object lookup• Based on object ID
• De-centralized file systems• CFS, PAST, Ivy
• Application Layer Multicast• Scribe, Bayeux, Splitstream
• Databases• PIER
19
Overview
• Routed Lookups – Chord
• Comparison of DHTs
• I3
• DTNs
20
Comparison• Many proposals for DHTs
• Tapestry (UCB) -- Symphony (Stanford) -- 1hop (MIT)
• Pastry (MSR, Rice) -- Tangle (UCB) -- conChord (MIT)
• Chord (MIT, UCB) -- SkipNet (MSR,UW) -- Apocrypha (Stanford)
• CAN (UCB, ICSI) -- Bamboo (UCB) -- LAND (Hebrew Univ.)
• Viceroy (Technion) -- Hieras (U.Cinn) -- ODRI (TexasA&M)
• Kademlia (NYU) -- Sprout (Stanford)
• Kelips (Cornell) -- Calot (Rochester)
• Koorde (MIT) -- JXTA’s (Sun)
• What are the right design choices? Effect on performance?
6
21
Deconstructing DHTs
Two observations:1. Common approach
• N nodes; each labeled with a virtual identifier (128 bits)• define “distance” function on the identifiers• routing works to reduce the distance to the destination
2. DHTs differ primarily in their definition of “distance”• typically derived from (loose) notion of a routing geometry
22
DHT Routing Geometries
• Geometries: • Tree (Plaxton, Tapestry)• Ring (Chord)• Hypercube (CAN)• XOR (Kademlia)• Hybrid (Pastry)
• What is the impact of geometry on routing?
23
Tree (Plaxton, Tapestry)
Geometry• nodes are leaves in a binary tree• distance = height of the smallest common subtree • logN neighbors in subtrees at distance 1,2,…,logN
001000 011010 101100 111110
24
Hypercube (CAN)
000
100
001
010
110 111
011
101
Geometry• nodes are the corners of a hypercube• distance = #matching bits in the IDs of two nodes• logN neighbors per node; each at distance=1 away
7
25
Ring (Chord)
Geometry• nodes are points on a ring• distance = numeric distance between two node IDs• logN neighbors exponentially spaced over 0…N
000
101 011
010
001
110
111
100
26
Hybrid (Pastry)
Geometry:• combination of a tree and ring• two distance metrics• default routing uses tree; fallback to ring under failures
• neighbors picked as on the tree
27
XOR (Kademlia)
00 01 1110
01 11 1000
Geometry:• distance(A,B) = A XOR B• logN neighbors per node spaced exponentially• not a ring because there is no single consistent
ordering of all the nodes
28
Geometry’s Impact on Routing• Routing
• Neighbor selection: how a node picks its routing entries• Route selection: how a node picks the next hop
• Proposed metric: flexibility• amount of freedom to choose neighbors and next-hop paths
• FNS: flexibility in neighbor selection• FRS: flexibility in route selection
• intuition: captures ability to “tune” DHT performance
• single predictor metric dependent only on routing issues
8
29
FNS for Ring Geometry
• Chord algorithm picks ith neighbor at 2i distance
• A different algorithm picks ith neighbor from [2i , 2i+1)
000
101
100
011
010
001
110
111
30
FRS for Ring Geometry
• Chord algorithm picks neighbor closest to destination
• A different algorithm picks the best of alternate paths
000
101
100
011
010
001
110
111 110
31
Flexibility: at a Glance
Flexibility Ordering of Geometries
Neighbors
(FNS)
Hypercube << Tree, XOR, Ring, Hybrid
(1) (2i-1)
Routes
(FRS)
Tree << XOR, Hybrid < Hypercube < Ring(1) (logN/2) (logN/2) (logN)
33
Geometry Flexibility Performance?
Validate over three performance metrics:1. resilience2. path latency3. path convergence
Metrics address two typical concerns: • ability to handle node failure• ability to incorporate proximity into overlay
routing
9
34
Analysis of Static ResilienceTwo aspects of robust routing• Dynamic Recovery : how quickly routing state is
recovered after failures• Static Resilience : how well the network routes before
recovery finishes• captures how quickly recovery algorithms need to work• depends on FRS
Evaluation:• Fail a fraction of nodes, without recovering any state• Metric: % Paths Failed
35
Does flexibility affect static resilience?
Tree << XOR ≈ Hybrid < Hypercube < RingFlexibility in Route Selection matters for Static Resilience
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90% Failed Nodes
% F
aile
d Pa
ths
Ring
Hybrid
XORTreeHypercube
36
0
20
40
60
80
100
0 400 800 1200 1600 2000Latency (msec)
CD
F
FNS Ring
Plain Ring
FRS Ring
FNS + FRS Ring
Which is more effective, FNS or FRS?
Plain << FRS << FNS ≈ FNS+FRSNeighbor Selection is much better than
Route Selection39
Overview
• Routed Lookups – Chord
• Comparison of DHTs
• I3• Slides Yuan; following slides FYI
• DTNs
10
Multicast
S1
C1 C2
S2
R RP RR
RR
RP: Rendezvous Point
40
Mobility
HA FA
Home Network
Network 5
5.0.0.1 12.0.0.4
Sender
Mobile Node
5.0.0.3
41
42
i3: Motivation• Today’s Internet based on point-to-point
abstraction
• Applications need more:• Multicast• Mobility• Anycast
• Existing solutions:• Change IP layer• Overlays
So, what’s the problem?A different solution for each service
The i3 solution• Solution:
• Add an indirection layer on top of IP• Implement using overlay networks
• Solution Components:• Naming using “identifiers”• Subscriptions using “triggers”• DHT as the gluing substrate
43
IndirectionEvery problem
in CS …
Only primitiveneeded
11
i3: Rendezvous Communication
• Packets addressed to identifiers (“names”)• Trigger=(Identifier, IP address): inserted by
receiver
44
Sender Receiver (R)
ID R
trigger
send(ID, data)send(R, data)
Senders decoupled from receivers
45
i3: Service Model• API
• sendPacket(id, p);• insertTrigger(id, addr);• removeTrigger(id, addr); // optional
• Best-effort service model (like IP)• Triggers periodically refreshed by end-hosts• Reliability, congestion control, and flow-
control implemented at end-hosts
i3: Implementation
• Use a Distributed Hash Table • Scalable, self-organizing, robust• Suitable as a substrate for the Internet
46
Sender Receiver (R)
ID R
trigger
send(ID, data)send(R, data)
DHT.put(id)
IP.route(R)
DHT.put(id)
Mobility
• The change of the receiver’s address • from R to R’ is transparent to the sender
48
12
Multicast• Every packet (id, data) is forwarded to each
receiver Ri that inserts the trigger (id, Ri)
49
Generalization: Identifier Stack• Stack of identifiers
• i3 routes packet through these identifiers
• Receivers• trigger maps id to <stack of ids>
• Sender can also specify id-stack in packet
• Mechanism:• first id used to match trigger• rest added to the RHS of trigger • recursively continued
51
Service Composition• Receiver mediated: R sets up chain and
passes id_gif/jpg to sender: sender oblivious
• Sender-mediated: S can include (id_gif/jpg, ID) in his packet: receiver oblivious
52
Sender(GIF)
Receiver R(JPG)
ID_GIF/JPG S_GIF/JPGID R
send((ID_GIF/JPG,ID), data)
S_GIF/JPG
send(ID, data) send(R, data)
56
Overview
• Routed Lookups – Chord
• Comparison of DHTs
• I3
• DTNs• Slides Sam; following slides FYI
13
Delay-Tolerant Networking Architecture• Goals
• Support interoperability across ‘radically heterogeneous’ networks
• Tolerate delay and disruption• Acceptable performance in high
loss/delay/error/disconnected environments• Decent performance for low loss/delay/errors
• Components• Flexible naming scheme • Message abstraction and API• Extensible Store-and-Forward Overlay Routing• Per-(overlay)-hop reliability and authentication
57
Disruption Tolerant Networks
58
Disruption Tolerant Networks
59
Naming Data (DTN)• Endpoint IDs are processed as names
• refer to one or more DTN nodes• expressed as Internet URI, matched as strings
• URIs• Internet standard naming scheme [RFC3986]• Format: <scheme> : <SSP>
• SSP can be arbitrary, based on (various) schemes
• More flexible than DOT/DONA design but less secure/scalable• Data-centric networking approaches iscussed later in
the course
60
14
62
Message Abstraction• Network protocol data unit: bundles
• “postal-like” message delivery• coarse-grained CoS [4 classes]• origination and useful life time [assumes sync’d clocks]• source, destination, and respond-to EIDs• Options: return receipt, “traceroute”-like function, alternative
reply-to field, custody transfer• fragmentation capability• overlay atop TCP/IP or other (link) layers [layer ‘agnostic’]
• Applications send/receive messages• “Application data units” (ADUs) of possibly-large size• Adaptation to underlying protocols via ‘convergence layer’• API includes persistent registrations
DTN Routing• DTN Routers form an overlay network
• only selected/configured nodes participate• nodes have persistent storage
• DTN routing topology is a time-varying multigraph• Links come and go, sometimes predictably• Use any/all links that can possibly help (multi)• Scheduled, Predicted, or Unscheduled Links
• May be direction specific [e.g. ISP dialup]• May learn from history to predict schedule
• Messages fragmented based on dynamics• Proactive fragmentation: optimize contact volume• Reactive fragmentation: resume where you failed
63
Example Routing Problem
64
Village
Internet
City
bike
2
3 1
Example Graph Abstraction
65
time (days)bike (data mule)
intermittent high capacityGeo satellite
medium/low capacitydial-up link
low capacity
City
Village 1
Village 2
Connectivity: Village 1 – City
band
wid
th
bikesatellitephone
15
The DTN Routing Problem• Inputs: topology (multi)graph, vertex buffer limits, contact
set, message demand matrix (w/priorities)
• An edge is a possible opportunity to communicate:• One-way: (S, D, c(t), d(t))• (S, D): source/destination ordered pair of
contact• c(t): capacity (rate); d(t): delay• A Contact is when c(t) > 0 for some period
[ik,ik+1]
• Vertices have buffer limits; edges in graph if ever in any contact, multigraph for multiple physical connections
• Problem: optimize some metric of delivery on this structure• Sub-questions: what metric to optimize?,
efficiency?66
Knowledge-Performance Tradeoff
67
Use of Knowledge Oracles
Contacts+
Queuing+
Traffic
ContactsSummary
Contacts
Contacts+
Queuing(local)
Contacts+
Queuing(global)
None
Algorithm
LP
MED
ED
EDLQEDAQ
Oracle
FC
Knowledge-Performance Tradeoff
68
Routing Solutions - Replication• “Intelligently” distribute identical data copies to
contacts to increase chances of delivery• Flooding (unlimited contacts)• Heuristics: random forwarding, history-based forwarding,
predication-based forwarding, etc. (limited contacts)
• Given “replication budget”, this is difficult• Using simple replication, only finite number of copies in the
network [Juang02, Grossglauser02, Jain04, Chaintreau05]• Routing performance (delivery rate, latency, etc.) heavily
dependent on “deliverability” of these contacts (or predictability of heuristics)
• No single heuristic works for all scenarios!
69
16
Using Erasure Codes• Rather than seeking particular “good” contacts,
“split” messages and distribute to more contacts to increase chance of delivery• Same number of bytes flowing in the network, now in
the form of coded blocks• Partial data arrival can be used to reconstruct the
original message• Given a replication factor of r, (in theory) any 1/r code blocks
received can be used to reconstruct original data• Potentially leverage more contacts opportunity that
result in lowest worse-case latency• Intuition:
• Reduces “risk” due to outlier bad contacts
70
Erasure Codes
71
Message n blocks
Encoding
Opportunistic Forwarding
Decoding
Message n blocks
DTN Security
• Payload Security Header(PSH) end-to-end security header
• Bundle Authentication Header (BAH) hop-by-hop security header
72
Bundle Agent
Bundle Application
Security Policy Router(may check PSH value)
Source
BAH
PSH
Sender Receiver/Sender
Receiver/Sender
Receiver/Sender
Destination
BAH BAH BAH
credit: MITRE
So, is this just e-mail?
• Many similarities to (abstract) e-mail service• Primary difference involves routing, reliability and security• E-mail depends on an underlying layer’s routing:
• Cannot generally move messages ‘closer’ to their destinations in a partitioned network
• In the Internet (SMTP) case, not disconnection-tolerant or efficient for long RTTs due to “chattiness”
• E-mail security authenticates only user-to-user
73
naming/ routing flow multi- security reliable prioritylate binding contrl app delivery
e-mail Y N (static) N(Y) N(Y) opt Y N(Y)DTN Y Y (exten) Y Y opt opt Y