Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Richard T. B. Ma
School of Computing
National University of Singapore
Peer-to-Peer Networks
CS 4226: Internet Architecture
Outline
P2P vs. traditional paradigm Properties, Advantages and Challenges
Practical P2P systems Napster, Gnutella, KaZaa, Skype, BitTorrent
Key technologies for P2P lookup services Distributed Hash Table (DHT)
two example architectures: Chord and CAN
The client/server model and extension
Client/server model: Asymmetric traditional communication model
roles: ad-hoc clients vs. dedicated servers
Extended model: Delegation a new role for server (client remains the same)
can be recursive or iterative
serverclient
request
response secondary server
delegation
response
Root DNS Servers
com DNS servers org DNS servers edu DNS servers
poly.edu
DNS servers
umass.edu
DNS serversyahoo.com
DNS serversamazon.com
DNS servers
pbs.org
DNS servers
An example: Domain Name System (DNS)
client wants IP for www.amazon.com; 1st approx: client queries a root server to find com DNS server
client queries com DNS server to get amazon.com DNS server
client queries amazon.com DNS server to get IP address for www.amazon.com
requesting hostcis.poly.edu
gaia.cs.umass.edu
root DNS server
local DNS serverdns.poly.edu
1
23
4
5
6
authoritative DNS server
dns.cs.umass.edu
78
TLD DNS server
Domain name resolution: iterative vs. recursive
requesting hostcis.poly.edu
gaia.cs.umass.edu
root DNS server
local DNS serverdns.poly.edu
1
2
45
6
authoritative DNS server
dns.cs.umass.edu
7
8
TLD DNS server
3
Severed-based vs. Peer-to-peer
peer-peerProperties and problems
Properties of (pure) P2P no always-on server or central entity
arbitrary end systems directly communicate
no a-priori knowledge/structure
flat architecture/namespace
Problems: peers are intermittently connected and change IP addresses unreliable service providers
how to stay connected?
how to do resource lookup?
File Distribution: Server-Client vs P2P
Question : How much time to distribute file from one server to N peers?
us
u2d1 d2
u1
uN
dN
Server
Network (with abundant bandwidth)
File, size F
us: server upload
bandwidth
ui: peer i upload
bandwidth
di: peer i download
bandwidth
File distribution time: server-client
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server sequentially
sends N copies:
NF/us time
client i takes F/di time to download
increases linearly in 𝑁 (for large 𝑁)
= 𝑑𝑐𝑠 = max𝑁𝐹
𝑢𝑠
,𝐹
min 𝑑𝑖
Time to distribute Fto N clients using
client/server approach
File distribution time: P2P
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F
server must send one copy: F/us time
client i takes F/di time to download
NF bits must be downloaded (aggregate)
fastest possible upload rate: us + Sui
𝑑𝑃2𝑃 = max𝐹
𝑢𝑠
,𝐹
min 𝑑𝑖,
𝑁𝐹
𝑢𝑠 + 𝑢𝑖
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
N
Min
imum
Dis
trib
ution T
ime P2P
Client-Server
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
When and when not P2P?
When is P2P the right/wrong solution?
Claim: P2P vision is technically feasible in other words, possible to build everything on
Internet without any dedicated servers
but just because it’s technically feasible, doesn’t necessarily make sense…
in other words, just because we can do it P2P, doesn’t mean that we should do it P2P
So, when is P2P the right solution?!?
Some Criteria
Budget how much money do we have?
Resource relevance how widely are resources interesting to users?
Trust how much trust there is between users?
Rate of system change how fast do things change in the system?
Criticality how critical is the service to the users?
P2P Applications and Systems
File sharing Napster (99-01), KaZaA (01-12), Gnutella
Content distribution BitTorrent
VoIP and messaging Skype
Video streaming PPLive, PPStream
Others applications P2P computation, P2P storage, …
Napster: How does it work?
Based on a central index server user registers with the central server
server sends list of files to be shared
server knows all the peers and files in network
Searching based on keywords search results: a list of files with information
about the file and the peer sharing it
e.g., encoding rate, file size, peer’s bandwidth
some information entered by user, unreliable
Napster: How does it work?
Pretty much like the use of delegation
However, change the role of client/server, making peer-to-peer
Napster: Pros and Cons
Weaknesses: downloading from a
single peer only
single point of failure of the server
large computation to handle queries
unreliable content
vulnerable to attacks
lawsuits
Strengths: a consistent view of the
network
fast and efficient searching
guarantee correct search answers
Gnutella: How does it work?
Has only peers, all of which are fully equal conceptually an overlay network
To join the network, peer needs the address of another active peer out-of-band channel, e.g., get it from a website
Once joined, peer learns about others and learns about the topology of the network
Queries are flooded into the network
Downloads directly between peers
Gnutella: How does it work?
Query
Query
Hit
Hit
HTTP File transfer
Gnutella: Pros and Cons
Weaknesses: inefficient queries
flooding • wastes lot of network
and peer resources
• how to deal with it?
inefficient network management
• constant probing is needed
Strengths: fully distributed
open protocol• easy to write clients, e.g., no
KaZaA for Linux
robust against node failures• only true for random failures,
as it forms a power-law network
less susceptible to denial of service attack
KaZaA: How does it work?
Two kinds of nodes Ordinary Nodes (ON): a normal user peer
Supernodes (SN): a user peer with more resources/responsibilities than ON
Forms a two-tier hierarchy top level has only SN, lower level only ON
ON belongs to one SN: can change at will, but only one SN at a time
SN acts as a “hub” for all its ON-children keeps track of files in those ON-children peers
KaZaASuper nodes
exchange information between themselves
do not form a complete mesh
Ordinary nodes obtain address of SN,
send request and gives list of files to share
SN starts keeping track of this ON
not visible to other SN
KaZaA: Ordinary vs. Super Nodes
ON can be promoted to SN if it has sufficient resources (bandwidth, up time) user can typically refuse to become a SN
typical bandwidth requirement: 160-200 kbps
13% of ON responsible for 80% of uploads
SN change connections to other SN on a time scale of tens of minutes allows for larger network range to be explored
avg. lifetime of SN 2.5 hours, but high variance
SN don’t cache info from disconnected ON estimated 30,000 SN at any given time
one SN has connections to 30-50 other SN
Skype
Allows the user to make calls to other computers on Internet
real phone network and real phone number forwarded to Skype (costs money)
very popular, ~300 million downloads, ~15 million concurrent users online
Similar architecture to that of KaZaA supernodes and ordinary nodes
but: Skype is perfectly legal (the affected industry is “only” telcos, they sell DSL…)
Skype: How does it work?
inherently P2P: pairs of users communicate.
proprietary, encrypted application-layer protocol (inferred via reverse engineering)
hierarchical overlay with SNs
index maps usernames to IP addresses; distributed over SNs
Skype clients (SC)
Supernode (SN)
Skype login server
Peers (supernodes) as relays
problem when both Alice and Bob are behind “NATs”. NAT prevents an outside
peer from initiating a call to insider peer
solution: using Alice’s and Bob’s
SNs, relay is chosen each peer initiates
session with relay. peers can now
communicate through NATs via relay
BitTorrent: P2P Content Distribution
BitTorrent builds a network (swarm) for every file that is being distributed
Big advantage: can send “link” (.torrent) to a friend
“link” always refers to the same file
not feasible on search-based Napster, Gnutella, or KaZaA (hard to identify particular files)
Downside: no searching possible websites with “link collections” and search
capabilities exist, but no name service
BitTorrent: How does it work?
For each shared file, there is (initially) one server (seed) which hosts the original copy file is broken into chunks
“torrent” file: metadata about the content torrent file hosted typically on a web server
Client downloads torrent file: Metadata indicates the sizes and checksums of chunks
identifies a tracker
BitTorrent: To start with …Tracker
Seed
Web server
.torrent file
Tracker: 137.89.211.1 Chunks: 42Chunk 1: …Chunk 2: ……
12
1. seed starts tracker2. seed creates torrent-file and host it somewhere3. a new client obtains the torrent file4. the new client contacts tracker and obtain the “peers”5. the new client download/exchange chunks with peers
New client
3
4
5
BitTorrent: file distributiontracker: a server that keeps track of which seeds and peers are in the swarm; doesn’t participate in actual file distribution
obtain list
of peers
trading chunks
peer
torrent: group of peers exchanging chunks of a file
swarm: seeds+peers
file divided into 256KB chunks.
peer joining torrent:
has no chunks, but will accumulate them over time
registers with tracker to get list of peers, connects to subset of peers (“neighbors”)
when downloading, peer uploads chunks to others
peers may come and go
once peer has entire file, it may (selfishly) leave or (altruistically) remain
BitTorrent: a bit details
Pulling Chunks
at any given time, peers have different subsets of file chunks
periodically, a peer (Alice) asks each neighbor for list of chunks that they have.
Alice sends requests for her missing chunks rarest first
Pushing Chunks: tit-for-tat Alice sends chunks to four
neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10
secs
every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join
top 4 “optimistically un-choke”
BitTorrent: more details
BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
With higher upload rate, can find better trading partners & get file faster!
BitTorrent: Open Issues
Everyone must contribute clients behind a firewall?
low-bandwidth clients have a disadvantage?
BT’s impact on the network fast download != nearby in network
Optimal chunk selection algorithm rarest-first seems to work well in practice
is it optimal? fastest for single peer or overall?
Is tit-for-tat really necessary? are there situations where free-riding should
be allowed or even be encouraged?
Related issues
Dealing with today‘s users usenet/email worked when users behaved well;
now, spam is everywhere!
need accountability: identify individuals, even if “pseudonymously“
Preserve privacy (somehow conflicting goal)
Prevent “freeriding“ reputation tracking mechanisms help
voting mechanisms and payment schemes
effort went into accountability in P2P systems
tit-for-tat scheme in BitTorrent
Outline
P2P vs. traditional paradigm Properties, Advantages and Challenges
Practical P2P systems Napster, Gnutella, KaZaa, Skype, BitTorrent
Key technologies for P2P lookup services Distributed Hash Table (DHT)
Two example architectures: Chord and CAN
Searching and Addressing
Two ways to find objects, which determine how network is constructed
how objectives are placed
how efficient objects can be found
Examples (search or addressing?) Google
DNS, IP routing
Napster, Gnutella, KaZaa, BitTorrent
Searching vs. Addressing
Searching: no need to know
unique names (more user friendly)
hard to make efficient (can solve with $$, see Google)
need to compare actual objects to know if they are the same
Addressing: object location can
be made efficient
each object uniquely identifiable
need to know unique names
need to maintain structure required for addressing
Two types of P2P
Unstructured networks/systems cause the need for searching
does not mean complete lack of structure• has graph structure, e.g., power-law, hierachy …
but peers are free to join anywhere, choose neighbors freely, objects are stored anywhere
Structured networks/systems allow for addressing, deterministic routing
network structure determines where peers belong in the net and where objects are stored
how can we build such structured networks?
Key Value Store
Database contains entries in the form of (key, value) pairs
key: ss number; value: human name
key: content type; value: IP address
Operations/interface Put(key, value)
Get(key) value
Looks like a table find an object takes 𝑂 𝑁
how to locate an object efficiently?
key value
John 8732-7436
Adam 2349-5763
Mary 8734-7263
Linda 3682-8923
Recall: Hash Tables
Data structure fixed-sized array of hash
buckets
allow insertions, deletions and lookups in 𝑂 1
Hash function maps keys to hash buckets with desirable properties fast to compute
even distribution of keys
0
1
2
3
4
5
6
7
16
26
45
84
31
ℎ𝑎𝑠ℎ 𝑥 = 𝑥 𝑚𝑜𝑑 8
index of hash
buckets
keys that map
to the bucket
42
Distributed Hash Table (DHT)
Idea: distribute hash buckets to peers
Core question: how to design and implement an efficient mechanism to find which peer is
responsible for which hash bucket?
route between them?
0
1
2
16
26
3
4
5 45
84
6
7 31
42
DHT: Principles
Each node is responsible for one or more buckets as nodes joins and leaves,
the responsibilities change
Nodes communicate among themselves to find the responsible node scalable communications
make DHT efficient
DHTs support all the hash table operations
0
1
2
16
26
3
4
5 45
84
6
7 31
42
DHT: Examples
We’ll study: Chord (2001) and CAN (2001)
Other examples Pastry/Tapestry (2001): based on Plaxton routing
Kademlia (2002): based on XOR-metric
All provide the same abstraction store key-value pairs
when given a key, can retrieve/store the value
no semantics associated with key or value
Major differences design of namespace and routing in the overlay
References
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” in Proc. SIGCOMM, San Diego, CA, Aug. 2001, pp. 149–160.
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, “A scalable content-addressable network,” in Proc. SIGCOMM, San Diego, CA, Aug. 2001, pp. 161–172.
Chord: Basics
From MIT, used in P2P storage systems
Uses SHA-1 hash function in practice results in a 160-bit object/node identification
same hash function for both objects and nodes• node ID hashed from IP address
• object ID hashed from object name
Organized in a ring which wraps around nodes keep track of predecessor and successor
Chord
example: namespace 0, 23 − 1
an overlay network
who are the successor and predecessor of node 3?
Chord: how to assign indices?
In general, assign identifier to each node/object in the range 0,2𝑚 − 1
each identifier can be represented by 𝑚 bits
Central issue: assign (key, value) pairs to nodes/peers
Rule: assign indices to the node that has the closest ID
convention: closest is the immediate successor
successor 1 = 1
successor 2 = 3
successor 6 = 0
who is taking care of indices: 1, 2 and 6?
1
2
6
Chord: find a particular node
If we look for index 7, and we start at node 2, how many steps? successor 7 = 0
2 ⇒ 3 ⇒ 4 ⇒ 6 ⇒ 0
In general, it takes 𝑂 𝑁 steps 𝑁 is the # of nodes
too slow for large 𝑁
Chord: adding shortcuts
Notation Definition
𝑓𝑖𝑛𝑔𝑒𝑟[𝑘]. 𝑠𝑡𝑎𝑟𝑡 𝑛 + 2𝑘−1 mod 2𝑚, 1 ≤ 𝑘 ≤ 𝑚
. 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑖𝑛𝑔𝑒𝑟[𝑘]. 𝑠𝑡𝑎𝑟𝑡, 𝑓𝑖𝑛𝑔𝑒𝑟[𝑘 + 1]. 𝑠𝑡𝑎𝑟𝑡
. 𝑛𝑜𝑑𝑒 first node ≥ 𝑛. 𝑓𝑖𝑛𝑔𝑒𝑟 𝑘 . 𝑠𝑡𝑎𝑟𝑡
successor the next node on the identifier circle; i.e., 𝑓𝑖𝑛𝑔𝑒𝑟 1 . 𝑛𝑜𝑑𝑒
predecessor the previous node on the identifier circle
Each node 𝑛 maintains a finger table that includes at most 𝑚 shortcuts
𝑖th finger/shortcut is at least 2𝑖−1 far apart
Finger table of node 𝑛
Fingers for node 3 and 6: startstart int. succ.
4 𝟒, 𝟓
5 𝟓, 𝟕
7 𝟕, 𝟑
start int. succ.
7 𝟕, 𝟎
0 𝟎, 𝟐
2 𝟐, 𝟔
Fingers for node 3 and 6: nodestart int. succ.
4 𝟒, 𝟓 4
5 𝟓, 𝟕 6
7 𝟕, 𝟑 0
start int. succ.
7 𝟕, 𝟎 0
0 𝟎, 𝟐 0
2 𝟐, 𝟔 2
Node Join start int. succ.
2 𝟐, 𝟑 1
3 𝟑, 𝟓 1
5 𝟓, 𝟏 1
Node Join start int. succ.
2 𝟐, 𝟑 2
3 𝟑, 𝟓 1
5 𝟓, 𝟏 1
start int. succ.
3 𝟑, 𝟒 1
4 𝟒, 𝟔 1
6 𝟔, 𝟐 1
Node Join start int. succ.
2 𝟐, 𝟑 2
3 𝟑, 𝟓 6
5 𝟓, 𝟏 6
start int. succ.
3 𝟑, 𝟒 6
4 𝟒, 𝟔 6
6 𝟔, 𝟐 6
start int. succ.
7 𝟕, 𝟎 0
0 𝟎, 𝟐 0
2 𝟐, 𝟔 2
start int. succ.
1 𝟏, 𝟐 1
2 𝟐, 𝟒 2
4 𝟒, 𝟎 6
Routing start int. succ.
2 𝟐, 𝟑 2
3 𝟑, 𝟓 6
5 𝟓, 𝟏 6
start int. succ.
3 𝟑, 𝟒 6
4 𝟒, 𝟔 6
6 𝟔, 𝟐 6
start int. succ.
7 𝟕, 𝟎 0
0 𝟎, 𝟐 0
2 𝟐, 𝟔 2
start int. succ.
1 𝟏, 𝟐 1
2 𝟐, 𝟒 2
4 𝟒, 𝟎 6query node 1: hash(key)=7
where is it located?
Node Leave
peer 1 abruptly leaves peer 0 detects; makes 2 its immediate successor;
asks 2 who its immediate successor is; makes 2’s immediate successor its second successor.
To handle node departure, require each node to know the IP address of its two successors
Each node periodically pings its two successors to see if they are still alive.
Chord: Performance
Finding an object takes 𝑂 log𝑁 steps
For 𝑁 nodes and 𝐾 objects each node is responsible for O 𝐾/𝑁 objects
when an 𝑁 + 1 𝑡ℎ node joins or leaves, responsibility of 𝑂 𝐾/𝑁 indices change hands
Any node joining or leaving an 𝑁-node network uses 𝑂 log𝑁 ∗ log𝑁 messages to re-establish the routing and finger tables
initialize finger table and predecessor (for join)
From a ring to …
Two-dimensional torus
CAN: Basics
Scalable content-addressable network (CAN)
From Berkley, published in 2001 in the same conference as Chord
Namespace is a 𝑑-dimensional torus
Keep track of neighbors only no need to store shortcuts
routing in a 𝑑-dimensional Euclidean space
CAN
a new node A joins via an existing node I
randomly choose a coordinate (x,y)
A
I
(x,y)
CAN
A
I
(x,y)
route to node J from node I
discover that node J owns (x,y)
J
CAN
J
A
split node J’s zone by half.
now node A owns one half
Splitting/merging namespace
Splitting a zone when a new node joins in a sequential order of the coordinates: split
along the 𝑋 dimension first, and then 𝑌
for the 2-dimensional space, each zone is a square or a 1:2 narrow rectangle
When an existing node departs merge back to a neighbor, if it can be done
otherwise, a neighbor node might temporarily handle multiple zones
CAN
routing is easy: routing table contains 4 neighbors J
A
CAN
routing is easy: routing table contains 4 neighbors J
A
CANB
node B insert(K,V)
1. 𝒂 = 𝒉𝒙 𝑲𝒃 = 𝒉𝒚(𝑲)
2. route (K,V) to coordinate (a,b)
3. node who owns (a,b) stores (a,b)
𝑥 = 𝑎
𝑦 = 𝑏
CANB
node C retrieve (K,V)
1. 𝒂 = 𝒉𝒙 𝑲𝒃 = 𝒉𝒚(𝑲)
2. route “retrieve(K,V)” to the node who owns (a,b)
(a,b)
C
CAN: Extension and Performance
Increase the dimension 𝑑 > 2
increase routing table size and hash functions
but shorter path
State information 𝑂 𝑑
maintain information of 2𝑑 neighbors
Routing takes 𝑂 𝑑𝑛1/𝑑 with 𝑛 nodes
average path length is 𝑑/4 𝑛1/𝑑
From 2D to 3D
CAN: Extension and Performance
Multiple realities multiple independent coordinate spaces
each node gets a different zone in each space
contents are replicated on every reality
routing fault tolerance and also shorter path
Routing weighted by round-trip-times take network topology into consideration
forward to the “best” neighbor
Dimensions vs. Realities
increasing dimension reduces # hops more
but large reality has other benefits
More References
A. Rowstron and P. Druschel, "Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems” IFIP/ACM International Conference on Distributed Systems Platforms (Middleware ’01), 329–350.
B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, J. Kubiatowicz, ”Tapestry: A Resilient Global-scale Overlay for Service Deployment,” IEEE Journal on Selected Areas in Communications, 22(1): 2004.