Peer-to-Peer Networkstbma/teaching/cs4226y16_past/06-P2P.pdf · The client/server model and...

Preview:

Citation preview

Richard T. B. Ma

School of Computing

National University of Singapore

Peer-to-Peer Networks

CS 4226: Internet Architecture

Outline

P2P vs. traditional paradigm Properties, Advantages and Challenges

Practical P2P systems Napster, Gnutella, KaZaa, Skype, BitTorrent

Key technologies for P2P lookup services Distributed Hash Table (DHT)

two example architectures: Chord and CAN

The client/server model and extension

Client/server model: Asymmetric traditional communication model

roles: ad-hoc clients vs. dedicated servers

Extended model: Delegation a new role for server (client remains the same)

can be recursive or iterative

serverclient

request

response secondary server

delegation

response

Root DNS Servers

com DNS servers org DNS servers edu DNS servers

poly.edu

DNS servers

umass.edu

DNS serversyahoo.com

DNS serversamazon.com

DNS servers

pbs.org

DNS servers

An example: Domain Name System (DNS)

client wants IP for www.amazon.com; 1st approx: client queries a root server to find com DNS server

client queries com DNS server to get amazon.com DNS server

client queries amazon.com DNS server to get IP address for www.amazon.com

requesting hostcis.poly.edu

gaia.cs.umass.edu

root DNS server

local DNS serverdns.poly.edu

1

23

4

5

6

authoritative DNS server

dns.cs.umass.edu

78

TLD DNS server

Domain name resolution: iterative vs. recursive

requesting hostcis.poly.edu

gaia.cs.umass.edu

root DNS server

local DNS serverdns.poly.edu

1

2

45

6

authoritative DNS server

dns.cs.umass.edu

7

8

TLD DNS server

3

Severed-based vs. Peer-to-peer

peer-peerProperties and problems

Properties of (pure) P2P no always-on server or central entity

arbitrary end systems directly communicate

no a-priori knowledge/structure

flat architecture/namespace

Problems: peers are intermittently connected and change IP addresses unreliable service providers

how to stay connected?

how to do resource lookup?

File Distribution: Server-Client vs P2P

Question : How much time to distribute file from one server to N peers?

us

u2d1 d2

u1

uN

dN

Server

Network (with abundant bandwidth)

File, size F

us: server upload

bandwidth

ui: peer i upload

bandwidth

di: peer i download

bandwidth

File distribution time: server-client

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

F server sequentially

sends N copies:

NF/us time

client i takes F/di time to download

increases linearly in 𝑁 (for large 𝑁)

= 𝑑𝑐𝑠 = max𝑁𝐹

𝑢𝑠

,𝐹

min 𝑑𝑖

Time to distribute Fto N clients using

client/server approach

File distribution time: P2P

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

F

server must send one copy: F/us time

client i takes F/di time to download

NF bits must be downloaded (aggregate)

fastest possible upload rate: us + Sui

𝑑𝑃2𝑃 = max𝐹

𝑢𝑠

,𝐹

min 𝑑𝑖,

𝑁𝐹

𝑢𝑠 + 𝑢𝑖

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35

N

Min

imum

Dis

trib

ution T

ime P2P

Client-Server

Server-client vs. P2P: example

Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us

When and when not P2P?

When is P2P the right/wrong solution?

Claim: P2P vision is technically feasible in other words, possible to build everything on

Internet without any dedicated servers

but just because it’s technically feasible, doesn’t necessarily make sense…

in other words, just because we can do it P2P, doesn’t mean that we should do it P2P

So, when is P2P the right solution?!?

Some Criteria

Budget how much money do we have?

Resource relevance how widely are resources interesting to users?

Trust how much trust there is between users?

Rate of system change how fast do things change in the system?

Criticality how critical is the service to the users?

P2P Applications and Systems

File sharing Napster (99-01), KaZaA (01-12), Gnutella

Content distribution BitTorrent

VoIP and messaging Skype

Video streaming PPLive, PPStream

Others applications P2P computation, P2P storage, …

Napster: How does it work?

Based on a central index server user registers with the central server

server sends list of files to be shared

server knows all the peers and files in network

Searching based on keywords search results: a list of files with information

about the file and the peer sharing it

e.g., encoding rate, file size, peer’s bandwidth

some information entered by user, unreliable

Napster: How does it work?

Pretty much like the use of delegation

However, change the role of client/server, making peer-to-peer

Napster: Pros and Cons

Weaknesses: downloading from a

single peer only

single point of failure of the server

large computation to handle queries

unreliable content

vulnerable to attacks

lawsuits

Strengths: a consistent view of the

network

fast and efficient searching

guarantee correct search answers

Gnutella: How does it work?

Has only peers, all of which are fully equal conceptually an overlay network

To join the network, peer needs the address of another active peer out-of-band channel, e.g., get it from a website

Once joined, peer learns about others and learns about the topology of the network

Queries are flooded into the network

Downloads directly between peers

Gnutella: How does it work?

Query

Query

Hit

Hit

HTTP File transfer

Gnutella: Pros and Cons

Weaknesses: inefficient queries

flooding • wastes lot of network

and peer resources

• how to deal with it?

inefficient network management

• constant probing is needed

Strengths: fully distributed

open protocol• easy to write clients, e.g., no

KaZaA for Linux

robust against node failures• only true for random failures,

as it forms a power-law network

less susceptible to denial of service attack

KaZaA: How does it work?

Two kinds of nodes Ordinary Nodes (ON): a normal user peer

Supernodes (SN): a user peer with more resources/responsibilities than ON

Forms a two-tier hierarchy top level has only SN, lower level only ON

ON belongs to one SN: can change at will, but only one SN at a time

SN acts as a “hub” for all its ON-children keeps track of files in those ON-children peers

KaZaASuper nodes

exchange information between themselves

do not form a complete mesh

Ordinary nodes obtain address of SN,

send request and gives list of files to share

SN starts keeping track of this ON

not visible to other SN

KaZaA: Ordinary vs. Super Nodes

ON can be promoted to SN if it has sufficient resources (bandwidth, up time) user can typically refuse to become a SN

typical bandwidth requirement: 160-200 kbps

13% of ON responsible for 80% of uploads

SN change connections to other SN on a time scale of tens of minutes allows for larger network range to be explored

avg. lifetime of SN 2.5 hours, but high variance

SN don’t cache info from disconnected ON estimated 30,000 SN at any given time

one SN has connections to 30-50 other SN

Skype

Allows the user to make calls to other computers on Internet

real phone network and real phone number forwarded to Skype (costs money)

very popular, ~300 million downloads, ~15 million concurrent users online

Similar architecture to that of KaZaA supernodes and ordinary nodes

but: Skype is perfectly legal (the affected industry is “only” telcos, they sell DSL…)

Skype: How does it work?

inherently P2P: pairs of users communicate.

proprietary, encrypted application-layer protocol (inferred via reverse engineering)

hierarchical overlay with SNs

index maps usernames to IP addresses; distributed over SNs

Skype clients (SC)

Supernode (SN)

Skype login server

Peers (supernodes) as relays

problem when both Alice and Bob are behind “NATs”. NAT prevents an outside

peer from initiating a call to insider peer

solution: using Alice’s and Bob’s

SNs, relay is chosen each peer initiates

session with relay. peers can now

communicate through NATs via relay

BitTorrent: P2P Content Distribution

BitTorrent builds a network (swarm) for every file that is being distributed

Big advantage: can send “link” (.torrent) to a friend

“link” always refers to the same file

not feasible on search-based Napster, Gnutella, or KaZaA (hard to identify particular files)

Downside: no searching possible websites with “link collections” and search

capabilities exist, but no name service

BitTorrent: How does it work?

For each shared file, there is (initially) one server (seed) which hosts the original copy file is broken into chunks

“torrent” file: metadata about the content torrent file hosted typically on a web server

Client downloads torrent file: Metadata indicates the sizes and checksums of chunks

identifies a tracker

BitTorrent: To start with …Tracker

Seed

Web server

.torrent file

Tracker: 137.89.211.1 Chunks: 42Chunk 1: …Chunk 2: ……

12

1. seed starts tracker2. seed creates torrent-file and host it somewhere3. a new client obtains the torrent file4. the new client contacts tracker and obtain the “peers”5. the new client download/exchange chunks with peers

New client

3

4

5

BitTorrent: file distributiontracker: a server that keeps track of which seeds and peers are in the swarm; doesn’t participate in actual file distribution

obtain list

of peers

trading chunks

peer

torrent: group of peers exchanging chunks of a file

swarm: seeds+peers

file divided into 256KB chunks.

peer joining torrent:

has no chunks, but will accumulate them over time

registers with tracker to get list of peers, connects to subset of peers (“neighbors”)

when downloading, peer uploads chunks to others

peers may come and go

once peer has entire file, it may (selfishly) leave or (altruistically) remain

BitTorrent: a bit details

Pulling Chunks

at any given time, peers have different subsets of file chunks

periodically, a peer (Alice) asks each neighbor for list of chunks that they have.

Alice sends requests for her missing chunks rarest first

Pushing Chunks: tit-for-tat Alice sends chunks to four

neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10

secs

every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join

top 4 “optimistically un-choke”

BitTorrent: more details

BitTorrent: Tit-for-tat

(1) Alice “optimistically unchokes” Bob

(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates

(3) Bob becomes one of Alice’s top-four providers

With higher upload rate, can find better trading partners & get file faster!

BitTorrent: Open Issues

Everyone must contribute clients behind a firewall?

low-bandwidth clients have a disadvantage?

BT’s impact on the network fast download != nearby in network

Optimal chunk selection algorithm rarest-first seems to work well in practice

is it optimal? fastest for single peer or overall?

Is tit-for-tat really necessary? are there situations where free-riding should

be allowed or even be encouraged?

Related issues

Dealing with today‘s users usenet/email worked when users behaved well;

now, spam is everywhere!

need accountability: identify individuals, even if “pseudonymously“

Preserve privacy (somehow conflicting goal)

Prevent “freeriding“ reputation tracking mechanisms help

voting mechanisms and payment schemes

effort went into accountability in P2P systems

tit-for-tat scheme in BitTorrent

Outline

P2P vs. traditional paradigm Properties, Advantages and Challenges

Practical P2P systems Napster, Gnutella, KaZaa, Skype, BitTorrent

Key technologies for P2P lookup services Distributed Hash Table (DHT)

Two example architectures: Chord and CAN

Searching and Addressing

Two ways to find objects, which determine how network is constructed

how objectives are placed

how efficient objects can be found

Examples (search or addressing?) Google

DNS, IP routing

Napster, Gnutella, KaZaa, BitTorrent

Searching vs. Addressing

Searching: no need to know

unique names (more user friendly)

hard to make efficient (can solve with $$, see Google)

need to compare actual objects to know if they are the same

Addressing: object location can

be made efficient

each object uniquely identifiable

need to know unique names

need to maintain structure required for addressing

Two types of P2P

Unstructured networks/systems cause the need for searching

does not mean complete lack of structure• has graph structure, e.g., power-law, hierachy …

but peers are free to join anywhere, choose neighbors freely, objects are stored anywhere

Structured networks/systems allow for addressing, deterministic routing

network structure determines where peers belong in the net and where objects are stored

how can we build such structured networks?

Key Value Store

Database contains entries in the form of (key, value) pairs

key: ss number; value: human name

key: content type; value: IP address

Operations/interface Put(key, value)

Get(key) value

Looks like a table find an object takes 𝑂 𝑁

how to locate an object efficiently?

key value

John 8732-7436

Adam 2349-5763

Mary 8734-7263

Linda 3682-8923

Recall: Hash Tables

Data structure fixed-sized array of hash

buckets

allow insertions, deletions and lookups in 𝑂 1

Hash function maps keys to hash buckets with desirable properties fast to compute

even distribution of keys

0

1

2

3

4

5

6

7

16

26

45

84

31

ℎ𝑎𝑠ℎ 𝑥 = 𝑥 𝑚𝑜𝑑 8

index of hash

buckets

keys that map

to the bucket

42

Distributed Hash Table (DHT)

Idea: distribute hash buckets to peers

Core question: how to design and implement an efficient mechanism to find which peer is

responsible for which hash bucket?

route between them?

0

1

2

16

26

3

4

5 45

84

6

7 31

42

DHT: Principles

Each node is responsible for one or more buckets as nodes joins and leaves,

the responsibilities change

Nodes communicate among themselves to find the responsible node scalable communications

make DHT efficient

DHTs support all the hash table operations

0

1

2

16

26

3

4

5 45

84

6

7 31

42

DHT: Examples

We’ll study: Chord (2001) and CAN (2001)

Other examples Pastry/Tapestry (2001): based on Plaxton routing

Kademlia (2002): based on XOR-metric

All provide the same abstraction store key-value pairs

when given a key, can retrieve/store the value

no semantics associated with key or value

Major differences design of namespace and routing in the overlay

References

I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” in Proc. SIGCOMM, San Diego, CA, Aug. 2001, pp. 149–160.

S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, “A scalable content-addressable network,” in Proc. SIGCOMM, San Diego, CA, Aug. 2001, pp. 161–172.

Chord: Basics

From MIT, used in P2P storage systems

Uses SHA-1 hash function in practice results in a 160-bit object/node identification

same hash function for both objects and nodes• node ID hashed from IP address

• object ID hashed from object name

Organized in a ring which wraps around nodes keep track of predecessor and successor

Chord

example: namespace 0, 23 − 1

an overlay network

who are the successor and predecessor of node 3?

Chord: how to assign indices?

In general, assign identifier to each node/object in the range 0,2𝑚 − 1

each identifier can be represented by 𝑚 bits

Central issue: assign (key, value) pairs to nodes/peers

Rule: assign indices to the node that has the closest ID

convention: closest is the immediate successor

successor 1 = 1

successor 2 = 3

successor 6 = 0

who is taking care of indices: 1, 2 and 6?

1

2

6

Chord: find a particular node

If we look for index 7, and we start at node 2, how many steps? successor 7 = 0

2 ⇒ 3 ⇒ 4 ⇒ 6 ⇒ 0

In general, it takes 𝑂 𝑁 steps 𝑁 is the # of nodes

too slow for large 𝑁

Chord: adding shortcuts

Notation Definition

𝑓𝑖𝑛𝑔𝑒𝑟[𝑘]. 𝑠𝑡𝑎𝑟𝑡 𝑛 + 2𝑘−1 mod 2𝑚, 1 ≤ 𝑘 ≤ 𝑚

. 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑖𝑛𝑔𝑒𝑟[𝑘]. 𝑠𝑡𝑎𝑟𝑡, 𝑓𝑖𝑛𝑔𝑒𝑟[𝑘 + 1]. 𝑠𝑡𝑎𝑟𝑡

. 𝑛𝑜𝑑𝑒 first node ≥ 𝑛. 𝑓𝑖𝑛𝑔𝑒𝑟 𝑘 . 𝑠𝑡𝑎𝑟𝑡

successor the next node on the identifier circle; i.e., 𝑓𝑖𝑛𝑔𝑒𝑟 1 . 𝑛𝑜𝑑𝑒

predecessor the previous node on the identifier circle

Each node 𝑛 maintains a finger table that includes at most 𝑚 shortcuts

𝑖th finger/shortcut is at least 2𝑖−1 far apart

Finger table of node 𝑛

Fingers for node 3 and 6: startstart int. succ.

4 𝟒, 𝟓

5 𝟓, 𝟕

7 𝟕, 𝟑

start int. succ.

7 𝟕, 𝟎

0 𝟎, 𝟐

2 𝟐, 𝟔

Fingers for node 3 and 6: nodestart int. succ.

4 𝟒, 𝟓 4

5 𝟓, 𝟕 6

7 𝟕, 𝟑 0

start int. succ.

7 𝟕, 𝟎 0

0 𝟎, 𝟐 0

2 𝟐, 𝟔 2

Node Join start int. succ.

2 𝟐, 𝟑 1

3 𝟑, 𝟓 1

5 𝟓, 𝟏 1

Node Join start int. succ.

2 𝟐, 𝟑 2

3 𝟑, 𝟓 1

5 𝟓, 𝟏 1

start int. succ.

3 𝟑, 𝟒 1

4 𝟒, 𝟔 1

6 𝟔, 𝟐 1

Node Join start int. succ.

2 𝟐, 𝟑 2

3 𝟑, 𝟓 6

5 𝟓, 𝟏 6

start int. succ.

3 𝟑, 𝟒 6

4 𝟒, 𝟔 6

6 𝟔, 𝟐 6

start int. succ.

7 𝟕, 𝟎 0

0 𝟎, 𝟐 0

2 𝟐, 𝟔 2

start int. succ.

1 𝟏, 𝟐 1

2 𝟐, 𝟒 2

4 𝟒, 𝟎 6

Routing start int. succ.

2 𝟐, 𝟑 2

3 𝟑, 𝟓 6

5 𝟓, 𝟏 6

start int. succ.

3 𝟑, 𝟒 6

4 𝟒, 𝟔 6

6 𝟔, 𝟐 6

start int. succ.

7 𝟕, 𝟎 0

0 𝟎, 𝟐 0

2 𝟐, 𝟔 2

start int. succ.

1 𝟏, 𝟐 1

2 𝟐, 𝟒 2

4 𝟒, 𝟎 6query node 1: hash(key)=7

where is it located?

Node Leave

peer 1 abruptly leaves peer 0 detects; makes 2 its immediate successor;

asks 2 who its immediate successor is; makes 2’s immediate successor its second successor.

To handle node departure, require each node to know the IP address of its two successors

Each node periodically pings its two successors to see if they are still alive.

Chord: Performance

Finding an object takes 𝑂 log𝑁 steps

For 𝑁 nodes and 𝐾 objects each node is responsible for O 𝐾/𝑁 objects

when an 𝑁 + 1 𝑡ℎ node joins or leaves, responsibility of 𝑂 𝐾/𝑁 indices change hands

Any node joining or leaving an 𝑁-node network uses 𝑂 log𝑁 ∗ log𝑁 messages to re-establish the routing and finger tables

initialize finger table and predecessor (for join)

From a ring to …

Two-dimensional torus

CAN: Basics

Scalable content-addressable network (CAN)

From Berkley, published in 2001 in the same conference as Chord

Namespace is a 𝑑-dimensional torus

Keep track of neighbors only no need to store shortcuts

routing in a 𝑑-dimensional Euclidean space

CAN

a new node A joins via an existing node I

randomly choose a coordinate (x,y)

A

I

(x,y)

CAN

A

I

(x,y)

route to node J from node I

discover that node J owns (x,y)

J

CAN

J

A

split node J’s zone by half.

now node A owns one half

Splitting/merging namespace

Splitting a zone when a new node joins in a sequential order of the coordinates: split

along the 𝑋 dimension first, and then 𝑌

for the 2-dimensional space, each zone is a square or a 1:2 narrow rectangle

When an existing node departs merge back to a neighbor, if it can be done

otherwise, a neighbor node might temporarily handle multiple zones

CAN

routing is easy: routing table contains 4 neighbors J

A

CAN

routing is easy: routing table contains 4 neighbors J

A

CANB

node B insert(K,V)

1. 𝒂 = 𝒉𝒙 𝑲𝒃 = 𝒉𝒚(𝑲)

2. route (K,V) to coordinate (a,b)

3. node who owns (a,b) stores (a,b)

𝑥 = 𝑎

𝑦 = 𝑏

CANB

node C retrieve (K,V)

1. 𝒂 = 𝒉𝒙 𝑲𝒃 = 𝒉𝒚(𝑲)

2. route “retrieve(K,V)” to the node who owns (a,b)

(a,b)

C

CAN: Extension and Performance

Increase the dimension 𝑑 > 2

increase routing table size and hash functions

but shorter path

State information 𝑂 𝑑

maintain information of 2𝑑 neighbors

Routing takes 𝑂 𝑑𝑛1/𝑑 with 𝑛 nodes

average path length is 𝑑/4 𝑛1/𝑑

From 2D to 3D

CAN: Extension and Performance

Multiple realities multiple independent coordinate spaces

each node gets a different zone in each space

contents are replicated on every reality

routing fault tolerance and also shorter path

Routing weighted by round-trip-times take network topology into consideration

forward to the “best” neighbor

Dimensions vs. Realities

increasing dimension reduces # hops more

but large reality has other benefits

More References

A. Rowstron and P. Druschel, "Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems” IFIP/ACM International Conference on Distributed Systems Platforms (Middleware ’01), 329–350.

B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, J. Kubiatowicz, ”Tapestry: A Resilient Global-scale Overlay for Service Deployment,” IEEE Journal on Selected Areas in Communications, 22(1): 2004.

Recommended