49
winter 2008 P2P 1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks Napster Gnutella KaZaA BitTorrent Distributed Hash Tables (DHT) and Structured Networks Chord Pros and Cons Readings: do required and optional readings if interested

Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

  • View
    244

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 1

Peer-to-Peer Networks:Unstructured and Structured

• What is a peer-to-peer network?• Unstructured Peer-to-Peer Networks

– Napster– Gnutella– KaZaA– BitTorrent

• Distributed Hash Tables (DHT) and Structured Networks– Chord – Pros and Cons

Readings: do required and optional readings if interested

Page 2: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 2

Peer-to-Peer Networks:How Did it Start?

• A killer application: Naptser– Free music over the Internet

• Key idea: share the content, storage and bandwidth of individual (home) users

Internet

Page 3: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 3

Model

• Each user stores a subset of files• Each user has access (can download)

files from all users in the system

Page 4: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 4

Main Challenge

• Find where a particular file is stored

AB

C

D

E

F

E?

Page 5: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 5

Other Challenges

• Scale: up to hundred of thousands or millions of machines

• Dynamicity: machines can come and go any time

Page 6: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 6

Peer-to-Peer Networks: Napster• Napster history: the rise

– January 1999: Napster version 1.0– May 1999: company founded– September 1999: first lawsuits– 2000: 80 million users

• Napster history: the fall– Mid 2001: out of business due to lawsuits– Mid 2001: dozens of P2P alternatives that were harder to

touch, though these have gradually been constrained– 2003: growth of pay services like iTunes

• Napster history: the resurrection– 2003: Napster reconstituted as a pay service– 2006: still lots of file sharing going on

Shawn Fanning,Northeastern freshman

Page 7: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 7

Napster Technology: Directory Service

• User installing the software– Download the client program– Register name, password, local directory, etc.

• Client contacts Napster (via TCP)– Provides a list of music files it will share– … and Napster’s central server updates the directory

• Client searches on a title or performer– Napster identifies online clients with the file– … and provides IP addresses

• Client requests the file from the chosen supplier– Supplier transmits the file to the client– Both client and supplier report status to Napster

Page 8: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 8

Napster• Assume a centralized index system that

maps files (songs) to machines that are alive

• How to find a file (song)– Query the index system return a machine that stores

the required file• Ideally this is the closest/least-loaded machine

– ftp the file

• Advantages: – Simplicity, easy to implement sophisticated search

engines on top of the index system

• Disadvantages:– Robustness, scalability (?)

Page 9: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 9

Napster: Example

AB

C

D

E

F

m1m2

m3

m4

m5

m6

m1 Am2 Bm3 Cm4 Dm5 Em6 F

E?m5

E? E

Page 10: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 10

Napster Technology: Properties• Server’s directory continually updated

– Always know what music is currently available– Point of vulnerability for legal action

• Peer-to-peer file transfer– No load on the server– Plausible deniability for legal action (but not enough)

• Proprietary protocol– Login, search, upload, download, and status operations– No security: cleartext passwords and other vulnerability

• Bandwidth issues– Suppliers ranked by apparent bandwidth & response time

Page 11: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 11

Napster: Limitations of Central Directory

• Single point of failure• Performance bottleneck• Copyright infringement

• So, later P2P systems were more distributed

File transfer is decentralized, but locating content is highly centralized

Page 12: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 12

Peer-to-Peer Networks: Gnutella

• Gnutella history– 2000: J. Frankel &

T. Pepper released Gnutella

– Soon after: many other clients (e.g., Morpheus, Limewire, Bearshare)

– 2001: protocol enhancements, e.g., “ultrapeers”

• Query flooding– Join: contact a few

nodes to become neighbors

– Publish: no need!– Search: ask neighbors,

who ask their neighbors

– Fetch: get file directly from another node

Page 13: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 13

Gnutella• Distribute file location• Idea: flood the request• Hot to find a file:

– Send request to all neighbors– Neighbors recursively multicast the request– Eventually a machine that has the file receives the

request, and it sends back the answer

• Advantages:– Totally decentralized, highly robust

• Disadvantages:– Not scalable; the entire network can be swamped with

request (to alleviate this problem, each request has a TTL)

Page 14: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 14

Gnutella• Ad-hoc topology• Queries are flooded for bounded number of hops• No guarantees on recall

Query: “xyz”

xyz

xyz

Page 15: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 15

Gnutella: Query Flooding

• Fully distributed– No central server

• Public domain protocol

• Many Gnutella clients implementing protocol

Overlay network: graph

• Edge between peer X and Y if there’s a TCP connection

• All active peers and edges is overlay net

• Given peer will typically be connected with < 10 overlay neighbors

Page 16: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 16

Gnutella: Protocol• Query message sent

over existing TCPconnections

• Peers forwardQuery message

• QueryHit sent over reversepath

Query

QueryHit

Query

Query

QueryHit

Query

Query

Query

Hit

File transfer:HTTP

Scalability:limited scopeflooding

Page 17: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 17

Gnutella: Peer Joining• Joining peer X must find some other peer in

Gnutella network: use list of candidate peers• X sequentially attempts to make TCP with

peers on list until connection setup with Y• X sends Ping message to Y; Y forwards Ping

message. • All peers receiving Ping message respond with

Pong message• X receives many Pong messages. It can then

setup additional TCP connections

Page 18: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 18

Gnutella: Pros and Cons

• Advantages– Fully decentralized– Search cost distributed– Processing per node permits powerful search

semantics

• Disadvantages– Search scope may be quite large– Search time may be quite long– High overhead and nodes come and go often

Page 19: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 19

Peer-to-Peer Networks: KaAzA

• KaZaA history– 2001: created by

Dutch company (Kazaa BV)

– Single network called FastTrack used by other clients as well

– Eventually the protocol changed so other clients could no longer talk to it

• Smart query flooding– Join: on start, the client

contacts a super-node (and may later become one)

– Publish: client sends list of files to its super-node

– Search: send query to super-node, and the super-nodes flood queries among themselves

– Fetch: get file directly from peer(s); can fetch from multiple peers at once

Page 20: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 20

KaZaA: Exploiting Heterogeneity

• Each peer is either a group leader or assigned to a group leader– TCP connection between

peer and its group leader– TCP connections between

some pairs of group leaders

• Group leader tracks the content in all its children

ordinary peer

group-leader peer

neighoring re la tionshipsin overlay network

Page 21: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 21

KaZaA: Motivation for Super-Nodes

• Query consolidation– Many connected nodes may have only a few files– Propagating query to a sub-node may take more

time than for the super-node to answer itself

• Stability– Super-node selection favors nodes with high up-time– How long you’ve been on is a good predictor of how

long you’ll be around in the future

Page 22: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 22

Peer-to-Peer Networks: BitTorrent

• BitTorrent history and motivation– 2002: B. Cohen debuted BitTorrent– Key motivation: popular content

• Popularity exhibits temporal locality (Flash Crowds)• E.g., Slashdot effect, CNN Web site on 9/11,

release of a new movie or game

– Focused on efficient fetching, not searching• Distribute same file to many peers• Single publisher, many downloaders

– Preventing free-loading

Page 23: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 23

BitTorrent: Simultaneous Downloading

• Divide large file into many pieces– Replicate different pieces on different peers– A peer with a complete piece can trade with

other peers– Peer can (hopefully) assemble the entire file

• Allows simultaneous downloading– Retrieving different parts of the file from

different peers at the same time

Page 24: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 24

BitTorrent Components• Seed

– Peer with entire file– Fragmented in pieces

• Leacher– Peer with an incomplete copy of the file

• Torrent file– Passive component– Stores summaries of the pieces to allow peers to

verify their integrity

• Tracker– Allows peers to find each other– Returns a list of random peers

Page 25: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 25

BitTorrent: Overall Architecture

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

TrackerWeb Server

.torr

ent

Page 26: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 26

BitTorrent: Overall Architecture

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

Get-announce

Web Server

Page 27: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 27

BitTorrent: Overall Architecture

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

Response-peer list

Web Server

Page 28: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 28

BitTorrent: Overall Architecture

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

Shake-hand

Web Server

Shake-hand

Page 29: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 29

BitTorrent: Overall Architecture

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

pieces

pieces

Web Server

Page 30: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 30

BitTorrent: Overall Architecture

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

piecespieces

pieces

Web Server

Page 31: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 31

BitTorrent: Overall Architecture

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

Get-

announce

Response-peer list

piecespieces

pieces

Web Server

Page 32: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 32

Free-Riding Problem in P2P Networks

• Vast majority of users are free-riders– Most share no files and answer no queries– Others limit # of connections or upload speed

• A few “peers” essentially act as servers– A few individuals contributing to the public good– Making them hubs that basically act as a server

• BitTorrent prevent free riding– Allow the fastest peers to download from you– Occasionally let some free loaders download

Page 33: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 33

Distributed Hash Tables (DHTs)

• Abstraction: a distributed hash-table data structure – insert(id, item);– item = query(id); (or lookup(id);)– Note: item can be anything: a data object,

document, file, pointer to a file…

• Proposals– CAN, Chord, Kademlia, Pastry, Tapestry, etc

Page 34: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 34

DHT Design Goals

• Make sure that an item (file) identified is always found

• Scales to hundreds of thousands of nodes

• Handles rapid arrival and failure of nodes

Page 35: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 35

• Distributed Hash Tables (DHTs)• Hash table interface: put(key,item), get(key)• O(log n) hops • Guarantees on recall

Structured Networks

K I

K I

K I

K I

K I

K I

K I

K I

K I

put(K1,I1)

(K1,I1)

get (K1)

I1

Page 36: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 36

Chord• Associate to each node and item a unique

id in an uni-dimensional space 0..2m-1

• Key design decision– Decouple correctness from efficiency

• Properties – Routing table size O(log(N)) , where N is the total number

of nodes– Guarantees that a file is found in O(log(N)) steps

Page 37: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 37

Identifier to Node Mapping Example

• Node 8 maps [5,8]• Node 15 maps [9,15]• Node 20 maps [16,

20]• …• Node 4 maps [59, 4]

• Each node maintains a pointer to its successor

4

20

3235

8

15

44

58

Page 38: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 38

Lookup• Each node maintains

its successor

• Route packet (ID, data) to the node responsible for ID using successor pointers

4

20

3235

8

15

44

58

lookup(37)

node=44

Page 39: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 39

Joining Operation

• Each node A periodically sends a stabilize() message to its successor B

• Upon receiving a stabilize() message, node B – returns its predecessor B’=pred(B) to A by sending a

notify(B’) message

• Upon receiving notify(B’) from B, – if B’ is between A and B, A updates its successor to B’ – A doesn’t do anything, otherwise

Page 40: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 40

Joining Operation• Node with id=50

joins the ring• Node 50 needs to

know at least one node already in the system– Assume known

node is 15

4

20

3235

8

15

44

58

50

succ=4pred=44

succ=nilpred=nil

succ=58pred=35

Page 41: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 41

Joining Operation• Node 50:

send join(50) to node 15

• Node 44: returns node 58

• Node 50 updates its successor to 58

4

20

3235

8

15

44

58

50

join(50)

succ=58

succ=4pred=44

succ=nilpred=nil

succ=58pred=35

58

Page 42: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 42

Joining Operation• Node 50:

send stabilize() to node 58

• Node 58: – update

predecessor to 50

– send notify() back

4

20

32

35

8

15

44

58

50

succ=58pred=nil

succ=58pred=35

stabilize()

notif

y(pr

ed=5

0)

pred=50succ=4pred=44

Page 43: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 43

Joining Operation (cont’d)• Node 44 sends a stabilize

message to its successor, node 58

• Node 58 reply with a notify message

• Node 44 updates its successor to 50

4

20

3235

8

15

44

58

50

succ=58stabilize()no

tify(

pred

=50)

succ=50

pred=50succ=4

pred=nil

succ=58pred=35

Page 44: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 44

Joining Operation (cont’d)• Node 44 sends a stabilize

message to its new successor, node 50

• Node 50 sets its predecessor to node 44

4

20

3235

8

15

44

58

50

succ=58

succ=50

Stabilize()pred=44

pred=50

pred=35

succ=4

pred=nil

Page 45: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 45

Joining Operation (cont’d)

• This completes the joining operation!

4

20

3235

8

15

44

58

50succ=58

succ=50

pred=44

pred=50

Page 46: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 46

Achieving Efficiency: finger tables

80 + 2080 + 21

80 + 22

80 + 23

80 + 24

80 + 25

(80 + 26) mod 27 = 16

0Say m=7

ith entry at peer with id n is first peer with id >= )2(mod2 min

i ft[i]0 961 962 963 964 965 1126 20

Finger Table at 80

32

4580

20112

96

Page 47: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 47

Achieving Robustness

• To improve robustness each node maintains the k (> 1) immediate successors instead of only one successor

• In the notify() message, node A can send its k-1 successors to its predecessor B

• Upon receiving notify() message, B can update its successor list by concatenating the successor list received from A with A itself

Page 48: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 48

Chord Optimizations

• Reduce latency– Chose finger that reduces expected time to reach

destination– Chose the closest node from range [N+2i-1,N+2i) as

successor

• Accommodate heterogeneous systems– Multiple virtual nodes per physical node

Page 49: Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella

winter 2008 P2P 49

DHT Conclusions• Distributed Hash Tables are a key component

of scalable and robust overlay networks• Chord: O(log n) state, O(log n) distance• Both can achieve stretch < 2• Simplicity is key• Services built on top of distributed hash

tables– persistent storage (OpenDHT, Oceanstore)– p2p file storage, i3 (chord)– multicast (CAN, Tapestry)