Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff

Peer To PeerDistributed Systems

Pete Keleher

Why Distributed Systems?

Aggregate resources!– memory– disk– CPU cycles

Proximity to physical stuff– things with sensors– things that print– things that go boom– other people

Fault tolerance!– Don’t want one tsunami to take everything down

Why Peer To Peer Systems?

What’s peer to peer?

(Traditional) Client-Server

Server Clients

Peer To Peer

– Lots of reasonable machines• No one machine loaded more than others• No one machine irreplacable!

Peer-to-Peer (P2P)

Where do the machines come from?– “found” resources

• SETI @ home• BOINC

– existing resources• computing “clusters” (32, 64, ….)

What good is a peer to peer system?– all those things mentioned before, including

Storage: files, MP3’s, leaked documents, porn …

The lookup problem

Internet

N1

N2 N3

N6N5

N4

Publisher

Key=“title”Value=MP3 data… Client

Lookup(“title”)

?

Centralized lookup (Napster)

Publisher@

Client

Lookup(“title”)

N6

N9 N7

DB

N8

N3

N2N1SetLoc(“title”, N4)

Simple, but O(N) states and a single point of failure

Key=“title”Value=MP3 data…

N4

Flooded queries (Gnutella)

N4Publisher@

Client

N6

N9

N7N8

N3

N2N1

Robust, but worst case O(N) messages per lookup


Lookup(“title”)

Routed queries (Freenet, Chord, etc.)

N4Publisher

Client

N6

N9

N7N8

N3

N2N1

Lookup(“title”)


Bad load balance.

Routing challenges

Define a useful key nearness metric.

Keep the hop count small.– O(log N)

Keep the routing tables small.– O(log N)

Stay robust despite rapid changes.

Distributed Hash Tables to the Rescue!

Load Balance: Distributed hash function spreads keys evenly over the nodes (Consistent hashing).

Decentralization: Fully distributed (Robustness).

Scalability: Lookup grows as a log of number of nodes.

Availability: Automatically adjusts internal tables to reflect changes.

Flexible Naming: No constraints on key structure.

What’s a Hash?

Wikipedia: any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer

Example: Assume: N is a large prime ‘a’ means the ASCII code for the letter ‘a’ (it’s 97)

H(“pete”) =

= (H(“pe”) x N + ‘t’) x N + ‘e’

= (H(“pe”) x N + ‘t’) x N + ‘e’

= 451845518507

H(“pet”) x N + ‘e’ H(“pete”) mod 1000 = 507H(“peter”) mod 1000 = 131H(“petf”) mod 1000 = 986

H(“pete”) mod 1000 = 507H(“peter”) mod 1000 = 131H(“petf”) mod 1000 = 986

It’s a deterministic random number generator!

Chord (a DHT)

m-bit identifier space for both keys and nodes.

Key identifier = SHA-1(key).

Node identifier = SHA-1(IP address).

Both are uniformly distributed.

How to map key IDs to node IDs?

Consistent hashing [Karger 97]

N32

N90

N105

K80

K20

K5

Circular 7-bitID space

Key 5Node 105

A key is stored at its successor: node with next higher ID

Basic lookup

N32

N90

N105

N60

N10N120

K80

“ Where is key 80?”

“ N90 has K80”

Basic lookup

N32

N90

N105

N60

N10N120

K80


“ N90 has K80”

Basic lookup

N32

N90

N105

N60

N10N120

K80


“ N90 has K80”

Basic lookup

N32

N90

N105

N60

N10N120

K80


“ N90 has K80”

Basic lookup

N32

N90

N105

N60

N10N120

K80


“ N90 has K80”

“ Finger table” allows log(N)-time lookups

N80

½¼

1/8

1/161/321/641/128

Every node knows m other nodes in the ring

Finger i points to successor of n+2i-1

N80

½¼

1/8

1/161/321/641/128

112

N120

Each node knows more about portion of circle close to it

Lookups take O(log(N)) hops

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19


N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19


N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19


N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19


N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

Joining: linked list insert

N36

N40

N25

1. Lookup(36)K30K38

1. Each node’s successor is correctly maintained.

2. For every key k, node successor(k) is responsible for k.

Join (2)

N36

N40

N25

2. N36 sets its ownsuccessor pointer

K30K38

Initialize the new node finger table

Join (3)

N36

N40

N25

3. Set N25’s successorpointer

Update finger pointers of existing nodes

K30K38

Join (4)

N36

N40

N25

4. Copy keys 26..36from N40 to N36

K38

K30

Transferring keys

Stabilization Protocol

To handle concurrent node joins/fails/leaves.

Keep successor pointers up to date, then verify and correct finger table entries.

Incorrect finger pointers may only increase latency, but incorrect successor pointers may cause lookup failure.

Nodes periodically run stabilization protocol.

Won’t correct a Chord system that has split into multiple disjoint cycles, or a single cycle that loops multiple times around the identifier space.

Take Home Points

Hash used to uniformly distribute data, nodes across a range.

Random distribution balances load.

Awesome systems paper:– identify commonality across algorithms– restrict work to implementing that one simple

abstraction– use as building block

Documents

Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff