View
51
Download
1
Category
Preview:
DESCRIPTION
Peer to Peer Technologies. Roy Werber Idan Gelbourt. prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001. Lecture Overview. 1 st Part: The P2P communication model, architecture and applications 2nd Part: Chord and CFS. Peer to Peer - Overview. - PowerPoint PPT Presentation
Citation preview
Peer to Peer Technologies
Roy WerberIdan Gelbourt
prof. Sagiv’s Seminar
The Hebrew University of Jerusalem, 2001
Lecture Overview
1st Part:The P2P communication model, architecture
and applications
2nd Part:Chord and CFS
Peer to Peer - Overview
A class of applications that takes advantage of resources:Storage, CPU cycles, content, human presence
Available at the edges of the Internet
A decentralized system that must cope with the unstable nature of computers located at the network edge
Client/Server Architecture
An architecture in which each process is a client or a server
Servers are powerful computers dedicated for providing services – storage, traffic, etc
Clients rely on servers for resources
Client/Server Properties
Big, strong serverWell known port/address of the serverMany to one relationshipDifferent software runs on the client/serverClient can be dumb (lacks functionality),
server performs for the clientClient usually initiates connection
Client Server Architecture
Server
Client
Client Client
Client
Internet
Client/Server Architecture
GET /index.html HTTP/1.0
HTTP/1.1 200 OK ...Client Server
Disadvantages of C/S Architecture
Single point of failureStrong expensive serverDedicated maintenance (a sysadmin)Not scalable - more users, more servers
Solutions
• Replication of data (several servers)• Problems:
• redundancy, synchronization, expensive
• Brute force (a bigger, faster server)• Problems:
• Not scalable, expensive, single point of failure
The Client Side
Although the model hasn’t changed over the years, the entities in it have
Today’s clients can perform more roles than just forwarding users requests
Today’s clients have:More computing powerStorage space
Thin Client
Performs simple tasks:I/O
Properties:CheapLimited processing powerLimited storage
Fat Client
Can perform complex tasks:GraphicsData manipulationEtc…
Properties: Strong computation powerBigger storageMore expensive than thin
Evolution at the Client Side
IBM PC @ 4.77MHz
360k diskettes
A PC @ 2GHz
40GB HD
DEC’S VT100No storage
‘70 ‘80 2001
What Else Has Changed?
The number of home PCs is increasing rapidlyPCs with dynamic IPs
Most of the PCs are “fat clients” Software cannot cope with hardware development As the Internet usage grow, more and more PCs
are connecting to the global net Most of the time PCs are idle
How can we use all this?
Sharing
Definition:1. To divide and distribute in shares
2. To partake of, use, experience, occupy, or enjoy with others
3. To grant or give a share in intransitive senses
Merriam Webster’s online dictionary (www.m-w.com)
There is a direct advantage of a co-operative network versus a single computer
Resources Sharing
What can we share?Computer resources
Shareable computer resources:“CPU cycles” - seti@homeStorage - CFSInformation - Napster / GnutellaBandwidth sharing - Crowds
SETI@Home
SETI – Search for ExtraTerrestrial Intelligence
@Home – On your own computerA radio telescope in Puerto Rico scans the
sky for radio signalsFills a DAT tape of 35GB in 15 hoursThat data has to be analyzed
SETI@Home (cont.)
The problem – analyzing the data requires a huge amount of computation
Even a supercomputer cannot finish the task on its own
Accessing a supercomputer is expensive
What can be done?
SETI@Home (cont.)
Can we use distributed computing?YEAH
Fortunately, the problem be solved in parallel - examples:Analyzing different parts of the skyAnalyzing different frequenciesAnalyzing different time slices
SETI@Home (cont.)
The data can be divided into small segments
A PC is capable of analyzing a segment in a reasonable amount of time
An enthusiastic UFO searcher will lend his spare CPU cycles for the computationWhen? Screensavers
SETI@Home - Example
SETI@Home - Summary
SETI reverses the C/S modelClients can also provide servicesServers can be weaker, used mainly for storage
Distributed peers serving the centerNot yet P2P but we’re close
Outcome - great results:Thousands of unused CPU hours tamed for the
mission3+ millions of users
What Exactly is P2P?
A distributed communication model with the properties:All nodes have identical responsibilitiesAll communication is symmetric
P2P Properties
Cooperative, direct sharing of resourcesNo central serversSymmetric clients
Client
Client
Client Client
Client
Internet
P2P Advantages
Harnesses client resources Scales with new clients Provides robustness under failures Redundancy and fault-tolerance Immune to DoS Load balance
P2P Disadvantages -- A Tough Design Problem
How do you handle a dynamic network (nodes join and leave frequently)
A number of constrains and uncontrolled variables:No central serversClients are unreliableClient vary widely in the resources they provideHeterogeneous network (different platforms)
Two Main Architectures
Hybrid Peer-to-PeerPreserves some of the traditional C/S
architecture. A central server links between clients, stores indices tables, etc
Pure Peer-to-PeerAll nodes are equal and no functionality is
centralized
Hybrid P2P
A main server is responsible for various administrative operations:Users’ login and logoutStoring metadataDirecting queries
Example: Napster
Examples - Napster
Napster is a program for sharing information (mp3 music files) over the Internet
Created by Shawn Fanning in 1999 although similar services were already present (but lacked popularity and functionality)
Napster Sharing Style: hybrid center+edge
“slashdot”•song5.mp3•song6.mp3•song7.mp3
“kingrook”•song4.mp3•song5.mp3•song6.mp3
•song5.mp3
1. Users launch Napster and connect to Napster server
3. beastieboy enters search criteria
4. Napster displays matches to beastieboy
2. Napster creates dynamic directory from users’ personal .mp3 libraries
Title User Speed song1.mp3 beasiteboy DSLsong2.mp3 beasiteboy DSLsong3.mp3 beasiteboy DSLsong4.mp3 kingrook T1song5.mp3 kingrook T1song5.mp3 slashdot 28.8song6.mp3 kingrook T1song6.mp3 slashdot 28.8song7.mp3 slashdot 28.8
5. beastieboy makes direct connection to kingrook for file transfer
s o n g 5
“beastieboy”•song1.mp3•song2.mp3•song3.mp3
What About Communication Between Servers?
Each Napster server creates its own mp3 exchange community: rock.napster.com, dance.napster.com, etc…
Creates a separation which is bad We would like multiple servers to share a
common ground. Reduces the centralization nature of each server, expands searchability
Various HP2P Models –1. Chained Architecture
Chained architecture – a linear chain of serversClients login to a random serverQueries are submitted to the server
If the server satisfies the query – DoneOtherwise – Forward the query to the next server
Results are forwarded back to the first serverThe server merges the resultsThe server returns the results to the client
Used by OpenNap network
2. Full Replication Architecture
Replication of constantly updated metadataA client logs on to a random server The server sends the updated metadata to all
serversResult:
All servers can answer queries immediately
3. Hash Architecture
Each server holds a portion of the metadataEach server holds the complete inverted list for a
subset of all wordsClient directs a query to a server that is responsible
for at least one of the keywordsThat server gets the inverted lists for all the keywords
from the other serversThe server returns the relevant results to the client
4. Unchained Architecture
Independent servers which do not communicate with each other
A client who logs on to one server can only see the files of other users at the same local server
A clear disadvantage of separating users into distinct domains
Used by Napster
Pure P2P
All nodes are equalNo centralized server
Example: Gnutella
A completely distributed P2P networkGnutella network is composed of clientsClient software is made of two parts:
A mini search engine – the clientA file serving system – the “server”
Relies on broadcast search
Gnutella - Operations
Connect – establishing a logical connection
PingPong – discovering new nodes (my friend’s friends)
Query – look for somethingDownload – download files (simple HTTP)
Gnutella – Form an Overlay
ConnectOKPingPing
Pin
g
Pin
g
Pon
g
Po n
g
PongPongPong
Pong
How to find a node?
Initially, ad hoc waysEmail, online chat, news groups…Bottom line: you got to know someone!
Set up some long-live nodesNew comer contacts the well-known nodesUseful for building better overlay topology
Gnutella – Search
Green Toad Green ToadGreen
Toad
Gre
en T
oad I h
aveI haveI have
•Toad A – look nice•Toad B – too far
A
B
I have
On a larger scale, things get more complicated
Gnutella – Scalability Issue
Can the system withstand flooding from every node?
Use TTL to limit the range of propagation5 ^ 5 = 3125, how much can you get ?Creates an “horizon” of computersThe promise is an expectation that you can
change horizon everyday when login
The Differences
While the pure P2P model is completely symmetric, in the hybrid model elements of both PP2P and C/S coexist
Each model has its disadvantagesPP2P is still having problems locating
informationHP2P is having scalability problems as with
ordinary server oriented models
P2P – Summary
The current settings allowed P2P to enter the world of PCs
Controls the niche of sharing resourcesThe model is being studied from the
academic and commercial point of view
There are still problems out there…
End Of Part I
Part II
Roy WerberIdan Gelbourt
Robert MorrisIon Stoica, David Karger,
M. Frans Kaashoek, Hari BalakrishnanMIT and Berkeley
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
A P2P Problem
Every application in a P2P environment must handle an important problem:
The lookup problem
What is the problem?
A Peer-to-peer Storage Problem
1000 scattered music enthusiastsWilling to store and serve replicasHow do you find the data?
The Lookup Problem
Internet
N1
N2 N3
N6N5
N4
Publisher
Key=“title”Value=MP3 data… Client
Lookup(“title”)
?
Dynamic network with N nodes, how can the data be found?
Centralized Lookup (Napster)
Publisher@
Client
Lookup(“title”)
N6
N9 N7
DB
N8
N3
N2N1SetLoc(“title”, N4)
Simple, but O(N) state and a single point of failure
Key=“title”Value=MP3 data…
N4
Hard to keep the data in the server updated
Hard to keep the data in the server updated
Flooded queries (Gnutella)
N4Publisher@
Client
N6
N9
N7N8
N3
N2N1
Robust, but worst case O(N) messages per lookup
Key=“title”Value=MP3 data…
Lookup(“title”)
Not scalableNot scalable
So Far
Centralized :
- Table size – O(n)
- Number of hops – O(1)Flooded queries:
- Table size – O(1)
- Number of hops – O(n)
We Want
Efficiency : O(log(N)) messages per lookupN is the total number of servers
Scalability : O(log(N)) state per nodeRobustness : surviving massive
failures
How Can It Be Done?
How do you search in O(log(n)) time?
Binary search You need an ordered array How can you order nodes in a network and data
items? Hash function!
Chord: Namespace
Namespace is a fixed length bit string Each object is identified by a unique ID How to get the ID?
Shark SHA-1
Object ID:DE11AC
SHA-1
Object ID:AABBCC
194.90.1.5:8080
Chord Overview
Provides just one operation :A peer-to-peer hash lookup:
Lookup(key) IP addressChord does not store the data
Chord is a lookup service, not a search service
It is a building block for P2P applications
Chord IDs
Uses Hash function:Key identifier = SHA-1(key)Node identifier = SHA-1(IP address)
Both are uniformly distributedBoth exist in the same ID space
How to map key IDs to node IDs?
Mapping Keys To Nodes
0 M
- an item
- a node
Consistent Hashing [Karger 97]
N32
N90
N105
K80
K20
K5
Circular 7-bitID space
Key 5Node 105
A key is stored at its successor: node with next higher ID
Basic Lookup
N32
N90
N105
N60
N10N120
K80
“Where is key 80?”
“N90 has K80”
“Finger Table” Allows Log(n)-time Lookups
N801/128
½¼
1/8
1/161/321/64
Circular 7-bitID space
N80 knows of only seven other nodes.
Finger i Points to Successor of N+2i
N80
½¼
1/8
1/161/321/641/128
112
N120
Lookups Take O(log(n)) Hops
N32
N10
N5
N20
N110
N99
N80
N60
Lookup(K19)
K19
Joining: Linked List Insert
N36
N40
N25
1. Lookup(36)K30K38
1. N36 wants to join. He finds his successor
Join (2)
N36
N40
N25
2. N36 sets its own successor pointer
K30K38
Join (3)
N36
N40
N25
3. Copy keys 26..36 from N40 to N36
K30K38
K30
Join (4)
4. Set N25’s successor pointer
Update finger pointers in the backgroundCorrect successors produce correct lookups
N36
N40
N25
K30K38
K30
Join: Lazy Finger Update Is OK
N36
N40
N25
N2
K30
N2 finger should now point to N36, not N40Lookup(K30) visits only nodes < 30, will undershoot
Failures Might Cause Incorrect Lookup
N120
N113
N102
N80
N85
N80 doesn’t know correct successor, so incorrect lookup
N10
Lookup(90)
Solution: Successor Lists
Each node knows r immediate successors After failure, will know first live successor Correct successors guarantee correct lookups
Guarantee is with some probability
Choosing the Successor List Length
Assume 1/2 of nodes failP(successor list all dead) = (1/2)r
i.e. P(this node breaks the Chord ring)Depends on independent failure
P(no broken nodes) = (1 – (1/2)r)N
If we choose :
r = 2log(N) makes prob. = 1 – 1/N
Chord Properties
Log(n) lookup messages and table space.Well-defined location for each ID.
No search required.
Natural load balance.No name structure imposed.Minimal join/leave disruption.Does not store documents…
Experimental Overview
Quick lookup in large systemsLow variation in lookup costsRobust despite massive failureSee paper for more results
Experiments confirm theoretical results
Chord Lookup Cost Is O(log N)
Number of Nodes
Avera
ge M
ess
ag
es
per
Looku
p
Constant is 1/2
Failure Experimental Setup
Start 1,000 CFS/Chord serversSuccessor list has 20 entries
Wait until they stabilizeInsert 1,000 key/value pairs
Five replicas of each
Stop X% of the serversImmediately perform 1,000 lookups
Massive Failures Have Little Impact
0
0.2
0.4
0.6
0.8
1
1.2
1.4
5 10 15 20 25 30 35 40 45 50
Faile
d L
ooku
ps
(Perc
en
t)
Failed Nodes (Percent)
(1/2)6 is 1.6%
Chord Summary
Chord provides peer-to-peer hash lookupEfficient: O(log(n)) messages per lookupRobust as nodes fail and joinGood primitive for peer-to-peer systems
http://www.pdos.lcs.mit.edu/chord
Wide-area Cooperative Storage With CFS
Robert MorrisFrank Dabek, M. Frans Kaashoek,David Karger, Ion Stoica
MIT and Berkeley
What Can Be Done With Chord
Cooperative MirroringTime-Shared Storage
Makes data available when offline
Distributed IndexesSupport Napster keyword search
How to Mirror Open-source Distributions?
Multiple independent distributionsEach has high peak load, low average
Individual servers are wastefulSolution: aggregate
Option 1: single powerful serverOption 2: distributed service
But how do you find the data?
Design Challenges
Avoid hot spots Spread storage burden evenly Tolerate unreliable participants Fetch speed comparable to whole-file TCP Avoid O(#participants) algorithms
Centralized mechanisms [Napster], broadcasts [Gnutella]
CFS solves these challenges
CFS Overview
CFS – Cooperative File System:P2P read-only storage system
Read-only – only the owner can modify files
Completely decentralized
node
client server
node
clientserverInternet
CFS - File System
A set of blocks distributed over the CFS servers
3 layers:FS – interprets blocks as files (Unix V7)Dhash – performs block managementChord – maintains routing tables used to find
blocks
Chord
Uses 160-bit identifier spaceAssigns to each node and block an
identifierMaps block’s id to node’s idPerforms key lookups (as we saw earlier)
Dhash – Distributed Hashing
Performs blocks management on top of chord :Block’s retrieval ,storage and caching
Provides load balance for popular filesReplicates each block at a small number
of places (for fault-tolerance)
CFS - Properties
Tested on prototype :EfficientRobustLoad-balanced ScalableDownload as fast as FTP
DrawbacksNo anonymityAssumes no malicious participants
Design Overview
FS
Dhash
Chord
Dhash
Chord
• DHash stores, balances, replicates, caches blocks
• DHash uses Chord [SIGCOMM 2001] to locate blocks
Client-server Interface
Files have unique names Files are read-only (single writer, many readers) Publishers split files into blocks Clients check files for authenticity
FS Client serverInsert file f
Lookup file f
Insert block
Lookup block
node
server
node
Naming and Authentication
1. Name could be hash of file content Easy for client to verify But update requires new file name
2. Name could be a public key Document contains digital signature Allows verified updates w/ same name
CFS File Structure
Public key
Root block
signature
H(D)
D
Directory block
H(F)
F
Inode block
H(B1)B1
B2
H(B2)
Data block
File Storage
Data is stored for an agreed-upon finite interval
Extensions can be requestedNo specific delete commandAfter expiration – the blocks fade
Storing Blocks
Long-term blocks are stored for a fixed timePublishers need to refresh periodically
Cache uses LRU (Least Recently Used)
disk: cache Long-term block storage
Replicate Blocks at k Successors
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
Replica failure is independent
Lookups Find Replicas
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
1.3.
2.
4.
Lookup(BlockID=17)
RPCs:1. Lookup step2. Get successor list3. Failed block fetch4. Block fetch
First Live Successor Manages Replicas
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
Copy of17
DHash Copies to Caches Along Lookup Path
N40
N10
N5
N20
N110
N99
N80
N60
Lookup(BlockID=45)
N50
N68
1.
2.
3.
RPCs:1. Chord lookup2. Chord lookup3. Block fetch4. Send to cache
4.
4.
Naming and Caching
D30 @ N32Client 1
Client 2
Every hop is smaller,the chance of collision when doing lookup is high
Caching is efficient
Caching Doesn’t Worsen Load
N32
• Only O(log N) nodes have fingers pointing to N32• This limits the single-block load on N32
Virtual Nodes Allow Heterogeneity – Load Balancing
Hosts may differ in disk/net capacity Hosts may advertise multiple IDs
Chosen as SHA-1(IP Address, index)Each ID represents a “virtual node”
Host load proportional to # v.n.’s Manually controlled
Node A
N60N10 N101
Node B
N5
Server Selection By Chord
N80 N48
100ms
10ms
• Each node monitors RTTs to its own fingers• Tradeoff: ID-space progress vs delay
N25
N90
N96
N18N115
N70
N37
N55
50ms
12ms
Lookup(47)
Why Blocks Instead of Files?
Cost: one lookup per blockCan tailor cost by choosing good block size
Benefit: load balance is simpleFor large filesStorage cost of large files is spread outPopular files are served in parallel
CFS Project Status
Working prototype software Some abuse prevention mechanismsGuarantees authenticity of files, updates,
etc. Napster-like interface in the works
Decentralized indexing system Some measurements on RON testbed Simulation results to test scalability
Experimental Setup (12 nodes)
One virtual node per host 8Kbyte blocks RPCs use UDP
CA-T1CCIArosUtah
CMU
To vu.nlLulea.se
MITMA-CableCisco
Cornell
NYU
OR-DSL
• Caching turned off• Proximity routing
turned off
CFS Fetch Time for 1MB File
• Average over the 12 hosts• No replication, no caching; 8 KByte blocks
Fetc
h T
ime (
Seco
nd
s)
Prefetch Window (KBytes)
Distribution of Fetch Times for 1MB
Fract
ion
of
Fetc
hes
Time (Seconds)
8 Kbyte Prefetch
24 Kbyte Prefetch40 Kbyte Prefetch
CFS Fetch Time vs. Whole File TCP
Fract
ion
of
Fetc
hes
Time (Seconds)
40 Kbyte Prefetch
Whole File TCP
Robustness vs. Failures
Faile
d L
ooku
ps
(Fra
ctio
n)
Failed Nodes (Fraction)
(1/2)6 is 0.016
Six replicasper block;
Future work
Test load balancing with real workloads Deal better with malicious nodes Indexing Other applications
CFS Summary
CFS provides peer-to-peer r/o storageStructure: DHash and ChordIt is efficient, robust, and load-balancedIt uses block-level distributionThe prototype is as fast as whole-file TCP
http://www.pdos.lcs.mit.edu/chord
The End
Recommended