Peer-to-Peer (P2P) NetworksDr. Yingwu Zhu
OverviewCS ArchitectureP2P ArchitectureUnstructured P2P NetworksNapster, Gnutella, KaZza, FreenetBitTorrentStructured P2P NetworksChord, Pastry, Tapestry, CANWont be covered here!
Client/Sever ArchitectureWell known, powerful, reliable server is a data sourceClients request data from serverVery successful modelWWW (HTTP), FTP, Web services, etc.As more clients are added, the demand on the server increases!!!
Client/Server LimitationsScalability is hard to achievePresents a single point of failureRequires administrationUnused resources at the network edgeCPU cycles, storage, etc.P2P systems try to address these limitations
Why Study P2P?Huge fraction of traffic on networks today>=50%! Exciting new applicationsNext level of resource sharing Vs. timesharing, client-server, P2P E.g. Access 10s-100s of TB at low cost.
Users and Usage60M users of file-sharing in US8.5M logged in at a given time on average814M units of music sold in US last year140M digital tracks sold by music companiesAs of Nov, 35% of all Internet traffic was for BitTorrent, a single file-sharing systemMajor legal battles underway between recording industry and file-sharing companies
Share of Internet Traffic
Number of UsersOthers includeBitTorrent, eDonkey, iMesh,Overnet, Gnutella
BitTorrent (and others) gaining sharefrom FastTrack (Kazaa).
P2P ComputingP2P computing is the sharing of computer resources and services by direct exchange between systems.These resources and services include the exchange of information, processing cycles, cache storage, and disk storage for files.P2P computing takes advantage of existing computing power, computer storage and networking connectivity, allowing users to leverage their collective power to the benefit of all.
P2P ArchitectureAll nodes are both clients and serversProvide and consume dataAny node can initiate a connectionNo centralized data sourceThe ultimate form of democracy on the InternetThe ultimate threat to copy-right protection on the Internet
What is P2P?A distributed system architectureNo centralized controlTypically many nodes, but unreliable and heterogeneousNodes are symmetric in functionTake advantage of distributed, shared resources (bandwidth, CPU, storage) on peer-nodesFault-tolerant, self-organizingOperate in dynamic environment, frequent join and leave is the normInternet
P2P Network CharacteristicsClients are also servers and routersNodes contribute content, storage, memory, CPUNodes are autonomous (no administrativeauthority)Network is dynamic: nodes enter and leave the network frequentlyNodes collaborate directly with each other (not through well-known servers)Nodes have widely varying capabilities
P2P vs. Client/ServerPure P2P:No central serverFor certain requests any peer can function as a client, as a router, or as a serverThe information is not located in a central location but is distributed among all peersA peer may need to communicate with multiple peers to locate a piece of information
As more peers are added, both demand and capacity of the network increases !
P2P BenefitsEfficient use of resourcesUnused bandwidth, storage, processing power at the edge of the networkScalabilityConsumers of resources also donate resourcesAggregate resources grow naturally with utilizationReliabilityReplicasGeographic distributionNo single point of failureEase of administrationNodes self organizeNo need to deploy servers to satisfy demand (c.f. scalability)Built-in fault tolerance, replication, and load balancing
P2P TrafficsP2P networks generate more traffic than any other internet application2/3 of all bandwidth on some backbones
P2P Data FlowCacheLogic P2P file format analysis (2005)Streamsight used for Layer-7 Deep Packet Inspection
Category of P2P SystemsUnstructuredNo restriction on overlay structures and data placementNapster, Gnutella, Kazza, Freenet, BittorrentStructuredDistributed hash tables (DHTs)Place restrictions on overlay structures and data placementChord, Pastry, Tapestry, CAN
NapsterShare Music files, MP3 dataNodes register their contents (list of files) and IPs with serverCentralized server for searchesThe client sends queries to the centralized server for files of interestKeyword search (artist, song, album, bitrate, etc.)Napster server replies with IP address of users with matching filesFile download done on a peer to peer basisPoor scalabilitySingle point of failureLegal issues shutdownClientServerClientQuery ReplyFile Transfer
Napster: PublishI have X, Y, and Z!insert(X, 18.104.22.168)...22.214.171.124
Napster: SearchWhere is file A?search(A)-->126.96.36.199188.8.131.52
NapsterCentral Napster serverCan ensure correct resultsBottleneck for scalabilitySingle point of failureSusceptible to denial of serviceMalicious usersLawsuits, legislationSearch is centralizedFile transfer is direct (peer-to-peer)
Gnutella: Query FloodingBreadth-First Search (BFS)
Gnutella: Query FloodingA node/peer connects to a set of Gnutella neighborsForward queries to neighborsClient which has the Information responds.Flood network with TTL for termination+ Results are complete Bandwidth wastage
Gnutella vs. NapsterDecentralizedNo single point of failureNot as susceptible to denial of serviceCannot ensure correct resultsFlooding queriesSearch is now distributed but still not scalable
Gnutella: Random WalkImproved over query flooding Same overly structure to Gnutella Forward the query to random subset of it neighbors+ Reduced bandwidth requirements Incomplete results High latencyPeer nodes
Kazza (Fasttrack Networks)Hybrid of centralized Napster and decentralized GnutellaSuper-peers act as local search hubsEach super-peer is similar to a Napster server for a small portion of the networkSuper-peers are automatically chosen by the system based on their capacities (storage, bandwidth, etc.) and availability (connection time)Users upload their list of files to a super-peerSuper-peers periodically exchange file listsYou send queries to a super-peer for files of interestThe local super-peer may flood the queries to other super-peers for the files of interest, if it cannot satisfy the queries.Exploit the heterogeneity of peer nodes
KazzaUses supernodes to improve scalability, establish hierarchy Uptime, bandwidth Closed-sourceUses HTTP to carry out downloadEncrypted protocol; queuing, QoS
KaZaA: Network Design
KaZaA: File InsertI have X!insert(X, 184.108.40.206)...220.127.116.11
KaZaA: File SearchWhere is file A?
FreenetData flows in reverse path of queryImpossible to know if a user is initiating or forwarding a queryImpossible to know if a user is consuming or forwarding data
Smart queriesn Requests getrouted tocorrect peerbyincrementaldiscovery
BittorrentA popular P2P application for file exchange!
Problems to AddressTraditional Client/Server SharingPerformance deteriorates rapidly as the number of clients increasesFree-riding in P2P network Free riders only download without contributing to the network.
Basic IdeaChop file into many piecesA piece is broken into sub-pieces ... typically 16KB in sizePolicy: Until a piece is assembled, only download sub-pieces for that pieceThis policy lets complete pieces assemble quickly
Replicate DIFFERENT pieces on different peers as soon as possibleAs soon as a peer has a complete piece, it can trade it with other peersHopefully, we will be able to assemble the entire file at the end
File OrganizationPiece 256KBBlock 16KBFile4213Incomplete Piece
Critical Elements1 A web serverTo provide the metainfo file by HTTPFor example: http://bt.btchina.nethttp://bt.ydy.com/Web ServerThe Lord of Ring.torrentTroy.torrent
Critical Elements2 The .torrent fileStatic metainfo file to contain necessary information :NameSizeChecksumIP address (URL) of the TrackerPieces Piece length
Critical Elements3 A BitTorrent trackerNon-content-sharing nodeTrack peersFor example:http://bt.cnxp.com:8080/announcehttp://btfans.3322.org:6969/announcePeer cacheIP, port, peer idState informationCompletedDownloadingReturns random list
Critical Elements4 An end user (peer)Guys who want to use BitTorrent must install corresponding software or plug-in for web browsers.Downloader (leecher) : Peer has only a part ( or none ) of the file.Seeder: Peer has the complete file, and chooses to stay in the system to allow other peers to download
MessagesPeer Peer messagesTCP SocketsPeer Tracker messages HTTP Request/Response
BitTorrent joining a torrentPeers divided into: seeds: have the entire fileleechers: still downloadingdata requestpeer listmetadata file.torrentjoin1. obtain the metadata file2. contact the tracker 3. obtain a peer list (contains seeds & leechers)4. contact peers from that list for data
BitTorrent exchanging data Download sub-pieces in parallel! I have Verify pieces using hashes Advertise received pieces to the entire peer list Look for the rarest pieces
BitTorrent - unchoking Periodically calculate data-receiving rates Upload to (unchoke) the fastest downloaders Optimistic unchokingperiodically select a peer at random and upload t