Models and Algorithms for Complex Networks Searching in P2P Networks

  • View
    213

  • Download
    0

Embed Size (px)

Transcript

  • Models and Algorithms for Complex NetworksSearching in P2P Networks

  • What is P2P?the sharing of computer resources and services by direct exchange of information

  • What is P2P?P2P is a class of applications that take advantage of resources storage, cycles, content, human presence available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable and unpredictable IP addresses P2P nodes must operate outside the DNS system and have significant, or total autonomy from central servers

  • What is P2P?A distributed network architecture may be called a P2P network if the participants share a part of their own resources. These shared resources are necessary to provide the service offered by the network. The participants of such a network are both resource providers and resource consumers

  • What is P2P?Various definitions seem to agree onsharing of resourcesdirect communication between equals (peers)no centralized control

  • Client/Server ArchitectureWell known, powerful, reliable server is a data sourceClients request data from server

    Very successful modelWWW (HTTP), FTP, Web services, etc.* Figure from http://project-iris.net/talks/dht-toronto-03.ppt

  • Client/Server LimitationsScalability is hard to achievePresents a single point of failureRequires administrationUnused resources at the network edge

    P2P systems try to address these limitations

  • P2P ArchitectureClients are also servers and routersNodes contribute content, storage, memory, CPUNodes are autonomous (no administrative authority)Network is dynamic: nodes enter and leave the network frequentlyNodes collaborate directly with each other (not through well-known servers)Nodes have widely varying capabilities

    * Content from http://project-iris.net/talks/dht-toronto-03.ppt

  • P2P Goals and BenefitsEfficient use of resourcesUnused bandwidth, storage, processing power at the edge of the networkScalabilityNo central information, communication and computation bottleneckAggregate resources grow naturally with utilizationReliabilityReplicasGeographic distributionNo single point of failureEase of administrationNodes self-organizeBuilt-in fault tolerance, replication, and load balancingIncreased autonomyAnonymity Privacynot easy in a centralized systemDynamismhighly dynamic environmentad-hoc communication and collaboration

  • P2P ApplicationsAre these P2P systems?

    File sharing (Napster, Gnutella, Kazaa)

    Multiplayer games (Unreal Tournament, DOOM)

    Collaborative applications (ICQ, shared whiteboard)

    Distributed computation (Seti@home)

    Ad-hoc networks

    We will focus on information sharing P2P systems

  • Information sharing P2P systemsThe resource to be shared is information (e.g. files)The participants create an overlay network over a physical network (e.g. the Internet)P2P search problem: locate the requested information in the overlay network efficientlysmall number of messages and hopslow latencyload balanceeasy to update in a highly dynamic setting

  • Popular file sharing P2P SystemsNapster, Gnutella, Kazaa, Freenet

    Large scale sharing of files.User A makes files (music, video, etc.) on their computer available to othersUser B connects to the network, searches for files and downloads files directly from user A

    Issues of copyright infringement

  • Napsterprogram for sharing files over the Interneta disruptive application/technology?history:5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service12/99: first lawsuit3/00: 25% UWisc traffic Napster2000: est. 60M users2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws 7/01: # simultaneous online users:Napster 160K, Gnutella: 40K, Morpheus: 300K

  • Napster: how does it workApplication-level, client-server protocol over point-to-point TCP

    Four steps:Connect to Napster serverUpload your list of files (push) to server.Give server keywords to search the full list with.Select best of correct answers. (pings)

  • Napsternapster.comusersFile list is uploaded1.

  • Napsternapster.comuserRequestandresultsUser requests search at server.2.

  • Napsternapster.comuserpingspingsUser pings hosts that apparently have data.

    Looks for best transfer rate.3.

  • Napsternapster.comuserRetrievesfileUser retrieves file4.

  • NapsterCentral Napster serverCan ensure correct resultsFast search

    Bottleneck for scalabilitySingle point of failureSusceptible to denial of serviceMalicious usersLawsuits, legislation

    Hybrid P2P system all peers are equal but some are more equal than othersSearch is centralizedFile transfer is direct (peer-to-peer)

  • GnutellaShare any type of files (not just music)Completely decentralized method of searching for filesapplications connect to peer applications

    each application instance serves to:store selected filesroute queries (file searches) from and to its neighboring peersrespond to queries (serve file) if file stored locallyGnutella history:3/14/00: release by AOL, almost immediately withdrawntoo late: 23K users on Gnutella at 8 am this AMreverse engineered. many iterations to fix poor initial design

  • GnutellaSearching by flooding:If you dont have the file you want, query 7 of your neighbors.If they dont have it, they contact 7 of their neighbors, for a maximum hop count of 10.Requests are flooded, but there is no tree structure.No looping but packets may be received twice.Reverse path forwarding

    * Figure from http://computer.howstuffworks.com/file-sharing.htm

  • Gnutella fool.* ? TTL = 2

  • Gnutella TTL = 1TTL = 1IPX:fool.herfool.herXTTL = 1

  • Gnutella fool.youfool.meYIPY:fool.me fool.you

  • Gnutella IPY:fool.me fool.you

  • Gnutella: strengths and weaknessespros:flexibility in query processingcomplete decentralizationsimplicityfault tolerance/self-organizationcons:severe scalability problemssusceptible to attacksPure P2P system

  • Gnutella: initial problems and fixes2000: avg size of reachable network only 400-800 hosts. Why so small?modem users: not enough bandwidth to provide search routing capabilities: routing black holesFix: create peer hierarchy based on capabilitiespreviously: all peers identical, most modem black holespreferential connection:favors routing to well-connected peersfavors replies to clients that themselves serve large number of files: prevent freeloading

  • Kazaa (Fasttrack network)Hybrid of centralized Napster and decentralized Gnutellahybrid P2P system

    Super-peers act as local search hubsEach super-peer is similar to a Napster server for a small portion of the networkSuper-peers are automatically chosen by the system based on their capacities (storage, bandwidth, etc.) and availability (connection time)

    Users upload their list of files to a super-peerSuper-peers periodically exchange file listsYou send queries to a super-peer for files of interest

  • AnonymityNapster, Gnutella, Kazaa dont provide anonymityUsers know who they are downloading fromOthers know who sent a query

    FreenetDesigned to provide anonymity among other features

  • FreenetKeys are mapped to IDsEach node stores a cache of keys, and a routing table for some keysrouting table defines the overlay networkQueries are routed to the node with the most similar keyData flows in reverse path of queryImpossible to know if a user is initiating or forwarding a queryImpossible to know if a user is consuming or forwarding dataKeys replicated in (some) of the nodes along the path

  • Freenet routing

    K117 ?

    TTL = 8

    N01

    N02

    N03

    N04

    N07

    N06

    N05

    N08

    K124N02

    K317N08

    K613N05

  • Freenet routing

    K117 ?

    TTL = 7

    N01

    N02

    N03

    N04

    N07

    N06

    N05

    N08

    K124N02

    K317N08

    K613N05

    K514N01

    TTL = 6

    sorry...

  • Freenet routing

    K117 ?

    TTL = 7

    N01

    N02

    N03

    N04

    N07

    N06

    N05

    N08

    K124N02

    K317N08

    K613N05

    K514N01

    TTL = 6

    K100N07

    K222N06

    K617N05

    TTL = 5

  • Freenet routing

    K117 ?

    TTL = 7

    N01

    N02

    N03

    N04

    N07

    N06

    N05

    N08

    K124N02

    K317N08

    K613N05

    K514N01

    TTL = 6

    K100N07

    K222N06

    K617N05

    TTL = 5

    117

    117

  • Freenet routing

    K117 ?

    TTL = 7

    N01

    N02

    N03

    N04

    N07

    N06

    N05

    N08

    K124N02

    K317N08

    K613N05

    K514N01

    TTL = 6

    K117N07

    K100N07

    K222N06

    K617N05

    TTL = 5

    117

    117

    117

  • Freenet routing

    N01

    N02

    N03

    N04

    N07

    N06

    N05

    N08

    K117N07

    K124N02

    K317N08

    K613N05

    K514N01

    K117N07

    K100N07

    K222N06

    K617N05

    117

    117

    117

  • Freenet routing N01 N02 N03 N04 N07 N06 N05 N08K117 N08K124 N02K317 N08 K613 N05 K514 N01 K117 N08 K100 N07 K222 N06 K617 N05 117 117 117 Inserts are performed similarly they are unsuccessful queries

  • Freenet strengths and weaknessespros:complete decentralizationfault tolerance/self-organizationanonymityscalability (to some degree)cons:questionable efficiency & performancerare keys disappear from the systemimprovement of performance requires high overhead (maintenance of additional information for routing)