Upload
frederick-stone
View
219
Download
1
Embed Size (px)
Citation preview
1
School of Computing ScienceSimon Fraser University
CMPT 880: Internet Architectures and CMPT 880: Internet Architectures and ProtocolsProtocols
Introduction to Peer-to-Peer Systems
Instructor: Dr. Mohamed HefeedaInstructor: Dr. Mohamed Hefeeda
3
P2P Computing: Definitions
Peers cooperate to achieve desired functions- Peers:
• End-systems (typically, user machines)
• Interconnected through an overlay network
• Peer ≡ Like the others (similar or behave in similar manner)
- Cooperate: • Share resources, e.g., data, CPU cycles, storage, bandwidth
• Participate in protocols, e.g., routing, replication, …
- Functions: • File-sharing, distributed computing, communications,
content distribution, …
Note: the P2P concept is much wider than file sharing
5
When Did P2P Start?
Napster (Late 1990’s)- Court shut Napster down in 2001
Gnutella (2000) Then the killer FastTrack (Kazaa, ...) BitTorrent, and many others Accompanied by significant research interest Claim
- P2P is much older than Napster!
Proof- The original Internet!
- Remember UUCP (unix-to-unix copy)?
6
What IS and IS NOT New in P2P?
What is not new- Concepts!
What is new- The term P2P (may be!)
- New characteristics of • Nodes which constitute the
• System that we build
7
What IS NOT New in P2P?
Distributed architectures Distributed resource sharing Node management (join/leave/fail) Group communications Distributed state management ….
8
What IS New in P2P?
Nodes (Peers)- Quite heterogeneous
• Several order of magnitudes difference in resources
• Compare the bandwidth of a dial-up peer versus a high-speed LAN peer
- Unreliable• Failure is the norm!
- Offer limited capacity• Load sharing and balancing are critical
- Autonomous• Rational, i.e., maximize their own benefits!
• Motivations should be provided to peers to cooperate in a way that optimizes the system performance
9
What IS New in P2P? (cont’d)
System - Scale
• Numerous number of peers (millions)
- Structure and topology• Ad-hoc: No control over peer joining/leaving
• Highly dynamic
- Membership/participation• Typically open
- More security concerns• Trust, privacy, data integrity, …
- Cost of building and running• Small fraction of same-scale centralized systems
• How much would it cost to build/run a super computer with processing power of that 3 Million SETI@Home PCs?
10
What IS New in P2P? (cont’d)
So what? We need to design new lighter-weight
algorithms and protocols to scale to millions (or billions!) of nodes given the new characteristics
Question: why now, not two decades ago?- We did not have such abundant (and
underutilized) computing resources back then!
- And, network connectivity was very limited
11
Why is it Important to Study P2P?
P2P traffic is a major portion of Internet traffic (50+%), current killer app
P2P traffic has exceeded web traffic (former killer app)!
Direct implications on the design, administration, and use of computer networks and network resources
- Think of ISP designers or campus network administrators
Many potential distributed applications
12
Sample P2P Applications
File sharing- Gnutella, Kazaa, Napster, …
Distributed cycle sharing- SETI@home, Gnome@home, …
File and storage systems- OceanStore, CFS, Freenet, Farsite, …
Media streaming and content distribution- PROMISE
- SplitStream, CoopNet, PeerCast, Bullet, Zigzag, NICE, …
13
P2P vs its Cousin (Grid Computing)
Common Goal:- Aggregate resources (e.g., storage, CPU
cycles, and data) into a common pool and provide efficient access to them
Differences along five axes [Foster & Imanitchi 03] - Target communities and applications
- Type of shared resources
- Scalability of the system
- Services provided
- Software required
14
P2P vs Grid Computing (cont’d)
Issue Grid P2P
Communities and Applications
Established communities, e.g., scientific institutions Computationally-intensive problems
Grass-root communities (anonymous) Mostly, file-swapping
Resources Shared
Powerful and Reliable machines, clusters High-speed connectivity Specialized instruments
PCs with limited capacity and connectivity Unreliable Very diverse
15
P2P vs Grid Computing (cont’d)
Issue Grid P2P
System Scalability
Hundreds to thousands of nodes
Hundreds of thousands to Millions of nodes
Services Provided
Sophisticated services: authentication, resources discovery, scheduling, access control, and membership control Members usually trust others
Limited services: resource discovery limited trust among peers
Software required
Sophisticated suit: e.g., Globus, Condor
Simple: (screen saver), e.g., Kazza, SETI@Home
16
P2P vs Grid Computing: Discussion
The differences mentioned are based on the traditional view of each paradigm
- In the future, it is conceived that both paradigms will converge and will complement each other [e.g., Butt et al. 03]
Target communities and applications- Grid: is going open
Type of shared resources- P2P: is to include various and more powerful resources
Scalability of the system- Grid: is to increase number of nodes
Services provided- P2P: is to provide authentication, data integrity, trust
management, …
17
P2P Systems: Simple Model
P2P Substrate
Operating System
Hardware
Middleware
P2P Application
Software architecture model on a peer
System architecture: Peers form an overlay according to the P2P
Substrate
18
Overlay Network
An abstract layer built on top of the physical network
Neighbors in the overlay can be several hops away in the physical network
Why do we need overlays?- Flexibility in
• Choosing neighbors
• Forming and customizing topology to fit application needs (e.g., short delay, reliability, high BW, …)
• Designing communication protocols among nodes
- Get around limitations in legacy networks
- Enable new (and old!) network services
20
Overlay Network (cont’d)
Some applications that use overlays- Application level multicast, e.g., ESM, Zigzag,
NICE, …
- Reliable inter-domain routing, e.g., RON
- Content Distribution Networks (CDN)
- Peer-to-peer file sharing
Overlay design issues- Select neighbors
- Handle node arrivals, departures
- Detect and handle failures (nodes, links)
- Monitor and adapt to network dynamics
23
Peer Software Architecture Model
A software client installed on each peer
Three components:
- P2P Substrate
- Middleware
- P2P Application
P2P Substrate
Operating System
Hardware
Middleware
P2P Application
Software architecture model on a peer
24
Peer Software Architecture Model (cont’d)
P2P Substrate (key component)- Overlay management
• Construction
• Maintenance (peer join/leave/fail and network dynamics)
- Resource management• Allocation (storage)
• Discovery (routing and lookup)
Can be classified according to the flexibility of placing objects at peers
25
P2P Substrates: Classification
Structured (or tightly controlled, DHT) − Objects are rigidly assigned to specific peers
− Looks like as a Distributed Hash Table (DHT)
− Efficient search & guarantee of finding
− Lack of partial name and keyword queries
− Maintenance overhead
− Ex: Chord, CAN, Pastry, Tapestry, Kademila (Overnet)
Unstructured (or loosely controlled)− Objects can be anywhere
− Support partial name and keyword queries
− Inefficient search & no guarantee of finding
− Some heuristics exist to enhance performance
− Ex: Gnutella, Kazaa (super node), GIA [Chawathe et al. 03]
26
Peer Software Architecture Model (cont’d)
Middleware- Provides auxiliary services to the P2P
application, e.g., • Peer selection
• Trust management
• Data integrity validation
• Authentication and authorization
• Membership management
• Accounting (Economics and rationality)
• …
- Ex: CollectCast, EigenTrust, Micro payement
27
Peer Software Architecture Model (cont’d)
P2P Application- Potentially, there could be multiple applications
running on top of a single P2P substrate
- Applications include• File sharing
• File and storage systems
• Distributed cycle sharing
• Content distribution
- This layer provides some functions and bookkeeping relevant to the target application
• File assembly (file sharing)
• Buffering and rate smoothing (streaming)
Ex: Promise, Bullet, CFS, Gnutella, Kazaa
28
Outline of the Rest of the Introduction
P2P Substrates- Structured (DHT)
• Example: CAN
- Unstructured • Example 1: Gnutella
• Example 2: Kazaa
Middleware and P2P Application - Example: CollectCast and Promise
Course Roadmap: - Papers flash overview (1-2 min each!)
Project discussion
29
Summary
In P2P computing paradigm:- Peers cooperate to achieve desired functions
Started (or re-discovered) with Napster ’98 Old, well-researched distributed concepts BUT, with new characteristics (e.g., heterogeneity,
unreliability, rationality, scale, ad hoc), new and lighter-weight algorithms are needed
Simple model for P2P systems:- Peers form an abstract layer called overlay
- A peer software client may have three components• P2P substrate, middleware, and P2P application
• Borders between components may be blurred