18
Introduction to Peer-to- Peer Networks

Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Embed Size (px)

Citation preview

Page 1: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Introduction to Peer-to-Peer Networks

Page 2: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

What is a P2P network

• A P2P network is a large distributed system. It uses the

vast resource of PCs distributed at the edge of the Internet

to build a network that allows resource sharing without

any central authority

• Client-Server vs. Peer-to-peer. A peer is both a client

and a server. Control is decentralized.

• Much more than a system for sharing pirated music.

Page 3: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Why does P2P need attention?

Page 4: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

A P2P network is an overlay network

Network of peers. Each link between peers consists of one or

more IP links. The overlay network resides in the

application layer.

Alice Bob

Carol

Page 5: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Well-known P2P Systems

• Napster

• Gnutella

• KaZaA

• eDpnkey

• Chord

• Tapestry

• CAN

• Pastry

• BitTorrent

Page 6: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Some important issues

Search

Storage

Security

Applications

Page 7: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

A Distributed Storage Service

Alice Bob

Carol David

Page 8: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Promises

Consider File Sharing as an Example

– Available 24/7

– Durable despite machine failures

– Information is protected

– Resilient to Denial of Service

Page 9: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Additional Goals

• Massive scalability

• Anonymity

• Deniability

• Resistance to censorship

Page 10: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Challenges

• A P2P network must be self-organizing. Join

and leave operations must be self-managed.

• The infrastructure is untrusted and the

components are unreliable. The number of faulty

nodes grows linearly with system size. Yet, the

aggregate behavior has to be trustworthy.

Page 11: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Challenges

• Tolerance to failures and churn

• Efficient routing even if the structure of the

network is unpredictable.

• Dealing with freeriders

• Load balancing

• Security issues

Page 12: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Looking up data

• How do you locate data/files/objects in a large P2P

system built around a dynamic set of nodes in a

scalable manner without any centralized server or

hierarchy?

• Napster index servers used a central database.

Questionable scalability and poor resilience.

• Check how names are looked up in internet’s DNS.

Page 13: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Napster

Developed by Shawn Fanning in 1999, Shut down after 2 years for copyright infringement. Centralized directory servers were a bottleneck..

Root/Redirector

Directoryserver

Directoryserver

Directoryserver

Users

INTERNET

Stores indices of songs only

Page 14: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Gnutella

Truly decentralized system. A search like

where is Double Helix?

is based on the flooding of the query on a graph of

arbitrary topology. Obvious scalability problem, and

the wastage of bandwidth caused serious

inefficiencies.

Page 15: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Gnutella graph

Client looking

for “double helix”

double helix

Page 16: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Unstructured vs. Structured

• Unstructured P2P networks allow resources

to be placed at any node. The network

topology is arbitrary, and the growth is

spontaneous.

• Structured P2P networks simplify resource

location and load balancing by defining a

topology and defining rules for resource

placement.

Page 17: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Distributed Hash Table (DHT)

Object-to-machine mapping uses unique keys.

H (object name) = key (H = hash function)

H (machine name) = key

Object name mapped to key k is placed in machine whose

name is mapped to key k.

Simplifies object location.

Page 18: Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Distributed Hash Table (DHT)

keyspace

a

c

b

0N-1

Machine namehashed to b

Object namehashed to b

Basic idea