On Improving the Performance Dependability of Unstructured P2P Systems via Replication

On Improving the PerformanceDependability of Unstructured P2P

Systems via Replication

ANIRBAN MONDALANIRBAN MONDALYI LIFU YI LIFU

MASARU KITSUREGAWAMASARU KITSUREGAWAInstitute of Industrial Science,Institute of Industrial Science,

University of Tokyo.University of Tokyo.E-mail: [email protected]: [email protected]

PRESENTATION OUTLINEPRESENTATION OUTLINE

IntroductionIntroduction Related WorkRelated Work System OverviewSystem Overview Proposed Replication SchemeProposed Replication Scheme Performance EvaluationPerformance Evaluation Conclusion and Future WorkConclusion and Future Work

INTRODUCTIONINTRODUCTION P2P systems are becoming increasingly popular

A dependable P2P system is the need of the hour Two perspectives of dependability

system reliabilitythe availability of the individual peers

system performance data availability

We define a performance-dependable P2P system as one that the users can rely on for obtaining data files of their interest in real-time.

We focus on improving the performance-dependability of unstructured P2P systems via dynamic replication.

MotivationMotivationFree-riders

A majority of the peers typically download data from a small percentage of peers that offer data

High skews in the initial data distribution

A disproportionately high number of queries need to be answered by a few ‘hot’ peersSevere load imbalance throughout the system.

Job queues of the ‘hot’ peers keep increasingIncreased waiting times high response times

MotivationMotivationFree-riders

A majority of the peers typically download data from a small percentage of peers that offer data

High skews in the initial data distribution

A disproportionately high number of queries need to be answered by a few ‘hot’ peersSevere load imbalance throughout the system.

Job queues of the ‘hot’ peers keep increasingIncreased waiting times high response times

This decreases the dependability of the system.

The ChallengesThe Challenges

Sheer size of P2P networks Heterogeneity

CPU capacity Available disk space Transfer rate of connections

Dynamism of the environmentPeers joining / leaving the systemPeers joining / leaving the systemHot data becoming cold and vice versaHot data becoming cold and vice versa

MAIN CONTRIBUTIONSMAIN CONTRIBUTIONS

A dynamic data placement strategy involving data replicationObjective: to reduce the loads of the

overloaded peersA dynamic query redirection technique

Objective: to reduce response times



RELATED WORKRELATED WORK

Broadcast (Gnutella)Broadcast (Gnutella)Centralized (Napster)Centralized (Napster)Routing indices [Crespo2002]Routing indices [Crespo2002]Distributed hash tables Distributed hash tables

Chord [Stoica2001]Chord [Stoica2001]Pastry [Rowstron2001]Pastry [Rowstron2001]

RELATED WORK (CONT.)RELATED WORK (CONT.)

[Kangasharju2002] investigates optimal replication of content in P2P

systemsadaptive, fully distributed algorithm that dynamically

replicates content in a near-optimal manner [Cohen2002, Lv2002] facilitate search via replication. Dependability via load-balancing in structured P2P

systems (using DHTs) [Dabek2001][Rao2003]

[Triantafillou2003] divides system into clusters based on semantic categories discusses dependability via inter-cluster and intra-cluster load-

balancing

How this proposal differs from our previous spatial GRID proposal?

Our GRID-related work Imposes structure on system Data movement in KB range Data scattering avoidance Individual nodes are usually

dedicated and expected to be available most of the time.

Main aim is load-balancing

This proposal No structure imposed Data movement in MB/GB Data scattering is ok

Individual nodes may join/leave anytime.

Replication, not load-balancing



SYSTEM OVERVIEWSYSTEM OVERVIEW

Each peer is assigned a globally unique identifier PID Broadcast-based search Every peer maintains its own access statistics

Number of accesses made to each of its data files. List of peers which has downloaded each of its files Given that very ‘hot’ files may be aggressively downloaded

by hundreds of peers very quickly, a peer keeps track only of those peers which have directly downloaded from itself.

Every peer provides a certain amount Space of its disk space for replication. LRU scheme deployed for Space

Periodic deletion of unused replicas We sacrifice replica consistency for improving query response

times.

SYSTEM OVERVIEW (CONT.)SYSTEM OVERVIEW (CONT.)

Distance between two peers: communication time between them

Two peers are regarded as neighbours if they are directly connected to each other.

Periodic exchange of status messages between neighboursLoad informationAvailable disk space information

SYSTEM OVERVIEW (CONT.)SYSTEM OVERVIEW (CONT.)

Load of a peer: number of queries waiting in peer’s job queueLoad normalized w.r.t. CPU capacity

AssumptionsPeers know transfer rates between themselves and

other peers.Every peer knows availability information of its

neighbouring peers.



Replication Scheme Each peer P periodically checks its neighbours’ loads If P’s load exceeds the average loads of its

neighbouring peers by 10%, replication is initiated. Selection of hot data files

Using recent access statistics informationP sorts its files in desc. order of access frequenciesP traverses this sorted list of data files and selects

as ‘hot’ files the top N files whose access frequency exceeds a pre-defined threshold Tfreq.

Number of replicas For every Nd accesses to D, a new replica is

created for D. Tfreq and Nd are pre-specified at design time.

Criteria for Selection of destination peer Dest for replication

Dest should have a high probability of being online. PDest should have adequate available disk space. Load difference with Dest should be significant. Transfer time TRep with Dest should be minimized.

Dest should be chosen from the peers which have already downloaded that data file.

This makes TRep effectively equal to 0.

Replication StrategyReplication Strategy For each ‘hot’ data file D, the ‘hot’ peer PHot sends a

message to each peer which has downloaded D The peers in which a copy of D exists reply to PHot

with their respective load and available disk space Only the peers with high availability and sufficient

available disk space are candidates Among these candidate peers, PHot first puts the peer

MIN with the lowest load into a set Candidate. Peers whose normalized load difference with MIN is

less than δ are also put into Candidate. δ is a small integer

The peer in Candidate whose available disk space is maximum is selected as the destination peer.

Algorithm for selecting the destination peer

Query Redirection to replicas

What happens when a peer PIssue issues a query Q for a data item D to a ‘hot’ peer PHot?PHot needs to redirect Q to a peer REDIRECT

containing Di’s replica, if any such replica exists. Objective: To minimize Q’s response time

PHot checks the list of peers having Di’s replica Selection criteria for query redirection

REDIRECT should be highly available.Load difference between PHot and REDIRECT

should be significant.Transfer time between REDIRECT and PIssue

should be low.

Query Redirection (Cont.)

The ‘hot’ peer PHot first selects a set of peerswhich contain a replica of the data file Dwhose load difference with itself exceeds TDiff

.TDiff is a parameter which is application-

dependent and subjective. Among these selected peers, the peer with the

maximum transfer rate with the query issuing peer PIssue is selected for query redirection.

Query redirection algorithm



Performance EvaluationPerformance Evaluation

Investigates the followingInvestigates the followingEffect of variations in workload skewEffect of variations in workload skewEffect of variations in number of peersEffect of variations in number of peers

Performance metric:Performance metric:Average Response TimeAverage Response Time

PARAMETERS USED IN PERFORMANCE EVALUATION

Average Responsetimes at the hot nodes

Snapshot of Loaddistribution

Snapshot of Loaddistribution

Effect of varying the Workload Skew

Effect of varying the number of peers



CONCLUSION AND FUTURE WORKCONCLUSION AND FUTURE WORK

We have proposed a strategy for enhancing the dependability of P2P systems via dynamic replication.

Our strategy takes free-riders into account. Our performance evaluation demonstrates the

effectiveness of our replication-based strategy. Future Scope of Work

Dealing with very large data items e.g., video filesCost-effective integration into existing P2P systemsLoad-balancingLoad-balancing

Documents

On Improving the Performance Dependability of Unstructured P2P Systems via Replication