Upload
gusty
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
On Improving the Performance Dependability of Unstructured P2P Systems via Replication. ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA Institute of Industrial Science, University of Tokyo. E-mail: [email protected]. PRESENTATION OUTLINE. Introduction Related Work System Overview - PowerPoint PPT Presentation
Citation preview
On Improving the PerformanceDependability of Unstructured P2P
Systems via Replication
ANIRBAN MONDALANIRBAN MONDALYI LIFU YI LIFU
MASARU KITSUREGAWAMASARU KITSUREGAWAInstitute of Industrial Science,Institute of Industrial Science,
University of Tokyo.University of Tokyo.E-mail: [email protected]: [email protected]
PRESENTATION OUTLINEPRESENTATION OUTLINE
IntroductionIntroduction Related WorkRelated Work System OverviewSystem Overview Proposed Replication SchemeProposed Replication Scheme Performance EvaluationPerformance Evaluation Conclusion and Future WorkConclusion and Future Work
INTRODUCTIONINTRODUCTION P2P systems are becoming increasingly popular
A dependable P2P system is the need of the hour Two perspectives of dependability
system reliabilitythe availability of the individual peers
system performance data availability
We define a performance-dependable P2P system as one that the users can rely on for obtaining data files of their interest in real-time.
We focus on improving the performance-dependability of unstructured P2P systems via dynamic replication.
MotivationMotivationFree-riders
A majority of the peers typically download data from a small percentage of peers that offer data
High skews in the initial data distribution
A disproportionately high number of queries need to be answered by a few ‘hot’ peersSevere load imbalance throughout the system.
Job queues of the ‘hot’ peers keep increasingIncreased waiting times high response times
MotivationMotivationFree-riders
A majority of the peers typically download data from a small percentage of peers that offer data
High skews in the initial data distribution
A disproportionately high number of queries need to be answered by a few ‘hot’ peersSevere load imbalance throughout the system.
Job queues of the ‘hot’ peers keep increasingIncreased waiting times high response times
This decreases the dependability of the system.
The ChallengesThe Challenges
Sheer size of P2P networks Heterogeneity
CPU capacity Available disk space Transfer rate of connections
Dynamism of the environmentPeers joining / leaving the systemPeers joining / leaving the systemHot data becoming cold and vice versaHot data becoming cold and vice versa
MAIN CONTRIBUTIONSMAIN CONTRIBUTIONS
A dynamic data placement strategy involving data replicationObjective: to reduce the loads of the
overloaded peersA dynamic query redirection technique
Objective: to reduce response times
PRESENTATION OUTLINEPRESENTATION OUTLINE
IntroductionIntroduction Related WorkRelated Work System OverviewSystem Overview Proposed Replication SchemeProposed Replication Scheme Performance EvaluationPerformance Evaluation Conclusion and Future WorkConclusion and Future Work
RELATED WORKRELATED WORK
Broadcast (Gnutella)Broadcast (Gnutella)Centralized (Napster)Centralized (Napster)Routing indices [Crespo2002]Routing indices [Crespo2002]Distributed hash tables Distributed hash tables
Chord [Stoica2001]Chord [Stoica2001]Pastry [Rowstron2001]Pastry [Rowstron2001]
RELATED WORK (CONT.)RELATED WORK (CONT.)
[Kangasharju2002] investigates optimal replication of content in P2P
systemsadaptive, fully distributed algorithm that dynamically
replicates content in a near-optimal manner [Cohen2002, Lv2002] facilitate search via replication. Dependability via load-balancing in structured P2P
systems (using DHTs) [Dabek2001][Rao2003]
[Triantafillou2003] divides system into clusters based on semantic categories discusses dependability via inter-cluster and intra-cluster load-
balancing
How this proposal differs from our previous spatial GRID proposal?
Our GRID-related work Imposes structure on system Data movement in KB range Data scattering avoidance Individual nodes are usually
dedicated and expected to be available most of the time.
Main aim is load-balancing
This proposal No structure imposed Data movement in MB/GB Data scattering is ok
Individual nodes may join/leave anytime.
Replication, not load-balancing
PRESENTATION OUTLINEPRESENTATION OUTLINE
IntroductionIntroduction Related WorkRelated Work System OverviewSystem Overview Proposed Replication SchemeProposed Replication Scheme Performance EvaluationPerformance Evaluation Conclusion and Future WorkConclusion and Future Work
SYSTEM OVERVIEWSYSTEM OVERVIEW
Each peer is assigned a globally unique identifier PID Broadcast-based search Every peer maintains its own access statistics
Number of accesses made to each of its data files. List of peers which has downloaded each of its files Given that very ‘hot’ files may be aggressively downloaded
by hundreds of peers very quickly, a peer keeps track only of those peers which have directly downloaded from itself.
Every peer provides a certain amount Space of its disk space for replication. LRU scheme deployed for Space
Periodic deletion of unused replicas We sacrifice replica consistency for improving query response
times.
SYSTEM OVERVIEW (CONT.)SYSTEM OVERVIEW (CONT.)
Distance between two peers: communication time between them
Two peers are regarded as neighbours if they are directly connected to each other.
Periodic exchange of status messages between neighboursLoad informationAvailable disk space information
SYSTEM OVERVIEW (CONT.)SYSTEM OVERVIEW (CONT.)
Load of a peer: number of queries waiting in peer’s job queueLoad normalized w.r.t. CPU capacity
AssumptionsPeers know transfer rates between themselves and
other peers.Every peer knows availability information of its
neighbouring peers.
PRESENTATION OUTLINEPRESENTATION OUTLINE
IntroductionIntroduction Related WorkRelated Work System OverviewSystem Overview Proposed Replication SchemeProposed Replication Scheme Performance EvaluationPerformance Evaluation Conclusion and Future WorkConclusion and Future Work
Replication Scheme Each peer P periodically checks its neighbours’ loads If P’s load exceeds the average loads of its
neighbouring peers by 10%, replication is initiated. Selection of hot data files
Using recent access statistics informationP sorts its files in desc. order of access frequenciesP traverses this sorted list of data files and selects
as ‘hot’ files the top N files whose access frequency exceeds a pre-defined threshold Tfreq.
Number of replicas For every Nd accesses to D, a new replica is
created for D. Tfreq and Nd are pre-specified at design time.
Criteria for Selection of destination peer Dest for replication
Dest should have a high probability of being online. PDest should have adequate available disk space. Load difference with Dest should be significant. Transfer time TRep with Dest should be minimized.
Dest should be chosen from the peers which have already downloaded that data file.
This makes TRep effectively equal to 0.
Replication StrategyReplication Strategy For each ‘hot’ data file D, the ‘hot’ peer PHot sends a
message to each peer which has downloaded D The peers in which a copy of D exists reply to PHot
with their respective load and available disk space Only the peers with high availability and sufficient
available disk space are candidates Among these candidate peers, PHot first puts the peer
MIN with the lowest load into a set Candidate. Peers whose normalized load difference with MIN is
less than δ are also put into Candidate. δ is a small integer
The peer in Candidate whose available disk space is maximum is selected as the destination peer.
Algorithm for selecting the destination peer
Query Redirection to replicas
What happens when a peer PIssue issues a query Q for a data item D to a ‘hot’ peer PHot?PHot needs to redirect Q to a peer REDIRECT
containing Di’s replica, if any such replica exists. Objective: To minimize Q’s response time
PHot checks the list of peers having Di’s replica Selection criteria for query redirection
REDIRECT should be highly available.Load difference between PHot and REDIRECT
should be significant.Transfer time between REDIRECT and PIssue
should be low.
Query Redirection (Cont.)
The ‘hot’ peer PHot first selects a set of peerswhich contain a replica of the data file Dwhose load difference with itself exceeds TDiff
.TDiff is a parameter which is application-
dependent and subjective. Among these selected peers, the peer with the
maximum transfer rate with the query issuing peer PIssue is selected for query redirection.
Query redirection algorithm
PRESENTATION OUTLINEPRESENTATION OUTLINE
IntroductionIntroduction Related WorkRelated Work System OverviewSystem Overview Proposed Replication SchemeProposed Replication Scheme Performance EvaluationPerformance Evaluation Conclusion and Future WorkConclusion and Future Work
Performance EvaluationPerformance Evaluation
Investigates the followingInvestigates the followingEffect of variations in workload skewEffect of variations in workload skewEffect of variations in number of peersEffect of variations in number of peers
Performance metric:Performance metric:Average Response TimeAverage Response Time
PARAMETERS USED IN PERFORMANCE EVALUATION
Average Responsetimes at the hot nodes
Snapshot of Loaddistribution
Snapshot of Loaddistribution
Effect of varying the Workload Skew
Effect of varying the number of peers
PRESENTATION OUTLINEPRESENTATION OUTLINE
IntroductionIntroduction Related WorkRelated Work System OverviewSystem Overview Proposed Replication SchemeProposed Replication Scheme Performance EvaluationPerformance Evaluation Conclusion and Future WorkConclusion and Future Work
CONCLUSION AND FUTURE WORKCONCLUSION AND FUTURE WORK
We have proposed a strategy for enhancing the dependability of P2P systems via dynamic replication.
Our strategy takes free-riders into account. Our performance evaluation demonstrates the
effectiveness of our replication-based strategy. Future Scope of Work
Dealing with very large data items e.g., video filesCost-effective integration into existing P2P systemsLoad-balancingLoad-balancing