Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen Department of

Autonomous Replication for High Availability in Unstructured P2P Systems

Francisco Matias Cuenca-Acuna,

Richard P. Martin,

Thu D. Nguyen

Department of Computer Science,

Rutgers University

April, 2003

Content

Introduction Background: PlanetP Autonomous Replication Performance Evaluation Conclusions

Introduction (1)

Peer-to-peer (p2p) computing becomes a powerful paradigm sharing information across the internet

Problem of providing high availability for shared data Recent measurements suggest

– Members of p2p communities may be offline more than they are online

Providing practical availability, say 99-99.9% would be– Expensive storage-wise using traditional replication methods– Expensive bandwidth-wise as peers leave and rejoin the

community

Introduction (2)

Question: Is it possible to place replicas of shared files in such a way that

– Files are highly available– Without requiring the continual movement of replicas to

members currently online

Propose a distributed replication algorithm– Decisions are made entirely autonomously by individual

members– Only need a small amount of loosely synchronized global

state

Introduction (3)

Assumption– Files are replicated in their entirety only when a member

hoards that file for disconnected operation– Otherwise, files are replicated using an erasure code– Study a very weakly structured system because tight

coordination is difficult and costly in dynamic environment

Background: PlanetP (1)

PlanetP is a publish/subscribe system– Support content-based search, rank and retrieval

Members publish documents when they wish to share

Publication of a document– Give PlanetP a XML snippet containing a pointer to the file– PlanetP indexes the XML snippet and the file– Local index is used to support content search


Two major components to enable community-wide sharing

– An gossiping layer Periodically gossip about changes to keep shared data weakly

consistent

– Content search, rank and retrieval service Two data structures to be replicated on every peer

– Membership directory: Contains names and addresses of all current members

– Global content index: Contains term-to-peer mapping


To locate content– Users pose queries at a specific node– Identify the subset of target peers from the local copy of

global content index– Query is passed to these target peers– The targets evaluate the query against their local indexes

and return results (URLs for relevant documents)

Results have shown that PlanetP can easily scale to sizes of several thousands

Autonomous Replication (1)

Member’s hoard set– Members hoard some subset of the shared files entirely on

their local storage– Members take responsibility for ensuring the availability

Replicator– Member is trying to replicate an erasure-coded fragment of

a file Target

– Peer that the replicator is asking to store the fragment Replicator store

– Excess storage space contributed by each member for replication

Autonomous Replication (2)

Each file is identified by a unique ID Overall algorithm

– Advertises the file IDs in its hoard set and the fragments in its replication store to the global index

– Periodically estimates the availability of its hoarded files and the fragments (Estimating Availability)

– Every Tr time units, increase the availability of a file that is not at a target availability (Randomized Replication)

– The target peer saves the incoming fragment (Replacement Scheme)

Estimating Files Availability (1)

Replicated in two manners– Entire copies– Erasure-coded file fragments

H(f): set of peers hoarding a file f F(f): set of peers containing a fragment f A(f): availability of f

– All nodes in H(f) are simultaneously offline– At least n-m+1 of the nodes in F(f) are offline

)( fFn

Estimating Files Availability (2)

H(f) and F(f) do not intersect– Peer adds a file for which it is storing a fragment to its hoard

set, it ejects the fragment immediately A(f) does not account for the possibility of duplicate

fragments– n >> m

Randomized Replication (1)

Erasure codes (Reed Solomon) provide data redundancy– Divide a file into m fragment and recode them into n fragment (m <

n) Generate all n fragments => detect and regenerate specific lost

fragments Disadvantages for highly dynamic environment

– Member availability changes over time Necessary to change n => re-fragmenting and replication of some files

– Peers leaving Accurate accounting of which peer is storing which fragment => regenerate

fragment loss

– Peers temporarily going offline Introducing duplicate fragments

Randomized Replication (2)

Choose n >> m but do not generate all n fragments To increase the availability of a file

– RANDOMLY generate an additional fragment from the set of n possible fragments

Chance of having duplicate fragments is small if n is very large

Not having any peer coordination

Replacement (1)

A target peer receives a replication request If its store is full

– Decide whether to accept the incoming fragment, OR– select other fragments to evict from its store

Choose the fragments with the highest availability to make space

– Deterministic algorithm => victimize fragments of the same file => drastic changes in the file’s availability

Propose Weighted Random Selection Process

Replacement (2)

Policy– Compute the average number of nines in the availability of

the fragments– Incoming fragment’s number of nines > 10% of this average

=> reject incoming fragment– Lottery scheduling to select victim fragments

Divide tickets into two subsets with 80:20 Each fragment is assigned an equal share of the smaller subset Fragments with availability above 10% of the average are given a

portion of the larger subset

Replacement (3)

Example– Target node has 3 fragments with availability 0.99, 0.9, 0.5– Number of nines => 2, 1, 0.3– Average availability in nines + 10% = 0.76– If we have 100 tickets

First fragment=67+6.6, Second fragment=13+6.6, Third fragment=0+6.6

– Chances of each fragment to be evicted First fragment=0.74, Second fragment=0.2, Third fragment=0.06

Use number of nines rather than the availability– It linearizes the differences between values

Replicators (1)

Select fragment to replicate Similar to replacement, lottery scheduling is used

– Favoring files with low availability

Find target peer– Select a target randomly– If the target does not have sufficient space, select another

target– Repeat this process for five times– If not success, randomly choose from these five targets

Experimental Environment (1)

Event driven simulator Assumption

– Members replicate files at synchronous intervals– Not simulate the detail timing of message transfers– Not simulate the full PlanetP gossiping protocol

To account for the data staleness, reestimate file availability only once every 10 minutes

– Use Reed Solomon code and m=10


Evaluate 3 different p2p communities File Sharing (FS) : very loosely coupled community

sharing multimedia files Commercial (CO): corporate/university environment Worgroup (WG): distributed development group


Parameters– Per peer mean uptime and downtime– Peer arrival to and exit from the community as exponential

arrival process– Number of file per node– Number of hoarded replicas per file– Amount of excess space on each node– File sizes

BASE: nodes pushing and accepting replicas in the complete absences of information on file availability

OMNI: replication is driven by a central agent to maximize the minimum file availability

Availability VS Excess Storage

CO: small amount of excess space is required FS: 8X excess capacity for 3 nines availability File hoarding has little effect on file availability

(0 replication from hoarding) (25% of the files have 2 hoarded replicas) (25% of the files have 5 hoarded replicas)

Overall Availability (CDF)

(CO with 1X, 1.5X and 2X) (FS with 1X, 3X and 6X) (WG with 9X)

CO: 2X excess storage, over 99% of files with 3 nines availability FS: around 6X excess storage for 3 nines availability WG: performs better if files and excess storage are uniformly distributed

– Non-uniform => not easy for replicators to find free space on the subset of peers– Peers with the most files to replicate have the most excess storage

Against BASE

(CDF: Availability for FS with 3X) (BASE: number of fragments) (REP: number of fragments)

BASE: about 16% of files < a single nine availability Replacement policy can increase fairness BASE’s FIFO favors peers who are frequently online

– They push their files to less available peers even if the latter’s content should be replicated more

Bandwidth Usage

CO– REP: excess space from 1X to 2X => average number of

files replicated per hour from 10 to 0

FS– REP: excess space from 1X to 3X => average number of

files replicated per hour from 241 to 0– BASE: 3X excess space => replicate 816 files per hour

Conclusions

Address the question of increasing the availability of shared files

Study a decentralized algorithm under a very loosely coordinated environment

Achieve practical availability (99.9%) in a completely decentralized system with low individual availability

Such availability levels do not require– Sophisticated data structures– Complex replication schemes– Excessive bandwidth

Documents

Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen Department of