[IEEE 2009 15th International Conference on Parallel and Distributed Systems - Shenzhen, China (2009.12.8-2009.12.11)] 2009 15th International Conference on Parallel and Distributed

A Replica Placement Algorithm for Hybrid CDN-P2P Architecture

Hai Jiang1, 3, Zhan Wang1, 3, Albert K. Wong2, Jun Li1, Zhongcheng Li1

1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2Hong Kong University of Science and Technology, Hong Kong, China 3Graduate University of Chinese Academy of Sciences, Beijing, China

{jianghai, wangzhan}@ict.ac.cn, [email protected] , {lijun, zcli}@ict.ac.cn

Abstract

The Hybrid CDN-P2P architecture, or HCDN, which combines the complementary advantages of CDN and P2P networks, has been proposed to reduce the deployment cost and to improve the quality of service in file sharing and video streaming applications. A replica placement algorithm (RPA) decides where to replicate the specific data. Existing RPAs for pure CDN do not work efficiently in the HCDN architecture because they do not take into consideration the contribution of the peers at the P2P distribution level.

In this article, a heuristic RPA that takes into account the effects of P2P distribution is proposed for HCDN. The performance of our proposed algorithm is evaluated and the impact of some key metrics is analyzed. The experimental result shows the clear performance benefits of our approach.

1. Introduction

A key challenge for Internet infrastructure has been the need to deliver increasingly large volume content to a rapidly growing user population. CDN and P2P have been playing important roles in addressing this challenge. CDN disseminates content strategically from an origin server to a set of surrogates deployed across the wide-area Internet. It can reduce the user-perceived latency, but has the shortcoming of expensive deployments. In P2P, peers behave as servers as well as clients. The file one peer downloads is made available for uploading to other peers. P2P avoids the deployment cost, but has the weakness of low QoS assurance when there are insufficient peers. Making use of the highly complementary advantages of CDN and P2P, the hybrid content distribution network (HCDN) [1, 2, 3] has been proposed to reduce the deployment cost and provide QoS assurance. In HCDN, a two-level hierarchical architecture is constructed, and clients can concurrently retrieve content from both CDN system and P2P. This same architecture is referred to by some researchers as CDN-P2P, a name that more directly reflect the combined CDN and P2P structure. In this paper, we will use the terms HCDN and CDN-P2P interchangeably.

Replica placement algorithms (RPAs) are used in CDN to decide where to replicate specific data so as to reduce the delivery latency, bandwidth consumption and storage cost. In HCDN, with P2P technology incorporated to achieve cost-efficiency in a two-level hybrid architecture, RPAs for conventional CDN cannot work efficiently. For example, in conventional CDN, when the number of user requests from one region increases, there will be a higher preference to place a replica on a close-by surrogate server to reduce delivery cost. But in HCDN, increase in user requests also means more peer contribution which may alleviate the transmission overload from the close-by CDN server. Therefore, our motivation is to take the impact of P2P content delivery into account in order to optimize the replica placement for HCDN hybrid architecture.

The remainder of this paper is organized as follows. In section 2, the related work is summarized and some of our previous work is introduced. In section 3, the network model of HCDN is described to show its main features. In section 4, the heuristic algorithm for replica placement in HCDN is presented in detail. Finally, we show the numerical result of performance evaluation in section 5, and draw a conclusion in section 6.

2. Related Work

In our previous work, we presented the HCDN hybrid architecture in [1, 4] and evaluated its performance in [5]. Similar works have also been done by Skevik et al in [2], Wang et al in [3], Xu et al in [6] and Cahill et al in [7]. The main idea of these works is to construct a hybrid architecture integrating the CDN and P2P in order to improve the content distribution efficiency.

There has been a considerable amount of research work on the replica placement algorithms for traditional CDNs. Several placement algorithms are proposed for web server replicas in [8]. These algorithms use workload information such as client latency and request rates, to make informed placements. In [9], dynamic replica placement and user requests redirection are addressed. In [10], based on ring topology, the dynamic placement algorithms are proposed. Load balancing on the network

2009 15th International Conference on Parallel and Distributed Systems

1521-9097/09 $26.00 © 2009 IEEE

DOI 10.1109/ICPADS.2009.25

758

link among surrogate servers is also addressed. In [11], a novel segmentation-based technique for large media files is presented, with the key feature that the data distributed to a surrogate server is not the whole document but fine-grained segments. In [14, 15], a file replication algorithm in P2P file sharing systems is proposed.

To the best of our knowledge, there are little works focusing on the replica placement in HCDN hybrid architecture. As mentioned earlier, RPAs for pure CDN do not work effectively for HCDN. In this paper, our main contribution is to propose and evaluate a heuristic replica placement algorithm for HCDN, in which the peer-to-peer feature is considered sufficiently.

3. CDN-P2P Hybrid Network Model

The architecture of HCDN [1, 4, 5], or CDN-P2P, can be abstracted to a two-level hierarchical hybrid model as shown in Fig.1. The process of content delivery is divided into two stages: CDN-level distribution and P2P-level distribution. In the backbone network, the CDN-level system is deployed and content is strategically disseminated to surrogates. With content pushed towards the network edges, client-perceived latency can be reduced. P2P technology can also be used in CDN systems to allow the surrogates to exchange content each other, reducing the overload of origin servers. In our previous work on HCDN, we introduced the use of P2P with centralized indexing so that clients locating a file to the same surrogate can exchange content with each other. So, in our scheme, clients can concurrently retrieve content from the CDN surrogates and from other P2P peers.

Our HCDN makes use of the complementary advantages in CDN and P2P. Compared to the pure CDN architecture, HCDN can reduce the overload of surrogates; therefore, the infrastructure cost can be significantly reduced as fewer surrogates need to be deployed. Compared to the pure P2P architecture, HCDN can improve the quality of service and avoid the low

performance when there are insufficient peers in system. In addition, the autonomic P2P system may also localize the network traffic to alleviate congestions in the backbone.

The network model is an overlay network composed of servers and clients. The surrogate is a logical entity and may consist of multiple physical servers. A centralized architecture (e.g. BitTorrent-like protocol) in the P2P-level distribution has the advantages of simplicity, controllability and efficiency for P2P indexing. It is also worth mentioning that the surrogate and index server can be integrative or separate in system implementation.

4. Replica Placement Algorithm in HCDN

4.1 Problem Formulation

The hybrid structure can be described as a graph G ,which consists of the set of nodes V and the set of edges E . The set V can be divided into two sets, the set of surrogate servers SV (with cardinality SV N ) and the set of peers PV . Let 1 2, , ..., MO F F F denote the set of files which can be downloaded and shared by peers. The set iS (with cardinality i iS n ) represents the optimal set of surrogate servers storing file iF , while the set iP

represents the set of peers downloading file iF .

The replica placement problem can be described as follow:

, ,V V1 1min{ } min{ ( )}

S Sv i v i iA Ai M v A i M v A

PlaceCost T R (1)

where A is not empty, and the parameter ,v iPlaceCost

for file iF in node v is dependent on the transport cost ,v iT

and storage cost iR . ,v iT depends on the distances to other nodes storing file iF . It is possible that a very popular file is not stored in a surrogate, because another nearby surrogate is already storing a replica of it. Since we consider a whole file as the unit of replica, iR is the size of file iF . The factor is used to describe the relative cost between storage and bandwidth. If is large, the storage cost is considered to be relatively high. Conversely, the opposite is true.

Our goal is to find the optimal set of surrogate servers iS for each file iF by minimizing the total placing cost

for content transmission and storage. For HCDN, some of the uploading service is provided by the user peers, so the transport cost may be reduced compared to conventional pure CDN.

index data origin server surrogate / index server client

Fig.1. Network model of HCDN

759

4.2 Heuristic Replica Placement Algorithm Notations and definitions used in our formulation are

shown in Table 1. For simplicity, we assume that the replica placement algorithm is executed at time T with perfect knowledge of the traffic pattern during the time interval 0,T . Peers are classified into two categories: downloaders and seeds. Downloaders are peers who only have parts of a file and seeds are peers who have the whole file and who stay in the system to allow other peers to download from them. The factor ( 0 1 ) is used to represent the average uploading effectiveness of all downloaders. If 1 , it means the downloaders’ uploading rate is equal to their available outgoing bandwidth; if 0 , it means that the downloaders do not upload data to other peers at all.

We consider a homogenous environment in which each peer has the same uploading bandwidth denoted as u . So, in the P2P-level network, the total uploading rate of all peers for file iF is i ix y u . To simplify the presentation, we define the variable im as shown in eq. (2) below, with the interpretation that the peers’ contribution is equivalent to serving im user requests.

i ii

i

x y um

b (2)

The placement cost can be computed via eq. (3), where S id S denotes the average distance between the nearest

surrogate server and the surrogate servers storing file iF

in the CDN-level system. The cost function consists of two parts: transport cost i i i S ib r m d S and storage cost

i is n .

0 i i i S i i i i i

ii i

b r m d S s n r mPlaceCost

r m (3)

Next, we consider the total distribution cost which includes also the transport cost among the peers. In eq. (4), P id P denotes the average distance between

requesting peers and supplying peers owning file iF . If

im is greater than ir (the request rate of file iF ), the total

cost will be i i P ib r d P which is equal to the transport cost among the peers. This is the case where the user requests are served only by the peers.

S

i i i i

i i i P i i i i i

i i P i i i

b r m d S

TotalCost b m d P s n r m

b r d P r m

(4)

The case ( m)0 0 iim means that the peers do not contribute to the uploading service. This is the situation in conventional pure CDN. According to eq. (3) and (4), we can obtain

i i i i S i i iTotalCost PlaceCost b r d S s n (5)

With eq. (1), (3) and (4), we can also obtain

1

M

ii

PlaceCost PlaceCost (6)

1

M

ii

TotalCost TotalCost (7)

Fig. 2. The heuristic RPA for CDN-P2P architecture.

1. FOR each 1,2,...,i M DO 2. Given iP , and u , calculate im .

3. IF i ir m THEN

4. 0iPlaceCost5. GOTO step 1. 6. ELSE 7. 1in and 0iPlaceCost INFINIT

8. Find the optimal set iS with in surrogate servers which yields the minimal cost.

i i i i i S i i i iPlaceCost n b r m d S n s nand

1i i i i

i i

PlaceCost n PlaceCost n

PlaceCost n

9. IF in N or 0i iPlaceCost n THEN

10. i i iPlaceCost PlaceCost n11. GOTO step 1. 12. ELSE 13. Increases in by 1 and goto step 8. 14. END IF 15. END IF 16. END FOR 17. Therefore, we can get the optimal set of every file iF which yields the minimal placing cost.

1

M

ii

PlaceCost PlaceCost

Table 1. Notations and Definitions for Problem Formulation

Notation Definition

is Size of file iF

ib Data rate of file iF

ir Number of requests of file iF during 0,T

ix Number of seeds for file iF during 0,T

iy Number of downloaders for file iF during 0,T

Effectiveness of the Peer-to-Peer sharing

u Outgoing bandwidth of one peer

760

Our goal is to minimize the total placement cost denoted as PlaceCost . It can be achieved by minimizing the cost iPlaceCost of each file iF . Hence, we can get the optimal subset of surrogate servers for file iF by finding the element with minimum cost in the power set of V .Therefore, we can describe our heuristic replica placement algorithm for CDN-P2P architecture as shown in Fig. 2.

5. Performance Evaluation

In this section, we evaluate the proposed heuristic and compare it with the traditional RPA of pure CDN [10]. We assume that the CDN-level network has 10 surrogate servers and has a ring-based topology [10]. Because of symmetry of the problem, user requests will be equally distributed to all surrogate servers storing an identical file. Therefore, we can obtain an optimal subset of surrogate servers by maximizing the distance between replicas stored in the CDN-level network. Assume that we place

in replicas of file iF on in of the N surrogate servers, we can derive the following function:

2 2

, 14

ii i

i

N nd n n N

Nn (8)

where id n indicates the approximate distance between requesting nodes and surrogate servers storing file iF in the CDN-level network.

Instead of using a random distribution or Zipf-like distribution for the popularity of files, we use the Mandelbrot-Zipf distribution which is shown to be more realistic for modeling the popularity of P2P files in [12, 13]. In a Mandelbrot-Zipf distribution, the request rate of the i-th most popular file is given as:

1

, 1i M

j

i qr R i M

j q (9)

In eq. (9), R is the total number of user requests. is the skewness factor, while q is the plateau factor, so called because it defines the plateau shape in the high popularity part of the distribution [13]. When 0q , the Mandelbrot–Zipf distribution degenerates to the Zipf-like distribution with a skewness factor .

In Fig. 3, we assume that 50 files are shared by nodes in the networks. A higher value of means an increase in popularity of files at the lowest ranks. A higher value of q means that popular files are requested less often.

Based on [13], the typical values are 0.65 and 5q ,which will be also used in our subsequent simulation.

We assume that a total of 10,000 requests are produced by user peers during the period 0,T . In order to guarantee the quality of service, we store at least one replica in surrogate servers for each file. Let m be the sum of (1 )im i M . We assume the following basic

experimental conditions: m 7500 , 3i id S d n ,

3P id P and 6 . Without lost of generality, we assume that every file has the same bit rate and size, both set to one.

To analyze the effects of peer contribution on the storage cost, we vary the parameter m from 0 to 10,000. As shown in Fig. 4, the storage cost of HCDN is much lower than that of the traditional pure CDN. When mincreases, the storage cost of HCDN is decreased. This means the capacity demand of surrogate servers can be

10 20 30 40 500.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

dist

ribut

ion

The number of file

=0.3=0.65=1.2

(a) Effects of parameter

10 20 30 40 500.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

dist

ribut

ion

The number of file

q=0 q=5 q=14

(b) Effects of parameter qFig. 3. Cumulative Mandelbrot-Zipf distribution for popularity

761

reduced in HCDN through our proposed RPA, compared with RPAs in pure CDN.

As shown in Fig. 5, when m increases the transport cost and the total cost of pure CDN do not change. However, for HCDN, the total cost decreases and the transport cost increases. The more the contribution provided by peers, the more the total cost will be reduced.

As explained earlier, the parameter represents the relative importance between transport cost and storage cost. To show the impact of on cost, we increase from 0 to 40. As shown in Fig. 6, when increases the storage cost decreases while the transport cost increases. This means that when the caching cost is high, we should save more storage capacity, but it may result in more transmission cost. Otherwise, when caching cost is lower, we can deploy more replicas and save transport cost.

Next, we vary the number of requests from 0 to 10,000. Fig. 7 shows the effects of r on the PlaceCost . When more user requests are produced, PlaceCost , the deployment cost, will increase significantly. We observe

that the PlaceCost in HCDN is still much lower than that in pure CDN.

Then, we vary the parameter q from 0 to 40. As shown in Fig. 8, the larger the value of q , the higher the PlaceCost . When q increases, the transport cost will also increase. This implies that we should store more replicas.

0 2000 4000 6000 8000 10000 12000 140000

500

1000

1500

2000

2500

3000

Pla

ceC

ost

r

HCDN CDN

Fig. 7. PlaceCost with different r in HCDN

0 10 20 30 401250

1450

1650

1850

2050

2250

2450

Plac

eCos

t

q

=0.3=0.65=1.2=1.6

Fig. 8. PlaceCost with different q in HCDN

0 10 20 30 400

500

1000

1500

2000

2500

3000

3500

transport cost storage cost

Tran

spor

t cos

t

100

200

300

400

500

Storage cost

Fig. 6. Transport cost and storage cost in HCDN

0 2000 4000 6000 8000 1000029500

30000

30500

31000

31500

32000

32500

33000

Cos

t

m

total (HCDN) transport (HCDN) total (CDN) transport (CDN)

Fig. 5. Total cost and transport cost with different m

0 2000 4000 6000 8000 100000

100

200

300

400

500

Sto

rage

cos

t

m

CDN HCDN

Fig. 4. Storage cost with different m

762

We also can see that a higher value of will result in a lower PlaceCost .

Finally, to investigate the effects of storage limitation of surrogate server, we denote the maximum storage capacity as maxC. Fig. 9 shows the results of PlaceCostwith different maxC. It can be seen that a lower maxCwill result in a higher PlaceCost . It means that, the fewer replicas may be replicated if the capacity is lower. For a large , which means a high storage cost, less storage capacity is needed, and the different curves corresponding to different maxC are close together as a result.

6. Conclusion

The CDN-P2P hybrid network is an efficient architecture for large-scale content distribution. The existing RPAs for pure CDN fail to work well for the hybrid network because they do not take into consideration the inherent features of CDN-P2P at the P2P-level. In this work, we propose a heuristic algorithm for replica placement in the CDN-P2P architecture. The peer contribution is taken into account in our approach. As shown in our performance evaluation, compared to RPA for pure CDN, our approach can reduce the placement cost (i.e. the deployment cost).

7. Acknowledgement

This work is partly supported by the national key technology research development program of China under grant No.2006BAH02A11, and the major state basic research development program of China under grant No. 2007CB310702.

8. References

[1] Hai Jiang, Jun Li, Zhongcheng Li, et al. “Efficient hierarchical content distribution using P2P technology,” The IEEE International Conference on Networks (ICON’08). Dec. 2008.

[2] K. Skevik, V. Goebel, T. Plagemann. “Design of a hybrid cdn,” in: MIPS'04: Proceedings of the Second International Workshop on Multimedia Interactive Protocols and Systems, Nov. 2004.

[3] C. Huang, A. Wang, J. Li, et al. “Understanding Hybrid CDN-P2P: Why Limelight Needs its Own Red Swoosh,” Proc. 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV'08), Braunschweig, Germany, May 2008.

[4] Hai Jiang, Jun Li, Zhongcheng Li. “Hybrid content distribution network and its performance modeling,” Chinese journal of computer. Vol. 32, No. 3, pp. 473-482, Mar. 2009.

[5] Hai Jiang, Jun Li, Zhongcheng Li, et al. “Performance Evaluation of Content Distribution in Hybrid CDN-P2P Network”. The International Conference on Future Generation Communication and Networking (FGCN'08). Dec. 2008.

[6] Dongyan Xu, Sunil Suresh Kulkarni. “Analysis of a CDN-P2P hybrid architecture for cost-effective streaming media distribution,” Multime-dia Systems, pp. 383-399, Mar. 2006.

[7] Adrian J. Cahill, Cormac J. Sreenan. “An efficient CDN placement algorithm for the delivery of high-quality TV content,” the 12th annual ACM international conference on Multimedia, New York, NY, USA , 2004.

[8] L. Qiu, N. Padianahhan, and G. M. Voelkeer. “On the placement of web server replicas,” in Proceedings if IEEE INFOCOM, pp. 1587-1596, 2001.

[9] F. Presti, C. Petrioli, et al. “Distributed Dynamic Replica Placement and User Request Redirection in Content Delivery Networks.” IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2007.

[10] T. Wauters, J. Coppens, et al. “Replica placement in ring based content delivery networks,” in Computer Communications, vol. 29, pp.3313-3326, 2005.

[11] Zongming Fei, Mengkun Yang. “A segment-based fine-grained peer sharing technique for delivering large media files in content distributed networks,” IEEE transactions on multimedia, Vol. 8, No.4, pp. 824-829, 2006.

[12] K. Gummadi, R. Dunn, S. Saroiu, et al. “Measurement, modeling, and analysis of a peer-to-peer file-sharing workload,” in Proc. 19th ACM Symp. Operating Systems Principles (SOSP’03), Bolton Landing, NY, Oct. 2003.

[13] M. Hefeeda, and O. Saleh. “Traffic Modeling and Proportional Partial Caching for Peer-to-Peer Systems,” IEEE/ACM Transactions on networking, Vol. 16, No. 6, Dec. 2008.

[14] Haiying Shen. “IRM: Integrated File Replication and Consistency Maintenance in P2P Systems,” IEEE International Conference on Computer Communications and Networks (ICCCN '08), pp. 1-6, Aug. 2008.

[15] Haiying Shen. “EAD: An Efficient and Adaptive DecentralizedFile Replication Algorithm in P2P File Sharing Systems,” IEEE Transactions on Parallel and Distributed Systems, Vol. 99, No. 1, 2009.

0 10 20 30 40

1000

2000

3000

4000

5000

6000

7000

8000

Pla

ceC

ost

maxC=40 maxC=30 maxC=20 maxC=10

Fig. 9. PlaceCost with limited capacity of surrogate

763

Documents

[IEEE 2009 15th International Conference on Parallel and Distributed Systems - Shenzhen, China (2009.12.8-2009.12.11)] 2009 15th International Conference on Parallel and Distributed