18
This article was downloaded by: [186.233.152.15] On: 05 May 2014, At: 01:03 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Information Systems Research Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org Diffusion Models for Peer-to-Peer (P2P) Media Distribution: On the Impact of Decentralized, Constrained Supply Kartik Hosanagar, Peng Han, Yong Tan, To cite this article: Kartik Hosanagar, Peng Han, Yong Tan, (2010) Diffusion Models for Peer-to-Peer (P2P) Media Distribution: On the Impact of Decentralized, Constrained Supply. Information Systems Research 21(2):271-287. http://dx.doi.org/10.1287/isre.1080.0221 Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval. For more information, contact [email protected]. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. Copyright © 2010, INFORMS Please scroll down for article—it is on subsequent pages INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Diffusion Models for Peer-to-Peer (P2P) Media Distribution: On the Impact of Decentralized, Constrained Supply

  • Upload
    yong

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

This article was downloaded by: [186.233.152.15] On: 05 May 2014, At: 01:03Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Information Systems Research

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

Diffusion Models for Peer-to-Peer (P2P) MediaDistribution: On the Impact of Decentralized, ConstrainedSupplyKartik Hosanagar, Peng Han, Yong Tan,

To cite this article:Kartik Hosanagar, Peng Han, Yong Tan, (2010) Diffusion Models for Peer-to-Peer (P2P) Media Distribution: On the Impact ofDecentralized, Constrained Supply. Information Systems Research 21(2):271-287. http://dx.doi.org/10.1287/isre.1080.0221

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2010, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Information Systems ResearchVol. 21, No. 2, June 2010, pp. 271–287issn 1047-7047 �eissn 1526-5536 �10 �2102 �0271

informs ®

doi 10.1287/isre.1080.0221©2010 INFORMS

Diffusion Models for Peer-to-Peer (P2P)Media Distribution: On the Impact ofDecentralized, Constrained Supply

Kartik HosanagarOperations and Information Management, The Wharton School, University of Pennsylvania,

Philadelphia, Pennsylvania 19104, [email protected]

Peng HanaQuantive, Inc. (Microsoft), Seattle, Washington 98104, [email protected]

Yong TanMichael G. Foster School of Business, University of Washington, Seattle, Washington 98195,

[email protected]

In peer-to-peer (P2P) media distribution, users obtain content from other users who already have it. This formof decentralized product distribution demonstrates several unique features. Only a small fraction of users inthe network are queried when a potential adopter seeks a file, and many of these users might even free-ride, i.e.,not distribute the content to others. As a result, generated demand might not always be fulfilled immediately. Wepresent mixing models for product diffusion in P2P networks that capture decentralized product distribution bycurrent adopters, incomplete demand fulfillment and other unique aspects of P2P product diffusion. The modelsserve to demonstrate the important role that P2P search process and distribution referrals—payments made tousers that distribute files—play in efficient P2P media distribution. We demonstrate the ability of our diffusionmodels to derive normative insights for P2P media distributors by studying the effectiveness of distributionreferrals in speeding product diffusion and determining optimal referral policies for fully decentralized andhierarchical P2P networks.

Key words : peer-to-peer file diffusion; P2P; supply-constrained diffusion; free-riding; mixing model ofdiffusion; distributed systems

History : Sumit Sarkar, Senior Editor; Giri Kumar Tayi, Associate Editor. This paper was received on September30, 2006, and was with the authors 12 months for 3 revisions. Published online in Articles in AdvanceAugust 31, 2009.

1. IntroductionPeer-to-Peer (P2P) networks are distributed networksin which the participants share their own resources inaddition to consuming them from others. In P2P-basedmedia distribution, users download content fromother users who have the content and in turn redis-tribute it to future adopters. P2P allows a contentprovider to efficiently distribute content at a rel-atively low cost and is also effective in handlingflash crowds in content distribution (Padmanabhanand Sripanidkulchai 2002). A number of encryptiontechnologies have also emerged to prevent piracy inP2P networks. Thus, although the early use of the

technology was for illegal file sharing, P2P is increas-ingly being adopted for legitimate media distribu-tion on the Internet. P2P is being used for onlineradio (e.g., Social.fm), Internet TV (e.g., Joost), andsoftware distribution (e.g., distribution of the RedHatLinux OS on BitTorrent). Recently, NBC Universal andAOL announced Internet TV initiatives based on P2Ptechnologies. In addition, Altnet, iMesh, Grooveshark,rVibe, We7, and several other firms use a P2P distri-bution platform to sell music and other digital medialicensed from the music labels. In 2004, there wereover 50 million legal downloads per month on Kazaafor over 10,000 titles in Altnet’s library (Currah 2004).

271

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained Supply272 Information Systems Research 21(2), pp. 271–287, © 2010 INFORMS

In 2007, over 9.5 M music files were uploaded by10,000 beta users of Grooveshark’s P2P music down-load service.1

Nodes in a P2P network are potential consumersof a digital product and, by virtue of the design ofP2P networks, can redistribute the product upon buy-ing it. P2P distribution is unique relative to central-ized media distribution in that decentralized supplyoften imposes a constraint on demand fulfillment.In most P2P architectures, only a fraction of the nodesin the network are queried in response to a requestto prevent flooding the network with queries. Thus,even if a file exists in a network, nodes contain-ing the file might not always be queried. Further-more, even if a node containing the requested file isqueried, the node might not distribute the file. Freeriders—users that consume content from the network,without sharing or redistributing it to other users—are known to be pervasive in P2P networks (Adarand Huberman 2000). As a result of these two fac-tors, generated demand might not always be satis-fied. Thus, “supply-side” factors related to incompletesearch of the network and the redistribution incen-tives have a crucial impact on file diffusion within thenetwork.Commonly used redistribution incentives include

penalties to users who free-ride and rewards to userswho contribute. Penalties include intentionally slow-ing the download of free riders as in the BitTorrentprotocol. Penalties are used sparingly in commer-cial P2P systems where users pay to obtain content.Commercial P2P networks often provide payments,known as distribution referrals, to users who dis-tribute content to others in the network. Distributionreferrals have been advocated in various studies (e.g.,Golle et al. 2001, Arora et al. 2003) and are used in sev-eral commercial P2P networks. For example, Altnetpays users on the Kazaa network who agree to joinAltnet as distribution points (New York Times 2003).Grooveshark and rVibe also compensate users for dis-tributing content. In these networks, whenever a newuser purchases a track, a small payment is made to theuser that distributes the content to the buyer. rVibecurrently pays the distributing user $0.05 on a $0.99

1 Source. Personal conversations with Sam Tarantino, CEO ofGrooveshark, January 2008.

sale, and Grooveshark pays the user $0.25 per sale(Techcrunch 2007). Hereafter, we refer to these pay-ments as referrals.2

In this paper, we propose a model for diffusion ofdigital products in P2P networks that explicitly cap-tures the supply-side factors—file search and redistri-bution incentives—described above and demonstratesthe applicability of our model by studying optimalpayments to users who distribute content. Modelingproduct diffusion is of considerable interest to man-agers. Diffusion models can be used for demand fore-casting and for planning a variety of prelaunch andpostlaunch strategic decisions such as optimal level ofproduct sampling, optimal pricing, and optimal tim-ing of successive generations of a product (Mahajanet al. 2000). As a result, product diffusion modelshave been actively studied in marketing for morethan forty years. These models focus primarily on thedemand generation process. They generally assumethat the generated demand is always fulfilled and donot model the important supply-side constraint in P2Pnetworks. The few papers in the product diffusionliterature that have modeled supply constraints havedone so in centralized settings where supply does notdepend on the number of current adopters or actionstaken by them. In contrast, current adopters generatethe supply in P2P diffusion. These supply-side factors,tied to the file search process and free-riding in thenetwork, significantly influence product diffusion inP2P networks and are in fact of most interest to P2Pnetwork managers. It is thus important to incorporatethese into diffusion models.Our study of product diffusion in P2P networks

makes three main contributions. First, we developa diffusion model that incorporates supply-side con-straints unique to P2P networks, and we derive ana-lytical results about the sales dynamics. Specifically,we develop a mixing model of file diffusion that incor-porates both incomplete search and free-riding in P2P

2 However, it is useful to distinguish the referrals in P2P from tra-ditional referrals. Traditional referral fees are paid to existing cus-tomers for bringing new customers to the firm. In contrast, thereferral in P2P encourages existing customers to distribute content,which increases the file availability and in turn sales. Thus, theprimary impact is on the supply side.

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained SupplyInformation Systems Research 21(2), pp. 271–287, © 2010 INFORMS 273

networks. Second, we present an application on opti-mal referrals that endogenizes the impact of distribu-tion referrals on file availability and overall profits.We derive expressions for the optimal referral andshow that a referral policy that accounts for diffusiondynamics is far more profitable than a myopic policythat ignores these dynamics. Finally, we demonstratethat the file search architecture exerts a strong influ-ence on diffusion. We find that a hierarchical architec-ture with a few groups demonstrates faster diffusionthan flat P2P networks. However, there are diminish-ing returns from increased centralization.The rest of this paper is organized as follows. In §2,

we review related literature. In §3, starting with exist-ing epidemiological models, we develop analyticalmodels to capture the dynamics of diffusion in P2Pnetworks. Section 3.1 introduces the notation. Sec-tion 3.2 develops diffusion models for flat P2P net-works and includes a study of the optimal referralpolicy. Section 3.3 focuses on product diffusion andoptimal referrals in hierarchical P2P networks andexamines the impact of the search architecture on therate of diffusion. In §3.4 we use simulations to test therobustness of the models under more general demandprocesses. Section 4 concludes this study and dis-cusses future work.

2. Literature ReviewThere are three streams of work highly relevant to ourstudy of P2P content diffusion. The first two relateto the literature on new product diffusion and epi-demiological diffusion, respectively. The third relatesto studies of free-riding in P2P networks and the useof referral payments to address the same.

2.1. Product DiffusionDirect work on P2P content diffusion is limited.A notable exception is the work by Izal et al. (2004)on an empirical study of file diffusion in the BitTor-rent network. Developing an analytical model of filediffusion within networks is not the focal point ofthe paper. However, there is a vast body of work onnew product diffusion models, dating back to 1960s.Fourt and Woodlock (1960) propose a product diffu-sion model in which a fixed fraction of the consumerswho have not yet bought the product do so every

period. Bass (1969) proposed an extension that addi-tionally incorporates word of mouth (WOM) com-munication between current adopters and potentialadopters. Building on this model, work has also beendone to incorporate effects of advertising and promo-tion (Horsky and Simon 1983), competition (Krishnanet al. 2000) and pricing (Bass 1980).These diffusion models focus on demand genera-

tion and assume that demand is always fulfilled. InP2P networks, a potential adopter can conduct onlyan incomplete search of the network, and even thequeried nodes may free-ride. As a result, demandis often not fulfilled immediately. In this sense, thesupply-constrained diffusion models of Ho et al.(2002), Kumar and Swaminathan (2003), and Jain et al.(1991) are more relevant to our diffusion model. Thesepapers study situations where demand is not metbecause of production capacity constraints. However,they focus on centralized settings wherein the man-agerial intervention is tied to capacity sizing. In con-trast, the distribution infrastructure in P2P networksis decentralized, and the product supply involvesa social process. The more the number of currentadopters and higher their willingness to distributea product, the greater is the file availability in thenetwork. Thus, product adoption directly increasesproduct supply. Furthermore, the relevant managerialinterventions relate to the design of the search processand incentives to encourage nodes to distribute con-tent rather than an increase in the centralized capac-ity. Our diffusion model uniquely captures these vari-ables and the dependence between product diffusionand social supply.

2.2. Epidemiological DiffusionInfectious diseases spread from infected nodes to sus-ceptible nodes and are examples of decentralizeddiffusion processes. Epidemic diffusion models havebeen used in computer networking research, includ-ing in studies of information diffusion in mobile adhoc networks (Khelil et al. 2002) and spread of com-puter viruses (Kephart and White 1991). The diffusionmechanism in P2P networks is in many ways simi-lar to the spread of diseases. Broadly speaking, whena node seeks content in a P2P network, a request issent out to other nodes in the network. File trans-fer is completed once a node is found that shares

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained Supply274 Information Systems Research 21(2), pp. 271–287, © 2010 INFORMS

the desired resource. Similarly, in the spread of dis-eases, an infected individual makes contact with peo-ple around her and disease transmission occurs once asusceptible individual is contacted. Thus, the suscep-tible in epidemic diffusion is analogous to the nodeseeking content in P2P and the infected is analogousto the node that distributes content.Despite these similarities, diffusion in a P2P net-

work is unique in several ways. One major differ-ence is that the susceptible agent typically receivesa disease passively, whereas the node seeking con-tent initiates the contact in P2P diffusion. This impliesthat epidemic contact occurs in a semi-random man-ner, while the search process in P2P network is not.P2P search can be architected and the design choiceshave a notable impact on diffusion. For example,although the content in P2P networks is always dis-tributed, the search process can be completely cen-tralized, completely decentralized/flat, or hierarchical(Asvanund et al. 2004). Networks like Napster main-tained a single central catalog and search requestswere forwarded to the central server. Gnutella ver-sion 0.4 is completely decentralized, with each nodemaintaining its own catalog and responding to searchqueries. Gnutella version 0.6 and Kazaa use a hier-archical architecture wherein nodes connect to super-nodes, which are in turn connected to each other.The supernodes index the content for their nodes andrespond to search requests. An additional differenceis that the reproductive capacity of a virus usuallygrows proportionately with the diffusion of a disease.In contrast, file availability and free-riding imposea constraint on product diffusion in P2P networks.Finally, epidemiologists are interested in slowing thediffusion through vaccinations or by quarantiningthe infected. P2P managers are interested in speed-ing the diffusion by appropriately architecting P2Psearch process or using referrals and other incentivesto encourage users to distribute content.

2.3. Free-Riding in P2P and Payment-BasedIncentives

Free-riding, which has been widely documented inP2P networks (Adar and Huberman 2000, Asvanundet al. 2004), can slow the diffusion of products withinthe network. Several approaches have been proposedto alleviate this problem, for example, offering higher

quality of service (QoS) to users that share theirresources (Kamvar et al. 2003). In commercial P2P net-works, where users pay for content, content providerscan use payments to encourage users to distributecontent to others in the network (Arora et al. 2003,Lang and Vragov 2005). Golle et al. (2001) discuss theuse of micropayments to reward peers for distributingcontent. A number of commercial P2P systems such asGrooveshark and rVibe use payment-based incentivesto encourage users to share and distribute content.

3. P2P Diffusion ModelThe diffusion models we develop are homogeneousmixing models. In homogeneous mixing models, thereis no spatial structure, i.e., specific neighbors of a nodeare not modeled, and all nodes within a compartment(i.e., of the same type) run the same risk of beinginfected. One can use the equivalent of mean fieldanalysis for large populations �N →�� to develop dif-ferential equations that capture the dynamics.3 Popu-lar examples of homogeneous mixing models includethe SIR/SIS models in epidemiology (Diekmann andHeesterbeek 2000), Bass model in marketing (Bass1969) and the Lotka-Volterra model in population ecol-ogy (Brauer and Castillo-Chavez 2000). The implica-tion of the homogeneous mixing setup is that we donot model the specific topological connections in theP2P network. The diffusion dynamics depend onlyon the number of nodes seeking or distributing con-tent rather than which particular nodes seek or dis-tribute content. We model the search process in P2Pnetworks within the framework of the homogeneousmixing model. Specifically, groups in hierarchical net-works are modeled as additional compartments in thediffusion model.We begin by introducing our notation in §3.1.

In §§3.2 and 3.3, we focus on diffusion in completelydecentralized (flat) networks and hierarchical net-works, respectively.

3 In mean field techniques, the distributions of quantities overtheir randomness are represented instead by their average values(Newman et al. 2000). By ignoring certain dependencies, a closedset of equations for the expected values of variables can be derived(Opper and Saad 2001). Mean field approximations are often veryaccurate in the limit for large population sizes (e.g., see Newmanet al. 2000, Andersson and Britton 2000). We thank an anonymousreferee for pointing us to this literature.

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained SupplyInformation Systems Research 21(2), pp. 271–287, © 2010 INFORMS 275

3.1. Notation, Definitions, and AssumptionsConsider a digital product such as a music file beingdistributed in a P2P network. Nodes in the networkare potential consumers of the product and can obtainit from nodes that already have the product. When-ever a node distributes a copy of the digital productto another node, it gets a referral payment from thefirm. In the discussion below, we assume that the firmpaying the referral fee is the copyright holder.4

Let N denote the total number of nodes in the net-work. We assume that there is no intermediary—i.e.,a node that buys and sells the product while not seek-ing it—and that a node is used by one individual only,so no node consumes a file more than once. Thus N isalso the market potential or the total number of nodesthat might eventually buy the product. The nodes thathave the product already are called satisfied nodes. Thenumber of satisfied nodes in the network at time t isdenoted byQ�t�, and the fraction of nodes that are sat-isfied is q�t�=Q�t�/N . The nodes that do not yet havethe product are called seeking nodes. The number ofseeking nodes at t is N −Q�t�, and the fraction of seek-ing nodes in the network is �1− q�t��. � denotes therate at which seeking nodes attempt to seek the prod-uct. Seeking nodes can obtain the product from satis-fied nodes. Note, however, that not all satisfied nodesdistribute the product. Satisfied nodes that are willingto distribute the product are called seeds. The num-ber of seeds and fraction of nodes that are seeds aredenoted by S�t� and s�t�= S�t�/N , respectively. In therest of the model development, we simplify notationby denoting the number of seeking nodes, satisfiednodes and seeds by �N − Q�, Q, and S, respectively(and the corresponding fractions by �1−q�, q, and s). Itis implied that these are dynamic variables. Note thata satisfied node does not have to be a seed, but a seedmust be a satisfied node first. Thus, S ≤Q.We assume that the network contains some altru-

istic nodes that are always willing to be seeds evenif there is no compensation for doing so and that theremaining nodes are “strategic,” i.e., they will not bea seed unless the distribution referral is greater than

4 In reality, the firm can be the copyright holder of the content, theonline retailer (i.e., the P2P firm), or both. Cooperative promotionsbetween retailers and labels (copyright holders) are common in themusic industry.

their cost of being a seed. The fraction of nodes thatare altruistic is denoted by � and the number of altru-istic seeds is �Q. The assumption of altruistic nodesis without loss of generality as � might be set equalto zero. However, some users are known to distributecontent in P2P networks in which no distributionreferral is offered. This willingness to share might befueled by a sense of community, reciprocity, or othersuch factors. Let c denote a node’s cost of distributinga file and r denote the referral payment made to theseed that distributes the file. A strategic satisfied nodewill be a seed if and only if c ≤ r . Assuming that cis Uniformly distributed in �0�C� across the strategicnodes, the number of strategic satisfied nodes that actas seeds is �1−��Q�r/C�. Therefore,

S =(�+ r

C�1−��

)Q� if r ≤C�

Q� otherwise�(1)

When r = 0, file distribution relies solely on the good-will of the altruistic nodes. We also assume that thefirm is unable to price discriminate, i.e., altruisticnodes also get the referral even though they wouldshare regardless.Finally, we describe the search process in the net-

work and associated notation. We consider both acompletely decentralized as well as a hierarchical net-work in this paper. In a completely decentralized orflat network, whenever a seeking node seeks a file,i neighbors of the seeking node are randomly selectedand queried. Each of the queried nodes again for-wards the request to another i unique nodes. This pro-cess continues until the maximum number of hops,denoted by j , is reached. Thus, the request will besent to a total of k = i+ i2 + · · · + ij = i�ij − 1�/�i− 1�nodes. The search fails if none of these k nodes areseeds.5 In Gnutella 0.4, a seeking node will query 7

5 We do not model the network topology. We assume only thata node’s neighbors are randomly assigned and uncorrelated withwhether that neighbor is a seed. Thus, each of the k queried nodesis just as likely to be a seed as any other node in the network. Thisimplicitly models the interaction as though the network is fullyconnected. The assumption is for analytical tractability but is not asrestrictive as it might appear, for two reasons. First, P2P networksare regularly rewired and the neighbors are constantly changing asa result. Second, P2P networks have a small diameter (Iamnitchiet al. 2004) so most nodes are reachable in a few hops.

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained Supply276 Information Systems Research 21(2), pp. 271–287, © 2010 INFORMS

Figure 1 Hierarchical P2P Network

Node

Super-node

of its neighbors. If these nodes do not have the file,they each contact 7 of their neighbors, and so on untilthe maximum hop count of 10 is reached (Ross andRubenstein 2003).Now consider a 2 level hierarchical P2P network

in which N nodes are organized into M groups. Weassume that all the groups are homogenous, exceptthat one of them labeled Group-I has the initial satis-fied node. Because the assignment of nodes to groupsin current P2P networks is independent of both therequest rate of the nodes and the content suppliedby them (Garces-Erice et al. 2003), the assumption ofhomogeneity across groups is reasonable. The num-ber of nodes in any group is given by n=N/M . Thenumber of satisfied nodes and fraction of nodes thatare satisfied in Group-I are denoted QI and qI respec-tively. Similarly, Q and q denote the correspondingvariables for the other groups. Each group in the net-work has a supernode that provides indexing servicesto all the nodes within its group, as shown in Figure 1.Any request from a seeking node will be satisfied ifthere is a seed within the group, because the super-node indexes all nodes in the group and can forwardthe request to the seed directly. If there is no seedin the group, the supernode forwards the request to�l − 1� randomly selected groups.6 Thus, a total of lgroups in the network are queried. The probabilitythat a randomly queried satisfied node distributes thefile is given by �= �+ �1−��r/C. Thus, given nq sat-isfied nodes in a group, the probability that there is noseed in the group is p= �1−��nq . Similarly, the proba-bility that there is no seed in Group-I is pI = �1−��nqI .

6 Again, we assume that neighbors of supernodes are assignedrandomly, uncorrelated with the density of seeds in the groups.This models the interaction as though the supernodes are fullyconnected.

Table 1 Glossary of Terms

t TimeN Total number of nodes in the network (also the market

potential)Q�t� Number of satisfied nodes at time t , Q�0� > 0; q = Q/N

S�t� Number of seeds at time t , S�0� > 0; s= S/N

� Fraction of nodes that are altruistic� Average rate at which a seeking node seeks contentr Referral per download offered to a node that distributes

contentc The cost per unit time of being a seed, c ∈ U�0� C�

Notation for a flat P2P networki Number of requests generated by seeking nodes in a flat

networkj Maximum number of hops for requests in a flat networkk Maximum number of nodes queried, k = i�i j −1�/�i−1�

Notation for a hierarchical P2P networkM Total number of groups in a hierarchical networkn Number of nodes in a single group �n= N/M�

l Number of groups queried in a hierarchical networkQI , qI Number and fraction of satisfied nodes in Group-IQ, q Number and fraction of satisfied nodes in a group other

than Group-I� Probability that a randomly queried satisfied node

distributes the filep= �1−��nq Probability that there is no seed in a group other than

Group-IpI = �1−��nqI Probability that there is no seed in Group-I

Table 1 summarizes the notation that will be used inthe rest of this paper.

3.2. Diffusion Model for Flat NetworksWe begin by considering a completely decentralizedP2P network. Whenever a seeking node seeks content,requests are sent to other nodes and forwarded withinthe network. A maximum of k nodes are queried asdescribed in §3.1. If a request reaches a seed, the file istransferred to the seeking node, which then becomesa satisfied node. Otherwise, the request fails and thestatus of the seeking node remains unchanged. Thenode can return in a later period to seek the con-tent. Because � is the rate at which a seeking nodeseeks content, ��N −Q� is the total number of seekingnodes seeking content at an instant when there areQ satisfied nodes. If each of these nodes sends out krequests, �1− �1− �S/N��k� is the fraction of the nodes

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained SupplyInformation Systems Research 21(2), pp. 271–287, © 2010 INFORMS 277

that reach at least one seed.7 Thus, the mean field dif-fusion equation for Q—i.e., the average rate at whichseeking nodes become satisfied nodes—is

dQ

dt= ��N −Q�

(1−

(1− S

N

)k)� or

dq

dt= ��1− q��1− �1− s�k��

(2)

We can substitute the expression for S from (1).A referral equal to C ensures that every satisfied nodeis a seed. Thus a referral greater than C is unnecessary.Substituting (1) into (2) and assuming r ≤C,

dq

dt= ��1− q��1− �1−�q�k�� (3)

where � = ��+ �1−��r/C�. Solving the above differ-ential equation with the initial condition of Q�0�=1(or, q�0�= q0=1/N�, we obtain the following solution.8

1ak − 1 ln�x− a�− 1

k�a− 1� ln�x− 1�

− 1k

k−1∑j=1

xj

a− xj

ln�x− xj�= �t+C1� (4)

where a= �1−���1− r/C�, x= 1−�q, CI is a constantof integration computed in the online supplement,and xj = ei2!j/k is the jth root of xk = 1, not includ-ing x= 1. In the supplement, we show that (4) has noimaginary parts and that the solution is unique. Wealso show that sales �dq/dt� is unimodal.

Proposition 1. The product penetration q�tc� at agiven time instant tc increases with k in a convex fashionearly in the file’s diffusion (small q) and in a concave fash-ion late in the diffusion (large q�. The time t�qc� to achievea given level of product penetration qc decreases with k ina convex fashion.

The proof is in the appendix. Figure 2 illus-trates Proposition 1 for sample parameter values(N = 10�000, Q�0� = 1, C = 1, � = 0�1, � = 0�5, and

7 Because neighbor assignment is uncorrelated with whether a nodeis a seed, the probability that a neighbor is a seed is just S/N .8 Additional information is contained in an online appendix to thispaper that is available on the Information Systems Research website(http://isr.pubs.informs.org/ecompanion.html).

Figure 2 Impact of Querying More Nodes on Diffusion Speed

75

60

45

30

15

01 5 10 15 20 25

0

0.2

0.4

0.6

0.8

1.0

Pro

duct

pen

etra

tion

q(t

=8)

Tim

e to

ach

ieve

50%

pene

trat

ion

t(q

=0.

5)

Number of nodes queried (k)

r = 0�2). The time to achieve 50% network penetra-tion decreases with k in a convex manner, and prod-uct penetration after 8 time units (q(8)) is S-shapedin k. Proposition 1 indicates that the marginal ben-efit of querying more nodes can be increasing earlyin the diffusion, but once there are sufficient satis-fied nodes, querying more nodes no longer providesthe significant returns that it did before. Consideringthe negative effects of request flooding, there shouldbe a threshold beyond which any further increase ink is not desirable. One potential impact of requestflooding is that nodes have to respond to many morerequests, which in turn might reduce their likelihoodof serving as seeds. Our model does not explicitlyincorporate these costs of request flooding. However,additional analysis reveals that if the number of seedsdecreases with k in a concave manner, there could bea threshold value of k beyond which the gains identi-fied in Proposition 1 are offset by the reduction in theseeds. The analysis is omitted from this paper but isavailable on request from the authors.

Proposition 2. The product penetration q�tc� at agiven time instant tc increases with the referral r . Simul-taneously, the time t�qc� to achieve a given level of productpenetration qc decreases with r in a convex fashion.

The proof is in the appendix. An increase in thereferral increases the willingness of satisfied nodes toserve as seeds. This, in turn, increases file availabilityand helps speed product diffusion. There is a trade-off between offering a high referral to speed diffu-sion and reducing the referral to increase margins. Weevaluate this trade-off in §3.2.1 to compute the opti-mal referral r∗.Consider a special case in which k= 1, i.e., only one

node in the network is queried whenever a seeking

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained Supply278 Information Systems Research 21(2), pp. 271–287, © 2010 INFORMS

node attempts to obtain the content. Equation (3) cannow be solved to obtain the following closed-formexpressions:

q�t� r�= 11+ �N − 1�e−A�r�t

� (5)

and

t�q� r�= 1A�r�

(ln

q

1− q+ ln�N − 1�

)� (6)

where A�r�= ���+ r�1−��/C�. Equations (5) and (6)specify an S-shaped P2P diffusion curve. These equa-tions can be used to derive the optimal referral thatthe firm should offer. We turn to that question now.

3.2.1. Optimal Referral. We demonstrate the ap-plicability of the diffusion model by studying optimalreferral strategies in P2P networks. Diffusion model-ing can be particularly useful here relative to myopicpolicies that determine referrals based on the immedi-ate impact on sales without accounting for the impacton future sales.Denote the length of the planning horizon by T .

Assume the referral r is constant throughout the plan-ning horizon, and the unit price of the product is con-stant and normalized to 1. The firm’s profit from eachsale is �1 − r�. Thus, the referral optimization prob-lem is

maxr

(∫ T

0�1− r�

dq

dtdt

)≈ �1− r�q�T �� (7)

In the equation above,9 we do not discount futuresales for analytical tractability. The referral optimiza-tion problem for arbitrary values of k is relatively easyto solve numerically, but closed-form results are hardto come by. In the following analysis, we focus on thecase in which k = 1 due to the tractability it affords.Numerical analysis suggests that although the mag-nitude of the referrals changes with k, the results arequalitatively similar to the ones highlighted below.Substituting q�t� from (5) into (7) and computing

the first-order condition with respect to r gives

�N − 1�exp(−�T

(�+ �1−��

r

C

))

·(�T �1−��

1− r

C− 1

)− 1= 0� (8)

9 Strictly speaking, the objective function is �1 − r��q�T � − 1/N�.Because N is large, we approximate this to �1− r�q�T �.

Solving (8) and imposing the bounds on r , the resultfollows (proof in the appendix).

Proposition 3. The optimal referral for the flat P2Pnetwork with diffusion specified by (5) is

r∗ =max[0�min

[C�1− C

�T �1−��

(1+W

(1

N − 1· exp

(�T

(�+ 1−�

C

)− 1

)))]]� (9)

where W�x� is the Lambert W -function (solution ofW�x�exp�W�x��= x�.

Using the properties of the Lambert W -function,the following properties of the optimal referral arederived in the appendix.

Proposition 4. The optimal referral is (a) non-increasing with the fraction of nodes in the network that arealtruistic ���, (b) nondecreasing with the P2P network’ssize �N �, and (c) nondecreasing with request rate � for�<�th and nonincreasing with � for �>�th, where

�th = 1T

(2+W��N − 1�e−1�+ e

N − 1

· exp�W��N − 1�e−1��)(

�+ 1−�

C

)−1� (10)

As expected, an increase in the number of altruis-tic nodes reduces the need for a high referral. Simi-larly, product penetration occurs at a slower rate in alarge network (see (5)). As a result, a higher referralis needed to speed the diffusion in a large network.Figure 3 illustrates the impact of � for sample

parameters (N = 10�000, Q�0� = 1, C = 1, � = 0�1).When � is extremely small, i.e., request rate amongseeking nodes is low, the best strategy for the firm isto offer no referral. This is because the bottleneck isnot free-riding but the low rate of demand generation.Once � reaches a certain threshold value, the referralstarts having an impact and thus increases. Finally, avery high request rate helps speed up diffusion con-siderably. This helps generate altruistic satisfied nodesat a faster rate, which in turn reduces the need for avery high referral. As a result, we observe the switchfrom a nondecreasing relationship to a nonincreasingrelationship between r∗ and �.Now consider a myopic referral policy that does not

account for the impact of referrals on future sales. The

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained SupplyInformation Systems Research 21(2), pp. 271–287, © 2010 INFORMS 279

Figure 3 Optimal Referral vs. Request Rate

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Request rate (β)

Opt

imal

ref

erra

l (r*

)

myopic referral maximizes the instantaneous profitgiven by

!�t�= �1−r�dq

dt= �1−r��

(�+ r�1−��

C

)�1−q�q� (11)

Setting d!/dr = 0, the optimal myopic referral is

r∗M = 12− C�

2�1−��� (12)

Proposition 5. The myopic referral is nonincreasingwith the fraction of nodes that are altruistic ��� and inde-pendent of both the request arrival rate � and the networksize N .

The proof follows from (12) and imposing thebounds on the referral. Interestingly, the myopic refer-ral, unlike the optimal referral, is independent of boththe request arrival rate � and the network size N . Thisis because the instantaneous profit depends on theinstantaneous demand, the fraction that is fulfilled,and the margin per sale. The referral impacts only thefraction of demand that is fulfilled and the marginper sale, but not the instantaneous demand, which isfixed for the purpose of computing the instantaneoussales. As a result, factors such as � and N , that influ-ence only the instantaneous demand but not demandfulfillment or margins, are irrelevant to the computa-tion of the myopic referral. In contrast, the optimalpolicy accounts for the fact that the referral impactsthe demand generated in future periods, and thus theoptimal referral interacts with the demand terms.

Figure 4 Profit Differential Between Optimal and Myopic Referral

600

500

400

300

200

100

02.0

1.0

0 0

0.50.2

0.40.6

0.81.01.5

Altruistic nodes (α)Request rate (β)

Per

cent

age

of in

crea

se in

pro

fitIn Figure 4, we plot the percentage increase in profit

achieved by the optimal referral relative to the myopicreferral against the fraction of altruistic nodes in thenetwork (�) and the request rate (�). The remainingparameters are N = 10�000, T = 20, and C = 1. When� is high or � is too low or too high, the need to offerreferrals diminishes (see Propositions 4 and 5). Hence,it does not matter whether a myopic or optimal refer-ral is offered. However for intermediate request ratesand low levels of altruism, as observed in reality (Izalet al. 2004, Adar and Huberman 2000), the optimalreferral significantly outperforms the myopic referral.

3.3. Diffusion in Hierarchical P2P NetworksA number of P2P networks are hierarchical, ratherthan completely decentralized to reduce query flood-ing. Kazaa, a popular P2P network, has a two levelstructure where leaf nodes are organized into groups.A rough estimate from 2003 indicates that there were10,000 groups in the Kazaa network with the super-node in each group handling 200 to 500 nodes (Rossand Rubenstein 2003).Consider a 2 level hierarchical P2P network with

M groups. As described in §3.1, the supernode of theseeking node’s group will query all nodes within thegroup and also forward the request to (l−1) neighbor-ing groups. Let us first consider diffusion in Group-I:

dqIdt

= ��1− qI ��1− pIpl−1�

= ��1− qI ��1− �1−��n�qI+�l−1�q��� (13)

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained Supply280 Information Systems Research 21(2), pp. 271–287, © 2010 INFORMS

In (13), ��1 − qI � is the generated demand and�1− pIp

l−1� is the fraction of this demand that is ful-filled (pI and p represent the probability that there isno seed in Group-I and another group, respectively).Now consider diffusion in the remaining �M − 1�

groups. The generated demand in any group is��1− q�. The probability that there is a seed withina group is �1− p�. When there is no seed within thegroup, two cases arise. The first case is one in whichall �l − 1� requests are forwarded to groups otherthan Group-I. The probability of this case (i.e., of notquerying Group-I) is(

M − 2l− 1

)·(M − 1l− 1

)−1= M − l

M − 1 � (14)

The probability that one of these �l− 1� groups con-tains a seed is �1 − pl−1�. The second case is one inwhich one of the �l−1� requests goes to Group-I, andthe remaining �l−2� requests are sent to other groups.The probability of querying Group-I is 1 − �M − l�/

�M − 1�= �l− 1�/�M − 1�. The probability that one ofthe groups contains a seed is �1− pIp

l−2�. Hence, theprobability that generated demand is fulfilled is

�1− p�+ p

(M − l

M − 1 �1− pl−1�+ l− 1M − 1 �1− pIp

l−2�)� (15)

Substituting the expression for p and pI , we get thediffusion equation

dq

dt= ��1− q�

(1− �1−��n�l−1�q

·(

M − l

M − 1 �1−��nq + l− 1M − 1 �1−��nqI

))� (16)

We can jointly solve (16) and (13) to obtain the dif-fusion trajectory in each of the groups. The diffusioncurve for the network as a whole is then obtained byaggregating the diffusion curves across all groups.10

While (13) and (16) can be solved numerically, thereis no closed-form solution to these differential equa-tions. The numerical solutions are discussed below.Figure 5 plots the diffusion curves based on (13)

and (16) for different values of M . The remainingparameters are N = 10�000, C = 1, � = 0�1, � = 0�5,l = 2, and r = 0�2. When the number of groups M

10 Overall product penetration in the network is ��M − 1�q+ qI �/M .For large M , this can be approximated to q.

Figure 5 Diffusion Process Under Varying Levels of NetworkCentralization

1.0

0.9

0.8

M = 10

M = 500

M = 1,000

0.7

0.6

0.5

0.4

0.3

0.2

0.1

00 2 4 6 8

Time (t)

Pro

duct

pen

etra

tion

(q)

M = 100

is large, the P2P network is similar to the decentral-ized network of §3.2. As M decreases, the fraction ofrequests reaching a seed increases significantly rela-tive to a completely flat network and product diffu-sion occurs faster. Figure 6 plots the time to achieve

Figure 6 Impact of Network Centralization on Time to Achieve 50%Network Penetration

0 0.02 0.04 0.06 0.08 0.10

4.0

3.0

2.0

1.0

1.5

2.5

3.5

1/M

Tim

e to

ach

ieve

50%

pen

etra

tion

t(q

=0.

5)

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained SupplyInformation Systems Research 21(2), pp. 271–287, © 2010 INFORMS 281

Figure 7 Impact of Incomplete Search and Free-Riding on DiffusionSpeed

10

8

6

4

2

00

0.5

1.0 1,000800

600400

2000

Degree of centralization (M)Referral (r )

Tim

e to

ach

ieve

50%

pene

trat

ion

t(q

=0.

5)

50% network penetration for the same parameters asin Figure 5. Initially, as we move from a flat struc-ture to hierarchical structure, a small decrease in M

will result in a significant reduction in diffusion time.However, as M reaches a sufficiently small value, fur-ther centralization will not necessarily help speed upthe diffusion noticeably. The bottleneck is no longerthe incomplete search of the network; rather, the bot-tlenecks are the rate of demand generation and thewillingness of satisfied nodes to distribute the file.Given that the primary motivation for reducing thenumber of groups is to speed diffusion, there is littlereason to prefer a completely (or nearly) centralizedP2P architecture given other drawbacks of centralizedstructure, such as the increased load on supernodes.For example, Figures 5 and 6 suggest that there is lit-tle to gain by reducing groups below 50 for the givennetwork configuration.To illustrate the joint impact of incomplete search

and free-riding, Figure 7 plots the time to achieve 50%network penetration against the number of groups inthe network M and the distribution referral r . Otherparameters are the same as in Figure 5. When bothaspects of supply are active constraints, there is muchto gain by increasing the referral, reducing the num-ber of groups in the network, or both. However, ifthe referral is very high, then almost all nodes dis-tribute the file, and the gains from reducing M aresomewhat modest. Similarly, if the network is highlycentralized, the referral might not be as crucial. This,

Figure 8 Impact of Search Process on Optimal Referral

0.8

0.6

0.4

0.2

00

510

1520 0

100200

300400

500

Opt

imal

ref

erra

l (r*

)

Number of requests

forwarded (I) Number of groups (M)

of course, assumes that there are sufficient altruisticnodes in the network �� = 0�1 in Figure 7). If mostnodes in the network are not altruistic, then a referralwill still be needed even with considerable centraliza-tion of the network. In summary, there are a numberof interesting design parameters (M , r , l) available tomanagers and network designers that can be carefullytuned based on the diffusion model.

3.3.1. Optimal Referral in Hierarchical Net-works. In this section, we investigate the optimalreferral in the hierarchical network. Because of thecomplexity of the diffusion equation for a hierarchi-cal network, we cannot derive a closed-form analyticexpression for the optimal referral. We evaluate itnumerically using (7) as the objective function and(13) and (16) as the diffusion equations. Figure 8 plotsthe optimal referral r∗ against the number of groupsin the network (M) and the number of requests for-warded (l). The other parameters are N = 10�000,C = 1, � = 0�1, � = 0�5. Increasing the number ofrequests helps reduce the magnitude of the referral.Furthermore, increasing the degree of centralizationof the P2P network (i.e., reducingM) reduces the needto offer high referrals (provided there are a reasonablenumber of altruistic nodes). At the same time, if thereare very few altruistic nodes in the network, thenincreasing the degree of centralization cannot entirelyreplace the need for referrals (Figure 9).

3.4. SimulationsOur diffusion models focus on supply-side factorsand model a simple demand process based on mean

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained Supply282 Information Systems Research 21(2), pp. 271–287, © 2010 INFORMS

Figure 9 Impact of Altruism and Network Centralization on OptimalReferral

0.8

0.6

0.4

0.2

00

0.5

1.0 0100

200300

400500

Number of groups (M )Altruistic nodes (α)

field analysis. We now use simulations to study therobustness of the models under more general demandprocesses. Specifically, the simulations incorporateheterogeneity by allowing � to vary across nodes.Furthermore, given the mean request rate � for anynode, the simulation models a Poisson request gener-ation process for each node (recollect that the modelassumed a deterministic process). Thus, the timing ofrequests varies across simulation runs. We then com-pare the average outcome over these runs with thepredictions from the models.The input to a simulation is the parameter set

(N�����C� r). Further {k} or {l, M} are addition-ally specified based on the type of network. First, Nnodes are generated. A fraction (�) of nodes are ran-domly selected to be altruistic nodes. For the remain-ing nodes, the cost of being a seed (c) is drawn froma Uniform distribution in �0�C�. Before the diffusionbegins, one of the nodes is randomly chosen as theinitial satisfied node. To allow heterogeneity, the con-tent seeking rate is uniformly distributed from 0.5�to 1.5� across the nodes. Given the mean seeking rateof a node, the request process is modeled as a Pois-son distribution. That is, the time between successiveattempts by a specific seeking node to locate contentis drawn from an exponential distribution. Hierarchi-cal networks are simulated by randomly forming Mgroups before the diffusion begins. Each group has arandomly selected supernode.The diffusion process is simulated through a series

of discrete steps. In the first step, the time until the

first request is drawn for each node from the respec-tive exponential distributions. The node with the low-est value is chosen as the first seeking node. The nodeis assigned several random neighbors at the time itis ready to request content. Requests are then for-warded to k randomly selected neighbors. If k exceedsthe number of neighbors then these neighbors for-ward the request to their randomly assigned neigh-bors until k unique nodes are queried. If none of therequests reach a seed, the status of the seeking nodeis unchanged and the time before the node’s nextattempt to locate the content is drawn. If the queriednode is a seed, the status of the querying node ischanged to a satisfied node. This newly satisfied nodewill be a seed if it is altruistic or if its cost of dis-tributing content is less than the referral. Next, thenode with the lowest time to a request is selected.The process repeats until all nodes are satisfied orthe time reaches an upper threshold. The simulationis repeated 100 times with the same parameters, andthe average value of q is recorded as the simulationresult. The simulation for the hierarchical network issimilar except that nodes are organized into groupsand requests are always handled by supernodes.The predicted diffusion curve and the curve

observed in the simulations (averaged over 100 runs)for a flat P2P network are plotted in Figure 10.The parameter configuration is N = 10�000, Q�0�= 1,C = 1, � = 0�1, � = 0�5, k = 8, and r = 0�2. The meanfield model reasonably approximates the mean dif-fusion curve. Figure 11 indicates that the model fitsthe simulations for a hierarchical P2P network as

Figure 10 Simulation Result for Flat P2P with Multiple Requests

0

0.2

0.4

0.6

0.8

1.0

0 5 10 15 20 25Time (t)

Pro

duct

pen

etra

tion

(q)

Numerical resultSimulation result

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained SupplyInformation Systems Research 21(2), pp. 271–287, © 2010 INFORMS 283

Figure 11 Simulation Result for Hierarchical P2P Network

0

0.2

0.4

0.6

0.8

1.0

0 0.5 1.0 1.5 2.0 2.5

Time (t )

Pro

duct

pen

etra

tion

(q)

Hierarchical model

Simulation result

well. The parameter settings are N = 10�000, M = 100,C = 1, �= 0�1, �= 1, and r = 0�001.

4. Conclusions and Future DirectionsWith the increasing use of P2P networks for dis-tribution of digital products, modeling P2P productdiffusion is of considerable interest. However, P2Pdiffusion demonstrates several unique characteristicsnot captured by traditional models. For example,generated demand is often not fulfilled immedi-ately because of the decentralized distribution in P2Pcoupled with the incomplete search and free-ridingby nodes. P2P media distributors are often mostinterested in understanding and designing these P2P-specific attributes such as P2P search architecture oruse of distribution referrals to encourage file sharing.We developed models to capture the influence of thedecentralized supply on product diffusion in P2P net-works and demonstrated its application in determin-ing optimal distribution referrals. Simulations suggestthat the models are robust to heterogeneity acrossnodes and stochasticity in the request process.There exist a number of interesting avenues for

future work. On the modeling front, we developa mixing model and do not model detailed spatialstructure in a P2P network. An interesting direction topursue is that of modeling network topology and theentry and exit of nodes in the network. The networkmodels of Andersson (1998), Durrett (1999), Eubanket al. (2004), and Ganesh et al. (2005) are highly appli-cable in this regard. Models that incorporate networkstructure can sometimes generate novel insights onthe diffusion process (e.g., see Durrett 1999). Given

the spatial structure, it is interesting to ask what typesof network topologies and search strategies are effec-tive in locating seeds in the network. In this con-text, recent work on decentralized search in complexnetworks is relevant (Kleinberg 2006, Liben-Nowellet al. 2005). Vega-Redondo (2007) provides an excel-lent overview of the topic.We assumed a monopoly setup in which unful-

filled demand returns in a future period. In reality,unfulfilled demand might be permanently lost to acompetitor, especially in the presence of a centralizedmedia distributor such as iTunes. In addition, weassumed that the referral does not impact the demandprocess itself whereas it can impact the timing ofpurchases when participants are forward-looking.Modeling the loss of sales to competitors and theimpact of distribution referrals on the demand pro-cess are likely to be particularly relevant to practice.An interesting direction for future study would be

in applying the models to address managerial ques-tions tied to P2P media distribution. For example,how do prior results on optimal dynamic pricing, pro-motion, timing of product release, etc., change underdecentralized product supply in P2P? How should afirm “seed” a new product, i.e., use free samples tosupport the distribution of a product whose avail-ability is limited early in its diffusion? Similarly, amore detailed investigation of dynamic referral poli-cies can be particularly useful for P2P managers. Weare currently investigating this issue in ongoing work(Hosanagar et al. 2008). It would also be interest-ing to incorporate congestion costs to determine theoptimal number of nodes to query in a decentralizednetwork. Finally, a highly valuable extension wouldbe the estimation of diffusion parameters using datafrom real P2P networks. Although it is ideal to esti-mate diffusion parameters from the closed-form solu-tions of diffusion equations, it might be necessary inour case to estimate parameters using discrete-timedifference equations because of the complexity of theP2P diffusion models. Mahajan et al. (2000) providean excellent overview of maximum likelihood estima-tion (MLE) and nonlinear least squares (NLS) basedestimation techniques for diffusion models.All of the above comments suggest that there are a

number of open problems and further study on P2P

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained Supply284 Information Systems Research 21(2), pp. 271–287, © 2010 INFORMS

product diffusion modeling can prove useful. Ourresearch is only a first step in this direction.

AcknowledgmentsThe authors thank Sumit Sarkar, an associate editor, andtwo anonymous referees for their valuable feedback. Allerrors remain our own.

Appendix A. Diffusion in Flat P2P Network

Proposition 1. The product penetration q�tc� at a given timeinstant tc increases with k in a convex fashion early in the file’sdiffusion (small q) and in a concave fashion late in the diffusion(large q). The time t�qc� to achieve a given level of product pene-tration qc decreases with k in a convex fashion.

Proof. Suppose we fix t = tc . Given the number ofrequests forwarded (k), let the value of q at this instant bedenoted q = q�k�. The diffusion equation can be rewrittenas∫ q�k�

q0�dx/��1− x��1− �1−�x�k���= �tc . Now suppose we

instead had �k+ 1� requests forwarded, then∫ q�k+1�

q0

(dx

�1− x��1− �1−�x�k+1�

)= �tc�

Therefore,∫ q�k�

q0

(dx

�1− x��1− �1−�x�k�

)

−∫ q�k+1�

q0

(dx

�1− x��1− �1−�x�k+1�= 0�

Rewriting it,∫ q�k+1�

q�k�

dx

�1− x��1− �1−�x�k�

=∫ q�k+1�

q0

�1−�x�k�x dx

�1− x��1− �1−�x�k+1��1− �1−�x�k�� (A1)

Because the second term is greater than zero, we can con-clude that q�k+ 1� > q�k�. Thus, an increase in the numberof nodes queried serves to increase the product penetration.Now let us evaluate the nature of the increasing relation-

ship. Just as with (A1), we can derive∫ q�k�

q�k−1�dx

�1− x��1− �1−�x�k�

=∫ q�k−1�

q0

�1−�x�k−1�xdx

�1− x��1− �1−�x�k��1− �1−�x�k−1�� (A2)

Taking the difference between (A1) and (A2), we have∫ q�k+1�

q�k�

dx

�1− x��1− �1−�x�k�−∫ q�k�

q�k−1�dx

�1− x��1− �1−�x�k�

=∫ q�k+1�

q�k−1��1−�x�k�x dx

�1− x��1− �1−�x�k+1��1− �1−�x�k�

+∫ q�k−1�

q0

�1−�x�k−1��x�2 dx

�1− x��1− �1−�x�k+1��1− �1−�x�k−1�> 0�

Therefore, ∫ q�k+1�

q�k�

dx

�1− x��1− �1−�x�k�

>∫ q�k�

q�k−1�dx

�1− x��1− �1−�x�k�� (A3)

The relationship between q�k + 1�, q�k�, and q�k− 1�depends on the function *k�q� = ∫

q�dx/��1 − x��1 −�1 − �x�k��, which is increasing in q. If *k�q� is concave,then for inequality (A3) to hold, we require that q�k+ 1�+q�k− 1�− 2q�k� > 0. That is, we require q�k� to be convex.The second derivative of *k�q� is negative when q is small.Thus, q�k� is convex with k for small q.To prove that q�k� is concave for large q, we rewrite (A1)

and (A2) as∫ q�k+1�

q�k�

dx

�1− x��1− �1−�x�k+1�

=∫ q�k�

q0

�1−�x�k�x dx

�1− x��1− �1−�x�k+1��1− �1−�x�k��

∫ q�k�

q�k−1�dx

�1− x��1− �1−�x�k−1�

=∫ q�k�

q0

�1−�x�k−1�xdx

�1− x��1− �1−�x�k��1− �1−�x�k−1��

Taking the difference between the above two expressions,just as we did with (A1) and (A2) to get (A3),

∫ q�k+1�

q�k�

dx

�1− x��1− �1−�x�k−1�

<∫ q�k�

q�k−1�dx

�1− x��1− �1−�x�k−1�� (A4)

Similarly, *k−1�q� = ∫q �dx/��1− x��1− �1−�x�k−1��� is in-

creasing in q. If *k−1�q� is convex, we require q�k + 1� +q�k−1�−2q�k� < 0 to make inequality (A4) hold. That is, werequire q�k� to be concave. The second derivative of *k−1�q�is positive for large q. It thus follows that q�k� is concavewith k for large q.Now, let us evaluate the impact of k on the time to

achieve a given level of product penetration. Suppose wefix q = qc . Denote the time taken to achieve this level ofproduct penetration by t�k�. Then,

t�k+ 1�− t�k�

=− 1�

∫ qc

q0

�x�1−�x�k dx

�1− x��1− �1−�x�k��1− �1−�x�k+1�< 0�

The second difference is:

t�k+ 1�+ t�k− 1�− 2t�k�

= 1�

∫ qc

q0

��x�2�1−�x�k−1�1+�1−�x�k�dx

�1−x��1−�1−�x�k−1��1−�1−�x�k��1−�1−�x�k+1�>0�

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained SupplyInformation Systems Research 21(2), pp. 271–287, © 2010 INFORMS 285

Proposition 2. The product penetration q�tc� at a given timeinstant tc increases with the referral r . Simultaneously, the timet�qc� to achieve a given level of product penetration qc decreaseswith r in a convex fashion.

Proof. Recollect that∫ q

q0�dx/��1− x��1− �1−�x�k���= �t.

Suppose we fix t = tc . We examine how q varies withrespect to r , we have

dq

dr

1�1− q��1− �1−�q�k�

− k�1−��

C

�∫ q

q0

x�1−�x�k−1dx�1− x��1− �1−�x�k�2

= 0�or

dq

dr= �1− q��1− �1−�q�k�

k�1−��

C

�∫ q

q0

x�1−�x�k−1dx�1− x��1− �1−�x�k�2

> 0�

Now, let us evaluate the impact of r on the time toachieve a given level of product penetration. Suppose wefix q = qc . Taking the derivative of t with respect to r , we get

dt

dr=−k�1−��

�C

∫ qc

q0

x�1−�x�k−1 dx�1− x��1− �1−�x�k�2

< 0� and

d2t

dr2= k�1−��2

�C2

�∫ qc

q0

x2�1−�x�k−2�k− 1+ �k+ 1��1−�x�k� dx

�1− x��1− �1−�x�k�3> 0�

Proposition 3. The optimal referral for the flat P2P networkwith diffusion specified by (5) is

r∗ =max[0�min

[C�1− C

�T �1−��

(1+W

(1

N − 1

� exp(�T

(�+ 1−�

C

)− 1

)))]]�

where W�x� is the Lambert W -function (solution of W�x� ·exp�W�x��= x�.

Proof. Setting the first-order condition specified in (8)equal to zero, we get �N − 1�exp�−�T ��+ �1− ���r/C��� ·��T �1 − ����1− r�/C� − 1� = 1. Rearranging this condition,we have(

�1− r��T �1−��

C− 1

)exp

(�1− r��T �1−��

C− 1

)

= 1N − 1 exp

(�T

(�+ 1−�

C

)− 1

)�

This equation is of the form W�x�exp�W�x��= x, where

W

(1

N − 1 exp(�T

(�+ 1−�

C

)− 1

))

= �1− r��T �1−��

C− 1� (A5)

Given the Lambert W -function W ( ), the solution to(21) is

r∗=1− C

�T �1−��

(1+W

(1

N−1 exp(�T

(�+ 1−�

C

)−1)))

The second derivative at r = r∗ is

−�N−1�exp(−�T

(�+ 1−�

Cr∗))

�1−r∗�

·(�T1−�

C

)2/(1+�N−1�exp

(−�T

(�+ 1−�

Cr∗)))2

<0�

because r∗ < 1. Therefore, the function is locally concaveand r∗ is a local maximum. However, the objective func-tion is convex for small r and concave for large r . We cannonetheless show that the objective function is unimodal,hence, r = r∗ is a global maximum. To see this, consider thefirst derivative with respect to r . The denominator is posi-tive, so the sign is determined by the numerator:

Z�r� = −1+ �N − 1�

� exp(−�T

(�+ 1−�

Cr

))(�1− r��T

1−�

C− 1

)�

Suppose that Z�0� > 0. This is the condition under which anonzero optimal r exists. Then

Z′�r� = −�N − 1�

� exp(−�T

(�+ 1−�

Cr

))�1− r�

(�T1−�

C

)2�

which is negative if r < 1 and positive if r > 1. There-fore Z�r�, starting from a positive value at r = 0, decreaseswith r , crosses 0 at r = r∗ < 1, and eventually increaseswith r for r > 1. However, for r > 1, Z�r� < 0. So, Z�r�≥ 0 ifr ≤ r∗, and Z�r� < 0 if r > r∗. This shows that the objectivefunction is unimodal.

Proposition 4. The optimal referral is (a) nonincreasingwith the fraction of nodes in the network that are altruistic (�),(b) nondecreasing with the P2P network’s size (N ), and (c) non-decreasing with request rate � for � < �th, and nonincreasingwith � for �>�th, where

�th = 1T

(2+W��N − 1�e−1�+ e

N − 1

� exp�W��N − 1�e−1��)(

�+ 1−�

C

)−1�

Proof.(a) From (8), we have �T �1 − ����1− r�/C� − 1 = 1/

��N − 1�exp�−�T ��+ �1−��r/C��� > 0. Thus, �T �1 − �� ·

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained Supply286 Information Systems Research 21(2), pp. 271–287, © 2010 INFORMS

�1− r∗�/C > 1. It then follows that 1− r∗ > 0. We will usethis result below.

,r∗

,�=− C

�T �1−��2

(1+W

(1

N−1 exp(�T

(�+ 1−�

C

)−1)))

− C

�T �1−��W ′(1

N−1 exp(�T

(�+ 1−�

C

)−1))

� �T �1−1/C�

N−1 exp(�T

(�+ 1−�

C

)−1)�

Using (8) and the equation W ′�x�=W�x�/��1+W�x��x�, wecan rewrite the above expression:

,r∗

,�= − C

1−�

(1− r∗

C+ 1− 1

C− C�1− 1/C�

�T �1− r∗��1−��

)

< − C

1−�

(1− r∗

C+ 1− 1

C− 1+ 1

C

)=−1− r∗

1−�< 0�

(b) W is an increasing function of N . Because r∗ isincreasing in W , it follows that r∗ increases with N .(c) We have

,r∗

,��T �= C

��T �2�1−���1+w�− C

�T �1−��w′ 1

N − 1

� exp(�T

(�+ 1−�

C

)− 1

)(�+ 1−�

C

)�

where w = W��1/�N − 1��exp��T �� + �1−��/C� − 1�� andw′ = dW�x�/dx. We apply the following two properties ofthe Lambert-W function:

wew = 1N − 1 exp

(�T

(�+ 1−�

C

)− 1

)� (A6)

and �1+w�eww′ = 1. Using these properties, the derivativecan be rewritten as follows:

,r∗

,��T �= C

��T �2�1−���1+w�− C

�T �1−��

w

1+w

(�+ 1−�

C

)�

Hence, the sign of the above expression is determined bythe expression

�1+w�2−w

(�T

(�+ 1−�

C

))� (A7)

Using (A6), (A7) can be rewritten as 1+w−w ln��N −1�w�,which is positive for small w and negative for w >wth,where wth = �e/�N − 1��exp�W��N − 1�e−1��. It follows thatr∗ increases when �<�th and decreases for �>�th, where

�th = 1T

(2+W��N − 1�e−1�+ e

N − 1

� exp�W��N − 1�e−1��)(

�+ 1−�

C

)−1�

Imposing r ∈ �0�C�, it follows that r∗ is nondecreasing for�<�th and nonincreasing for �>�th.

ReferencesAdar, E., B. A. Huberman. 2000. Free riding on Gnutella.

First Monday: Peer-Reviewed J. Internet 5–10. http://www.firstmonday.dk/issues/issue5_10/adar/index. html.

Andersson, H. 1998. Limit theorems for a random graph epidemicmodel. Ann. Appl. Probab. 8(4) 1331–1349.

Andersson, H., T. Britton. 2000. Lecture Notes in Statistics: StochasticEpidemic Models and Their Statistical Analysis. Springer-Verlag,New York.

Arora, G., M. Hanneghan, M. Merabti. 2005. P2P commercial digitalcontent exchange. Electronic Commerce Res. Appl. 4(3) 250–263.

Asvanund, A., K. Clay, R. Krishnan, M. Smith. 2004. An empiricalanalysis of network externalities in peer-to-peer music sharingnetworks. Inform. System Res. 15(2) 155–174.

Bass, F. 1969. A new product growth model for consumer durables.Management Sci. 15(5) 215–227.

Bass, F. 1980. The relationship between diffusion rates, experiencecurves, and demand elasticities for consumer durable techno-logical innovations. J. Bus. 53(July) 551–567.

Brauer, F., Castillo-Chavez. 2001. Mathematical Models in PopulationBiology and Epidemiology. Springer, New York.

Currah, A. 2004. The digital storm: The strategic challenge of Inter-net distribution to the Hollywood studio system. ExecutiveReport, University of Oxford, Oxford, UK.

Diekmann, O., J. A. P. Heesterbeek. 2000. Mathematical Epidemiologyof Infectious Diseases: Model Building, Analysis and Interpretation.Mathematical and Computational Biology. Wiley, New York.

Durrett, R. 1999. Stochastic spatial models. SIAM Rev. 41(4) 677–718.

Eubank, S., H. Guclu, V. S. Anil Kumar, M. Marathe, A. Srinivasan,Z. Toroczkai, N. Wang. 2004. Modeling disease outbreaks inrealistic urban social networks. Nature 429(6988) 180–184.

Fourt, L. A., J. W. Woodlock. 1960. Early prediction of market suc-cess for grocery products. J. Marketing 25 31–38.

Ganesh, A., L. Massoulie, D. Towsley. 2005. The effect of networktopology on the spread of epidemics. Proc. 2005 Infocom, Insti-tute of Electrical and Electronics Engineers, Miami, 1455–1466.

Garces-Erice, L., E. Biersack, K. Ross, P. Felber, G. Urvoy-Keller.2003. Hierarchical peer-to-peer systems. Parallel Processing Lett.13(4) 643–657.

Golle, P., K. Leyton-Brown, I. Mironov, M. Lillibridge. 2001. Incen-tives for sharing in peer-to-peer networks. Proc. 3rd ACM Conf.Electronic Commerce, ACM, Tampa, FL.

Ho, T., S. Savin, C. Terwiesch. 2002. Managing demand and salesdynamics in constrained new product diffusion under supplyconstraint. Management Sci. 48(2) 187–206.

Hosanagar, K., P. Han, Y. Tan. 2008. Optimal dynamic referralsin P2P networks. Working paper, University of Pennsylvania,Philadelphia.

Horsky, D., L. S. Simon. 1983. Advertising and the diffusion of newproducts. Marketing Sci. 2(1) 1–17.

Iamnitchi, A., M. Ripeanu, I. Foster. 2004. Small-world filesharingcommunities. Proc. 23rd Annual Joint Conf. IEEE Comput. Comm.Soc., Hong Kong, 7–11.

Izal, M., G. Urvoy-Keller, E. W. Biersack, P. A. Felber, A. Al Hamra,L. Garc’es-Erice. 2004. Dissecting BitTorrent: Five months ina torrent’s lifetime. Proc. Passive Active Measurement Workshop,Antibes Juan-les-Pins, France.

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Hosanagar et al.: Diffusion Models for P2P Media Distribution: On the Impact of Decentralized, Constrained SupplyInformation Systems Research 21(2), pp. 271–287, © 2010 INFORMS 287

Jain, D., V. Mahajan, E. Muller. 1991. Innovation diffusion in pres-ence of supply restrictions. Marketing Sci. 10(1) 100–113.

Kamvar, S. D., M. T. Schlosser, H. Garcia-Molina. 2003. Incentivesfor combatting freeriding on P2P networks. Lecture Notes Com-put. Sci., Vol. 2790, Euro-Par 2003 Parallel Processing. Springer,Berlin/Heidelberg.

Kephart, J. O., S. R. White. 1991. Directed-graph epidemiologicalmodels of computer viruses. IEEE Comput. Soc. Sympos. Res.Security Privacy, Oakland, CA, 343–359.

Khelil, A., C. Becker, J. Tian, K. Rothermel. 2002. An epidemicmodel for information diffusion in MANETs. Proc. Fifth ACMInternat. Workshop Model., Anal. Simulation Wireless Mobile Sys-tems, ACM, New York, 54–60.

Kleinberg, J. 2006. Complex networks and decentralized searchalgorithms. Proc. Internat. Congress of Mathematicians, Associa-tion of the International Congress of Mathematicians, Madrid,Spain.

Krishnan, T. V., F. M. Bass, V. Kumar. 2000. Impact of a late entranton the diffusion of a new product/service. J. Marketing Res.37(May) 269–278.

Kumar, S., J. Swaminathan. 2003. Diffusion of innovations undersupply constraints. Oper. Res. 51(6) 866–879.

Lang, K., R. Vragov. 2005. A pricing mechanism for digital contentdistribution over peer-to-peer networks. J. Management Inform.Systems 22(2) 121–139.

Liben-Nowell, D., J. Novak, R. Kumar, P. Raghavan, A. Tomkins.2005. Geographic routing in social networks. Proc. NationalAcad. Sci. USA 102(33) 11623–11628.

Mahajan, V., E. Muller, Y. Wind. 2000. New-Product Diffu-sion Models. IEEE, Springer-Science and Business Media,New York.

Newman, M. E. J., C. Moore, D. J. Watts. 2000. Mean-field solutionof the small-world network model. Physical Rev. Lett. 84(14)3201–3204.

New York Times. 2003. E-Commerce Report: Incentive marketingspreads on the Internet, with offers of discounts or credittoward gifts. (June 9).

Opper, M., D. Saad. 2001. Advanced Mean Field Methods—Theory andPractice. MIT Press, Cambridge.

Padmanabhan, V. N., K. Sripanidkulchai. 2002. The case for coop-erative networking. P. Druschel, M. F. Kaashoek, A. I. T.Rowstron, eds. Proc. First Internat. Workshop on Peer-to-Peer Sys-tems, Springer, Cambridge, MA, 178–190.

Ross, K. W., D. Rubenstein. 2003. Tutorial on P2P systems. Proc.22nd Ann. Joint Conf. Comput. Comm. Societies, IEEE, Washing-ton, DC.

TECHCRUNCH. 2007. P2P music sharing service Grooveshark upscompensation.

Vega-Redondo, F. 2007. Complex Social Networks. Cambridge Uni-versity Press, Cambridge, UK.

Dow

nloa

ded

from

info

rms.

org

by [

186.

233.

152.

15]

on 0

5 M

ay 2

014,

at 0

1:03

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.