Influence maximization with viral product design

Influence'Maximiza.on'with''Viral'Product'Design'

Nicola'Barbieri' 'Francesco'Bonchi'''

Yahoo!'Labs'B'Barcelona,'Spain''{barbieri,bonchi}@yahooBinc.com ''

'

Product'design'

A'generic'product'can'be'described'in'terms'of'a'number'of'a2ributes/features.'

Product'design'is'the'process'of'crea.ng'a'new'product,'by'determining'a'set'of'features'that'will'eventually'maximize'its'visibility'and'boost&the&market&share.''

PHASE'1'

PHASE'2'

Sta.s.cal'analysis'of'purchases'to''determine'how'people'value'different'features'that'characterize'an'

individual'product.'

Combinatorial'op.miza.on'problem'to'iden.fy'a'product'with'the'largest'market'share.'''

Share'of'choice'

Influence Maximization with Viral Product Design

Nicola Barbieri ⇤ Francesco Bonchi †

AbstractProduct design and viral marketing are two popular conceptsin the marketing literature that, although following differentpaths, aim at the same goal: maximizing the adoption ofa new product. While the effect of the social networkis nowadays kept in great consideration in any marketing-related activity, the interplay between product design andsocial influence is surprisingly still largely unexplored.

In this paper we move a first step in this directionand study the problem of designing the features of a novelproduct such that its adoption, fueled by peer influenceand “word-of-mouth” effect, is maximized. We model theviral process of product adoption on the basis of socialinfluence and the features of the product, and devise animproved iterative scaling procedure to learn the parametersthat maximize the likelihood of our novel feature-awarepropagation model. In order to design an effective algorithmfor our problem, we study the property of the underlyingpropagation model. In particular we show that the expectedspread, i.e., the objective function to maximize, is monotoneand submodular when we fix the features of the productand seek for the set of users to target in the viral marketingcampaign. Instead, when we fix the set of users and try tofind the optimal features for the product, then the expectedspread is neither submodular nor monotone (as it is thecase, in general, for product design). Therefore, we developan algorithm based on an alternating optimization betweenselecting the features of the product, and the set of users totarget in the campaign. Our experimental evaluation on real-world data from the domain of social music consumption(LastFM) and social movie consumption (Flixster) confirmsthe effectiveness of the proposed framework in integratingproduct design in viral marketing.

1 IntroductionProduct design and viral marketing are two widely studiedconcepts in the marketing literature. The former conceptrefers to all those activities (such as market share forecastingor estimation of customer utilities for a given set of productfeatures), that are aimed at designing a new product, withthe objective of maximizing the number of customers adopt-ing the product. The latter concept refers to marketing tech-

⇤Yahoo Labs, Barcelona, Spain. E-mail: [email protected]†Yahoo Labs, Barcelona, Spain. E-mail: [email protected]

niques which are able to take advantage of peer influenceamong customers, through modern social media and com-munication platforms. The idea is to exploit a pre-existingsocial network in order to increase brand awareness or toachieve other marketing objectives (such as product sales)through self-replicating viral processes.

The share-of-choice (SOC) problem is the combinato-rial optimization problem at the basis of product design. Inthe SOC problem we are given a set of product features Fwith their levels (e.g., the size of the screen of a laptop), anda set of customers V = {v1, v2, . . . , vn}. Customers havedifferent utilities for the feature levels of a product (calledpart-worth utilities in the marketing literature [9]): we letui

f,l

denote the part-worth utility for customer vi

if level l ischosen for feature f 2 F . Moreover we are given a “hurdle”hi

for each user vi

, representing the utility value of equilib-rium between making or not making the purchase.

By denoting with xf,l

2 {0, 1} an indicator variable,which is 1 if the level l has been selected for the feature fand zero otherwise, the assumed adoption model is that acustomer v

i

decides to adopt the product when the sum ofutilities exceeds her hurdle:

(1.1)X

f2F

X

l

xf,l

· ui

f,l

� hi

.

The objective is to select the feature levels for a new productso to maximize the number of customers that will adopt it.

The basic computational problem behind viral mar-keting instead, is that of selecting the set of initial usersthat are more likely to influence the largest number ofusers in a social network, known as influence maximiza-tion (MAXINF) [11]. Given a directed social network G =

(V,E), where a link (vj

, vi

) 2 E means that vj

can poten-tially influence v

i

, and given a budget k, the problem requiresto find k “seed” nodes in the network, such that by activatingthem we can maximize the expected number of nodes thateventually get activated, according to a probabilistic propa-gation model that governs how influence propagates throughthe network.

One of the most studied propagation models is thelinear threshold (LT) model. In this model at a giventimestamp, each node is either active (a customer whichalready purchased the product) or inactive, and each node’stendency to become active increases monotonically as moreof its neighbors become active. An active node never

The'customer'vi'decides'to'adopt'the'product'when'the'sum'of'u.li.es'exceeds'her'hurdle:'

Conjoint'analysis''Data'about'customers'preferences'is'collected'and'analyzed'to'

produce'partAworth&uCliCes:'

•  uif,l'is'the'u.lity'for'customer'vi'if'level'l'is'chosen'for'feature'f'•  hi'is'the'u.lity'value'of'equilibrium'(hurdle)'between'making'or'not'making'the'

purchase.''

The'goal'is'to'select'the'feature'levels'for'a'new'product'that'maximize'the'number'of'customers'that'will'adopt'it.'

Informa.on'propaga.on'in'online'SNS'

Users'performs'ac.ons'(likes,'purchases,'etc.)'and'those'ac.ons'propagate'across'the'network.'

A'PropagaCon&model'governs'how'influence'propagates'across'a'network.'

Linear&Threshold&model:&&•  bj,i'is'the'influence'exert'by'user'vj'on'vi'•  Sum'of'incoming'weights'≤'1'•  Each'vi'picks'a'random'threshold'θi'~'U[0,1].'•  Time'proceeds'in'discrete'steps.'

'

becomes inactive again. In particular, each node vi

isinfluenced by each neighbor v

j

according to a weight bj,i

,such that the sum of incoming weights to v

i

is no morethan 1. At the beginning, each node v

i

picks a threshold ✓i

uniformly at random from [0, 1]. If at time t, the total weightfrom the active neighbors of an inactive node v

i

is at least ✓i

,

(1.2)X

(vj

,v

i

)2E

v

j

is active

bj,i

� ✓i

,

then vi

becomes active at timestamp t+ 1.Observe that the two problems, SOC and MAXINF,

share various similarities: both have a threshold-based acti-vation model, they have the same objective to maximize (thenumber of adoptions), and they are both NP-hard [12, 11].

They have also several differences. In particular, in SOConly a set of user V is considered, overlooking the possiblerelations among the users, i.e., the social network. On theother hand MAXINF considers all the products the same, thusoverlooking an important aspect: different product featuresinterest different users to different extents, so one shouldnot forget the characteristics of the product, when modelinginfluence propagation.

In this paper we incorporate product design in viralmarketing. This leads to the new problem of influencemaximization with viral product design (MAXINF VPD),which builds upon the similarities of SOC and MAXINF, andovercomes their limitations by considering both the socialinfluence and the features of the product to be promoted bya viral marketing campaign.

In a recent paper [10], the economists Gunnec andRaghavan provide the “dual” of our effort: they incorporatepeer influence in SOC. In particular peer influence is injectedinto the objective function of SOC as a decrease in oneperson’s hurdle. The more social contacts adopt the product,the smaller becomes the hurdle. By means of an analyticalexample, Gunnec and Raghavan show the difference inmarket share among SOC with and without peer influence:in the worst case, the loss of market share for ignoring thesocial network effects can be as large as the whole market.

In Figure 1 we report empirical example based on realdata, to show that in viral marketing the features of theproduct being promoted cannot be overlooked. In particularwe show the different expected spread achieved by ouralgorithm when considering items with different featuresset.1 The first plot, from a LastFM dataset, shows theexpected spread (for different sizes of the seed set) of aviral marketing campaign of two different new songs thatwe might want to promote. One song has the features{Electronic, Metal}, indicating a song combining HeavyMetal and Electronic music, while the second song hasthe features {Lana del Rey, Kate Perry}, indicating an

1In this paper for simplicity’s sake and due to the nature of our datasets, the featureswe consider have binary values: either the product has a feature or not.

1 5 10 15 200

50

100

150

200

3 1132

61 7355

81102 112

145

Expe

cted

spre

ad

{Electronic, Metal}{Lana del Rey, Kate Perry}

1 5 10 15 200

1,000

2,000

3,000

14

1,3501,715 1,856 2,0411,877 2,054 2,131 2,252

2,462

Seed set size

Expe

cted

spre

ad

{wizards and magicians, very good for kids}{thriller, aliens}

Figure 1: The different spread of influence achieved byour algorithm when considering items with different featuresets. The plot above refers to an experiment on the LastFM

dataset, while the bottom plot is from the Flixster dataset.

hypothetical collaboration among these two pop singers. Theexpected size of market share in our experiments clearlyspeaks in favor of the second song. Therefore, if we wereabout commercializing a new hit, we would better considersetting up a collaboration between Lana del Rey and KatePerry, than a metal-electronic extravaganza.

2 Background and related workInfluence maximization. Given a probabilistic propagationmodel m (for instance, LT) and a seed set S ✓ V , theexpected number of active nodes at the end of the processis denoted by �

m

(S). The MAXINF problem requiresto find the set S ✓ V , |S| = k, such that �

m

(S) ismaximum. Kempe et al. [11] show that the problem isNP-hard, however, the function �

m

(S) is monotone2 andsubmodular3. When equipped with such properties, thesimple algorithm that at each iteration greedily extends theset of seeds with the node w providing the largest marginalgain �

m

(S [ {w}) � �m

(S), produces a solution with(1� 1/e) approximation guarantee [16].

Following [11], considerable effort has been devoted todevelop methods for improving the efficiency of MAXINF[13, 5, 8], or for learning the strength of influence along eachlink [18, 7] (which is a needed input for MAXINF). Regard-less the fact that users’ authoritativeness, expertise, trust andinfluence are evidently topic-dependent, the research on so-cial influence has surprisingly largely overlooked this aspect:only recently, researchers have started looking at social influ-ence while keeping in consideration the characteristics of theitem that is propagating [2].

2�

m

(S) �

m

(T ) whenever S ✓ T .3�

m

(S [ {w}) � �

m

(S) � �

m

(T [ {w}) � �

m

(T ) whenever S ✓ T .

If'at'.me'(t)'

then'vi'becomes&acCve'at'(t+1).'

Linear Threshold Model (LT)Every arc (u,v) has associated a weight b(u,v) such that the sum of incoming

weights in each node is d 1Time proceeds in discrete steps

Each node v picks a random threshold θv ~ U[0,1]

A node v becomes active when the sum of incoming weights from active neighbors reaches θv

17

.1

.1

.1

.1

.1

.2

.2

.2

.3

.3

.3

.3

.4

.4

.4

.4

.4

.1

.1

ba

c

fe

d

g

h

i

Viral'marke.ng'

Exploit'social'influence'for'markeCng.'

MAXINF&[1]':'find'the'set'S'�'V,'with'|S|'='k,'such'that'the'expected'number'of'ac.ve'nodes'at'the'end'of'the'process'is'maximum.'

MAXINF'is'NPAhard'under'LT!!!'

Target'users'who'are'likely'to'invoke'wordBofBmouth'diffusion.'

σm(S)%=%expected&spread'of'propaga.on,'if'S%is'the'ini.al'set'of'ac.ve'nodes'under'a'propaga.on'model'm.'

[1]'Kempe'et'al.'(KDD’03)'“Maximizing%the%spread%of%influence%through%a%social%network”'

σLT(S)%is'monotone&and&submodular.'

The'greedy&algorithm'produces'a'solu.on'with'(1'−'1/e)'approxima.on'guarantee.'

?'

Product'design'and'viral'marke.ng'aim'at'the'same'goal:'maximizing'the'adop.on'of'a'new'product.'

Social%contagion%through%viral%product%design[2]'

[2]'Aral'Sinan,'and'Dylan'Walker.'CreaCng%social%contagion%through%viral%product%design:%A%randomized%trial%of%peer%influence%in%networks.'Management'Science'57.9'[3]'Gunnec'Dilek,'and'S.'Raghavan.'IntegraCng%Social%Network%Effects%in%the%ShareJofJChoice%Problem.'Technical'report,'2012.'

Social%network%effect%in%the%SoC%problem[3]'

'Randomized'experiment'tes.ng'the'effec.veness'of'viral'product'design'features'in'crea.ng'social'contagion.'

Viral'Characteris.cs'

Viral'Features'

Refer'to'the'content'of'a'product.''

How'the'product'is'shared'–'viral'mechanism'associated'with'the'product.''

Model'social'network'effects'processes'as'an'addi.onal'aUribute'of'the'product.'

The'number'of'neighbors'who'have'already'adopted'the'product'increases'linearly'the'u.lity'of'a'consumer'in'making'the'

purchase.'

becomes inactive again. In particular, each node vi

isinfluenced by each neighbor v

j

according to a weight bj,i

,such that the sum of incoming weights to v

i

is no morethan 1. At the beginning, each node v

i


uniformly at random from [0, 1]. If at time t, the total weightfrom the active neighbors of an inactive node v

i

is at least ✓i

,

(1.2)X

(vj

,v

i

)2E

v

j

is active

bj,i

� ✓i

,

then vi

becomes active at timestamp t+ 1.Observe that the two problems, SOC and MAXINF,

share various similarities: both have a threshold-based acti-vation model, they have the same objective to maximize (thenumber of adoptions), and they are both NP-hard [12, 11].

They have also several differences. In particular, in SOConly a set of user V is considered, overlooking the possiblerelations among the users, i.e., the social network. On theother hand MAXINF considers all the products the same, thusoverlooking an important aspect: different product featuresinterest different users to different extents, so one shouldnot forget the characteristics of the product, when modelinginfluence propagation.

In this paper we incorporate product design in viralmarketing. This leads to the new problem of influencemaximization with viral product design (MAXINF VPD),which builds upon the similarities of SOC and MAXINF, andovercomes their limitations by considering both the socialinfluence and the features of the product to be promoted bya viral marketing campaign.

In a recent paper [10], the economists Gunnec andRaghavan provide the “dual” of our effort: they incorporatepeer influence in SOC. In particular peer influence is injectedinto the objective function of SOC as a decrease in oneperson’s hurdle. The more social contacts adopt the product,the smaller becomes the hurdle. By means of an analyticalexample, Gunnec and Raghavan show the difference inmarket share among SOC with and without peer influence:in the worst case, the loss of market share for ignoring thesocial network effects can be as large as the whole market.

In Figure 1 we report empirical example based on realdata, to show that in viral marketing the features of theproduct being promoted cannot be overlooked. In particularwe show the different expected spread achieved by ouralgorithm when considering items with different featuresset.1 The first plot, from a LastFM dataset, shows theexpected spread (for different sizes of the seed set) of aviral marketing campaign of two different new songs thatwe might want to promote. One song has the features{Electronic, Metal}, indicating a song combining HeavyMetal and Electronic music, while the second song hasthe features {Lana del Rey, Kate Perry}, indicating an

1In this paper for simplicity’s sake and due to the nature of our datasets, the featureswe consider have binary values: either the product has a feature or not.

1 5 10 15 200

50

100

150

200

3 1132

61 7355

81102 112

145

Expe

cted

spre

ad

{Electronic, Metal}{Lana del Rey, Kate Perry}

1 5 10 15 200

1,000

2,000

3,000

14

1,3501,715 1,856 2,0411,877 2,054 2,131 2,252

2,462

Seed set size

Expe

cted

spre

ad

{wizards and magicians, very good for kids}{thriller, aliens}

Figure 1: The different spread of influence achieved byour algorithm when considering items with different featuresets. The plot above refers to an experiment on the LastFM

dataset, while the bottom plot is from the Flixster dataset.

hypothetical collaboration among these two pop singers. Theexpected size of market share in our experiments clearlyspeaks in favor of the second song. Therefore, if we wereabout commercializing a new hit, we would better considersetting up a collaboration between Lana del Rey and KatePerry, than a metal-electronic extravaganza.

2 Background and related workInfluence maximization. Given a probabilistic propagationmodel m (for instance, LT) and a seed set S ✓ V , theexpected number of active nodes at the end of the processis denoted by �

m

(S). The MAXINF problem requiresto find the set S ✓ V , |S| = k, such that �

m

(S) ismaximum. Kempe et al. [11] show that the problem isNP-hard, however, the function �

m

(S) is monotone2 andsubmodular3. When equipped with such properties, thesimple algorithm that at each iteration greedily extends theset of seeds with the node w providing the largest marginalgain �

m

(S [ {w}) � �m

(S), produces a solution with(1� 1/e) approximation guarantee [16].

Following [11], considerable effort has been devoted todevelop methods for improving the efficiency of MAXINF[13, 5, 8], or for learning the strength of influence along eachlink [18, 7] (which is a needed input for MAXINF). Regard-less the fact that users’ authoritativeness, expertise, trust andinfluence are evidently topic-dependent, the research on so-cial influence has surprisingly largely overlooked this aspect:only recently, researchers have started looking at social influ-ence while keeping in consideration the characteristics of theitem that is propagating [2].

2�

m

(S) �

m

(T ) whenever S ✓ T .3�

m

(S [ {w}) � �

m

(S) � �

m

(T [ {w}) � �

m

(T ) whenever S ✓ T .

Influence'Maximiza.on'with'Viral'Product'Design''

Design'the'features'and'a'targe.ng'policy'of'a'novel'product'such'that'its'adop.on,'fueled'by'peer'influence'and'“wordBofBmouth”'

effect,'is'maximized.'

tion model and a set of nodes S ✓ V , the expected numberof nodes “infected” in the viral cascade started with S, iscalled (expected) spread of S and denoted by �(S). The in-fluence maximization problem asks for a set S ✓ V , |S| = k,such that �(S) is maximum, where k is an input parameter.

The most studied propagation model is the so called Inde-pendent Cascade (IC) model. We are given a directed socialgraph G = (V,A) with arcs (u, v) 2 A labeled by influenceprobability p

u,v

2 (0, 1], representing the strength of the in-fluence of u over v. If (u, v) 62 A, we define p

u,v

= 0. Ata given time step, each node is either active (an adopterof product) or inactive. At time 0, a set S of seeds areactivated. Time unfolds deterministically in discrete steps.As time unfolds, more and more of neighbors of an inactivenode v become active, eventually making v become active,and v’s decision may in turn trigger further decisions bythe nodes to which v is connected. In particular, in the ICmodel, when a node u first becomes active, say at time t, ithas one chance at influencing each inactive neighbor v withprobability p

u,v

, independently of the history thus far. Ifthe attempt succeeds, v becomes active at time t+ 1.

Influence maximization is generally NP-hard [?], underIC and other propagation models. Kempe et al., however,show that the function �(S) is monotone1 and submodular2.When equipped with such properties, the simple greedy al-gorithm that at each iteration greedily extends the set ofseeds with the node providing the largest marginal gain,produces a solution with provable approximation guarantee(1� 1/e) [?].Though simple, the greedy algorithm is computationally

prohibitive, since the step of selecting the node provid-ing the largest marginal gain is #P-hard. In their paper,Kempe et al. run Monte Carlo simulations for su�cientlymany times to obtain an accurate estimate of the expectedspread.However, running many propagation simulations isextremely costly on very large real-world social networks.Therefore, following [?], considerable e↵ort has been devotedto develop methods for improving the e�ciency and scalabil-ity of influence maximization [?, ?, ?, ?]. Regardless of suchresearch e↵orts, scalability still remains an open challenge:on the problem instances that we consider in this paper, evena state-of-the-art algorithm as CELF++ [?], takes from fewdays to more than a week in order to extract a seed set of50 nodes. Most of this literature on e�cient algorithms forinfluence maximization assumes the weighted social graphgiven, and do not address how the link influence probabili-ties p

u,v

can be obtained. This problem instead is addressedin [?, ?, ?].Regardless of the fact that users authoritativeness, exper-

tise, trust and influence are evidently topic-dependent, onlyfew papers have looked at social influence from the topicsperspective. Tang et al. [?] study the problem of learn-ing user-to-user topic-wise influence strength. The input totheir problem is the social network and a prior topic distri-bution for each node, which is given as input and inferredseparately. Liu et al. [?] propose a probabilistic model forthe joint inference of the topic distribution and topic-wiseinfluence strength: here the input is an heterogenous so-cial network with nodes that are users and documents. Thegoal is to learn users’ interest (topic distribution) and user-to-user influence. Lin et al. [?] study the joint modelingof influence and topics, by adopting textual models. None1�(S) �(T ) whenever S ✓ T .2�(S [ {w})� �(S) � �(T [ {w})� �(T ) whenever S ✓ T .

Figure 1: Topic-aware influence parameters are learnt fromthe log of past propagations and the social network following[?]. These are the prerequisites to build the INFLEX indexthat we use to e�ciently answer TIM queries.

of these three papers define an influence propagation model,instead in a recent work Barbieri et al. [?] extend the classicIC model to be topic-aware: the resulting model is namedTopic-aware Independent Cascade (TIC) .Barbieri et al. also devise methods to learn, from a log of

past propagations, the model parameters, i.e., topic-awareinfluence strength for each link and topic-distribution foreach item. Their experiments show that (1) topic-aware in-fluence propagation models are more accurate in describingreal-world influence driven propagations than the state-of-the-art topic-blind models, and (2) by considering the char-acteristics of the item, a larger number of adoptions can beobtained in the influence maximization problem. The TICmodel is assumed at the basis of our work and it is intro-duced in details next.

1.2 Problem definitionIn this paper we study indexing schemes for answering

Topic-aware Influence Maximization (TIM) queries. We aregiven a directed social graph G = (V,A) and a space of Ztopics. We assume the TIC propagation model introducedin [?], whose parameters are learned from a log of past prop-agation traces. In particular for each arc (u, v) 2 A and foreach topic z 2 [1, Z] we have a probability pz

u,v

representingthe strength of influence that user u exerts over user v fortopic z. A high level depiction of our setting is provided inFigure ??. An item i is described by a distribution ~�

i

overthe topics: that is for each topic z 2 [1, Z] we are given �z

i

,with

PZ

z=1 �z

i

= 1. In the TIC model a propagation happenslike in the IC model: when a node u first becomes active onitem i, has one chance of influencing each inactive neighborv, independently of the history thus far. The tentative suc-ceeds with a probability that is the weighted average of thelink probability w.r.t. the topic distribution of the item i:

piu,v

=ZX

z=1

�z

i

pzu,v

. (1)

A TIM query Q(~�q

, k), takes as input an item description~�q

and an integer k and it requires to find the seed set S ✓ V ,|S| = k, such that the expected number of nodes adoptingitem q, denoted by �(S,~�

q

), is maximum:

Q(~�q

, k) = argmaxS✓V,|S|=k

�(S,~�q

). (2)

Propaga'on)Log)

Social)Network)

Product( f1( f2( f3( …( fF(p1) ✔) ✔)

p2) ✔) ✔)

…) ✔) ✔) ✔)

Product)9)Feature)Assignments)

FeatureBaware'Propaga.on'model'

Product design can be roughly divided into two mainphases. In the first phase, data about customers preferencesis collected and analyzed to produce part-worth utilities. Themost popular tool for this task is a statistical technique usedin market research known as conjoint analysis [9].

The second phase takes part-worth utilities as inputand aims at selecting the levels of the product features thatmaximizes the product adoption: this is the share-of-choice(SOC) problem which has been shown to be NP-hard [12]and for which several heuristics have been proposed [1, 21].

Aral and Walker coin the term viral product design [19].They distinguish between “viral characteristics” and “viralfeatures” of a product. Viral characteristics refer to thecontent of the product, whereas viral features corresponddirectly to viral mechanism associated to the product.In thispaper we are interested in the former, while Aral and Walkerconduce randomized field experiments with the latter.

Gunnec and Raghavan [10] incorporate peer influence inthe SOC problem and propose a genetic algorithm for theirnew version of the problem. There are many differences withour contribution. First, they consider all social contacts of aperson having the same influence weight, thus overlookingany distinction between more or less influential users. Sec-ond, they assume part-worth utilities as given in input, as itis always the case for SOC. Instead in this paper we studythe problem of learning the part-worth utilities, the hurdlefor each user and the influence strength for each link jointly,by analyzing past data. Last, we incorporate product designin MAXINF, so the overall objective is different.

Miah et al. [14] study the problem of selecting whichattributes of a product the seller should highlight in orderto maximize the visibility of the product in a databaserepresenting an e-marketplace. Das et al. [6] introduce anew problem for web item design: given a training datasetof existing items with their user-submitted tags, and a queryset of desirable tags, design the k best new items expected toattract the maximum number of desirable tags.

3 Paper contributions and roadmapOur contributions are summarized as follows:

• A feature-aware propagation model (F-TM), whichmodels the process of product adoption by consideringsocial influence and part-worth utilities for the featuresof the items being propagated (Section 4).

• An improved iterative scaling procedure to learn part-worth utilities, hurdles, and social influence jointly(Section 4.1): those are the parameters that maximizethe likelihood for the F-TM model, on a log of observedproduct propagations.

• The definition of the influence maximization with viralproduct design (MAXINF VPD) problem and the studyof its properties under the F-TM propagation model(Section 5).

• An iterative alternating optimization algorithm for theMAXINF VPD problem (Section 6).

• A thorough experimental assessment using two real-world, semantically rich datasets from the domain ofsocial music consumption (LastFM) and social movieconsumption (Flixster). The datasets contain (i) thesocial network among users, (ii) a database of past itempropagations in the social network, and (iii) a set offeatures for each item (Section 7).

4 Feature-aware propagation modelWe next introduce our feature-aware propagation model. Forsimplicity’s sake, we focus on binary features: either theproduct has a feature or not.

Let G = (V,E) with V = {v1, v2, . . . , vn} be thesocial graph, where each node v

i

has an associated hurdlehi

2 R+, that represents an internal resistance of theuser to be influenced. For each arc (v

j

, vi

) 2 E, letbj,i

2 R+ represent the strength of the influence exertedby v

j

on vi

. Let F denote the universe of possible productfeatures, and let F(p) represent the set of features of a givenproduct p. The part-worth utility for user v

i

and featuref is denoted ui

f

2 R: it is natural that each feature maycontribute positively for some users, while its contribute maybe negative for others.

The feature-aware threshold model (F-TM) is a variantof the LT model, where peer influence and feature utilitiescooperate, while the hurdle is used as a reduction of theutilities. In accordance with the literature on MAXINF, wekeep the propagation model probabilistic: at the beginningeach node v

i


uniformly at random from[0, 1]. Time unfolds in discrete steps. For a given productp, the set of newly activated nodes at time t is denoted byD

p

(t), while Cp

(t) = [t

0t

Dp

(t0) represents the set of allnodes that adopted p so far. Product adoption is modeledusing a logistic function. At time t the probability that uservi

adopts product p is:

(4.3) Pr(p|i, t) = exp{ i

t

(p)}1 + exp{ i

t

(p)}where:

(4.4) i

t

(p) =X

(vj

,v

i

)2E

v

j

2C

p

(t)

bj,i

+

X

f2F(p)

ui

f

� hi

.

The activation probability is zero if the node vi

has no activeneighbors at the considered time, or if F(p) = ;. At time t ifPr(p|i, t) � ✓

i

then vi

adopts p at time t+ 1. The diffusionends when no new nodes can be activated.

With the choice of the logistic as activation function, wemodel log-odds as a linear function of the user overall utility(social influence and subjective preference) in adopting theproduct. Moreover, this choice allows us employing maxi-mum likelihood for estimating the parameters of the model.

•  Each'node'vi'has'an'associated'hurdle'hi'�'R+,'that'represents'an'internal'resistance'of'the'user'to'adopt'a'product'

•  For'each'arc'(vj','vi)'�'E,'let'bj,i'�'R+'represent'the'strength'of'the'influence'exerted'by'vj'on'vi'

•  F(p)'represents'the'set'of'features'of'a'given'product'p'•  The'partBworth'u.lity'for'user'vi'and'feature'f'is'denoted'uif'�'R.'

The'featureBaware'threshold'model'(FATM)'is'a'general'threshold'model,'where'peer'influence'and'feature'u.li.es'cooperate.'

At'.me't'the'probability'that'user'vi'adopts'product'p'is:'

Learning'the'parameters'of'the'model'

B � C +

X

p2P

X

i:vi

2V

�(i, p)

8

<

:

X

v

j

2N

p

in

(vi

)

�j,i

+

X

f2F(p)

�i

f

� ⌘i

�⇣

1� i

t

p

(i)(p)⌘

9

=

;

�X

p2P

X

i:vi

2V

(1� �(i, p))⇣

1� i

t

p

(i)(p)⌘

�X

p2P

X

i:vi

2V

i

t

p

(i)(p) ·1

�i

p

8

<

:

X

v

j

2N

p

in

(vi

)

exp

n

�j,i

�i

p

o

+

X

f2F(p)

exp

n

�i

f

�i

p

o

� exp

n

⌘i

�i

p

o

9

=

;

Figure 2: Lower bound on the likelihood improvement.

4.1 Learning the parameters of the model. We next fo-cus on the problem of learning social influence strength, part-worth utilities and hurdles from past observations. The in-put of the learning problem is composed by: (i) the directedsocial graph G = (V,E), (ii) a log of past product prop-agations D, and (iii) for each product p the list of its fea-tures F(p). The log D is a relation (User, Product, T ime),where each tuple hi, p, ti indicates that the user v

i

adoptedthe product p at the time t. We assume that the projectionof D over the first column is contained in V , the projectionof D over the second column corresponds to the universe ofproducts P , and the third column is contained in [t0, T ]. Wealso denote the time at which user v

i

adopted the product pby t

p

(i). If vi

does not adopt p, then we set tp

(i) = T + 1.The output of the learning problem is the set of param-

eters of the F-TM model: (i) the influence strength bi,j

foreach arc (v

i

, vj

) 2 E, (ii) the part-worth utility ui

f

for eachuser v

i

2 V and each feature f 2 F , and (iii) the hurdle hi

for each user vi

2 V . Let ⇥ denote the overall set of param-eters of the model. Assuming that each propagation trace isindependent from the others, the likelihood of the data giventhe model parameters ⇥ can be expressed as

L(D;⇥) =

X

p2PlogL(D

p

;⇥) where:

L(Dp

;⇥) =

Y

i : vi

2V

Pr(p|i, tp

(i))�(i,p) (1� Pr(p|i, tp

(i)))1��(i,p)

and Dp

represents the propagation of p, i.e., the selection ofthe relation D where the product is p. Moreover �(i, p) = 1

if vi

adopted p and 0 otherwise.The optimization problem can be defined as follows:

Maximize⇥

logL(D;⇥)

subject to bi,j

� 0, 8 (vi

, vj

) 2 E

and hi

� 0 8 vi

2 V

For solving this learning problem we adopt the improvediterative scaling (IIS) approach[17, 4]. In practice we seekfor an improvement � of the parameters, such that we obtaina higher (log) likelihood:

L(D;⇥+�) � L(D;⇥)

The idea is to express the change in log likelihood B =

L(D;⇥ + �) � L(D;⇥) and optimize it with respect to �.

The � is the set of improvements for all parameters, i.e., a�i,j

for each (vi

, vj

) 2 E, a ⌘i

for each vi

2 V , and a �i

f

for each vi

2 V and f 2 F , such that ⇥+� actually meansbi,j

:= bi,j

+ �i,j

, hi

:= hi

+ ⌘i

, and ui

f

:= ui

f

+ �i

f

.Let N

in

(vi

) denote the set of nodes which may po-tentially influence the node v

i

, i.e: Nin

(vi

) = {vj

2V |(v

j

, vi

) 2 E}, while Nout

(vi

), symmetrically defined, isthe set of nodes which can be influenced by v

i

. For a givenproduct p and a timestamp t, let Np

in

(vi

) = Cp

(tp

(i)) \N

in

(vi

) denote the set of nodes which are exerting influ-ence on v

i

for product p at time t. Moreover, let �i

p

=

|Np

in

(vi

)|+ |F(p)|+ 1.By exploiting the inequality � log x � 1 � x and by

applying Jensen’s inequality, we can derive a lower boundon the improvement of the likelihood as reported in Figure 2,where C is the sum of all terms in the derivation which donot depend on the IIS parameters. Due to lack of space weomit the complete derivation which will be provided in anextended version of this paper.

Following the IIS approach we compute the set ofupdates of parameters � that maximizes our lower bound onthe improvement of the loglikelihood. This is summarizedin Algorithm 1, where we compute @B

@(·) = 0 for each IISparameter by applying Newton Raphson, in order to obtainan incremental update. Note that b

i,j

+ �i,j

or hi

+ ⌘i

mayresult negative. In this case, due to the monotonicity of theactivation function with respect to b

i,j

and hi

, the lowestlegal value that meets our non-negative constraints is zero.

5 Influence maximization with viral product designWe next focus on the MAXINF VPD problem defined overthe F-TM propagation model. The input to the problem isthe social graph and the parameters of F-TM for which wedefined a learning procedure in the previous section.

PROBLEM 1. (MAXINF VPD) Given a directed graphG = (V,E), and the parameters of the F-TM propagationmodel: the influence strength b

i,j

for each arc (vi

, vj

) 2 E,the part-worth utility ui

f

for each user vi

2 V and featuref 2 F , and the hurdle h

i

for each user vi

2 V .Let �(S,H) denote the expected spread of a set of nodes

S ✓ V and a set of features H ✓ F , according to F-TM.Given a budget k of nodes, the problem requires to find S,with |S| k, and H that maximize �(S,H).

B � C +

X

p2P

X

i:vi

2V

�(i, p)

8

<

:

X

v

j

2N

p

in

(vi

)

�j,i

+

X

f2F(p)

�i

f

� ⌘i

�⇣

1� i

t

p

(i)(p)⌘

9

=

;

�X

p2P

X

i:vi

2V

(1� �(i, p))⇣

1� i

t

p

(i)(p)⌘

�X

p2P

X

i:vi

2V

i

t

p

(i)(p) ·1

�i

p

8

<

:

X

v

j

2N

p

in

(vi

)

exp

n

�j,i

�i

p

o

+

X

f2F(p)

exp

n

�i

f

�i

p

o

� exp

n

⌘i

�i

p

o

9

=

;



i


i


p

(i). If vi




foreach arc (v

i

, vj


f

for eachuser v

i


for each user vi


L(D;⇥) =

X

p2PlogL(D

p

;⇥) where:

L(Dp

;⇥) =

Y

i : vi

2V

Pr(p|i, tp

(i))�(i,p) (1� Pr(p|i, tp

(i)))1��(i,p)

and Dp


if vi


Maximize⇥

logL(D;⇥)

subject to bi,j

� 0, 8 (vi

, vj

) 2 E

and hi

� 0 8 vi

2 V


L(D;⇥+�) � L(D;⇥)




for each (vi

, vj

) 2 E, a ⌘i

for each vi

2 V , and a �i

f

for each vi


:= bi,j

+ �i,j

, hi

:= hi

+ ⌘i

, and ui

f

:= ui

f

+ �i

f

.Let N

in

(vi


i

, i.e: Nin

(vi

) = {vj

2V |(v

j

, vi

) 2 E}, while Nout

(vi


i


in

(vi

) = Cp

(tp

(i)) \N

in

(vi


i


p

=

|Np

in

(vi





i,j

+ �i,j

or hi

+ ⌘i


i,j

and hi




i,j

for each arc (vi

, vj


f

for each user vi


i

for each user vi



tion model and a set of nodes S ✓ V , the expected numberof nodes “infected” in the viral cascade started with S, iscalled (expected) spread of S and denoted by �(S). The in-fluence maximization problem asks for a set S ✓ V , |S| = k,such that �(S) is maximum, where k is an input parameter.

The most studied propagation model is the so called Inde-pendent Cascade (IC) model. We are given a directed socialgraph G = (V,A) with arcs (u, v) 2 A labeled by influenceprobability p

u,v

2 (0, 1], representing the strength of the in-fluence of u over v. If (u, v) 62 A, we define p

u,v

= 0. Ata given time step, each node is either active (an adopterof product) or inactive. At time 0, a set S of seeds areactivated. Time unfolds deterministically in discrete steps.As time unfolds, more and more of neighbors of an inactivenode v become active, eventually making v become active,and v’s decision may in turn trigger further decisions bythe nodes to which v is connected. In particular, in the ICmodel, when a node u first becomes active, say at time t, ithas one chance at influencing each inactive neighbor v withprobability p

u,v

, independently of the history thus far. Ifthe attempt succeeds, v becomes active at time t+ 1.

Influence maximization is generally NP-hard [?], underIC and other propagation models. Kempe et al., however,show that the function �(S) is monotone1 and submodular2.When equipped with such properties, the simple greedy al-gorithm that at each iteration greedily extends the set ofseeds with the node providing the largest marginal gain,produces a solution with provable approximation guarantee(1� 1/e) [?].Though simple, the greedy algorithm is computationally

prohibitive, since the step of selecting the node provid-ing the largest marginal gain is #P-hard. In their paper,Kempe et al. run Monte Carlo simulations for su�cientlymany times to obtain an accurate estimate of the expectedspread.However, running many propagation simulations isextremely costly on very large real-world social networks.Therefore, following [?], considerable e↵ort has been devotedto develop methods for improving the e�ciency and scalabil-ity of influence maximization [?, ?, ?, ?]. Regardless of suchresearch e↵orts, scalability still remains an open challenge:on the problem instances that we consider in this paper, evena state-of-the-art algorithm as CELF++ [?], takes from fewdays to more than a week in order to extract a seed set of50 nodes. Most of this literature on e�cient algorithms forinfluence maximization assumes the weighted social graphgiven, and do not address how the link influence probabili-ties p

u,v

can be obtained. This problem instead is addressedin [?, ?, ?].Regardless of the fact that users authoritativeness, exper-

tise, trust and influence are evidently topic-dependent, onlyfew papers have looked at social influence from the topicsperspective. Tang et al. [?] study the problem of learn-ing user-to-user topic-wise influence strength. The input totheir problem is the social network and a prior topic distri-bution for each node, which is given as input and inferredseparately. Liu et al. [?] propose a probabilistic model forthe joint inference of the topic distribution and topic-wiseinfluence strength: here the input is an heterogenous so-cial network with nodes that are users and documents. Thegoal is to learn users’ interest (topic distribution) and user-to-user influence. Lin et al. [?] study the joint modelingof influence and topics, by adopting textual models. None1�(S) �(T ) whenever S ✓ T .2�(S [ {w})� �(S) � �(T [ {w})� �(T ) whenever S ✓ T .

Figure 1: Topic-aware influence parameters are learnt fromthe log of past propagations and the social network following[?]. These are the prerequisites to build the INFLEX indexthat we use to e�ciently answer TIM queries.

of these three papers define an influence propagation model,instead in a recent work Barbieri et al. [?] extend the classicIC model to be topic-aware: the resulting model is namedTopic-aware Independent Cascade (TIC) .Barbieri et al. also devise methods to learn, from a log of

past propagations, the model parameters, i.e., topic-awareinfluence strength for each link and topic-distribution foreach item. Their experiments show that (1) topic-aware in-fluence propagation models are more accurate in describingreal-world influence driven propagations than the state-of-the-art topic-blind models, and (2) by considering the char-acteristics of the item, a larger number of adoptions can beobtained in the influence maximization problem. The TICmodel is assumed at the basis of our work and it is intro-duced in details next.

1.2 Problem definitionIn this paper we study indexing schemes for answering

Topic-aware Influence Maximization (TIM) queries. We aregiven a directed social graph G = (V,A) and a space of Ztopics. We assume the TIC propagation model introducedin [?], whose parameters are learned from a log of past prop-agation traces. In particular for each arc (u, v) 2 A and foreach topic z 2 [1, Z] we have a probability pz

u,v

representingthe strength of influence that user u exerts over user v fortopic z. A high level depiction of our setting is provided inFigure ??. An item i is described by a distribution ~�

i

overthe topics: that is for each topic z 2 [1, Z] we are given �z

i

,with

PZ

z=1 �z

i

= 1. In the TIC model a propagation happenslike in the IC model: when a node u first becomes active onitem i, has one chance of influencing each inactive neighborv, independently of the history thus far. The tentative suc-ceeds with a probability that is the weighted average of thelink probability w.r.t. the topic distribution of the item i:

piu,v

=ZX

z=1

�z

i

pzu,v

. (1)

A TIM query Q(~�q

, k), takes as input an item description~�q

and an integer k and it requires to find the seed set S ✓ V ,|S| = k, such that the expected number of nodes adoptingitem q, denoted by �(S,~�

q

), is maximum:

Q(~�q

, k) = argmaxS✓V,|S|=k

�(S,~�q

). (2)

Propaga'on)Log)

Social)Network)

Product( f1( f2( f3( …( fF(p1) ✔) ✔)

p2) ✔) ✔)

…) ✔) ✔) ✔)

Product)9)Feature)Assignments)

Learn:'•  Influence'strength'bi,j'•  Hurdle'hi'for'each'user'•  ParthBworth'u.li.es'uif'

Op.miza.on'procedure:'Improved&IteraCve&Scaling''

'(details'in'the'paper)'

B � C +

X

p2P

X

i:vi

2V

�(i, p)

8

<

:

X

v

j

2N

p

in

(vi

)

�j,i

+

X

f2F(p)

�i

f

� ⌘i

�⇣

1� i

t

p

(i)(p)⌘

9

=

;

�X

p2P

X

i:vi

2V

(1� �(i, p))⇣

1� i

t

p

(i)(p)⌘

�X

p2P

X

i:vi

2V

i

t

p

(i)(p) ·1

�i

p

8

<

:

X

v

j

2N

p

in

(vi

)

exp

n

�j,i

�i

p

o

+

X

f2F(p)

exp

n

�i

f

�i

p

o

� exp

n

⌘i

�i

p

o

9

=

;



i


i


p

(i). If vi




foreach arc (v

i

, vj


f

for eachuser v

i


for each user vi


L(D;⇥) =

X

p2PlogL(D

p

;⇥) where:

L(Dp

;⇥) =

Y

i : vi

2V

Pr(p|i, tp

(i))�(i,p) (1� Pr(p|i, tp

(i)))1��(i,p)

and Dp


if vi


Maximize⇥

logL(D;⇥)

subject to bi,j

� 0, 8 (vi

, vj

) 2 E

and hi

� 0 8 vi

2 V


L(D;⇥+�) � L(D;⇥)




for each (vi

, vj

) 2 E, a ⌘i

for each vi

2 V , and a �i

f

for each vi


:= bi,j

+ �i,j

, hi

:= hi

+ ⌘i

, and ui

f

:= ui

f

+ �i

f

.Let N

in

(vi


i

, i.e: Nin

(vi

) = {vj

2V |(v

j

, vi

) 2 E}, while Nout

(vi


i


in

(vi

) = Cp

(tp

(i)) \N

in

(vi


i


p

=

|Np

in

(vi





i,j

+ �i,j

or hi

+ ⌘i


i,j

and hi




i,j

for each arc (vi

, vj


f

for each user vi


i

for each user vi



Influence'maximiza.on'with'viral'product'design''

Let'σ(S,'H)'denote'the'expected'spread'of'a'set'of'nodes'S'�'V'and'a'set'of'features'H'�'F,'according'to'FBTM.''

MAXINF&VPD:'Given'a'budget'k'of'nodes,'the'problem'requires'to'find'S,'with'|S|'≤'k,'and'H'that'maximize'σ(S,'H).'

MAXINF'VPD'is'NPBhard'for'FBTM.'

Given'the'FBTM'model,'for'any'fixed'choice'of'the'featureBset'H,'the'spread'func.on'σ(S,H)'is'monotone'and'submodular'in'S.'

In'general,'due'to'possible'nega.ve'partBworth'u.li.es,'the'spread'func.on'under'FBTM'is'not'even'monotone!!'

6 AlgorithmIn the MAXINF VPD problem we have to jointly select aset of users S ✓ V and a set of features H ✓ F , tomaximize the expected viral spread of a viral marketingcampaign started with nodes S, for a new product describedby H . Given the enormous size of the search space ofour problem, i.e., 2|F|�|V |

k

�, we need some property, or at

least a good heuristics. A promising direction has recentlybeen investigated by Singh et al. in [20], where the authorsextends the concept of submodularity to set functions withtwo arguments (dubbed bisubmodularity). In particular, theyshow that for a restricted class of those functions, where (i)the two considered sets are not disjoint and (ii) fixing one ofthe coordinates of the function yields a submodular functionin the free coordinate, that is, if the following property holds

f(A,B)+f(A0, B0) � f(A[A0, B[B0

)+f(A\A0, B\B0)

then the coordinate-wise maximization, in which each set isoptimized by considering the other fixed, is approximatelyoptimal. Unfortunately, the same procedure does not pro-vide, in general, any approximation guarantee if the two setsare disjoint (although authors prove a weaker result that re-quires some additional assumptions, i.e, the existence of asubmodular extension of f ).

A further obstacle in our case is that, due to the possibil-ity of obtaining negative part-worth utilities, the spread func-tion under F-TM is not even monotone. This is a commonchallenge in product design and SOC, for which heuristic ap-proaches, or search techniques based on simulated annealingor genetic algorithms have been adopted [1, 3, 21].

Although we cannot rely on the bisubmodularity results,we can nevertheless borrow, as an heuristic, the coordinate-wise maximization approach. Following this idea we pro-pose a procedure which alternates local greedy choices forthe update of the seed set and the feature set. The procedureis presented in Algorithm 2.

Our algorithm starts by detecting the pair of singletonshv, fi in the cartesian product V ⇥ F , which maximizes thespread (Line 2). Then, iteration by iteration, we alternatethe procedure of greedy seed selection and feature update insuch a way that, at the generic iteration (i), we generate animprovement of the objective function, by forcing one of thetwo following inequalities to hold:

�(S(i), H(i)) �(S(i+1), H(i)

)(6.5)

�(S(i), H(i)) �(S(i), H(i+1)

).(6.6)

When we keep the feature set H fixed, if we increasethe size of the seed set S then inequality in Equation (6.5)trivially holds thanks to the monotonicity in S. Actually, thestep in which we make the greedy selection of the next nodeto add to the seed set (Line 9), has the usual (1�1/e) qualityguarantees, thanks to submodularity.

Algorithm 2: MAXINF VPDInput : Social graph G = (V,E), budget K 2 N, and the

set of all parameters of F-TM: social influencebi,j

, part-worth utilities ui

f

, and hurdle hi

.Output: S ⇢ V , |S| = k, and a set of features H ✓ F .

1 S,H,Hold

;2 (v0

i

, f 0) argmax(v

i

,f)2V ⇥F �({vi

}, {f})3 S S [ {v0

i

}, H H [ {f 0}4 �

curr

�(S,H)

5 iterate true6 it 0

7 while (iterate) do8 if (it mod 2 = 0 ^ |S| < k) then9 [s,�

curr

] greedySelection(S,H)

10 S S [ {s}11 else12 H

old

H13 [H,�

curr

] updateFeatureSet(S,H)

14 if (|S| = k ^Hold

= H) then15 iterate false16 end17 S0 argmax

S

0✓V

|S0||S|�(S0, H)

18 if (�(S0, H) > �curr

) then19 S S0

20 �curr

�(S0, H)

21 end22 end23 it it+ 1

24 end

For the features-update step we do not have any prop-erty, but we can adopt several heuristic techniques for gen-erating “good” solutions, such that Equation (6.6) is alwayssatisfied. In this paper, we experiment with two alternativesfor the the updateFeatureSet(S,H) procedure (Line 13):

Local Update: in this case the update procedure providesthe best feature set (w.r.t. the objective function) as out-put, which can be obtained by performing only one addi-tion/removal of a feature to/from the current feature set. Al-though the procedure is expected to converge naturally toa limited number of features, as the algorithm will reach apoint in which no local change can improve the spread, weput a constraint ( 10) of the size of the selected feature setto make the interpretability of the feature set provided.

Genetic Algorithm Update: the update of the feature setis performed by employing a genetic algorithm. Our imple-mentation is built upon the popular JGAP (http://jgap.sourceforge.net/) framework, by using elitist selec-tion on a initial population of 30 feature sets and by allowing20 evolutions. Better results may be, in principle, obtainedby using a larger number of evolutions, but we found thiscombination of parameters as a good compromise between

CoordinateBwise'procedure'for'MAX'VPD'

6 AlgorithmIn the MAXINF VPD problem we have to jointly select aset of users S ✓ V and a set of features H ✓ F , tomaximize the expected viral spread of a viral marketingcampaign started with nodes S, for a new product describedby H . Given the enormous size of the search space ofour problem, i.e., 2|F|�|V |

k

�, we need some property, or at

least a good heuristics. A promising direction has recentlybeen investigated by Singh et al. in [20], where the authorsextends the concept of submodularity to set functions withtwo arguments (dubbed bisubmodularity). In particular, theyshow that for a restricted class of those functions, where (i)the two considered sets are not disjoint and (ii) fixing one ofthe coordinates of the function yields a submodular functionin the free coordinate, that is, if the following property holds

f(A,B)+f(A0, B0) � f(A[A0, B[B0

)+f(A\A0, B\B0)

then the coordinate-wise maximization, in which each set isoptimized by considering the other fixed, is approximatelyoptimal. Unfortunately, the same procedure does not pro-vide, in general, any approximation guarantee if the two setsare disjoint (although authors prove a weaker result that re-quires some additional assumptions, i.e, the existence of asubmodular extension of f ).

A further obstacle in our case is that, due to the possibil-ity of obtaining negative part-worth utilities, the spread func-tion under F-TM is not even monotone. This is a commonchallenge in product design and SOC, for which heuristic ap-proaches, or search techniques based on simulated annealingor genetic algorithms have been adopted [1, 3, 21].

Although we cannot rely on the bisubmodularity results,we can nevertheless borrow, as an heuristic, the coordinate-wise maximization approach. Following this idea we pro-pose a procedure which alternates local greedy choices forthe update of the seed set and the feature set. The procedureis presented in Algorithm 2.

Our algorithm starts by detecting the pair of singletonshv, fi in the cartesian product V ⇥ F , which maximizes thespread (Line 2). Then, iteration by iteration, we alternatethe procedure of greedy seed selection and feature update insuch a way that, at the generic iteration (i), we generate animprovement of the objective function, by forcing one of thetwo following inequalities to hold:

�(S(i), H(i)) �(S(i+1), H(i)

)(6.5)

�(S(i), H(i)) �(S(i), H(i+1)

).(6.6)

When we keep the feature set H fixed, if we increasethe size of the seed set S then inequality in Equation (6.5)trivially holds thanks to the monotonicity in S. Actually, thestep in which we make the greedy selection of the next nodeto add to the seed set (Line 9), has the usual (1�1/e) qualityguarantees, thanks to submodularity.

Algorithm 2: MAXINF VPDInput : Social graph G = (V,E), budget K 2 N, and the

set of all parameters of F-TM: social influencebi,j

, part-worth utilities ui

f

, and hurdle hi

.Output: S ⇢ V , |S| = k, and a set of features H ✓ F .

1 S,H,Hold

;2 (v0

i

, f 0) argmax(v

i

,f)2V ⇥F �({vi

}, {f})3 S S [ {v0

i

}, H H [ {f 0}4 �

curr

�(S,H)

5 iterate true6 it 0

7 while (iterate) do8 if (it mod 2 = 0 ^ |S| < k) then9 [s,�

curr

] greedySelection(S,H)

10 S S [ {s}11 else12 H

old

H13 [H,�

curr

] updateFeatureSet(S,H)

14 if (|S| = k ^Hold

= H) then15 iterate false16 end17 S0 argmax

S

0✓V

|S0||S|�(S0, H)

18 if (�(S0, H) > �curr

) then19 S S0

20 �curr

�(S0, H)

21 end22 end23 it it+ 1

24 end

For the features-update step we do not have any prop-erty, but we can adopt several heuristic techniques for gen-erating “good” solutions, such that Equation (6.6) is alwayssatisfied. In this paper, we experiment with two alternativesfor the the updateFeatureSet(S,H) procedure (Line 13):

Local Update: in this case the update procedure providesthe best feature set (w.r.t. the objective function) as out-put, which can be obtained by performing only one addi-tion/removal of a feature to/from the current feature set. Al-though the procedure is expected to converge naturally toa limited number of features, as the algorithm will reach apoint in which no local change can improve the spread, weput a constraint ( 10) of the size of the selected feature setto make the interpretability of the feature set provided.

Genetic Algorithm Update: the update of the feature setis performed by employing a genetic algorithm. Our imple-mentation is built upon the popular JGAP (http://jgap.sourceforge.net/) framework, by using elitist selec-tion on a initial population of 30 feature sets and by allowing20 evolutions. Better results may be, in principle, obtainedby using a larger number of evolutions, but we found thiscombination of parameters as a good compromise between

We'generate'an'improvement'of'the'objec.ve'func.on,'such'that'one'of'the'

two'following'inequali.es'holds:'

Greedy'algorithm'

Local'update' Gene.c'algorithm'or'

Greedy'algorithm'

Pick'as'next'seed'node'the'one'with'highest'marginal'gain.''

Check'if'a'beUer'selec.on'of'seed'nodes'can'be'determined'for'the'updated'set'of'features.'

Evalua.on'on'real'data.'Last.fm Flixster

nodes (|V |) 1,372 29,275links (|E|) 7,354 212,106

number of product adoptions (|D|) 1,208,640 5,423,310propagations episodes 322,932 2,768,654

avg number of products per user 880 185avg number of users per product 23 953

products (|P |) 51,495 5,685distinct features (|F|) 13,510 5,116

avg number of features per product 5 8avg number of products per feature 18 9

Table 1: Summary of the datasets.

computational time and effectiveness. In this case the fitnessfunction can be naturally instantiated by the expected spread.However, to penalize large feature sets, we impose a penalty(|H| � t)2 if the number of feature exceeds the “desired”length t of the feature set (10 in our experiments).

Once updated the feature set, we apply the standardMAXINF greedy algorithm with F-TM model, with budget|S(i)| and fixed feature set, to check if a better selection ofseeds exists (Line 17). The procedure ends when we havereached the seed budget and no further update of the featureset produces an improvement of the spread (Line 15).

7 Experimental analysisDatasets. For our purposes we need information-richdatasets containing (i) the social network among users, (ii)a database of past propagations of items in the social net-work, and (iii) a set of features for each item. Gatheringsuch dataset is a non-trivial task. Indeed, the majority ofMAXINF literature uses synthetically-generated data: usu-ally the social graph is a real one, but the influence informa-tion is synthetic. We argue that for the task of studying andanalysing the impact of part-worth utilities and peer influ-ence on the overall diffusion process it is fundamental to relyon real-world data, mainly because real-world data exhibitspatterns which may not be found on synthetic data. There-fore, we collected two real-world datasets: one from the do-main of social music consumption (lastfm.com) and onefrom social movie consumption (flixster.com).

Our LastFM dataset was created starting from the Het-Rec 2011 Workshop dataset4, and enriching it by crawling.Here the action log D contains information about the firsttime that a user listens to a song. If a user v

i

listens to a songand shortly after one of her peers v

j

does the same, we as-sume that the song propagated from v

i

to vj

. More formally,we assume that a generic product p propagates from v

i

to vj

iff (vi

, vj

) 2 E and tp

(i) < tp

(j). Hence, we say that a tu-ple hi, p, ti 2 D is a propagation episode if exists v

j

2 Cp

(t)such that v

i

2 Nout

(vj

).

4http://www.grouplens.org/node/462

LastFm

Iteration

Log−Likelihood

0 20 40 60 80 100−1e+07

−8e+06

−6e+06

−4e+06

Flixster

Iteration

Log−Likelihood

0 20 40 60 80 100

−4e+07

−2.5e+07

−1e+07

Figure 3: Model fitting (Algorithm 1): convergence rate.

Also our Flixster dataset was obtained by conjoiningcrawled information5 with movies’ tags coming from theHetRec 2011 Workshop dataset5. Here a movie propagatesfrom a user to one of her peers, when both post a ratingfor it. In both datasets the social graph is bidirectional andthe features are the tags associated to each product (song inLastFM and movie in Flixster).Analysis of social influence and part-worth utility. Herewe report the analysis of the parameters (peer influence, part-worth utility, and hurdle) learnt from real data, using Algo-rithm 1. To fit the F-TM model we run Algorithm 1 proce-dure for 100 iterations on both datasets: the convergence rateis reported in Figure 3.

Figure 4 shows that influence weights are distributed ac-cording to an exponential distribution, with a high density ofzero-valued influence relationships, while part worth utilitiesare distributed according to a normal distribution, centeredin the interval (0, 20) and (0, 50) on LastFm and Flixster,respectively. Moreover, we can observe that the Flixster

dataset exhibits a higher level of influence, while in LastFm

part-worth utilities play a more important role. Hurdles aredistributed similarly in the two datasets.Viral vs popular features. With the goal of showing thatdifferent features trigger viral cascades differently, we nextcompare the spread achievable under our propagation modelby “viral” and “popular” features.

We define the level of “virality” of each feature bytwo different measures. Let prop(p, i, j) be a predicatethat is true if in the data we observe a propagation ofproduct p from v

i

to vj

, false otherwise. We define thePropagation Coe�cient(PC ) of the feature f as the ratiobetween the number of propagations and the total number ofproduct adoptions which involve the considered feature:

PC (f) =|{hi, p, ti 2 D|f 2 F(p) ^ 9 v

j

2 V : prop(p, j, i)}||{hi, p, ti 2 D|f 2 F(p)}|

As second measure we consider the likelihood that anadoption of a product which exhibits the feature f willtrigger adoptions. The Propagation Likelihood(PL) of the

5http://www.cs.sfu.ca/

˜

sja25/personal/datasets/

Last.fm:''social'music'consump.on.'We'record'the'first'.me'that'a'user'listens'to'a'song.''

Both'the'social'networks'are'bidirec.onal.'Each'song/movie'is'associated'with'a'set'of'tags.'

Flixster:'social'playorm'for'ra.ng'and'discovering'movies.'We'record'the'ra.ng'history'for'each'user.'

( 0,5

)

( 5,1

0 )

( 10,

15 )

( 15,

20 )

( 20,

25 )

( 25,

30 )

( 30,

35 )

( 35,

40 )

( 40,

45 )

( 45,

50 )

( 50,

55 )

Influence Weights − LastFm

Den

sity

1e−0

40.

001

0.01

0.1

( 0,1

0 )

( 10,

20 )

( 20,

30 )

( 30,

40 )

( 40,

50 )

( 50,

60 )

( 60,

70 )

( 70,

80 )

( 80,

90 )

( 90,

100

)( 1

00,1

10 )

( 110

,120

)( 1

20,1

30 )

Influence Weights − Flixster

1e−0

50.

001

0.1

( −12

0,−1

00 )

( −10

0,−8

0 )

( −80

,−60

)( −

60,−

40 )

( −40

,−20

)( −

20,0

)( 0

,20

)( 2

0,40

)( 4

0,60

)( 6

0,80

)( 8

0,10

0 )

( 100

,120

)( 1

20,1

40 )

( 140

,160

)

Part Worth Utilities − LastFm

1e−0

50.

001

0.1

( −20

0,−1

50 )

( −15

0,−1

00 )

( −10

0,−5

0 )

( −50

,0 )

( 0,5

0 )

( 50,

100

)

( 100

,150

)

( 150

,200

)

( 200

,250

)

( 250

,300

)

Part Worth Utilities − Flixster

1e−0

61e−0

40.

01

( 10,

20 )

( 20,

30 )

( 30,

40 )

( 40,

50 )

( 50,

60 )

( 60,

70 )

( 70,

80 )

( 80,

90 )

( 90,

100

)

( 100

,110

)

Hurdles − LastFm

0.00

10.

010.

1

( 0,2

0 )

( 20,

40 )

( 40,

60 )

( 60,

80 )

( 80,

100

)

( 100

,120

)

( 120

,140

)

( 140

,160

)

( 160

,180

)

( 180

,200

)

Hurdles − Flixster

1e−0

40.

001

0.01

0.1

Figure 4: Distribution of influence weights, part-worth utili-ties, and hurdles on the two datasets.feature f can be defined as:

PL(f) =

Phi,p,ti2Df2F(p)

|{vj

2V |prop(p,i,j)}||N

out

(vi

)|

|{hi, p, ti 2 D|f 2 F(p)}|

Both scores have values 2 [0, 1] and the higher the value, thehigher the ability of the feature to trigger viral cascades.

To compare the expected spread achieved by productwhich exhibit different kind of features, we compare on bothdatasets the spread achieved by selecting the top-5 and thetop-10 features selected according to three different criteria:(i) popularity (i.e, frequency, the number of products inwhich the feature is present), (ii) propagation coefficient and(iii) propagation likelihood. Before reporting the MAXINFresults, it is worth mentioning that propagation coefficientand propagation likelihood are positively correlated on bothdatasets (⇢ equals to 0.68 on LastFm and 0.91 on Flixster),while the correlation between the two virality indices andpopularity of the feature is significantly lower (⇡ 0.03 onLastFm, and ⇡ 0.14 on Flixster). Therefore, being populardoes not make a feature viral.

The MAXINF results, given in Figure 5, highlight signif-icant differences in terms of the expected spread achieved by

●●●●

●●●●

●●●●

●●●●

●●●●●

●●●●

●●●●

●●●●●

●

●●●●

●●●●●

●●●●

●●

Top−5 features − LastFm

Seed set size

Expe

cted

spr

ead

1 6 12 18 24 30 36 42 48

5010

015

0

● FrequencyProp CoefficientProp Likelihood

●

●

●

●●●

●●

●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●

Top−10 features − LastFm

Seed set size

Expe

cted

spr

ead

1 6 12 18 24 30 36 42 48

5010

020

030

0


●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●

●●

●

●●●●●●

●●●●

●●●●

●●●●●

Top−5 features − Flixster

Seed set size

Expe

cted

spr

ead

1 6 12 18 24 30 36 42 48

500

1000

2000

● FrequencyProp Coefficient/Likelihood

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●

●●●●

●●●●●●●●

●

Top−10 features − Flixster

Seed set size

Expe

cted

spr

ead

1 6 12 18 24 30 36 42 48

500

1000

2000


Figure 5: MAXINF with “viral” and “popular” features.

considering popular features versus “viral” ones. The high-est spread in these cases can be achieved by selecting thefeatures according to their propagation likelihood value. Onthe Flixster dataset, where we have already highlighted analbeit linear correlation between the values of propagationcoefficient and likelihood, the top-5 feature selected accord-ing to the two different virality measures corresponds.Influence maximization with viral product design. Wenext evaluate the performance of our framework for MAX-INF VPD. To initialize the algorithm, we select the 30 fea-tures with highest part-worth utility for each user v

i

, andwe pick the pair hv

i

, fi which achieves the largest expectedspread. Moreover, to reduce the feature search space, weconsider only the top-2, 000 features according to their prop-agation likelihood (PL), since this value, as shown in Fig-ure 5, seems to be a good heuristic.

We report in Figure 6 the results of the maximizationalgorithm with a budget of 20 seed nodes on LastFm and 10

on Flixster. The version of the algorithm which employs agenetic algorithm optimization exhibits better performancesin terms of spread (330 vs 305 on LastFm and 3, 104 vs2, 811 on Flixster), at the cost of a higher computationaloverhead (around 3 times slower than the simpler version onLastFm and 5 times on Flixster).

The incremental selection of seed nodes is generallyrobust: on both datasets, the greedy algorithm is rarelyable to provide a better combination of seeds after updatingthe feature set (considering the version based on geneticoptimization, about 3 times on LastFm and 2 on Flixster).

While the local update version provides 10 featuresas output on both the datasets (which corresponds to themaximum budget), the solution provided by the genetic

Viral'vs'popular'features'We'compare'the'spread'achievable'under'FTM'by'“viral”'and'

“popular”'features.'

PropagaCon&coefficient'is'the'ra.o'between'the'number'of'propaga.ons'and'the'total'number'of'adop.ons'for'the'

considered'feature.'

PropagaCon&likelihood&is'the'ra.o'between'the'number'of'observed'and''poten.al'propaga.ons'that'involve'

products'with'the'considered'feature.'

Being'popular'does'not'make'a'feature'viral!!'

MAX'VPD:'Results'

●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

LastFm

Iteration

Expe

cted

spr

ead

0 4 8 13 19 25 31 37

050

100

200

300

● Local UpdateGenetic Algorithm

●

●

●●

●● ●

● ●●

●● ● ● ● ● ● ● ● ●

Flixster

Iteration

Expe

cted

spr

ead

0 2 4 6 8 11 14 17 2050

015

0025

00

● Local UpdateGenetic Algorithm

Figure 6: Influence maximization with viral product design.

version includes 18 features on LastFm, and 23 on Flixster.A very interesting observation is that some features belongto both feature sets provided by the different versions of thealgorithm (6/18 on LastFm and 4/23 on Flixster, startingfrom a set of 2,000 features).

Finally, the most important question of our experimen-tal analysis is whether incorporating product design in viralmarketing produces any benefit, w.r.t. keeping the two tasksseparated. By comparing Figure 6 with Figure 5 we can ob-serve clearly that the spread achieved by integrating the prod-uct design process into the influence maximization frame issignificantly greater than the one achieved by the heuristicselection of features presented. On LastFm, employing 20

seed nodes and 10 features, the MAXINF VPD procedureachieves a spread of 305, while the best result reported fora heuristic selection of the same number of features is 170.Similar gains are visible in Flixster.

8 ConclusionsThe wide diffusion of social media and e-commerce plat-forms provides a great source of information for understand-ing customers’ needs, interests and opinions, which can beexploited to design better marketing strategies and betterproducts. In this paper we explore a novel direction in thefield of computational marketing, by proposing a frameworkwhich allows the integration of the product design processinto viral marketing.

We incorporate part-worth utilities into a social-influence-driven propagation model and define a procedureto fit the model to real data, so that we can learn peer influ-ence strength and part-worth utilities jointly. Based on theseparameters learnt from real data, we introduce the problemof influence maximization with product design and devise aheuristic optimization algorithm, which is empirically shownto provide good solutions on two real-world datasets. In par-ticular, the expected market share gained by our procedure islarger than that achieved by standard influence maximizationwithout product design, where the product is simply createdputting together the most “viral” features.Acknowledgments. This work was supported by MULTI-SENSOR project, funded by the European Commission, un-der the contract number FP7-610411.

References

[1] P. V. S. Balakrishnan and V. S. Jacob. Genetic algorithms forproduct design. Manage. Sci., 42(8):1105–1117, 1996.

[2] N. Barbieri, F. Bonchi, and G. Manco. Topic-aware socialinfluence propagation models. In ICDM, 2012.

[3] A. Belloni, R. Freund, M. Selove, and D. Simester. Optimiz-ing product line designs: Efficient methods and comparisons.Manage. Sci., 54(9):1544–1552, 2008.

[4] A. Berger. The improved iterative scaling algorithm: A gentleintroduction. 1997.

[5] W. Chen, C. Wang, and Y. Wang. Scalable influence max-imization for prevalent viral marketing in large-scale socialnetworks. In KDD, 2010.

[6] M. Das, G. Das, and V. Hristidis. Leveraging collaborativetagging for web item design. In KDD, 2011.

[7] A. Goyal, F. Bonchi, and L. V. S. Lakshmanan. Learninginfluence probabilities in social networks. In WSDM, 2010.

[8] A. Goyal, F. Bonchi, and L. V. S. Lakshmanan. A data-basedapproach to social influence maximization. PVLDB, 5(1):73–84, 2011.

[9] P. Green and V. Rao. Conjoint measurament for quantifyingjudgmental data. Journ. of Marketing Research, 8(3):355–363, 1971.

[10] D. Gunnec and S. Raghavan. Integrating social net-work effects in the share-of-choice problem. Un-published working paper. http://terpconnect.umd.edu/

˜

raghavan/preprints/snesoc.pdf.[11] D. Kempe, J. M. Kleinberg, and E. Tardos. Maximizing the

spread of influence through a social network. In KDD, 2003.[12] R. Kohli and R. Krishnamurti. Optimal product design using

conjoint analysis: Computational complexity and algorithms.Europ. Journ. of Operational Research, 40(2):186–195, 1989.

[13] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. Van-Briesen, and N. S. Glance. Cost-effective outbreak detectionin networks. In KDD, 2007.

[14] M. Miah, G. Das, V. Hristidis, and H. Mannila. Determiningattributes to maximize visibility of objects. IEEE Trans. onKnowl. and Data Eng., 21(7):959–973, 2009.

[15] E. Mossel and S. Roch. Submodularity of influence insocial networks: From local to global. SIAM J. Comput.,39(6):2176–2188, 2010.

[16] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysisof approximations for maximizing submodular set functions -i. Mathematical Programming, 14(1):265–294, 1978.

[17] K. Nigam. Using maximum entropy for text classification. InIJCAI, 1999.

[18] K. Saito, R. Nakano, and M. Kimura. Prediction of informa-tion diffusion probabilities for independent cascade model. InKES, 2008.

[19] A. Sinan and W. Dylan. Creating social contagion throughviral product design: A randomized trial of peer influence innetworks. Manage. Sci.. Forthcoming.

[20] A. Singh, A. Guillory, and J. Bilmes. On bisubmodularmaximization. Journal of Machine Learning Research, 2012.

[21] S. Tsafarakis and N. Matsatsinis. Designing optimal prod-ucts: Algorithms and systems. In Marketing Intelligent Sys-tems Using Soft Computing, 2010.

The'gene.c'algorithm'op.miza.on'exhibits'beUer'performances'in'terms'of'spread,'at'the'cost'of'a'higher'computa.onal'overhead.'

The'joint'selec.on'of'features'and'seed'nodes'outperforms'the'heuris.c'selec.on'of'features'followed'by'standard'influence'maximiza.on.'

Conclusion'Online'ac.vi.es'generate'huge'amounts'of'informa.on'about'users.'

ComputaConal&MarkeCng:&Applica.on'of'computa.onal'science'techniques'to'modeling'and'

understanding'market'behavior.'

Our'work'aims'at'integra.ng'the'product'design'process'into'viral'marke.ng.'

The'final'goal'is'to'design'effec.ve'marke.ng'strategies'and'more'successful'products.'

The'expected'market'share'gained'by'MAX'VPD'is'larger'than'the'one'achieved'by'standard'influence'maximiza.on'without'product'

design.'

THANKS!

Social Media

Influence maximization with viral product design