30
Inferring the Impacts of Social Media on Crowdfunding Chun-Ta Lu Sihong Xie Xiangnan Kong Philip S. Yu Presenter: Chun-Ta Lu University of Illinois at C hicago

Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Multi-Transfer: Transfer Learning with Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang

Presented by Zhiyuan Chen

Inferring the Impacts of Social Media on Crowdfunding

Chun-Ta Lu Sihong Xie Xiangnan Kong Philip S. Yu

Presenter: Chun-Ta LuUniversity of Illinois at Chicago

Page 2: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Inferring the Impacts of Social Media on Crowdfunding

1. Introduction to “Crowdfunding”2. Impacts of Social Media3. Experiments4. Conclusion & Future Works

Multi-Transfer: Transfer Learning with Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang

Presented by Zhiyuan Chen

Page 3: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

3

Page 4: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

4

Page 5: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

5

Impacts of Social Media on Crowdfunding

Unique properties: 1. fundraising goal 2. project duration

100%

0%

0% 100%Project Duration

Fundraising Goal

Page 6: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

6

Page 7: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

6

Page 8: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

6

Page 9: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

7

Impacts of Social Media on Crowdfunding

Group A

Group B

100%

0%

0% 100%Project Duration

Fundraising Goal

Unique properties: 1. fundraising goal 2. project duration

Page 10: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

7

Impacts of Social Media on Crowdfunding

Group A

Group B

$

$$$

100%

0%

0% 100%Project Duration

Fundraising Goal

Unique properties: 1. fundraising goal 2. project duration

Page 11: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Dataset: Kickstarter and Twitter

a specified duration [15, 16]. If they are interested in theproject, investors can contribute funds to it. If the fundrais-ing goal is reached, the funds are awarded to the proposer,and equal benefits are committed to the investors. Mostof the crowdfunding sites (e.g. Kickstarter) further applyan all-or-nothing principle: if the fundraising goal is notreached by the deadline, all the funds are returned to theinvestors, while all commitments are cancelled. Such anall-or-nothing principle helps investors to avoid the risk ofpaying for nothing.Besides fundraising activities on crowdfunding websites,

typically there are also fundraising-related social activitieson social networks like Twitter. Specifically, novel and cre-ative crowdfunding projects are notably discussed on socialnetworks where users are willing to share their interests, pro-vide or respond to comments, promote and announce news ofthe projects. These activities on social networks can be seenas a complementary data source to crowdfunding websitesand provide more insights into the crowdfunding process.We focus on a crowdfunding website called Kickstarter.comand related tweets on Twitter, since they exhibit the basic el-ements we wish to study, namely, the co-occurring fundrais-ing and social activities on Twitter. Our observations andmethodologies are applicable to other crowdfunding hosts,so long as they allow promotions on social media.

3. DATASET CHARACTERISTICSWe start our presentation by describing the data acquisi-

tion and processing for our analysis. We then provide thestatistical characteristics of our dataset.

3.1 Data acquisition and processingCrowdfunding data: From Kickstarter, we obtained all

the projects launched after Nov. 2012 and completed beforeApr. 2013. For each project we recorded its proposer, du-ration, webpage URL, shorten URL2, fundraising goal andavailable pledging options. Within each project’s durationwe recorded its total pledged amount and total number ofbackers on a daily basis. We noticed that there were sometrial projects having no backers or extremely low fundrais-ing goals (less than $100), and we removed them from ourdataset since none of them reached their fundraising goals.Twitter data: We gathered all the tweets regarding

Kickstarter from Nov. 2012 to Apr. 2013. The tweets are ex-tracted by the Twitter stream API with the keywords “kick-starter” and “kck.st/”2, ensuring they contain timestamps,authors, mentioned users, tweet texts, and URLs for ouranalysis. For each tweet’s author, we queried Twitter APIfor the metadata about the author as well as the followersand followees of the author. Since a tweet is limited to 140characters, most URLs are shortened by a URL shorteningservice. We expanded them to get the original URLs.Data preprocessing: A crowdfunding project could be

shared on different channels and referred to by various ti-tles. An effective promotion of a project should mention itsofficial URL to encourage others to visit it. For each projectwe consider only the tweets mentioning its URL within itsproject duration. We call the author of such a tweet a pro-moter. Besides, Kickstarter provides a Twitter shortcut forusers to share the news of sponsoring a project in a fixed for-

2 Kickstarter provides a shorten URL in the format of http://kck.st/* to encourage users to share the project

Potential Target

Backer

Promoter

Patron

Figure 2: Transitions of roles in crowdfunding. Solidlines show the action of broadcasting and dash linesshow the action of investment.

# of projects 1,521# of successful projects 841# of backers 145,032# of tweets 62,473# of promoters 39,051# of patrons 16,666# of mentioned users 14,415

Table 1: Statistics of thedataset

Film & VideoMusicGamesPublishingDesignOthers

Figure 3: Percentageof project categories.

0 0.5 1 1.5 2 2.5 30

20

40

60

80

100

120

140

Pledged funds/Fundraising Goal

# of

pro

ject

s

Figure 4: Distribution of the ratio of total pledgedamount to the fundraising goal. An additional 120projects (7.8% of all projects) were funded morethan 3x.

mat (e.g. “I just backed project title @kickstarter”). Hence,we further filter tweets with the keyword “backed” and theproject title, and call the author of such a tweet a patron.In this paper, we distinguish the patron in Twitter from thebacker in Kickstarter by assuming that a patron is not only abacker who puts a pledge on a Kickstarter project but alsotweets the news of sponsoring it. Figure 2 illustrates thetransition of roles in crowdfunding. In Figure 2, a potentialtarget would become a backer of a project if he/she investsin the project, or become a promoter if he/she broadcaststhe information of the project. If he/she both invests andpromote the project, he/she becomes a patron.

We further filter out projects that were seldom discussed(less than 5 tweets) in the Twitter dataset, simply becausethere are not enough data to correctly evaluate the crowd-funding campaign through Twitter. Table 1 and Figure 3show the basic statistics of our dataset after filtering.

3.2 Statistical characteristics of datasetThe distribution of the ratio of total amount of pledged

funds to the fundraising goal can be found in Figure 4. Itillustrates the fact that most Kickstarter projects either failsignificantly or meet their goals by relatively small margins.Among all the failed projects, only twenty-five percent of

Film & VideoMusicGamesPublishingDesignOthers

8

Potential Target

Backer

Promoter

Patron

Mentions the project in tweetsInvests in the projectTransition of roles in crowdfunding

Page 12: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Dataset Analysis: success rate

9

0 0.5 1 1.5 2 2.5 30

20

40

60

80

100

120

140

Pledged funds/Fundraising Goal

# of

pro

ject

s All-or-nothing policy: Projects either fail significantly or meet their goals by relatively small margins.

succeedfail

Page 13: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Dataset Analysis: success rate

9

0 0.5 1 1.5 2 2.5 30

20

40

60

80

100

120

140

Pledged funds/Fundraising Goal

# of

pro

ject

s All-or-nothing policy: Projects either fail significantly or meet their goals by relatively small margins.

succeedfail

0 1 2 3 4 50

0.5

1

1.5

2

2.5

3

pled

ged

fund

s/fu

ndra

isin

g go

al

# of backers (log base 10)

44% of projects failed to reach goals

succeed

fail

Page 14: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Dynamics of Crowdfunding

0 25 50 75 1000

10

20

30

Elapsed time of total project duration (%)

aver

age

num

ber p

er p

roje

ct (c

ount

)

Pled

ged

Fund

s (%

)

0

5

10

15backerratio

10

(1) Three phases of fundraising activities

Kickstarter

Page 15: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Dynamics of Crowdfunding

0 25 50 75 1000

10

20

30

Elapsed time of total project duration (%)

aver

age

num

ber p

er p

roje

ct (c

ount

)

Pled

ged

Fund

s (%

)

0

5

10

15backerratio

10

(1) Three phases of fundraising activities

(I) starting bursty phase50% of funds from the first 25% of project duration.

Kickstarter

Page 16: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Dynamics of Crowdfunding

0 25 50 75 1000

10

20

30

Elapsed time of total project duration (%)

aver

age

num

ber p

er p

roje

ct (c

ount

)

Pled

ged

Fund

s (%

)

0

5

10

15backerratio

10

(1) Three phases of fundraising activities

(I) starting bursty phase

(II) stationary phasefew activities happen in the middle 65% of project duration.

50% of funds from the first 25% of project duration.

Kickstarter

Page 17: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Dynamics of Crowdfunding

0 25 50 75 1000

10

20

30

Elapsed time of total project duration (%)

aver

age

num

ber p

er p

roje

ct (c

ount

)

Pled

ged

Fund

s (%

)

0

5

10

15backerratio

10

(1) Three phases of fundraising activities

(I) starting bursty phase

(II) stationary phasefew activities happen in the middle 65% of project duration.

(III) final bursty phasethe success rate of 60% of projects depend on the final 10% of project duration.

50% of funds from the first 25% of project duration.

Kickstarter

Page 18: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

0 25 50 75 1000

10

20

30

Elapsed time of total project duration (%)

aver

age

num

ber p

er p

roje

ct (c

ount

)

Pled

ged

Fund

s (%

)

0

5

10

15backerratio

0 25 50 75 1000

3

6

Elapsed time of total project duration (%)

aver

age

num

ber p

er p

roje

ct (c

ount

)

Patro

ns/P

rom

oter

s (%

)

0

50

100promoterpatronratio

11

Dynamics of Crowdfunding

(1) Three phases of fundraising activities

Kickstarter Twitter

Page 19: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

0 25 50 75 1000

10

20

30

Elapsed time of total project duration (%)

aver

age

num

ber p

er p

roje

ct (c

ount

)

Pled

ged

Fund

s (%

)

0

5

10

15backerratio

0 25 50 75 1000

3

6

Elapsed time of total project duration (%)

aver

age

num

ber p

er p

roje

ct (c

ount

)

Patro

ns/P

rom

oter

s (%

)

0

50

100promoterpatronratio

11

social activities shift from promoting to investing

Dynamics of Crowdfunding

(1) Three phases of fundraising activities

Kickstarter Twitter

Page 20: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Tasks (Early Prediction)

12

starting bursty phase:50% of funds from the first 25% of project duration.

(1) Predict number of backers (within 25% of project duration)

(2) Predict whether a project will succeed or fail (within 25% of project duration)

Prediction Tasks:

Page 21: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Features of early crowdfunding activities

13

Project features (SA), 7 features total: project duration, elapsed days since the launch, fundraising goal, median amount of pledge options, number of backers, ratio of current raised funds to fundraising goal, and average amount of pledge per backer.

Social activity features (SB), 6 features total: number of tweets, number of promoters, number of patrons, number of unique mentioned users, fraction of promoters from external stimulations, and fraction of patrons from external stimulations.

Social structure features (SC), 7 features total: average number of followers of promoters, median number of followers of promoters, number of edges, diameter (largest shortest distance), number of connected components, number of triads, and global clustering coefficient.

Dynamics of Crowdfunding

Page 22: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

(2) Correlations between features extracted in early stage and fundraising results

Correlation of the total number of backers at target deadline and the features extracted at specified time.

number of backers is strongly correlated to promotional activity

14

project duration (%)Top 5 Features 5 10 25

# of backers 0.82 0.89 0.92# of patrons 0.79 0.81 0.85# of promoters 0.77 0.78 0.79# of tweets 0.73 0.70 0.65# of connected components 0.70 0.69 0.68

Table 2: Correlation of the total number of back-ers at target deadline and the features extracted atspecified time.

project duration (%)Top 5 Features 5 10 25

raised funds/goal 0.77 0.89 0.95# of triads 0.47 0.46 0.45# of edges 0.43 0.43 0.38# of patrons 0.26 0.26 0.24# of backers 0.18 0.29 0.30

Table 3: Correlation of the ratio of raised funds tofundraising goal at target deadline and the featuresextracted at specified time.

of 5% means that the correlation between the accumulatednumber of promoters by the end of the 5% period of thefundraising duration to the number of backers at the targetdeadline is 0.77. This correlation increases slightly to 0.78and 0.79 at the end of the 10% and 25% periods, respectively.

For each table, we only report the top 5 features thathave the highest correlation at each time frame. As shownin both tables, the feature that has the best correlation is thefundraising result itself. This indicates the early status ofthe fundraising result has good predictive power on its finalvalue, and we should consider it as baseline for each task.Interestingly, the other top features for both objectives areextracted from social media, instead of the properties of aproject on its own.

Table 2 shows that the number of backers is strongly cor-related to the volume of promotional activities, i.e. the num-ber of patrons, the number of promoters, and the numberof tweets. On the other hand, Table 3 shows that the ra-tio of fundraising goal achievement is more correlated to thestructure of campaign graphs, i.e. the number of triads andthe number of edges. These observations are due to the fun-damentally di↵erent properties of the volume and networkstructures. The number of backers is measured as the quan-tity of support and is primarily a↵ected by the volume ofpromotion, whereas its success is determined by the qualityof support, i.e. the amount of money. A popular projectmay receive lots of very small contributions, e.g. $1 dol-lar, from general public but still not be able to reach thefundraising goal. However, a successful project would tar-get specific groups that have interests in it and acquire moremoney from them. The number of edges and the number oftriads both suggest the strength of information flows withingroups. A large number of edges and/or triads show a signof the intensive interests among groups, and thus they aremore correlated to the ratio of fundraising goal achievement.

Concurrent processes of social promotions and ex-ternal stimulations.

Our above observations show that there are strong corre-lations between fundraising results and social features. We

1 1.5 2 2.5 3 3.50

0.5

1

1.5

2

# of backers (log base 10)

# o

f e

dg

es

(lo

g b

ase

10

)

(a) Number of edges

1 1.5 2 2.5 3 3.50

1

2

3

4

# of backers (log base 10)

dia

mete

r

(b) Diameter

1 1.5 2 2.5 3 3.50

0.5

1

1.5

# of backers (log base 10)

# o

f co

nnect

ed c

om

ponents

(log b

ase

10)

(c) Number of connectedcomponents

1 1.5 2 2.5 3 3.555

60

65

70

75

# of backers (log base 10)

ext

ern

al i

nflu

en

ced

no

de

s (%

)

(d) fraction of nodes fromexternal stimulations

Figure 10: Each figure shows, given the promotionalactivity in the starting bursty phase, the value of aparticular property as a function of the number ofbackers at target deadline. Top row and bottomrow display the properties from the aspect of socialpromotions and external stimulations, respectively.

claim they are primarily due to the fact that if a projectis more persuasive to intrigue authors in social networks todiscuss it (either by social promotions or external stimula-tions), it is more attractive to investors. From the aspect ofsocial promotions, we expect such a project to have a largercampaign graph. From the aspect of external stimulations,we expect there exists more groups in the campaign, eachof which is represented by a weakly connected componentas each component can be viewed as started by an externalstimulus.We consider the relationship between the number of back-

ers and the social structure of its campaign in the startingbursty phase as an example to illustrate our claim. Fromthe aspect of social promotions, as shown in Figure 10(a)and Figure 10(b), we notice that the number of backers in-creases with the number of edges and/or the diameter. Bothfeatures indicate the expansion of crowdfunding campaignswhich is activated from the influences of social promotions.From the aspect of external stimulations, as can be seen inFigure 10(c), we observe the similar rise in the number ofweakly connected components, which indicates the increasein the number of groups.These results from both aspects suggest that social pro-

motions and external stimulations concurrently activate theprocesses of crowdfunding campaigns. One may ask whichone is more important? In Figure 10(d), we find that over60% of nodes are from external stimulations. This reflectsthe fact that most authors access a project from externalsources outside social media and motivated by personal in-terests to promote it. However, we also notice if there arevery few social promotions, which signals the unenthusias-tic response in social media, the number of backers would be

Dynamics of Crowdfunding

Page 23: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Correlation of raised funds/goal (%) at target deadline and the features extracted at specified time.

success rate is more related to the structure of promotional campaign

15

project duration (%)Top 5 Features 5 10 25

# of backers 0.82 0.89 0.92# of patrons 0.79 0.81 0.85# of promoters 0.77 0.78 0.79# of tweets 0.73 0.70 0.65# of connected components 0.70 0.69 0.68

Table 2: Correlation of the total number of back-ers at target deadline and the features extracted atspecified time.

project duration (%)Top 5 Features 5 10 25

raised funds/goal 0.77 0.89 0.95# of triads 0.47 0.46 0.45# of edges 0.43 0.43 0.38# of patrons 0.26 0.26 0.24# of backers 0.18 0.29 0.30

Table 3: Correlation of the ratio of raised funds tofundraising goal at target deadline and the featuresextracted at specified time.

of 5% means that the correlation between the accumulatednumber of promoters by the end of the 5% period of thefundraising duration to the number of backers at the targetdeadline is 0.77. This correlation increases slightly to 0.78and 0.79 at the end of the 10% and 25% periods, respectively.

For each table, we only report the top 5 features thathave the highest correlation at each time frame. As shownin both tables, the feature that has the best correlation is thefundraising result itself. This indicates the early status ofthe fundraising result has good predictive power on its finalvalue, and we should consider it as baseline for each task.Interestingly, the other top features for both objectives areextracted from social media, instead of the properties of aproject on its own.

Table 2 shows that the number of backers is strongly cor-related to the volume of promotional activities, i.e. the num-ber of patrons, the number of promoters, and the numberof tweets. On the other hand, Table 3 shows that the ra-tio of fundraising goal achievement is more correlated to thestructure of campaign graphs, i.e. the number of triads andthe number of edges. These observations are due to the fun-damentally di↵erent properties of the volume and networkstructures. The number of backers is measured as the quan-tity of support and is primarily a↵ected by the volume ofpromotion, whereas its success is determined by the qualityof support, i.e. the amount of money. A popular projectmay receive lots of very small contributions, e.g. $1 dol-lar, from general public but still not be able to reach thefundraising goal. However, a successful project would tar-get specific groups that have interests in it and acquire moremoney from them. The number of edges and the number oftriads both suggest the strength of information flows withingroups. A large number of edges and/or triads show a signof the intensive interests among groups, and thus they aremore correlated to the ratio of fundraising goal achievement.

Concurrent processes of social promotions and ex-ternal stimulations.

Our above observations show that there are strong corre-lations between fundraising results and social features. We

1 1.5 2 2.5 3 3.50

0.5

1

1.5

2

# of backers (log base 10)

# o

f e

dg

es

(lo

g b

ase

10

)

(a) Number of edges

1 1.5 2 2.5 3 3.50

1

2

3

4

# of backers (log base 10)

dia

mete

r

(b) Diameter

1 1.5 2 2.5 3 3.50

0.5

1

1.5

# of backers (log base 10)

# o

f co

nn

ect

ed

co

mp

on

en

ts(lo

g b

ase

10

)

(c) Number of connectedcomponents

1 1.5 2 2.5 3 3.555

60

65

70

75

# of backers (log base 10)

ext

ern

al i

nflu

ence

d n

odes

(%)

(d) fraction of nodes fromexternal stimulations

Figure 10: Each figure shows, given the promotionalactivity in the starting bursty phase, the value of aparticular property as a function of the number ofbackers at target deadline. Top row and bottomrow display the properties from the aspect of socialpromotions and external stimulations, respectively.

claim they are primarily due to the fact that if a projectis more persuasive to intrigue authors in social networks todiscuss it (either by social promotions or external stimula-tions), it is more attractive to investors. From the aspect ofsocial promotions, we expect such a project to have a largercampaign graph. From the aspect of external stimulations,we expect there exists more groups in the campaign, eachof which is represented by a weakly connected componentas each component can be viewed as started by an externalstimulus.We consider the relationship between the number of back-

ers and the social structure of its campaign in the startingbursty phase as an example to illustrate our claim. Fromthe aspect of social promotions, as shown in Figure 10(a)and Figure 10(b), we notice that the number of backers in-creases with the number of edges and/or the diameter. Bothfeatures indicate the expansion of crowdfunding campaignswhich is activated from the influences of social promotions.From the aspect of external stimulations, as can be seen inFigure 10(c), we observe the similar rise in the number ofweakly connected components, which indicates the increasein the number of groups.These results from both aspects suggest that social pro-

motions and external stimulations concurrently activate theprocesses of crowdfunding campaigns. One may ask whichone is more important? In Figure 10(d), we find that over60% of nodes are from external stimulations. This reflectsthe fact that most authors access a project from externalsources outside social media and motivated by personal in-terests to promote it. However, we also notice if there arevery few social promotions, which signals the unenthusias-tic response in social media, the number of backers would be

Dynamics of Crowdfunding(2) Correlations between features extracted in early stage

and fundraising results

Page 24: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Experiments (Early Prediction)

0 5 10 15 20 250.1

0.15

0.2

0.25

0.3

0.35

0.4

Elapsed time of total project duration (%)

mR

SE

S7BaseLine

0.5 1 1.5 2 2.5 3 3.50.25

0.3

0.35

0.4

0.45

0.5

0.55

# of backers (log base 10)

mR

SE

S7BaseLine

16

very few social promotions, which signals the unenthusias-tic response in social media, the number of backers would bevery low. Hence, the e↵ective way to promote a project is tostimulate from social media as well as stimulate from otherchannels. From the features extracted from social media,we could identify the bottlenecks of current campaign anddevise a better campaign strategy.

6. EXPERIMENTSThe previous section presented a set of properties that

governing the process of crowdfunding. Now we show thatthis understanding is directly applicable to our two predic-tion tasks.

The practical applications of both tasks would be to pre-dict project results as early as possible. Accordingly, we per-form the following tasks using only the information availablein the starting bursty phase (within 25% of project durationafter the project is posted in crowdfunding sites). We con-sider the features presented in the previous sections for ourprediction tasks. Note that besides focusing on predictionaccuracy, we also aim at obtaining explanatory insights intowhat features are the most relevant and useful as signals forcrowdfunding processes.

6.1 Predicting the number of backersRecall from Section 4 that our first task is to predict the

number of backers of a project at target deadline. As sug-gested in [?, ?], the Relative Squared Error (RSE) is morerelevant and meaningful than absolute quadratic error foronline content popularity prediction. We here adopt RSEto evaluate the performance of our first task. Let N(p) bethe total number of backers project p has received at targetdeadline, and N(p|t) be the total number of backers pre-dicted for project p based on data from the first t days. TheRSE takes the form of

RSE =

N(p|t)�N(p)

N(p)

�2

=

N(p|t)N(p)

� 1

�2

(1)

For the collection of projects, denoted by C, the mean Rel-ative Squared Error (mRSE) is defined as the arithmeticmean of the RSE values for all projects in C, that is:

mRSE =1|C| ·

X

p2C

✓t ·

X(p, t)N(p)

� 1

�2

(2)

The correlation between crowdfunding activity and thefuture number of backers observed in previous section sug-gests that the number of backers can be estimated as a linearfunction of features. Thus, we apply a multivariate linearmodel [?] to predict the number of backers. We denote thefeature vector of the project p at time t as X(p, t). Thefuture number of backers of the project can be estimated as

N(p|t) = ✓t ·X(p, t) (3)

where ✓t is the vector of model parameters and depends onlyon t.

Given a training set C, t, we can estimate the optimalvalues for the elements of ✓ti as the ones that minimize themRSE on C, i.e.:

argmin✓t

1|C|

X

p2C

✓t ·

X(p, t)N(p)

� 1

�2

(4)

0 5 10 15 20 250.1

0.15

0.2

0.25

0.3

0.35

0.4

Elapsed time of total project duration (%)

mR

SE

S7BaseLine

Figure 11: The performance of the prediction of thenumber of backers.

0.5 1 1.5 2 2.5 3 3.50.25

0.3

0.35

0.4

0.45

0.5

0.55

# of backers (log base 10)

mR

SE

S7BaseLine

Figure 12: The mRSE shown as a function of thenumber of backers at target deadline.

which is an ordinary least squares (OLS) problem that canbe solved via a singular value decomposition [?, ?].We conduct feature selection and find a core set of 7 fea-

tures that we use in this task: number of backers, numberof tweets, number of promoters, number of patrons, fractionof promoters from external influence, number of edges, andnumber of weakly connected components. Note that exceptthe number of backers, all the other six features are ex-tracted from social media. The first four in the six featuresare from social activity features, while the last two are fromsocial structure features. We call this set S7 and compare itagainst the baseline that only uses the number of backers.We perform 5-fold cross validation and report the mRSE.

Figure 11 gives the results of the prediction task. We canobserve that the features extracted from social media (S7)inspired by our previous analyses improve the performanceof prediction. For instance, when using only the data avail-able up to 5% of project duration, the mRSE produced bythe S7 and by the baseline are 0.337 and 0.357, respectively,leading to an error reduction of 5.5%. If extending the timewindow to 25% of project duration - after which half of back-ers have arrived (see Section 5.1) - both mRSE values can besignificantly reduced to about 0.16. The performance gainsfrom using social media features are gradually reduced. Thisis reasonable, as the scale of the backers reveals more in-formation of its final count, predictions just become easier.However, it is less meaningful to do such predictions at thisstage, as it might be too late for project owners to adjusttheir strategies.To further understand the performance of prediction for

di↵erent popularities, in Figure 12, we analyze the mRSEvalues produced by comparing S7 with baseline at 5% of

very few social promotions, which signals the unenthusias-tic response in social media, the number of backers would bevery low. Hence, the e↵ective way to promote a project is tostimulate from social media as well as stimulate from otherchannels. From the features extracted from social media,we could identify the bottlenecks of current campaign anddevise a better campaign strategy.

6. EXPERIMENTSThe previous section presented a set of properties that

governing the process of crowdfunding. Now we show thatthis understanding is directly applicable to our two predic-tion tasks.

The practical applications of both tasks would be to pre-dict project results as early as possible. Accordingly, we per-form the following tasks using only the information availablein the starting bursty phase (within 25% of project durationafter the project is posted in crowdfunding sites). We con-sider the features presented in the previous sections for ourprediction tasks. Note that besides focusing on predictionaccuracy, we also aim at obtaining explanatory insights intowhat features are the most relevant and useful as signals forcrowdfunding processes.

6.1 Predicting the number of backersRecall from Section 4 that our first task is to predict the

number of backers of a project at target deadline. As sug-gested in [?, ?], the Relative Squared Error (RSE) is morerelevant and meaningful than absolute quadratic error foronline content popularity prediction. We here adopt RSEto evaluate the performance of our first task. Let N(p) bethe total number of backers project p has received at targetdeadline, and N(p|t) be the total number of backers pre-dicted for project p based on data from the first t days. TheRSE takes the form of

RSE =

N(p|t)�N(p)

N(p)

�2

=

N(p|t)N(p)

� 1

�2

(1)

For the collection of projects, denoted by C, the mean Rel-ative Squared Error (mRSE) is defined as the arithmeticmean of the RSE values for all projects in C, that is:

mRSE =1|C| ·

X

p2C

✓t ·

X(p, t)N(p)

� 1

�2

(2)

The correlation between crowdfunding activity and thefuture number of backers observed in previous section sug-gests that the number of backers can be estimated as a linearfunction of features. Thus, we apply a multivariate linearmodel [?] to predict the number of backers. We denote thefeature vector of the project p at time t as X(p, t). Thefuture number of backers of the project can be estimated as

N(p|t) = ✓t ·X(p, t) (3)

where ✓t is the vector of model parameters and depends onlyon t.

Given a training set C, t, we can estimate the optimalvalues for the elements of ✓ti as the ones that minimize themRSE on C, i.e.:

argmin✓t

1|C|

X

p2C

✓t ·

X(p, t)N(p)

� 1

�2

(4)

0 5 10 15 20 250.1

0.15

0.2

0.25

0.3

0.35

0.4

Elapsed time of total project duration (%)

mR

SE

S7BaseLine

Figure 11: The performance of the prediction of thenumber of backers.

0.5 1 1.5 2 2.5 3 3.50.25

0.3

0.35

0.4

0.45

0.5

0.55

# of backers (log base 10)

mR

SE

S7BaseLine

Figure 12: The mRSE shown as a function of thenumber of backers at target deadline.

which is an ordinary least squares (OLS) problem that canbe solved via a singular value decomposition [?, ?].We conduct feature selection and find a core set of 7 fea-

tures that we use in this task: number of backers, numberof tweets, number of promoters, number of patrons, fractionof promoters from external influence, number of edges, andnumber of weakly connected components. Note that exceptthe number of backers, all the other six features are ex-tracted from social media. The first four in the six featuresare from social activity features, while the last two are fromsocial structure features. We call this set S7 and compare itagainst the baseline that only uses the number of backers.We perform 5-fold cross validation and report the mRSE.

Figure 11 gives the results of the prediction task. We canobserve that the features extracted from social media (S7)inspired by our previous analyses improve the performanceof prediction. For instance, when using only the data avail-able up to 5% of project duration, the mRSE produced bythe S7 and by the baseline are 0.337 and 0.357, respectively,leading to an error reduction of 5.5%. If extending the timewindow to 25% of project duration - after which half of back-ers have arrived (see Section 5.1) - both mRSE values can besignificantly reduced to about 0.16. The performance gainsfrom using social media features are gradually reduced. Thisis reasonable, as the scale of the backers reveals more in-formation of its final count, predictions just become easier.However, it is less meaningful to do such predictions at thisstage, as it might be too late for project owners to adjusttheir strategies.To further understand the performance of prediction for

di↵erent popularities, in Figure 12, we analyze the mRSEvalues produced by comparing S7 with baseline at 5% of

: feature vector

(1) Predict number of backers (within 25% of project duration)

Page 25: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Experiments (Early Prediction)

0 5 10 15 20 250.1

0.15

0.2

0.25

0.3

0.35

0.4

Elapsed time of total project duration (%)

mR

SE

S7BaseLine

0.5 1 1.5 2 2.5 3 3.50.25

0.3

0.35

0.4

0.45

0.5

0.55

# of backers (log base 10)

mR

SE

S7BaseLine

16

very few social promotions, which signals the unenthusias-tic response in social media, the number of backers would bevery low. Hence, the e↵ective way to promote a project is tostimulate from social media as well as stimulate from otherchannels. From the features extracted from social media,we could identify the bottlenecks of current campaign anddevise a better campaign strategy.

6. EXPERIMENTSThe previous section presented a set of properties that

governing the process of crowdfunding. Now we show thatthis understanding is directly applicable to our two predic-tion tasks.

The practical applications of both tasks would be to pre-dict project results as early as possible. Accordingly, we per-form the following tasks using only the information availablein the starting bursty phase (within 25% of project durationafter the project is posted in crowdfunding sites). We con-sider the features presented in the previous sections for ourprediction tasks. Note that besides focusing on predictionaccuracy, we also aim at obtaining explanatory insights intowhat features are the most relevant and useful as signals forcrowdfunding processes.

6.1 Predicting the number of backersRecall from Section 4 that our first task is to predict the

number of backers of a project at target deadline. As sug-gested in [?, ?], the Relative Squared Error (RSE) is morerelevant and meaningful than absolute quadratic error foronline content popularity prediction. We here adopt RSEto evaluate the performance of our first task. Let N(p) bethe total number of backers project p has received at targetdeadline, and N(p|t) be the total number of backers pre-dicted for project p based on data from the first t days. TheRSE takes the form of

RSE =

N(p|t)�N(p)

N(p)

�2

=

N(p|t)N(p)

� 1

�2

(1)

For the collection of projects, denoted by C, the mean Rel-ative Squared Error (mRSE) is defined as the arithmeticmean of the RSE values for all projects in C, that is:

mRSE =1|C| ·

X

p2C

✓t ·

X(p, t)N(p)

� 1

�2

(2)

The correlation between crowdfunding activity and thefuture number of backers observed in previous section sug-gests that the number of backers can be estimated as a linearfunction of features. Thus, we apply a multivariate linearmodel [?] to predict the number of backers. We denote thefeature vector of the project p at time t as X(p, t). Thefuture number of backers of the project can be estimated as

N(p|t) = ✓t ·X(p, t) (3)

where ✓t is the vector of model parameters and depends onlyon t.

Given a training set C, t, we can estimate the optimalvalues for the elements of ✓ti as the ones that minimize themRSE on C, i.e.:

argmin✓t

1|C|

X

p2C

✓t ·

X(p, t)N(p)

� 1

�2

(4)

0 5 10 15 20 250.1

0.15

0.2

0.25

0.3

0.35

0.4

Elapsed time of total project duration (%)

mR

SE

S7BaseLine

Figure 11: The performance of the prediction of thenumber of backers.

0.5 1 1.5 2 2.5 3 3.50.25

0.3

0.35

0.4

0.45

0.5

0.55

# of backers (log base 10)

mR

SE

S7BaseLine

Figure 12: The mRSE shown as a function of thenumber of backers at target deadline.

which is an ordinary least squares (OLS) problem that canbe solved via a singular value decomposition [?, ?].We conduct feature selection and find a core set of 7 fea-

tures that we use in this task: number of backers, numberof tweets, number of promoters, number of patrons, fractionof promoters from external influence, number of edges, andnumber of weakly connected components. Note that exceptthe number of backers, all the other six features are ex-tracted from social media. The first four in the six featuresare from social activity features, while the last two are fromsocial structure features. We call this set S7 and compare itagainst the baseline that only uses the number of backers.We perform 5-fold cross validation and report the mRSE.

Figure 11 gives the results of the prediction task. We canobserve that the features extracted from social media (S7)inspired by our previous analyses improve the performanceof prediction. For instance, when using only the data avail-able up to 5% of project duration, the mRSE produced bythe S7 and by the baseline are 0.337 and 0.357, respectively,leading to an error reduction of 5.5%. If extending the timewindow to 25% of project duration - after which half of back-ers have arrived (see Section 5.1) - both mRSE values can besignificantly reduced to about 0.16. The performance gainsfrom using social media features are gradually reduced. Thisis reasonable, as the scale of the backers reveals more in-formation of its final count, predictions just become easier.However, it is less meaningful to do such predictions at thisstage, as it might be too late for project owners to adjusttheir strategies.To further understand the performance of prediction for

di↵erent popularities, in Figure 12, we analyze the mRSEvalues produced by comparing S7 with baseline at 5% of

very few social promotions, which signals the unenthusias-tic response in social media, the number of backers would bevery low. Hence, the e↵ective way to promote a project is tostimulate from social media as well as stimulate from otherchannels. From the features extracted from social media,we could identify the bottlenecks of current campaign anddevise a better campaign strategy.

6. EXPERIMENTSThe previous section presented a set of properties that

governing the process of crowdfunding. Now we show thatthis understanding is directly applicable to our two predic-tion tasks.

The practical applications of both tasks would be to pre-dict project results as early as possible. Accordingly, we per-form the following tasks using only the information availablein the starting bursty phase (within 25% of project durationafter the project is posted in crowdfunding sites). We con-sider the features presented in the previous sections for ourprediction tasks. Note that besides focusing on predictionaccuracy, we also aim at obtaining explanatory insights intowhat features are the most relevant and useful as signals forcrowdfunding processes.

6.1 Predicting the number of backersRecall from Section 4 that our first task is to predict the

number of backers of a project at target deadline. As sug-gested in [?, ?], the Relative Squared Error (RSE) is morerelevant and meaningful than absolute quadratic error foronline content popularity prediction. We here adopt RSEto evaluate the performance of our first task. Let N(p) bethe total number of backers project p has received at targetdeadline, and N(p|t) be the total number of backers pre-dicted for project p based on data from the first t days. TheRSE takes the form of

RSE =

N(p|t)�N(p)

N(p)

�2

=

N(p|t)N(p)

� 1

�2

(1)

For the collection of projects, denoted by C, the mean Rel-ative Squared Error (mRSE) is defined as the arithmeticmean of the RSE values for all projects in C, that is:

mRSE =1|C| ·

X

p2C

✓t ·

X(p, t)N(p)

� 1

�2

(2)

The correlation between crowdfunding activity and thefuture number of backers observed in previous section sug-gests that the number of backers can be estimated as a linearfunction of features. Thus, we apply a multivariate linearmodel [?] to predict the number of backers. We denote thefeature vector of the project p at time t as X(p, t). Thefuture number of backers of the project can be estimated as

N(p|t) = ✓t ·X(p, t) (3)

where ✓t is the vector of model parameters and depends onlyon t.

Given a training set C, t, we can estimate the optimalvalues for the elements of ✓ti as the ones that minimize themRSE on C, i.e.:

argmin✓t

1|C|

X

p2C

✓t ·

X(p, t)N(p)

� 1

�2

(4)

0 5 10 15 20 250.1

0.15

0.2

0.25

0.3

0.35

0.4

Elapsed time of total project duration (%)

mR

SE

S7BaseLine

Figure 11: The performance of the prediction of thenumber of backers.

0.5 1 1.5 2 2.5 3 3.50.25

0.3

0.35

0.4

0.45

0.5

0.55

# of backers (log base 10)

mR

SE

S7BaseLine

Figure 12: The mRSE shown as a function of thenumber of backers at target deadline.

which is an ordinary least squares (OLS) problem that canbe solved via a singular value decomposition [?, ?].We conduct feature selection and find a core set of 7 fea-

tures that we use in this task: number of backers, numberof tweets, number of promoters, number of patrons, fractionof promoters from external influence, number of edges, andnumber of weakly connected components. Note that exceptthe number of backers, all the other six features are ex-tracted from social media. The first four in the six featuresare from social activity features, while the last two are fromsocial structure features. We call this set S7 and compare itagainst the baseline that only uses the number of backers.We perform 5-fold cross validation and report the mRSE.

Figure 11 gives the results of the prediction task. We canobserve that the features extracted from social media (S7)inspired by our previous analyses improve the performanceof prediction. For instance, when using only the data avail-able up to 5% of project duration, the mRSE produced bythe S7 and by the baseline are 0.337 and 0.357, respectively,leading to an error reduction of 5.5%. If extending the timewindow to 25% of project duration - after which half of back-ers have arrived (see Section 5.1) - both mRSE values can besignificantly reduced to about 0.16. The performance gainsfrom using social media features are gradually reduced. Thisis reasonable, as the scale of the backers reveals more in-formation of its final count, predictions just become easier.However, it is less meaningful to do such predictions at thisstage, as it might be too late for project owners to adjusttheir strategies.To further understand the performance of prediction for

di↵erent popularities, in Figure 12, we analyze the mRSEvalues produced by comparing S7 with baseline at 5% of

: feature vector

5.5% error reduction

(1) Predict number of backers (within 25% of project duration)

Page 26: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

5 10 250.8

0.85

0.9

0.95

Elapsed time of total project duration (%)AU

C

A A,B A,B,C

(2) Predict whether a project will succeed or fail (within 25% of project duration)

5 10 2570

75

80

85

Elapsed time of total project duration (%)

Accu

racy

A A,B A,B,C

A: Project features; B:Social activity features; C: Social structure feature

17

Experiments (Early Prediction)

Page 27: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

5 10 250.8

0.85

0.9

0.95

Elapsed time of total project duration (%)AU

C

A A,B A,B,C

(2) Predict whether a project will succeed or fail (within 25% of project duration)

5 10 2570

75

80

85

Elapsed time of total project duration (%)

Accu

racy

A A,B A,B,C

A: Project features; B:Social activity features; C: Social structure feature

Over 75% accuracy using features withinonly 5% of project duration (2~3 days)

17

Experiments (Early Prediction)

Page 28: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Conclusion & Future Work

18

1. Temporal distribution of attention is affected by both the freshness and the approaching deadline.

2. Fundraising results are more correlated to the social promotional activities rather than its own properties.

3. Qualitative analysis: why users invest in a project?

(1) number of backers is strongly correlated to promotional activity(2) success rate is more related to the structure of promotional campaign

Page 29: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Related work in this session

19

1. A Better World for All: Understanding and Promoting Micro-finance Activities in kiva.org

2. Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership

Focus on the crowdfunding platform itself. Key factors that encourage donationPredict how likely a given lender will fund a new loan

Analyze correlated behaviors within Twitter and YouTubeInsights into who watches/shares what on YouTube, and when.

Page 30: Multiple Views and Multiple Sources on Crowdfundingclu/doc/wsdm14_slides.pdf · Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang ... of the crowdfunding

Q&A

20

Multi-Transfer: Transfer Learning with Multiple Views and Multiple Sources Ben Tan, Erheng Zhong, Evan Wei Xiang, Qiang Yang

Presented by Zhiyuan Chen

Thank you!