Why Do We Need Web Science Research?

Why Do We Need Web Science Research?

December 2009

Sangki Han, Ph.D.

Professor / GSCT

KAIST

2009년 12월 2일 수요일

Web Science by Tim Berners-Lee in 2006

2

2009년 12월 2일 수요일

This is the first report of time-dependentseismic tomography applied to an eruptingvolcano. It builds on earlier work of thesame kind done in geothermal areas inCalifornia and Iceland and the Long ValleyCaldera, California. But the seminal exam-ple of major changes in V

P/V

Scomes from

The Geysers geothermal area in northernCalifornia.

During the 1980s and 1990s, some13,600 tons of steam per hour were extractedfrom The Geysers to generate electricity. Asa result of this overexploitation, the reser-voir became progressively depleted as porewater was replaced by steam. Repeat seis-mic tomography showed the steady growthof a reservoir-wide negative V

P/V

Sanomaly

that coincided with the steam-productionzone. This anomaly was caused by the com-bined effects of the replacement of pore liq-uid with steam, the resulting decrease inpressure, and the drying of clay minerals. Aremarkable series of snapshots showed therelentless growth of a volume of heavydepletion (3, 4). The work helped to increaseawareness of the nonsustainability of suchhigh rates of fluid withdrawal. Production atThe Geysers has now been reduced to sus-tainable levels. Time-dependent tomogra-phy is currently used to monitor the CosoGeothermal Area, southern California (5).

Time-dependent seismic tomographywas first applied to a volcano in a study ofMammoth Mountain, a volcano on the rimof Long Valley Caldera, California. In 1989,an intense swarm of hundreds of earth-quakes accompanied an injection of newmagma into the roots of this volcano,and triggered the outpouring of some300 tons of CO

2per day from the vol-

cano’s surface. Several broad swathsof trees died as a result of high levelsof CO

2in the soil, and the CO

2

also presented an asphyxiation hazardto humans. A comparison of V

P/V

Stomo-

graphic images calculated for 1989 and1997 showed changes that correlated wellwith areas of tree death on the surface above,and were attributed to migration of CO

2in

the volcano (6).By showing that time-dependent seismic

tomography can be used to monitor struc-tural changes directly associated with a vol-canic eruption cycle, Patanè et al. take a crit-ical step toward developing a useful volcano-hazard-reduction tool based on seismictomography. As with all good experiments,however, it ushers in new challenges. V

P/V

S

is affected by several factors, including porefluid phase, pressure, mineralogy, and frac-ture density. However, determining how each

of these has changed when changes in onlytwo quantities (V

Pand V

S) have been mea-

sured is not possible and requires the addi-tion of other kinds of data. Both theoreticaladvances and more data from different vol-canoes are needed before the potential of themethod can be fully assessed.

At present, monitoring of active volca-noes still rests mostly on relatively unso-phisticated seismic networks and the moni-toring of simple parameters, such as thenumbers of earthquakes and the amplitudeof harmonic tremor. Patanè et al. show thatmuch more sophisticated methods can nowbe used. Some of these methods only need tobe automated—a critical factor if they are tobe useful in situations where information isneeded on an hourly basis. It is hoped thatthis automation work will be pushed for-ward rapidly in the near future, putting us on

track to realizing technological capabilitiesresembling those of the fictional VirtualGeophysical Laboratory by 2025.

References and Notes

1. D. Patanè, G. Barberi, O. Cocina, P. De Gori, C.Chiarabba, Science 313, 821 (2006).

2. The compressional and shear waves are the fastest andsecond-fastest waves to be radiated from an earthquakesource, so they arrive first and second on seismograms.Their ratio provides information about pressure andabout the presence of gas and liquid in the study volume.Thus, changes in their ratio can tell us about changes inpressure and gas/liquid, which are thought to accompanythe buildup and occurrence of a volcanic eruption.

3. G. R. Foulger, C. C. Grant, A. Ross, B. R. Julian, Geophys.Res. Lett. 24, 135 (1997).

4. R. C. Gunasekera, G. R. Foulger, B. R. Julian, J. Geophys.Res. 108, 2134 (2003).

5. G. R. Foulger, B. R. Julian, Geotherm. Resour. Counc.Bull. 33, 120 (2004).

6. G. R. Foulger et al., J. Geophys. Res. 108, 2147 (2003).

10.1126/science.1131790

769

Since its inception, the World WideWeb has changed the ways scientistscommunicate, collaborate, and edu-

cate. There is, however, a growing realiza-tion among many researchers that a clear

research agenda aimedat understanding the

current, evolving,and potential Web is

needed. If we want tomodel the Web; if we

want to understand the architectural princi-ples that have provided for its growth; and ifwe want to be sure that it supports the basicsocial values of trustworthiness, privacy,and respect for social boundaries, then wemust chart out a research agenda that targetsthe Web as a primary focus of attention.

When we discuss an agenda for a scienceof the Web, we use the term “science” in twoways. Physical and biological science ana-

lyzes the natural world, and tries to findmicroscopic laws that, extrapolated to themacroscopic realm, would generate thebehavior observed. Computer science, bycontrast, though partly analytic, is princi-pally synthetic: It is concerned with the con-struction of new languages and algorithmsin order to produce novel desired computerbehaviors. Web science is a combination ofthese two features. The Web is an engineeredspace created through formally specifiedlanguages and protocols. However, becausehumans are the creators of Web pages andlinks between them, their interactions formemergent patterns in the Web at a macro-scopic scale. These human interactions are,in turn, governed by social conventions andlaws. Web science, therefore, must be inher-ently interdisciplinary; its goal is to bothunderstand the growth of the Web and to cre-ate approaches that allow new powerful andmore beneficial patterns to occur.

Unfortunately, such a research area doesnot yet exist in a coherent form. Withincomputer science, Web-related research haslargely focused on information-retrievalalgorithms and on algorithms for the routingof information through the underlying Inter-net. Outside of computing, researchers grow

Understanding and fostering the growth of the World Wide Web, both in engineering and societal

terms, will require the development of a new interdisciplinary field.

Creating a Science of the WebTim Berners-Lee, Wendy Hall, James Hendler, Nigel Shadbolt, Daniel J. Weitzner

COMPUTER SCIENCE

T. Berners-Lee and D. J. Weitzner are at the Computer Scienceand Artificial Intelligence Laboratory, Massachusetts Instituteof Technology, Cambridge, MA 02139, USA. W. Hall and N. Shadbolt are in the School of Electronics and ComputerScience, University of Southampton, Southampton SO171BJ, UK. J. Hendler is in the Computer Science Department,University of Maryland, College Park, MD 20742, USA. E-mail: [email protected]

Enhanced online at

www.sciencemag.org/cgi/content/full/313/5788/769

www.sciencemag.org SCIENCE VOL 313 11 AUGUST 2006

PERSPECTIVES

Published by AAAS

on

De

ce

mb

er

1,

20

09

w

ww

.scie

nce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

3

2009년 12월 2일 수요일

A New Discipline

Model the Web’s structure

Articulate the architectural principles that have fueled its phenomenal growth

Discover how online human interactions are driven by and can change social conventions

4

2009년 12월 2일 수요일

Interdisciplinary Approach

5

2009년 12월 2일 수요일

WSRI & Web Science Trust

The Web Science Research Initiative (WSRI) is a joint endeavour between the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT and the School of Electronics and Computer Science (ECS) at the University of Southampton. The goal of WSRI is to facilitate and produce the fundamental scientific advances necessary to inform the future design and use of the World Wide Web

Publication: Foundations and Trends in Web Science

Events

– Web Science Summer Graduate School

– WebSci09 - Society On-Line

Directors of WSRI are establishing a charitable body - the Web Science Trust (WST)

– Working with WWW Foundation

6

2009년 12월 2일 수요일

WebSci’09: Society On-Line

Understanding of both human behavior and technological design

– How do people and organisations behave on-line – what motivates them to shop, date, make friends, learn, participate in political life or manage their health or tax on-line?

– Which Web-based designs will they trust? To which on-line agents will they delegate?

– How can the dark side of the Web – such as cybercrime, pornography and terrorist networks – be both understood and held in check without compromising the experience of others?

– What are the effects of varying characteristics of Web-based technologies – such as security, privacy, network structure, the linking of data – on on-line behaviour, both criminal and non-criminal?

– And how can the design of the Web of the future ensure that a system on which – as Tim Berners-Lee put it – democracy and commerce depends remains 'stable and pro-human'?

7

Identified the following areas of on-line society and Web development for particular attention:

– E-commerce

– Government and Political Life

– Social Relationships

– Cybercrime and/or the Prevention Thereof

– Health

– Culture On-Line

– E-Learning

The cross-cutting infrastructure issues on which these areas depend including, but not limited to:

– Linked Data and the Semantic Web

– Trust and Reputation

– Security and Privacy

– Networking (Social and Technical)

2009년 12월 2일 수요일

8

32 SC IE NTIF IC AME RIC AN Oc tober 20 0 8

Since the World Wide Web blossomed in the mid-1990s, it has exploded to more than 15 billion pages that touch almost

all aspects of modern life. Today more and more people’s jobs depend on the Web. Media, bank-ing and health care are being revolutionized by it. And governments are even considering how to run their countries with it. Little appreciated, however, is the fact that the Web is more than the sum of its pages. Vast emergent properties have arisen that are transforming society. E-mail led to instant messaging, which has led to social net-works such as Facebook. The transfer of docu-ments led to file-sharing sites such as Napster, which have led to user-generated portals such as YouTube. And tagging content with labels is cre-ating online communities that share everything from concert news to parenting tips.

But few investigators are studying how such emergent properties have actually blossomed, how we might harness them, what new phe-nomena may be coming or what any of this might mean for humankind. A new branch of science—Web science—aims to address such is-sues. The timing fits history: computers were built first, and computer science followed,

which subsequently improved computing significantly. Web science was launched as a formal discipline in November 2006, when the two of us and our col-leagues at the Massachusetts In-stitute of Technology and the University of Southampton in England announced the begin-

ning of a Web Science Research Initiative. Lead-ing researchers from 16 of the world’s top uni-versities have since expanded on that effort.

This new discipline will model the Web’s structure, articulate the architectural principles that have fueled its phenomenal growth, and dis-cover how online human interactions are driven by and can change social conventions. It will elu-cidate the principles that can ensure that the net-work continues to grow productively and settle complex issues such as privacy protection and in-tellectual-property rights. To achieve these ends, Web science will draw on mathematics, physics, computer science, psychology, ecology, sociolo-gy, law, political science, economics, and more.

Of course, we cannot predict what this na-scent endeavor might reveal. Yet Web science has already generated crucial insights, some presented here. Ultimately, the pursuit aims to answer fundamental questions: What evolu-tionary patterns have driven the Web’s growth? Could they burn out? How do tipping points arise, and can that be altered?

Insights AlreadyAlthough Web science as a discipline is new, earlier research has revealed the potential value of such work. As the 1990s progressed, search-ing for information by looking for key words among the mounting number of pages was returning more and more irrelevant content. The founders of Google, Larry Page and Sergey Brin, realized they needed to prioritize the results.

Their big insight was that the importance of a page—how relevant it is—was best understood in terms of the number and importance of the pages linking to it. The difficulty was that part of this definition is recursive: the importance of a page is determined by the importance of the

INFORMATION TECHNOLOGY

KEY CONCEPTS The relentless rise in Web pages and links is creating emer-gent properties, from social net-working to virtual identity theft, that are transforming society.

A new discipline, Web science, aims to discover how Web traits arise and how they can be harnessed or held in check to benefit society.

Important advances are begin-ning to be made; more work can solve major issues such as securing privacy and conveying trust.

—The Editors

By Nigel Shadbolt and Tim Berners-Lee

Studying the Web will reveal better

ways to exploit information,

prevent identity theft,

revolutionize industry and manage

our ever growing online lives

EMERGESWeb Science

2009년 12월 2일 수요일

Power-Law Distribution of theWorld Wide Web

Barabasi and Albert (1) propose an im-proved version of the Erdos-Renyi (ER) the-ory of random networks to account for thescaling properties of a number of systems,including the link structure of the WorldWide Web (WWW). The theory they present,however, is inconsistent with empirically ob-served properties of the Web link structure.

Barabasi and Albert write that because“of the preferential attachment, a vertexthat acquires more connections than anoth-er one will increase its connectivity at ahigher rate; thus, an initial difference in theconnectivity between two vertices will in-crease further as the network grows. . . .Thus older . . . vertices increase their con-nectivity at the expense of the younger . . .ones, leading over time to some verticesthat are highly connected, a ‘rich-get-rich-er’ phenomenon” [figure 2C of (1)]. It isthis prediction of the Barabasi-Albert (BA)model, however, that renders it unable toaccount for the power-law distribution oflinks in the WWW [figure 1B of (1)].

We studied a crawl of 260,000 sites, eachone representing a separate domain name. Wecounted how many links the sites received

from other sites, and found that the distribu-tion of links followed a power law (Fig. 1A).Next, we queried the InterNIC database (us-ing the WHOIS search tool at www.networksolutions.com) for the date on whichthe site was originally registered. Whereasthe BA model predicts that older sites havemore time to acquire links and gather links ata faster rate than newer sites, the results ofour search (Fig. 1B) suggest no correlationbetween the age of a site and its number oflinks.

The absence of correlation between ageand the number of links is hardly surpris-ing; all sites are not created equal. Anexciting site that appears in 1999 will soonhave more links than a bland site created in1993. The rate of acquisition of new links isprobably proportional to the number oflinks the site already has, because the morelinks a site has, the more visible it becomesand the more new links it will get. (Thereshould, however, be an additional propor-tionality factor, or growth rate, that variesfrom site to site.)

Our recently proposed theory (2), whichaccounts for the power-law distribution in thenumber of pages per site, can also be appliedto the number of links a site receives. In thismodel, the number of new links a site re-ceives at each time step is a random fractionof the number of links the site already has.New sites, each with a different growth rate,appear at an exponential rate. This modelyields scatter plots similar to Fig. 1B, and canproduce any power-law exponent ! " 1.

Lada A. AdamicBernardo A. Huberman

Xerox Palo Alto Research Center3333 Coyote Hill Road

Palo Alto, CA 94304, USAE-mail: [email protected]

References1. A.-L. Barabasi and R. Albert, Science 286, 509 (1999).2. B. A. Huberman and L. A. Adamic, Nature 401, 131(1999).

10 November 1999; accepted 4 February 2000

Response: Adamic and Huberman offer ad-ditional support for the evolutionary networkmodel that we offered (1). The apparent messin their fig. 1B is rooted in their choice not toaverage their data. We believe that taking theaverage over all points of the same age, andextracting the trends within those averages,would have unveiled the increasing tendencypredicted by our model.

Although we do not have access to their

data, we can illustrate the same procedure forthe network of movie actors that we dis-cussed (1). When the connectivity of the in-dividual actors is plotted as a function of therelease year of their first movie (Fig. 1A), theresults are very similar to those shown in fig.1B of Adamic and Huberman’s comment.The only difference is that the movie industryhad its boom not 4 years ago, as did theWWW, but rather at the beginning of thecentury; thus, the apparently structureless re-gime persists much longer. When the connec-tivity of the actors that debuted in the sameyear is averaged, however, the average con-nectivity in the last 60 years increases withthe actor’s age, in line with the predictions ofour theory, and the curve follows a power lawfor almost a hundred years (Fig. 1B). Weexpect that a similar increasing tendencywould appear for the WWW data after aver-aging, but the length of the scaling intervalwould be limited by the Web’s comparativelybrief history.

The fluctuations that lead to the appar-ent randomness of Fig. 1A are due to theindividual differences in the rate at whichnodes increase their connectivity. It iseasy to include such differences in themodel and continuum theory proposed by

Fig. 1. (A) The distribution function for thenumber of links, k, to Web sites (from crawl inspring 1997). The dashed line has slope ! #1.94. (B) Scatter plot of the number of links, k,versus age for 120,000 sites. The correlationcoefficient is 0.03.

Fig. 1. (A) Scatter plot of movie actor connec-tivity, k (the number of other actors with whichhe or she performed during his or her career),versus the year of debut. All actors from theInternet Movie Database were included; n #392,340. (B) Average movie actor connectivity,$k%, versus year of debut. To determine $k%, k isaveraged over all actors that debuted in thesame year. The curve shows a systematic in-crease in the average connectivity with theactor’s professional lifetime, t # (2000 & yearof debut). The dotted line follows k(t) ' t(,with ( # 0.49, very close to the prediction ( #0.5 of (1). Inset shows a log-log plot of $k% as afunction of t, which illustrates the presence ofscaling in the last century. The dotted line hasslope 0.5.

T E C H N I C A L C O M M E N T

www.sciencemag.org SCIENCE VOL 287 24 MARCH 2000 2115a

Model the Web’s Structure

PageRank by Page and Brin Web is a scale-free network -.

Northeastern University’s Albert-László Barabási

Web as having short paths and small worlds– While at Cornell University in the

1990s, Duncan J. Watts and Steven H. Strogatz

– Even though the Web was huge, a user could get from one page to any other page in at most 14 clicks

9

2009년 12월 2일 수요일

Analysis of Topological Characteristicsof Huge Online Social Networking Services

Yong-Yeol AhnDepartment of PhysicsKAIST, Deajeon, Korea

[email protected]

Seungyeop Han!

NHN Corp.Korea

[email protected]

Haewoon KwakDivision of Computer Science

KAIST, Daejeon, Korea

[email protected]

Sue MoonDivision of Computer Science

KAIST, Daejeon, Korea

[email protected]

Hawoong JeongDepartment of PhysicsKAIST, Deajeon, Korea

[email protected]

ABSTRACT

Social networking services are a fast-growing business in theInternet. However, it is unknown if online relationships andtheir growth patterns are the same as in real-life social net-works. In this paper, we compare the structures of threeonline social networking services: Cyworld, MySpace, andorkut, each with more than 10 million users, respectively.We have access to complete data of Cyworld’s ilchon (friend)relationships and analyze its degree distribution, clusteringproperty, degree correlation, and evolution over time. Wealso use Cyworld data to evaluate the validity of snowballsampling method, which we use to crawl and obtain par-tial network topologies of MySpace and orkut. Cyworld,the oldest of the three, demonstrates a changing scaling be-havior over time in degree distribution. The latest Cyworlddata’s degree distribution exhibits a multi-scaling behavior,while those of MySpace and orkut have simple scaling be-haviors with di!erent exponents. Very interestingly, eachof the two exponents corresponds to the di!erent segmentsin Cyworld’s degree distribution. Certain online social net-working services encourage online activities that cannot beeasily copied in real life; we show that they deviate fromclose-knit online social networks which show a similar de-gree correlation pattern to real-life social networks.

Categories and Subject Descriptors: J.4 [ComputerApplications]: Social and behavioral sciences

General Terms: Human factors, Measurement

Keywords: Sampling, Social network

1. INTRODUCTION

The Internet has been a vessel to expand our social net-works in many ways. Social networking services (SNSs) areone successful example of such a role. SNSs provide an on-line private space for individuals and tools for interactingwith other people in the Internet. SNSs help people findothers of a common interest, establish a forum for discus-sion, exchange photos and personal news, and many more.

!This work was conducted while Han was at KAIST.

Copyright is held by the International World Wide Web Conference Com-mittee (IW3C2). Distribution of these papers is limited to classroom use,and personal use by others.WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.ACM 978-1-59593-654-7/07/0005.

Cyworld, the largest SNS in South Korea, had already 10million users 2 years ago, one fourth of the entire populationof South Korea. MySpace and orkut, similar social network-ing services, have also more than 10 million users each. Re-cently, the number of MySpace users exceeded 130 millionwith a growing rate of over a hundred thousand people perday. It is reported that these SNSs “attract nearly half of allweb users” [1]. The goal of these services is to help peopleestablish an online presence and build social networks; andto eventually exploit the user base for commercial purposes.Thus the statistics and dynamics of these online social net-works are of tremendous importance to social networkingservice providers and those interested in online commerce.

The notion of a network structure in social relations datesback about half a century. Yet, the focus of most sociologicalstudies has been interactions in small groups, not structuresof large and extensive networks. Di"culty in obtaining largedata sets was one reason behind the lack of structural study.However, as reported in [2] recently, missing data may dis-tort the statistics severely and it is imperative to use largedata sets in network structure analysis.

It is only very recently that we have seen research re-sults from large networks. Novel network structures fromhuman societies and communication systems have been un-veiled; just to name a few are the Internet and WWW [3] andthe patents, Autonomous Systems (AS), and a"liation net-works [4]. Even in the short history of the Internet, SNSs area fairly new phenomenon and their network structures arenot yet studied carefully. The social networks of SNSs arebelieved to reflect the real-life social relationships of peoplemore accurately than any other online networks. Moreover,because of their size, they o!er an unprecedented opportu-nity to study human social networks.

In this paper, we pose and answer the following questions:What are the main characteristics of online social net-

works? Ever since the scale-free nature of the World-WideWeb network has been revealed, a large number of networkshave been analyzed and found to have power-law scaling indegree distribution, large clustering coe"cients, and smallmean degrees of separation (so called the small-world phe-nomenon). The networks we are interested in this work arehuge and those of this magnitude have not yet been ana-lyzed.

How representive is a sample network? In most networks,

Analysis on CyWorld

10

2009년 12월 2일 수요일

Social Drivers

Discover how online human interactions are driven by and can change social conventions– Social drivers-goals, desires, interests and

attitudes-are fundamental aspects of how links are made

– Understanding the Web requires insights from sociology and psychology every bit as much as from mathematics and computer science

– Stanley Milgram (1967) vs. Duncan Watts (2003)

11

An Experimental Study of Searchin Global Social Networks

Peter Sheridan Dodds,1 Roby Muhamad,2 Duncan J. Watts1,2*

We report on a global social-search experiment in which more than 60,000e-mail users attempted to reach one of 18 target persons in 13 countries byforwarding messages to acquaintances. We find that successful social search isconducted primarily through intermediate to weak strength ties, does notrequire highly connected “hubs” to succeed, and, in contrast to unsuccessfulsocial search, disproportionately relies on professional relationships. By ac-counting for the attrition of message chains, we estimate that social searchescan reach their targets in a median of five to seven steps, depending on theseparation of source and target, although small variations in chain lengths andparticipation rates generate large differences in target reachability. We con-clude that although global social networks are, in principle, searchable, actualsuccess depends sensitively on individual incentives.

It has become commonplace to assert that anyindividual in the world can reach any otherindividual through a short chain of social ties(1, 2). Early experimental work by Traversand Milgram (3) suggested that the averagelength of such chains is roughly six, andrecent theoretical (4) and empirical (4–9)work has generalized the claim to a widerange of nonsocial networks. However, muchabout this “small world” hypothesis is poorlyunderstood and empirically unsubstantiated.In particular, individuals in real social net-works have only limited, local informationabout the global social network and, there-fore, finding short paths represents a non-trivial search effort (10–12). Moreover, andcontrary to accepted wisdom, experimentalevidence for short global chain lengths isextremely limited (13–15). For example,Travers and Milgram report 96 messagechains (of which 18 were completed) initiatedby randomly selected individuals from a cityother than the target’s (3). Almost all otherempirical studies of large-scale networks(4–9, 16–19) have focused either on non-social networks or on crude proxies of socialinteraction such as scientific collaboration,and studies specific to e-mail networks haveso far been limited to within single institu-tions (20).

We have addressed these issues by con-ducting a global, Internet-based social searchexperiment (21). Participants registered on-line (http://smallworld.sociology.columbia.edu) and were randomly allocated one of 18target persons from 13 countries (table S1).

Targets included a professor at an Ivy Leagueuniversity, an archival inspector in Estonia, atechnology consultant in India, a policemanin Australia, and a veterinarian in the Norwe-gian army. Participants were informed thattheir task was to help relay a message to theirallocated target by passing the message to asocial acquaintance whom they considered“closer” than themselves to the target. Of the98,847 individuals who registered, about25% provided their personal information andinitiated message chains. Because subsequentsenders were effectively recruited by theirown acquaintances, the participation rate af-ter the first step increased to an average of37%. Including initial and subsequent send-ers, data were recorded on 61,168 individualsfrom 166 countries, constituting 24,163 dis-tinct message chains (table S2). More thanhalf of all participants resided in North Amer-ica and were middle class, professional,college educated, and Christian, reflectingcommonly held notions of the Internet-usingpopulation (22).

In addition to providing his or her chosencontact’s name and e-mail address, eachsender was also required to describe how heor she had come to know the person, alongwith the type and strength of the resultingrelationship. Table 1 lists the frequencieswith which different types of relationships—classified by type, origin, and strength—were

invoked by our population of 61,168 activesenders. When passing messages, senderstypically used friendships in preference tobusiness or family ties; however, almost halfof these friendships were formed through ei-ther work or school affiliations. Furthermore,successful chains in comparison with incom-plete chains disproportionately involved pro-fessional ties (33.9 versus 13.2%) rather thanfriendship and familial relationships (59.8versus 83.4%) (table S3). Successful chainswere also more likely to entail links thatoriginated through work or higher education(65.1 versus 39.6%) (table S4). Men passedmessages more frequently to other men(57%), and women to other women (61%),and this tendency to pass to a same-sex con-tact was strengthened by about 3% if thetarget was the same gender as the sender andsimilarly weakened in the opposite case. In-dividuals in both successful and unsuccessfulchains typically used ties to acquaintancesthey deemed to be “fairly close.” However, insuccessful chains “casual” and “not close”ties were chosen 15.7 and 5.9% more fre-quently than in unsuccessful chains (tableS5), thus adding support, and some resolu-tion, to the longstanding claim that “weak”ties are disproportionately responsible for so-cial connectivity (23).

Senders were also asked why they consid-ered their nominated acquaintance a suit-able recipient (Table 2). Two reasons—geographical proximity of the acquaintanceto the target and similarity of occupation—accounted for at least half of all choices, ingeneral agreement with previous findings(24, 25). Geography clearly dominated theearly stages of a chain (when senders weregeographically distant) but after the third stepwas cited less frequently than other charac-teristics, of which occupation was the mostoften cited. In contrast with previous claims(3, 12), the presence of highly connectedindividuals (hubs) appears to have limitedrelevance to the kind of social search embod-ied by our experiment (social search withlarge associated costs/rewards or otherwisemodified individual incentives may behavedifferently). Participants relatively rarelynominated an acquaintance primarily becausehe or she had many friends (Table 2,“Friends”), and individuals in successful

1Institute for Social and Economic Research and Pol-icy, Columbia University, 420 West 118th Street,New York, NY 10027, USA. 2Department of Sociology,Columbia University, 1180 Amsterdam Avenue, NewYork, NY 10027, USA.

*To whom correspondence should be addressed. E-mail: [email protected]

Table 1. Type, origin, and strength of social ties used to direct messages. Only the top five categories inthe first two columns have been listed. The most useful category of social tie is medium-strengthfriendships that originate in the workplace.

Type of relationship % Origin of relationship % Strength of relationship %

Friend 67 Work 25 Extremely close 18Relatives 10 School/university 22 Very close 23Co-worker 9 Family/relation 19 Fairly close 33Sibling 5 Mutual friend 9 Casual 22Significant other 3 Internet 6 Not close 4

R E P O R T S

www.sciencemag.org SCIENCE VOL 301 8 AUGUST 2003 827

2009년 12월 2일 수요일

60 November 2007/Vol. 50, No. 11 COMMUNICATIONS OF THE ACM

By Oded Nov

illustration by lisa haney

In order to increase and enhance user-generated content contributions, it is important to understand the factors that lead

people to freely share their time and knowledge with others.

The last few years have seen a substantial growth inuser-generated online content [7, 11] deliveredthrough collaborative Internet outlets such asYouTube, Flickr, or Slashdot.org, as well as more tra-ditional media outlets such as BBC News.com [6].Consistent with the Open Information Society’s visionof decreasing restrictions on the creation and deliveryof previously protected information goods [1], user-generated content marks a new way for information tobe created, manipulated, and consumed.

Wikipedia, the Web-based user-created encyclopedia,is a prominent example of a collaborative, user-gener-ated content outlet [11]. With more than 1.9 million

WHAT MOTIVATESWIKIPEDIANS?

12

the ample opportunity for contributors to serve theego and publicly exhibit their knowledge, we wouldexpect the contribution level to be positively associ-ated with the Enhancement level.

Wikipedia relies on the open source model [9]where people contribute their time, talent, and knowl-edge in a collaborative effort to create publicly avail-able knowledge-based products. Therefore, in additionto the six general volunteering motivations, two othermotivations—fun and ideology—used extensively inresearch on open source software development (forexample, [8, 12] ) may also help to understand whypeople contribute to Wikipedia. In bothcases we would expect to see higher con-tribution levels associated with highermotivation levels.

THE SURVEY

Wikipedians volunteer their time andknowledge for no monetary reward,and therefore our questionnaireincluded contribution measures aswell as volunteering motivations mea-sures. The contribution level was mea-sured as hours per week spent oncontributing, a measure commonlyused as a proxy for participant contri-bution [10]. Motivation was measuredthrough the volunteering motivationsscale [2] adjusted to the Wikipediacontext, as well as items adjusted fromresearch on open source motivationmeasuring ideology [12] and fun [8].All of the motivation items in thequestionnaire were presented as state-ments to which Wikipedians wereasked to state how strongly they agreeor disagree on a scale of 1 to 7. Exam-ples of questionnaire items are pro-vided in Table 1.

The English Wikipedia AlphabeticalList of Wikipedians includes 2,847 peo-ple. These are not all the contributors,but rather only those who have createda personal user page beyond merely contributing con-tent, and therefore may be more committed to con-tributing. Of these, a random sample of 370Wikipedians were emailed a request to participate in aWeb-based survey. A total of 151 valid responses werereceived (40.8% response rate), of which 140 (92.7%)were from males. The respondents’ mean age was30.9, and on average they have been contributingcontent to Wikipedia 2.3 years. Like many studiesbased on survey design, this study may potentially suf-

fer from a response bias, whereby, for example, enthu-siastic Wikipedians were more likely to participatethan lesser contributors. To test this, responses of theearliest 30% respondents were compared with the last30% of the sample in terms of the contribution level.No bias was found.

THE RESULTS

The average level of contribution was 8.27 hours perweek—a total that varied across Wikipedians’ demo-graphics and motivation levels. Overall, the topmotivations were found to be Fun and Ideology,

whereas Social, Career, and Protec-tive were not found to be strongmotivations for contribution (seeTable 2). As expected, it was foundthat the levels of each of the sixmotivations positively correlatedwith contribution level (see Table2). However, somewhat unexpect-edly, the contribution level was notcorrelated significantly with the Ide-ology and Social motivations.

The Ideology case is particularlyinteresting, because while ideology isindicated as a strong motivation(ranked second out of eight), themotivation level is not significantlycorrelated with the contributionlevel. In other words, while peoplestate that ideology is high on their listof reasons to contribute, being moreideologically motivated does nottranslate into increased contribution.One way to explain this finding

could be the effect of social desirability [3] onresponses to the questionnaire. This explanation,however, is ruled out since part of Crowne and Mar-lowe’s scale [3] was used to control for social desir-ability. An alternative explanation might be that whilepeople have strong opinions about ideology, these donot translate into actual behavior—thus exhibiting acase of “talk is cheap.” Another possible explanationmight be that contributors who are motivated by ide-ology may also contribute to other ideology-relatedprojects, such as open source software projects, andwould therefore have less time to contribute to eachproject.

The lack of correlation between the Social motiva-tion and the level of contribution may be explained bythe identity of the Wikipedian’s “important others;”that is, if important others are people who do not haveinterest in Wikipedia, then the Wikipedian’s contri-bution is not expected to be influenced by the Social

COMMUNICATIONS OF THE ACM November 2007/Vol. 50, No. 11 63

Motivation

*significant at 0.05 level**significant at 0.01 level***significant at 0.001 level

Fun

Ideology

Values

Understanding

Enhancement

Protective

Career

Social

Mean

6.10 (1.15)

[0.322***]

5.59

(1.71)[0.110]

3.96(1.55)

[0.175*]

3.92(1.48)

[0.296***]

2.97(1.39)

[0.313***]

1.97(1.05)

[0.306***]

1.67 (0.94)

[0.185*]

1.51(0.92)

[0.027]

Table 2. Motivation levelsand correlations

with contributionlevels. Standard

deviations inparentheses.

Pearson correlation

coefficient inbrackets.

62 November 2007/Vol. 50, No. 11 COMMUNICATIONS OF THE ACM

articles created by users inEnglish alone, it is amongthe top 10 fastest growingWeb brands [10] and apromising model for col-laborative knowledgesharing [5].

While collaborativecontent in general, andWikipedia in particular,receive increasing atten-tion in both the researchand the business commu-nities, no empirical,quantitative data is available that illustrates why peo-ple contribute to outlets like Wikipedia. Contribu-tors’ motivations seem to be critical for sustainingWikipedia and other collaborative user-generatedcontent outlets, since the content is contributed byvolunteers [4] who offer their time and talent inreturn for no monetary reward. Therefore, in order tounderstand what underlies user-generated contentcontribution, we must understand what motivatescontent contributors, and identify which motivationsare associated with high or low levels of contribution.

As a volunteering activity, content contribution toWikipedia can be explained by the factors underpin-ning volunteering behavior. In their influential studyof volunteers’ motivations, Clary et al. [2] identifiedsix motivational categories:

Values. Volunteering gives people an opportunityto express values related to altruistic and humanitar-ian concerns for others. Given that contributing con-tent to Wikipedia enables participants to activelyshow their concern by sharing knowledge with others,it is expected that higher levels of the Values functionamong contributors will positively relate to the extentof contribution to Wikipedia.

Social. Volunteering may provide people thechance to be with their friends or to engage in activi-ties viewed favorably by important others. Given the

collaborative nature ofWikipedia, we wouldexpect contribution levelsto be positively associatedwith Social motivation lev-els.

Understanding. Throughvolunteering, individualsmay have an opportunityto learn new things andexercise their knowledge,

skills, and abilities. Thus, as contributing content toWikipedia allows contributors to exercise theirknowledge, skills, and abilities, we would expect tosee higher contribution levels the more Wikipediacontributors are motivated by Understanding.

Career. Volunteering may provide an opportunityto achieve job-related benefits such as preparing for anew career or maintaining career-relevant skills. Inthe Wikipedia context, we would expect to find somecorrelation between contribution levels and theCareer function, as Wikipedia offers contributors away to signal their knowledge and writing skills topotential employers. However, we do not expect thisto be a strong correlation, as most Wikipedians arenot professional writers, or alternatively, becausemany Wikipedians contribute anonymously so theircontribution would not be useful for career purposes.

Protective. This category includes protecting theego from negative features of the self, reducing guiltover being more fortunate than others, or addressingone’s own personal problems. Wikipedia seems toprovide ample opportunity for contributors toaddress such needs and share the fortune of havingknowledge with others who do not have it. Therefore,we would expect contribution to be positively associ-ated with the Protective category.

Enhancement. This category somewhat relates tothe Protective category, however, Enhancementinvolves positive strivings of the ego rather than elim-inating negative ego-based factors. Here, too, given

Motivation

Protective

Values

Career

Social

Understanding

Enhancement

Fun

Ideology

Question example

“By writing/editing in Wikipedia I feel less lonely.”

“I feel it is important to help others.”

“I can make new contacts that might help my business or career.

“People I'm close to want me to write/edit in Wikipedia.”

“Writing/editing in Wikipedia allows me to gain a new perspective on things.”

“Writing/editing in Wikipedia makes me feel needed.”

“Writing/editing in Wikipedia is fun.”

“I think information should be free.”

Table 1. Motivations and questionnaire items.

To understand what underlies user-generated content contribution, we must understand what motivates content

contributors, and identify which motivations are associated with high or low levels of contribution.

2009년 12월 2일 수요일

Small Technical Innovation

Explore how a small technical innovation can launch a large social phenomenon– Emergence of the blogosphere - TrackBack– Twitterverse - ReTweet, Follow/Follower– The growth of Facebook - Facebook Connect

13

2009년 12월 2일 수요일

Understand on the dissemination of an idea or information might change our view of journalism and commentary

What mechanisms can assure blog readers that the facts quoted are trustworthy?

Provenance of Information

14

BLOGOSPHERE has certain patterns of power. Matthew Hurst tracked how blogs link to one another. A visualization of the result (left) dis-plays each blog as a white dot; the few large dots are massively popular sites. Blogs that share numerous cross citations form distinct communi-ties (purple). Isolated groups that communicate frequently among themselves but rarely with oth-ers appear as straight lines along the outer edges.

2009년 12월 2일 수요일

Influentials: Two Viewpoints

15

2009년 12월 2일 수요일

The Rise of Semantic Web

The network of data on the Web RDF (Resource Description Framework)

– Gives meaning to data through sets of ‘triples’– The subjects, verbs and objects are each identified by a Universal

Resource Identifier (URI)

Taxonomies and Ontologies Wiki

– DBpedia project by Chris Bizer• As of November 2008, the DBpedia dataset consists of around 274 million

RDF triples

– Motivation to contribute to the Semantic Web? - Oded Nov of Polytechnic Institute of NYU

16

2009년 12월 2일 수요일

17

S

Corporate applications are well under way, and consumer uses are emerging

2009년 12월 2일 수요일

Graph of Linked Data sets on the Web, as at March 2009

18

2009년 12월 2일 수요일

Research Roadmap

By Nigel Shadbolt, Web Science Research Initiative - November, 2008 A Computational Perspective

– Linked Data Web or Semantic Web: how we are to browse, explore and query such a Web at scale.

– Collective Intelligence with only light rules of social coordination can lead to the emergence of large-scale, coherent resources such as Wikipedia. What are the characteristics of such resources? Why do people contribute and how do they maintain a highly stable core body of connected content?

– How do we support inference at a Web scale? What types of reasoning are possible? How is context represented and supported in Web inference?

– How are concepts such as trust and provenance computationally represented, maintained and repaired on the Web?

– As the Web has grown substantial amounts of it have become disconnected, atrophied or in others ways redundant. How are we to identify such necrotic and non-functional parts of the Web and what should be done about them?

19

2009년 12월 2일 수요일

Research Roadmap

A Mathematical Perspective– How do we model the transient or ephemeral Web? How do

we model this graph beneath the graph that is the Web?– How are Bayesian or other uncertainty representations best

used within the Web?– What is the topological structure of the Web? Can connections always

be established between its various parts, or do particular dynamic and time-dependent conditions create disconnected or sub- regions within it?

– The virtual “shape” of the Web: A particular query about a given subject may organize Web pages, existing or virtual, according to “how close they are” with respect to the given search criteria.

• A different structure to different users. It is a mathematical challenge to develop tools to describe this structure.

– How do we measure the level of complexity of the Web?

20

2009년 12월 2일 수요일

Research Roadmap

A Social Science Perspective– How can we develop inter-disciplinary epistemologies that will enable us to

understand the Web as a complex socio-technical phenomenon?– How can we do mixed methods research to explore the relations between

ethnographic insights to Web practice and the emergence of the Web at the macro level?

– How can we draw on new data sources e.g. digital records of network use to develop understanding of the sociological aspects of the Web?

– What are the on-going iterative relations between use and design of the Web?– How and why do people use newly emergent forms of the Web in the way

that they do? What implications does this have for our understanding of key sociological categories, e.g. kinship, gender, race, class and community, and vice versa? What implications does this have for our understanding of psychological constructs, e.g. personal and group identity, collaborative decision making, perception and attitudes.

– How is the Web situated within networks of power and in relation to social inequalities? To what extent might the Web offer empowering political resources? How might the Web change further as new populations access it?

21

2009년 12월 2일 수요일

Research Roadmap

An Economic Perspective– What are the economics of Web 2.0 (+)?

– What are the economic forces that shape the formation of social networks on the Web? What are the properties of those networks? What is the relationship between the economic structure of the Web, its social and mathematical structure?

– What are the commercial incentives created by the Web? What will be the industrial structure? Or are there forces that will allow smaller scale operations to co-exist with large firms?

– What are the economic arguments for and against open platforms in the Web? Should policy (economic and public) play any role in shaping or determining the openness of Web platforms?

– What (economic and social) mechanisms can be designed to improve the performance of the Web? For example, are there mechanisms that can improve the extent and quality of participation in online communities?

– Economics for piracy, privacy and identity?

22

2009년 12월 2일 수요일

Research Roadmap

A Legal Perspective– Techniques for representing and reasoning over legal and social

rules – explore and understand the impact of law as a driver in shaping the Web development?

– Is the present intellectual property regulatory regime fit for purpose in the Web 2.0 (+) environment? What is content in the Semantic Web and what rights should attach to it particularly when much is likely to be “computer generated”?

– Which technologies within the Web should the law ensure remain “open” rather than becoming the “property” of one or more commercial entities and what are the consequences of the choices available?

– To what extent are the service providers going to become the legal gatekeepers for public authorities in terms of delivering their public policy objectives e.g. Web policing for what is judged to be “illegal and harmful content”?

– What privacy issues arise in a Web environment of increasingly sophisticated information sharing?

23

2009년 12월 2일 수요일

Integrative Research Themes

Collective Intelligence– Technical, socio-economic, legal, psychological

The Openness of the Web– Economic, Legal

The Dynamics of the Web– Mathematical, sociological, legal, linguistic

Security, Privacy and Trust– Economic, social, legal interaction

Inference– Computational, psychological, linguistic

24

2009년 12월 2일 수요일

Thank you and meet me atFacebook: stevehanTwitter: steve3034me2day: steve3001

socialcomputing.tistory.com

25

2009년 12월 2일 수요일