Upload
webscikorea
View
8.926
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Why Do We Need Web Science Research?
December 2009
Sangki Han, Ph.D.
Professor / GSCT
KAIST
2009년 12월 2일 수요일
Web Science by Tim Berners-Lee in 2006
2
2009년 12월 2일 수요일
This is the first report of time-dependentseismic tomography applied to an eruptingvolcano. It builds on earlier work of thesame kind done in geothermal areas inCalifornia and Iceland and the Long ValleyCaldera, California. But the seminal exam-ple of major changes in V
P/V
Scomes from
The Geysers geothermal area in northernCalifornia.
During the 1980s and 1990s, some13,600 tons of steam per hour were extractedfrom The Geysers to generate electricity. Asa result of this overexploitation, the reser-voir became progressively depleted as porewater was replaced by steam. Repeat seis-mic tomography showed the steady growthof a reservoir-wide negative V
P/V
Sanomaly
that coincided with the steam-productionzone. This anomaly was caused by the com-bined effects of the replacement of pore liq-uid with steam, the resulting decrease inpressure, and the drying of clay minerals. Aremarkable series of snapshots showed therelentless growth of a volume of heavydepletion (3, 4). The work helped to increaseawareness of the nonsustainability of suchhigh rates of fluid withdrawal. Production atThe Geysers has now been reduced to sus-tainable levels. Time-dependent tomogra-phy is currently used to monitor the CosoGeothermal Area, southern California (5).
Time-dependent seismic tomographywas first applied to a volcano in a study ofMammoth Mountain, a volcano on the rimof Long Valley Caldera, California. In 1989,an intense swarm of hundreds of earth-quakes accompanied an injection of newmagma into the roots of this volcano,and triggered the outpouring of some300 tons of CO
2per day from the vol-
cano’s surface. Several broad swathsof trees died as a result of high levelsof CO
2in the soil, and the CO
2
also presented an asphyxiation hazardto humans. A comparison of V
P/V
Stomo-
graphic images calculated for 1989 and1997 showed changes that correlated wellwith areas of tree death on the surface above,and were attributed to migration of CO
2in
the volcano (6).By showing that time-dependent seismic
tomography can be used to monitor struc-tural changes directly associated with a vol-canic eruption cycle, Patanè et al. take a crit-ical step toward developing a useful volcano-hazard-reduction tool based on seismictomography. As with all good experiments,however, it ushers in new challenges. V
P/V
S
is affected by several factors, including porefluid phase, pressure, mineralogy, and frac-ture density. However, determining how each
of these has changed when changes in onlytwo quantities (V
Pand V
S) have been mea-
sured is not possible and requires the addi-tion of other kinds of data. Both theoreticaladvances and more data from different vol-canoes are needed before the potential of themethod can be fully assessed.
At present, monitoring of active volca-noes still rests mostly on relatively unso-phisticated seismic networks and the moni-toring of simple parameters, such as thenumbers of earthquakes and the amplitudeof harmonic tremor. Patanè et al. show thatmuch more sophisticated methods can nowbe used. Some of these methods only need tobe automated—a critical factor if they are tobe useful in situations where information isneeded on an hourly basis. It is hoped thatthis automation work will be pushed for-ward rapidly in the near future, putting us on
track to realizing technological capabilitiesresembling those of the fictional VirtualGeophysical Laboratory by 2025.
References and Notes
1. D. Patanè, G. Barberi, O. Cocina, P. De Gori, C.Chiarabba, Science 313, 821 (2006).
2. The compressional and shear waves are the fastest andsecond-fastest waves to be radiated from an earthquakesource, so they arrive first and second on seismograms.Their ratio provides information about pressure andabout the presence of gas and liquid in the study volume.Thus, changes in their ratio can tell us about changes inpressure and gas/liquid, which are thought to accompanythe buildup and occurrence of a volcanic eruption.
3. G. R. Foulger, C. C. Grant, A. Ross, B. R. Julian, Geophys.Res. Lett. 24, 135 (1997).
4. R. C. Gunasekera, G. R. Foulger, B. R. Julian, J. Geophys.Res. 108, 2134 (2003).
5. G. R. Foulger, B. R. Julian, Geotherm. Resour. Counc.Bull. 33, 120 (2004).
6. G. R. Foulger et al., J. Geophys. Res. 108, 2147 (2003).
10.1126/science.1131790
769
Since its inception, the World WideWeb has changed the ways scientistscommunicate, collaborate, and edu-
cate. There is, however, a growing realiza-tion among many researchers that a clear
research agenda aimedat understanding the
current, evolving,and potential Web is
needed. If we want tomodel the Web; if we
want to understand the architectural princi-ples that have provided for its growth; and ifwe want to be sure that it supports the basicsocial values of trustworthiness, privacy,and respect for social boundaries, then wemust chart out a research agenda that targetsthe Web as a primary focus of attention.
When we discuss an agenda for a scienceof the Web, we use the term “science” in twoways. Physical and biological science ana-
lyzes the natural world, and tries to findmicroscopic laws that, extrapolated to themacroscopic realm, would generate thebehavior observed. Computer science, bycontrast, though partly analytic, is princi-pally synthetic: It is concerned with the con-struction of new languages and algorithmsin order to produce novel desired computerbehaviors. Web science is a combination ofthese two features. The Web is an engineeredspace created through formally specifiedlanguages and protocols. However, becausehumans are the creators of Web pages andlinks between them, their interactions formemergent patterns in the Web at a macro-scopic scale. These human interactions are,in turn, governed by social conventions andlaws. Web science, therefore, must be inher-ently interdisciplinary; its goal is to bothunderstand the growth of the Web and to cre-ate approaches that allow new powerful andmore beneficial patterns to occur.
Unfortunately, such a research area doesnot yet exist in a coherent form. Withincomputer science, Web-related research haslargely focused on information-retrievalalgorithms and on algorithms for the routingof information through the underlying Inter-net. Outside of computing, researchers grow
Understanding and fostering the growth of the World Wide Web, both in engineering and societal
terms, will require the development of a new interdisciplinary field.
Creating a Science of the WebTim Berners-Lee, Wendy Hall, James Hendler, Nigel Shadbolt, Daniel J. Weitzner
COMPUTER SCIENCE
T. Berners-Lee and D. J. Weitzner are at the Computer Scienceand Artificial Intelligence Laboratory, Massachusetts Instituteof Technology, Cambridge, MA 02139, USA. W. Hall and N. Shadbolt are in the School of Electronics and ComputerScience, University of Southampton, Southampton SO171BJ, UK. J. Hendler is in the Computer Science Department,University of Maryland, College Park, MD 20742, USA. E-mail: [email protected]
Enhanced online at
www.sciencemag.org/cgi/content/full/313/5788/769
www.sciencemag.org SCIENCE VOL 313 11 AUGUST 2006
PERSPECTIVES
Published by AAAS
on
De
ce
mb
er
1,
20
09
w
ww
.scie
nce
ma
g.o
rgD
ow
nlo
ad
ed
fro
m
3
2009년 12월 2일 수요일
A New Discipline
Model the Web’s structure
Articulate the architectural principles that have fueled its phenomenal growth
Discover how online human interactions are driven by and can change social conventions
4
2009년 12월 2일 수요일
Interdisciplinary Approach
5
2009년 12월 2일 수요일
WSRI & Web Science Trust
The Web Science Research Initiative (WSRI) is a joint endeavour between the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT and the School of Electronics and Computer Science (ECS) at the University of Southampton. The goal of WSRI is to facilitate and produce the fundamental scientific advances necessary to inform the future design and use of the World Wide Web
Publication: Foundations and Trends in Web Science
Events
– Web Science Summer Graduate School
– WebSci09 - Society On-Line
Directors of WSRI are establishing a charitable body - the Web Science Trust (WST)
– Working with WWW Foundation
6
2009년 12월 2일 수요일
WebSci’09: Society On-Line
Understanding of both human behavior and technological design
– How do people and organisations behave on-line – what motivates them to shop, date, make friends, learn, participate in political life or manage their health or tax on-line?
– Which Web-based designs will they trust? To which on-line agents will they delegate?
– How can the dark side of the Web – such as cybercrime, pornography and terrorist networks – be both understood and held in check without compromising the experience of others?
– What are the effects of varying characteristics of Web-based technologies – such as security, privacy, network structure, the linking of data – on on-line behaviour, both criminal and non-criminal?
– And how can the design of the Web of the future ensure that a system on which – as Tim Berners-Lee put it – democracy and commerce depends remains 'stable and pro-human'?
7
Identified the following areas of on-line society and Web development for particular attention:
– E-commerce
– Government and Political Life
– Social Relationships
– Cybercrime and/or the Prevention Thereof
– Health
– Culture On-Line
– E-Learning
The cross-cutting infrastructure issues on which these areas depend including, but not limited to:
– Linked Data and the Semantic Web
– Trust and Reputation
– Security and Privacy
– Networking (Social and Technical)
2009년 12월 2일 수요일
8
32 SC IE NTIF IC AME RIC AN Oc tober 20 0 8
Since the World Wide Web blossomed in the mid-1990s, it has exploded to more than 15 billion pages that touch almost
all aspects of modern life. Today more and more people’s jobs depend on the Web. Media, bank-ing and health care are being revolutionized by it. And governments are even considering how to run their countries with it. Little appreciated, however, is the fact that the Web is more than the sum of its pages. Vast emergent properties have arisen that are transforming society. E-mail led to instant messaging, which has led to social net-works such as Facebook. The transfer of docu-ments led to file-sharing sites such as Napster, which have led to user-generated portals such as YouTube. And tagging content with labels is cre-ating online communities that share everything from concert news to parenting tips.
But few investigators are studying how such emergent properties have actually blossomed, how we might harness them, what new phe-nomena may be coming or what any of this might mean for humankind. A new branch of science—Web science—aims to address such is-sues. The timing fits history: computers were built first, and computer science followed,
which subsequently improved computing significantly. Web science was launched as a formal discipline in November 2006, when the two of us and our col-leagues at the Massachusetts In-stitute of Technology and the University of Southampton in England announced the begin-
ning of a Web Science Research Initiative. Lead-ing researchers from 16 of the world’s top uni-versities have since expanded on that effort.
This new discipline will model the Web’s structure, articulate the architectural principles that have fueled its phenomenal growth, and dis-cover how online human interactions are driven by and can change social conventions. It will elu-cidate the principles that can ensure that the net-work continues to grow productively and settle complex issues such as privacy protection and in-tellectual-property rights. To achieve these ends, Web science will draw on mathematics, physics, computer science, psychology, ecology, sociolo-gy, law, political science, economics, and more.
Of course, we cannot predict what this na-scent endeavor might reveal. Yet Web science has already generated crucial insights, some presented here. Ultimately, the pursuit aims to answer fundamental questions: What evolu-tionary patterns have driven the Web’s growth? Could they burn out? How do tipping points arise, and can that be altered?
Insights AlreadyAlthough Web science as a discipline is new, earlier research has revealed the potential value of such work. As the 1990s progressed, search-ing for information by looking for key words among the mounting number of pages was returning more and more irrelevant content. The founders of Google, Larry Page and Sergey Brin, realized they needed to prioritize the results.
Their big insight was that the importance of a page—how relevant it is—was best understood in terms of the number and importance of the pages linking to it. The difficulty was that part of this definition is recursive: the importance of a page is determined by the importance of the
INFORMATION TECHNOLOGY
KEY CONCEPTS The relentless rise in Web pages and links is creating emer-gent properties, from social net-working to virtual identity theft, that are transforming society.
A new discipline, Web science, aims to discover how Web traits arise and how they can be harnessed or held in check to benefit society.
Important advances are begin-ning to be made; more work can solve major issues such as securing privacy and conveying trust.
—The Editors
By Nigel Shadbolt and Tim Berners-Lee
Studying the Web will reveal better
ways to exploit information,
prevent identity theft,
revolutionize industry and manage
our ever growing online lives
EMERGESWeb Science
2009년 12월 2일 수요일
Power-Law Distribution of theWorld Wide Web
Barabasi and Albert (1) propose an im-proved version of the Erdos-Renyi (ER) the-ory of random networks to account for thescaling properties of a number of systems,including the link structure of the WorldWide Web (WWW). The theory they present,however, is inconsistent with empirically ob-served properties of the Web link structure.
Barabasi and Albert write that because“of the preferential attachment, a vertexthat acquires more connections than anoth-er one will increase its connectivity at ahigher rate; thus, an initial difference in theconnectivity between two vertices will in-crease further as the network grows. . . .Thus older . . . vertices increase their con-nectivity at the expense of the younger . . .ones, leading over time to some verticesthat are highly connected, a ‘rich-get-rich-er’ phenomenon” [figure 2C of (1)]. It isthis prediction of the Barabasi-Albert (BA)model, however, that renders it unable toaccount for the power-law distribution oflinks in the WWW [figure 1B of (1)].
We studied a crawl of 260,000 sites, eachone representing a separate domain name. Wecounted how many links the sites received
from other sites, and found that the distribu-tion of links followed a power law (Fig. 1A).Next, we queried the InterNIC database (us-ing the WHOIS search tool at www.networksolutions.com) for the date on whichthe site was originally registered. Whereasthe BA model predicts that older sites havemore time to acquire links and gather links ata faster rate than newer sites, the results ofour search (Fig. 1B) suggest no correlationbetween the age of a site and its number oflinks.
The absence of correlation between ageand the number of links is hardly surpris-ing; all sites are not created equal. Anexciting site that appears in 1999 will soonhave more links than a bland site created in1993. The rate of acquisition of new links isprobably proportional to the number oflinks the site already has, because the morelinks a site has, the more visible it becomesand the more new links it will get. (Thereshould, however, be an additional propor-tionality factor, or growth rate, that variesfrom site to site.)
Our recently proposed theory (2), whichaccounts for the power-law distribution in thenumber of pages per site, can also be appliedto the number of links a site receives. In thismodel, the number of new links a site re-ceives at each time step is a random fractionof the number of links the site already has.New sites, each with a different growth rate,appear at an exponential rate. This modelyields scatter plots similar to Fig. 1B, and canproduce any power-law exponent ! " 1.
Lada A. AdamicBernardo A. Huberman
Xerox Palo Alto Research Center3333 Coyote Hill Road
Palo Alto, CA 94304, USAE-mail: [email protected]
References1. A.-L. Barabasi and R. Albert, Science 286, 509 (1999).2. B. A. Huberman and L. A. Adamic, Nature 401, 131(1999).
10 November 1999; accepted 4 February 2000
Response: Adamic and Huberman offer ad-ditional support for the evolutionary networkmodel that we offered (1). The apparent messin their fig. 1B is rooted in their choice not toaverage their data. We believe that taking theaverage over all points of the same age, andextracting the trends within those averages,would have unveiled the increasing tendencypredicted by our model.
Although we do not have access to their
data, we can illustrate the same procedure forthe network of movie actors that we dis-cussed (1). When the connectivity of the in-dividual actors is plotted as a function of therelease year of their first movie (Fig. 1A), theresults are very similar to those shown in fig.1B of Adamic and Huberman’s comment.The only difference is that the movie industryhad its boom not 4 years ago, as did theWWW, but rather at the beginning of thecentury; thus, the apparently structureless re-gime persists much longer. When the connec-tivity of the actors that debuted in the sameyear is averaged, however, the average con-nectivity in the last 60 years increases withthe actor’s age, in line with the predictions ofour theory, and the curve follows a power lawfor almost a hundred years (Fig. 1B). Weexpect that a similar increasing tendencywould appear for the WWW data after aver-aging, but the length of the scaling intervalwould be limited by the Web’s comparativelybrief history.
The fluctuations that lead to the appar-ent randomness of Fig. 1A are due to theindividual differences in the rate at whichnodes increase their connectivity. It iseasy to include such differences in themodel and continuum theory proposed by
Fig. 1. (A) The distribution function for thenumber of links, k, to Web sites (from crawl inspring 1997). The dashed line has slope ! #1.94. (B) Scatter plot of the number of links, k,versus age for 120,000 sites. The correlationcoefficient is 0.03.
Fig. 1. (A) Scatter plot of movie actor connec-tivity, k (the number of other actors with whichhe or she performed during his or her career),versus the year of debut. All actors from theInternet Movie Database were included; n #392,340. (B) Average movie actor connectivity,$k%, versus year of debut. To determine $k%, k isaveraged over all actors that debuted in thesame year. The curve shows a systematic in-crease in the average connectivity with theactor’s professional lifetime, t # (2000 & yearof debut). The dotted line follows k(t) ' t(,with ( # 0.49, very close to the prediction ( #0.5 of (1). Inset shows a log-log plot of $k% as afunction of t, which illustrates the presence ofscaling in the last century. The dotted line hasslope 0.5.
T E C H N I C A L C O M M E N T
www.sciencemag.org SCIENCE VOL 287 24 MARCH 2000 2115a
Model the Web’s Structure
PageRank by Page and Brin Web is a scale-free network -.
Northeastern University’s Albert-László Barabási
Web as having short paths and small worlds– While at Cornell University in the
1990s, Duncan J. Watts and Steven H. Strogatz
– Even though the Web was huge, a user could get from one page to any other page in at most 14 clicks
9
2009년 12월 2일 수요일
Analysis of Topological Characteristicsof Huge Online Social Networking Services
Yong-Yeol AhnDepartment of PhysicsKAIST, Deajeon, Korea
Seungyeop Han!
NHN Corp.Korea
Haewoon KwakDivision of Computer Science
KAIST, Daejeon, Korea
Sue MoonDivision of Computer Science
KAIST, Daejeon, Korea
Hawoong JeongDepartment of PhysicsKAIST, Deajeon, Korea
ABSTRACT
Social networking services are a fast-growing business in theInternet. However, it is unknown if online relationships andtheir growth patterns are the same as in real-life social net-works. In this paper, we compare the structures of threeonline social networking services: Cyworld, MySpace, andorkut, each with more than 10 million users, respectively.We have access to complete data of Cyworld’s ilchon (friend)relationships and analyze its degree distribution, clusteringproperty, degree correlation, and evolution over time. Wealso use Cyworld data to evaluate the validity of snowballsampling method, which we use to crawl and obtain par-tial network topologies of MySpace and orkut. Cyworld,the oldest of the three, demonstrates a changing scaling be-havior over time in degree distribution. The latest Cyworlddata’s degree distribution exhibits a multi-scaling behavior,while those of MySpace and orkut have simple scaling be-haviors with di!erent exponents. Very interestingly, eachof the two exponents corresponds to the di!erent segmentsin Cyworld’s degree distribution. Certain online social net-working services encourage online activities that cannot beeasily copied in real life; we show that they deviate fromclose-knit online social networks which show a similar de-gree correlation pattern to real-life social networks.
Categories and Subject Descriptors: J.4 [ComputerApplications]: Social and behavioral sciences
General Terms: Human factors, Measurement
Keywords: Sampling, Social network
1. INTRODUCTION
The Internet has been a vessel to expand our social net-works in many ways. Social networking services (SNSs) areone successful example of such a role. SNSs provide an on-line private space for individuals and tools for interactingwith other people in the Internet. SNSs help people findothers of a common interest, establish a forum for discus-sion, exchange photos and personal news, and many more.
!This work was conducted while Han was at KAIST.
Copyright is held by the International World Wide Web Conference Com-mittee (IW3C2). Distribution of these papers is limited to classroom use,and personal use by others.WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.ACM 978-1-59593-654-7/07/0005.
Cyworld, the largest SNS in South Korea, had already 10million users 2 years ago, one fourth of the entire populationof South Korea. MySpace and orkut, similar social network-ing services, have also more than 10 million users each. Re-cently, the number of MySpace users exceeded 130 millionwith a growing rate of over a hundred thousand people perday. It is reported that these SNSs “attract nearly half of allweb users” [1]. The goal of these services is to help peopleestablish an online presence and build social networks; andto eventually exploit the user base for commercial purposes.Thus the statistics and dynamics of these online social net-works are of tremendous importance to social networkingservice providers and those interested in online commerce.
The notion of a network structure in social relations datesback about half a century. Yet, the focus of most sociologicalstudies has been interactions in small groups, not structuresof large and extensive networks. Di"culty in obtaining largedata sets was one reason behind the lack of structural study.However, as reported in [2] recently, missing data may dis-tort the statistics severely and it is imperative to use largedata sets in network structure analysis.
It is only very recently that we have seen research re-sults from large networks. Novel network structures fromhuman societies and communication systems have been un-veiled; just to name a few are the Internet and WWW [3] andthe patents, Autonomous Systems (AS), and a"liation net-works [4]. Even in the short history of the Internet, SNSs area fairly new phenomenon and their network structures arenot yet studied carefully. The social networks of SNSs arebelieved to reflect the real-life social relationships of peoplemore accurately than any other online networks. Moreover,because of their size, they o!er an unprecedented opportu-nity to study human social networks.
In this paper, we pose and answer the following questions:What are the main characteristics of online social net-
works? Ever since the scale-free nature of the World-WideWeb network has been revealed, a large number of networkshave been analyzed and found to have power-law scaling indegree distribution, large clustering coe"cients, and smallmean degrees of separation (so called the small-world phe-nomenon). The networks we are interested in this work arehuge and those of this magnitude have not yet been ana-lyzed.
How representive is a sample network? In most networks,
Analysis on CyWorld
10
2009년 12월 2일 수요일
Social Drivers
Discover how online human interactions are driven by and can change social conventions– Social drivers-goals, desires, interests and
attitudes-are fundamental aspects of how links are made
– Understanding the Web requires insights from sociology and psychology every bit as much as from mathematics and computer science
– Stanley Milgram (1967) vs. Duncan Watts (2003)
11
An Experimental Study of Searchin Global Social Networks
Peter Sheridan Dodds,1 Roby Muhamad,2 Duncan J. Watts1,2*
We report on a global social-search experiment in which more than 60,000e-mail users attempted to reach one of 18 target persons in 13 countries byforwarding messages to acquaintances. We find that successful social search isconducted primarily through intermediate to weak strength ties, does notrequire highly connected “hubs” to succeed, and, in contrast to unsuccessfulsocial search, disproportionately relies on professional relationships. By ac-counting for the attrition of message chains, we estimate that social searchescan reach their targets in a median of five to seven steps, depending on theseparation of source and target, although small variations in chain lengths andparticipation rates generate large differences in target reachability. We con-clude that although global social networks are, in principle, searchable, actualsuccess depends sensitively on individual incentives.
It has become commonplace to assert that anyindividual in the world can reach any otherindividual through a short chain of social ties(1, 2). Early experimental work by Traversand Milgram (3) suggested that the averagelength of such chains is roughly six, andrecent theoretical (4) and empirical (4–9)work has generalized the claim to a widerange of nonsocial networks. However, muchabout this “small world” hypothesis is poorlyunderstood and empirically unsubstantiated.In particular, individuals in real social net-works have only limited, local informationabout the global social network and, there-fore, finding short paths represents a non-trivial search effort (10–12). Moreover, andcontrary to accepted wisdom, experimentalevidence for short global chain lengths isextremely limited (13–15). For example,Travers and Milgram report 96 messagechains (of which 18 were completed) initiatedby randomly selected individuals from a cityother than the target’s (3). Almost all otherempirical studies of large-scale networks(4–9, 16–19) have focused either on non-social networks or on crude proxies of socialinteraction such as scientific collaboration,and studies specific to e-mail networks haveso far been limited to within single institu-tions (20).
We have addressed these issues by con-ducting a global, Internet-based social searchexperiment (21). Participants registered on-line (http://smallworld.sociology.columbia.edu) and were randomly allocated one of 18target persons from 13 countries (table S1).
Targets included a professor at an Ivy Leagueuniversity, an archival inspector in Estonia, atechnology consultant in India, a policemanin Australia, and a veterinarian in the Norwe-gian army. Participants were informed thattheir task was to help relay a message to theirallocated target by passing the message to asocial acquaintance whom they considered“closer” than themselves to the target. Of the98,847 individuals who registered, about25% provided their personal information andinitiated message chains. Because subsequentsenders were effectively recruited by theirown acquaintances, the participation rate af-ter the first step increased to an average of37%. Including initial and subsequent send-ers, data were recorded on 61,168 individualsfrom 166 countries, constituting 24,163 dis-tinct message chains (table S2). More thanhalf of all participants resided in North Amer-ica and were middle class, professional,college educated, and Christian, reflectingcommonly held notions of the Internet-usingpopulation (22).
In addition to providing his or her chosencontact’s name and e-mail address, eachsender was also required to describe how heor she had come to know the person, alongwith the type and strength of the resultingrelationship. Table 1 lists the frequencieswith which different types of relationships—classified by type, origin, and strength—were
invoked by our population of 61,168 activesenders. When passing messages, senderstypically used friendships in preference tobusiness or family ties; however, almost halfof these friendships were formed through ei-ther work or school affiliations. Furthermore,successful chains in comparison with incom-plete chains disproportionately involved pro-fessional ties (33.9 versus 13.2%) rather thanfriendship and familial relationships (59.8versus 83.4%) (table S3). Successful chainswere also more likely to entail links thatoriginated through work or higher education(65.1 versus 39.6%) (table S4). Men passedmessages more frequently to other men(57%), and women to other women (61%),and this tendency to pass to a same-sex con-tact was strengthened by about 3% if thetarget was the same gender as the sender andsimilarly weakened in the opposite case. In-dividuals in both successful and unsuccessfulchains typically used ties to acquaintancesthey deemed to be “fairly close.” However, insuccessful chains “casual” and “not close”ties were chosen 15.7 and 5.9% more fre-quently than in unsuccessful chains (tableS5), thus adding support, and some resolu-tion, to the longstanding claim that “weak”ties are disproportionately responsible for so-cial connectivity (23).
Senders were also asked why they consid-ered their nominated acquaintance a suit-able recipient (Table 2). Two reasons—geographical proximity of the acquaintanceto the target and similarity of occupation—accounted for at least half of all choices, ingeneral agreement with previous findings(24, 25). Geography clearly dominated theearly stages of a chain (when senders weregeographically distant) but after the third stepwas cited less frequently than other charac-teristics, of which occupation was the mostoften cited. In contrast with previous claims(3, 12), the presence of highly connectedindividuals (hubs) appears to have limitedrelevance to the kind of social search embod-ied by our experiment (social search withlarge associated costs/rewards or otherwisemodified individual incentives may behavedifferently). Participants relatively rarelynominated an acquaintance primarily becausehe or she had many friends (Table 2,“Friends”), and individuals in successful
1Institute for Social and Economic Research and Pol-icy, Columbia University, 420 West 118th Street,New York, NY 10027, USA. 2Department of Sociology,Columbia University, 1180 Amsterdam Avenue, NewYork, NY 10027, USA.
*To whom correspondence should be addressed. E-mail: [email protected]
Table 1. Type, origin, and strength of social ties used to direct messages. Only the top five categories inthe first two columns have been listed. The most useful category of social tie is medium-strengthfriendships that originate in the workplace.
Type of relationship % Origin of relationship % Strength of relationship %
Friend 67 Work 25 Extremely close 18Relatives 10 School/university 22 Very close 23Co-worker 9 Family/relation 19 Fairly close 33Sibling 5 Mutual friend 9 Casual 22Significant other 3 Internet 6 Not close 4
R E P O R T S
www.sciencemag.org SCIENCE VOL 301 8 AUGUST 2003 827
2009년 12월 2일 수요일
60 November 2007/Vol. 50, No. 11 COMMUNICATIONS OF THE ACM
By Oded Nov
illustration by lisa haney
In order to increase and enhance user-generated content contributions, it is important to understand the factors that lead
people to freely share their time and knowledge with others.
The last few years have seen a substantial growth inuser-generated online content [7, 11] deliveredthrough collaborative Internet outlets such asYouTube, Flickr, or Slashdot.org, as well as more tra-ditional media outlets such as BBC News.com [6].Consistent with the Open Information Society’s visionof decreasing restrictions on the creation and deliveryof previously protected information goods [1], user-generated content marks a new way for information tobe created, manipulated, and consumed.
Wikipedia, the Web-based user-created encyclopedia,is a prominent example of a collaborative, user-gener-ated content outlet [11]. With more than 1.9 million
WHAT MOTIVATESWIKIPEDIANS?
12
the ample opportunity for contributors to serve theego and publicly exhibit their knowledge, we wouldexpect the contribution level to be positively associ-ated with the Enhancement level.
Wikipedia relies on the open source model [9]where people contribute their time, talent, and knowl-edge in a collaborative effort to create publicly avail-able knowledge-based products. Therefore, in additionto the six general volunteering motivations, two othermotivations—fun and ideology—used extensively inresearch on open source software development (forexample, [8, 12] ) may also help to understand whypeople contribute to Wikipedia. In bothcases we would expect to see higher con-tribution levels associated with highermotivation levels.
THE SURVEY
Wikipedians volunteer their time andknowledge for no monetary reward,and therefore our questionnaireincluded contribution measures aswell as volunteering motivations mea-sures. The contribution level was mea-sured as hours per week spent oncontributing, a measure commonlyused as a proxy for participant contri-bution [10]. Motivation was measuredthrough the volunteering motivationsscale [2] adjusted to the Wikipediacontext, as well as items adjusted fromresearch on open source motivationmeasuring ideology [12] and fun [8].All of the motivation items in thequestionnaire were presented as state-ments to which Wikipedians wereasked to state how strongly they agreeor disagree on a scale of 1 to 7. Exam-ples of questionnaire items are pro-vided in Table 1.
The English Wikipedia AlphabeticalList of Wikipedians includes 2,847 peo-ple. These are not all the contributors,but rather only those who have createda personal user page beyond merely contributing con-tent, and therefore may be more committed to con-tributing. Of these, a random sample of 370Wikipedians were emailed a request to participate in aWeb-based survey. A total of 151 valid responses werereceived (40.8% response rate), of which 140 (92.7%)were from males. The respondents’ mean age was30.9, and on average they have been contributingcontent to Wikipedia 2.3 years. Like many studiesbased on survey design, this study may potentially suf-
fer from a response bias, whereby, for example, enthu-siastic Wikipedians were more likely to participatethan lesser contributors. To test this, responses of theearliest 30% respondents were compared with the last30% of the sample in terms of the contribution level.No bias was found.
THE RESULTS
The average level of contribution was 8.27 hours perweek—a total that varied across Wikipedians’ demo-graphics and motivation levels. Overall, the topmotivations were found to be Fun and Ideology,
whereas Social, Career, and Protec-tive were not found to be strongmotivations for contribution (seeTable 2). As expected, it was foundthat the levels of each of the sixmotivations positively correlatedwith contribution level (see Table2). However, somewhat unexpect-edly, the contribution level was notcorrelated significantly with the Ide-ology and Social motivations.
The Ideology case is particularlyinteresting, because while ideology isindicated as a strong motivation(ranked second out of eight), themotivation level is not significantlycorrelated with the contributionlevel. In other words, while peoplestate that ideology is high on their listof reasons to contribute, being moreideologically motivated does nottranslate into increased contribution.One way to explain this finding
could be the effect of social desirability [3] onresponses to the questionnaire. This explanation,however, is ruled out since part of Crowne and Mar-lowe’s scale [3] was used to control for social desir-ability. An alternative explanation might be that whilepeople have strong opinions about ideology, these donot translate into actual behavior—thus exhibiting acase of “talk is cheap.” Another possible explanationmight be that contributors who are motivated by ide-ology may also contribute to other ideology-relatedprojects, such as open source software projects, andwould therefore have less time to contribute to eachproject.
The lack of correlation between the Social motiva-tion and the level of contribution may be explained bythe identity of the Wikipedian’s “important others;”that is, if important others are people who do not haveinterest in Wikipedia, then the Wikipedian’s contri-bution is not expected to be influenced by the Social
COMMUNICATIONS OF THE ACM November 2007/Vol. 50, No. 11 63
Motivation
*significant at 0.05 level**significant at 0.01 level***significant at 0.001 level
Fun
Ideology
Values
Understanding
Enhancement
Protective
Career
Social
Mean
6.10 (1.15)
[0.322***]
5.59
(1.71)[0.110]
3.96(1.55)
[0.175*]
3.92(1.48)
[0.296***]
2.97(1.39)
[0.313***]
1.97(1.05)
[0.306***]
1.67 (0.94)
[0.185*]
1.51(0.92)
[0.027]
Table 2. Motivation levelsand correlations
with contributionlevels. Standard
deviations inparentheses.
Pearson correlation
coefficient inbrackets.
62 November 2007/Vol. 50, No. 11 COMMUNICATIONS OF THE ACM
articles created by users inEnglish alone, it is amongthe top 10 fastest growingWeb brands [10] and apromising model for col-laborative knowledgesharing [5].
While collaborativecontent in general, andWikipedia in particular,receive increasing atten-tion in both the researchand the business commu-nities, no empirical,quantitative data is available that illustrates why peo-ple contribute to outlets like Wikipedia. Contribu-tors’ motivations seem to be critical for sustainingWikipedia and other collaborative user-generatedcontent outlets, since the content is contributed byvolunteers [4] who offer their time and talent inreturn for no monetary reward. Therefore, in order tounderstand what underlies user-generated contentcontribution, we must understand what motivatescontent contributors, and identify which motivationsare associated with high or low levels of contribution.
As a volunteering activity, content contribution toWikipedia can be explained by the factors underpin-ning volunteering behavior. In their influential studyof volunteers’ motivations, Clary et al. [2] identifiedsix motivational categories:
Values. Volunteering gives people an opportunityto express values related to altruistic and humanitar-ian concerns for others. Given that contributing con-tent to Wikipedia enables participants to activelyshow their concern by sharing knowledge with others,it is expected that higher levels of the Values functionamong contributors will positively relate to the extentof contribution to Wikipedia.
Social. Volunteering may provide people thechance to be with their friends or to engage in activi-ties viewed favorably by important others. Given the
collaborative nature ofWikipedia, we wouldexpect contribution levelsto be positively associatedwith Social motivation lev-els.
Understanding. Throughvolunteering, individualsmay have an opportunityto learn new things andexercise their knowledge,
skills, and abilities. Thus, as contributing content toWikipedia allows contributors to exercise theirknowledge, skills, and abilities, we would expect tosee higher contribution levels the more Wikipediacontributors are motivated by Understanding.
Career. Volunteering may provide an opportunityto achieve job-related benefits such as preparing for anew career or maintaining career-relevant skills. Inthe Wikipedia context, we would expect to find somecorrelation between contribution levels and theCareer function, as Wikipedia offers contributors away to signal their knowledge and writing skills topotential employers. However, we do not expect thisto be a strong correlation, as most Wikipedians arenot professional writers, or alternatively, becausemany Wikipedians contribute anonymously so theircontribution would not be useful for career purposes.
Protective. This category includes protecting theego from negative features of the self, reducing guiltover being more fortunate than others, or addressingone’s own personal problems. Wikipedia seems toprovide ample opportunity for contributors toaddress such needs and share the fortune of havingknowledge with others who do not have it. Therefore,we would expect contribution to be positively associ-ated with the Protective category.
Enhancement. This category somewhat relates tothe Protective category, however, Enhancementinvolves positive strivings of the ego rather than elim-inating negative ego-based factors. Here, too, given
Motivation
Protective
Values
Career
Social
Understanding
Enhancement
Fun
Ideology
Question example
“By writing/editing in Wikipedia I feel less lonely.”
“I feel it is important to help others.”
“I can make new contacts that might help my business or career.
“People I'm close to want me to write/edit in Wikipedia.”
“Writing/editing in Wikipedia allows me to gain a new perspective on things.”
“Writing/editing in Wikipedia makes me feel needed.”
“Writing/editing in Wikipedia is fun.”
“I think information should be free.”
Table 1. Motivations and questionnaire items.
To understand what underlies user-generated content contribution, we must understand what motivates content
contributors, and identify which motivations are associated with high or low levels of contribution.
2009년 12월 2일 수요일
Small Technical Innovation
Explore how a small technical innovation can launch a large social phenomenon– Emergence of the blogosphere - TrackBack– Twitterverse - ReTweet, Follow/Follower– The growth of Facebook - Facebook Connect
13
2009년 12월 2일 수요일
Understand on the dissemination of an idea or information might change our view of journalism and commentary
What mechanisms can assure blog readers that the facts quoted are trustworthy?
Provenance of Information
14
BLOGOSPHERE has certain patterns of power. Matthew Hurst tracked how blogs link to one another. A visualization of the result (left) dis-plays each blog as a white dot; the few large dots are massively popular sites. Blogs that share numerous cross citations form distinct communi-ties (purple). Isolated groups that communicate frequently among themselves but rarely with oth-ers appear as straight lines along the outer edges.
2009년 12월 2일 수요일
Influentials: Two Viewpoints
15
2009년 12월 2일 수요일
The Rise of Semantic Web
The network of data on the Web RDF (Resource Description Framework)
– Gives meaning to data through sets of ‘triples’– The subjects, verbs and objects are each identified by a Universal
Resource Identifier (URI)
Taxonomies and Ontologies Wiki
– DBpedia project by Chris Bizer• As of November 2008, the DBpedia dataset consists of around 274 million
RDF triples
– Motivation to contribute to the Semantic Web? - Oded Nov of Polytechnic Institute of NYU
16
2009년 12월 2일 수요일
17
S
Corporate applications are well under way, and consumer uses are emerging
2009년 12월 2일 수요일
Graph of Linked Data sets on the Web, as at March 2009
18
2009년 12월 2일 수요일
Research Roadmap
By Nigel Shadbolt, Web Science Research Initiative - November, 2008 A Computational Perspective
– Linked Data Web or Semantic Web: how we are to browse, explore and query such a Web at scale.
– Collective Intelligence with only light rules of social coordination can lead to the emergence of large-scale, coherent resources such as Wikipedia. What are the characteristics of such resources? Why do people contribute and how do they maintain a highly stable core body of connected content?
– How do we support inference at a Web scale? What types of reasoning are possible? How is context represented and supported in Web inference?
– How are concepts such as trust and provenance computationally represented, maintained and repaired on the Web?
– As the Web has grown substantial amounts of it have become disconnected, atrophied or in others ways redundant. How are we to identify such necrotic and non-functional parts of the Web and what should be done about them?
19
2009년 12월 2일 수요일
Research Roadmap
A Mathematical Perspective– How do we model the transient or ephemeral Web? How do
we model this graph beneath the graph that is the Web?– How are Bayesian or other uncertainty representations best
used within the Web?– What is the topological structure of the Web? Can connections always
be established between its various parts, or do particular dynamic and time-dependent conditions create disconnected or sub- regions within it?
– The virtual “shape” of the Web: A particular query about a given subject may organize Web pages, existing or virtual, according to “how close they are” with respect to the given search criteria.
• A different structure to different users. It is a mathematical challenge to develop tools to describe this structure.
– How do we measure the level of complexity of the Web?
20
2009년 12월 2일 수요일
Research Roadmap
A Social Science Perspective– How can we develop inter-disciplinary epistemologies that will enable us to
understand the Web as a complex socio-technical phenomenon?– How can we do mixed methods research to explore the relations between
ethnographic insights to Web practice and the emergence of the Web at the macro level?
– How can we draw on new data sources e.g. digital records of network use to develop understanding of the sociological aspects of the Web?
– What are the on-going iterative relations between use and design of the Web?– How and why do people use newly emergent forms of the Web in the way
that they do? What implications does this have for our understanding of key sociological categories, e.g. kinship, gender, race, class and community, and vice versa? What implications does this have for our understanding of psychological constructs, e.g. personal and group identity, collaborative decision making, perception and attitudes.
– How is the Web situated within networks of power and in relation to social inequalities? To what extent might the Web offer empowering political resources? How might the Web change further as new populations access it?
21
2009년 12월 2일 수요일
Research Roadmap
An Economic Perspective– What are the economics of Web 2.0 (+)?
– What are the economic forces that shape the formation of social networks on the Web? What are the properties of those networks? What is the relationship between the economic structure of the Web, its social and mathematical structure?
– What are the commercial incentives created by the Web? What will be the industrial structure? Or are there forces that will allow smaller scale operations to co-exist with large firms?
– What are the economic arguments for and against open platforms in the Web? Should policy (economic and public) play any role in shaping or determining the openness of Web platforms?
– What (economic and social) mechanisms can be designed to improve the performance of the Web? For example, are there mechanisms that can improve the extent and quality of participation in online communities?
– Economics for piracy, privacy and identity?
22
2009년 12월 2일 수요일
Research Roadmap
A Legal Perspective– Techniques for representing and reasoning over legal and social
rules – explore and understand the impact of law as a driver in shaping the Web development?
– Is the present intellectual property regulatory regime fit for purpose in the Web 2.0 (+) environment? What is content in the Semantic Web and what rights should attach to it particularly when much is likely to be “computer generated”?
– Which technologies within the Web should the law ensure remain “open” rather than becoming the “property” of one or more commercial entities and what are the consequences of the choices available?
– To what extent are the service providers going to become the legal gatekeepers for public authorities in terms of delivering their public policy objectives e.g. Web policing for what is judged to be “illegal and harmful content”?
– What privacy issues arise in a Web environment of increasingly sophisticated information sharing?
23
2009년 12월 2일 수요일
Integrative Research Themes
Collective Intelligence– Technical, socio-economic, legal, psychological
The Openness of the Web– Economic, Legal
The Dynamics of the Web– Mathematical, sociological, legal, linguistic
Security, Privacy and Trust– Economic, social, legal interaction
Inference– Computational, psychological, linguistic
24
2009년 12월 2일 수요일
Thank you and meet me atFacebook: stevehanTwitter: steve3034me2day: steve3001
socialcomputing.tistory.com
25
2009년 12월 2일 수요일