If you can't read please download the document
Upload
buiquynh
View
217
Download
0
Embed Size (px)
Citation preview
Social Media &Big Data
Robert AcklandAustralian Demographic and
Social Research Institute (ADSRI)Australian National [email protected]
http://voson.anu.edu.au
Notes prepared for Big Data Analysis for Social Scientists courseACSPRI Winter Program, Brisbane, 29 June 3 July 2015
2
Plan of lecture
Examples of computer-mediated interaction Online research methods Mapping Cyberspace Construct validity of virtual world data Big Data
3
Examples of computer-mediated interaction
4
Newsgroups - Repositories of emails set up for different topics, often hosted on Usenet.
5
Wikis
"A wiki is a collection of web pages designed to enable anyone who accesses it to contribute or modify content, using a simplified markup language." (http://en.wikipedia.org/wiki/Wiki)
6
Folksonomy - A website enabling collaborative creation and managing of tags to annotate and categorise content (also known as social classification / tagging).
7
Blog - a chronologically updated website, typically written by a single author and designed to provide regular commentary on particular topics or else to serve as online diary.
8
Social network services - websites that allow people to create personal profiles and interact by requesting and accepting "friendships" and joining groups/forums.
9
Virtual world - computer-based simulated environments where individuals can assume digital representations (avatars) and interact.
10
MUD (British Legends) started in 1978 - said to be oldest virtual world in existence(http://www.british-legends.com/history.htm)
11
An advanced character in EverQuest 2
12
Micro-blogging a service allowing subscribers to broadcast short messages (140 char. max.) to other subscribers of the service. Video clips: Twitter in Plain
English Twouble with Twitter
13
14
Online research methods
15
Dimensions of Online Research Methods
Method: Quantitative standardised observational form of data collection (e.g.
survey) on sample from larger population; after coding, typically work with numbers
Qualitative exploring concepts; less focus on standardisation; more involvement by researcher; typically work with text
Mode: experiments, surveys, field research, unobtrusive research Presence of researcher: Obtrusive (reactive) / Unobtrusive (non-
reactive)
16
Analysis of digital trace data (Facebook profiles, websites content, website logs, click behaviour, emails, e-commerce data) is an example of unobtrusive research subjects may know they are being observed (or could be observed) when
generating the data, but can be considered unobtrusive if this knowledge is not likely to lead to biases in the data for purpose of present study
17
ORM: Method versus Researcher Presence
18
Unobtrusive social science researchhow we used to do it...and how it's done today...
19
Mapping Cyberspace
20
Cyberspace. A consensual hallucination experienced daily by billions of legitimate operators... A graphic representation of data abstracted from the banks of every computer in the human system. Unthinkable complexity. Lines of light ranged in the nonspace of the mind, clusters and constellations of data. Like city lights, receding... William Gibson, Neuromancer, 1984
It was suggestive of something, but had no real semantic meaning, even for me, as I saw it emerge on the page. -- Gibson on the origin of the term in the 2000 documentary No Maps for These Territories.
21
Router-level connectivity of the Internet, 1999 (Internet Mapping Project)
22
Outbound hyperlinks of the Australian Labor Party (Ackland and Gibson, 2004 using HypViewer)
23
Outbound hyperlinks of environmental activist organisation (2006 using Large Graph Layout)
Hyperlink network of an environmental activist organization (each node is a website and the ties are hyperlinks between websites).
Hyperlink data collected using Virtual Observatory for the Study of Online Networks (VOSON) System (http://voson.anu.edu.au)
Network map rendered using VOSON & Large Graph Layout
http://voson.anu.edu.au/
24
Hyperlink network - Australian sites focused on abortion or pregnancy (Ackland and Evans, 2005)
VOSON hyperlink network of Australian web sites focused on abortion (Ackland and Evans, 2005)
Force-directed graphing algorithm clearly displays assortative mixing on basis of abortion stance
Note boundary-spanner website with high betweeness
25
Divided They Blog Adamic and Glances (2005)
Network formed by 1500 US political bloggers
Each node is a blogger (red - conservative, blue - liberal) and each tie is a hyperlink
Node size is proportional to indegree
26
Twitter network (from NodeXL book)
27
Retweet/mention/replynetworkofTwitteruserswhotweeted(#auspolOR#ausvotes)AND(#asylumOR#asylumseekerOR#marriagequality)January2013
28
Big Data
29
What is the role of social scientists in the Big Data era?
Following draws from Gonzlez-Bailn, S. (2014): "Social Science in the Era of Big Data," forthcoming in Policy & Internet.
Two views about how Big Data will transform social science:1) Theory and interpretation will become less necessary data will speak
for themselves - e.g. Anderson (2008)2) Data-driven approaches underestimate role of researchers.
Disentangling signal from noise is a subjective process. Need (social science) context to identify meaningful correlations (and hopefully causality) in the data.
Perhaps unsurprisingly, I support view #2...
30
In order to insights from Big Data we often need to reduce them, by: applying filters (allowing identification of relevant streams of information)
or by aggregating them in a way that helps identify the right temporal scale or
spatial resolution. Social science can help in both of those stages
31
Filters involve sampling, which social scientists know a lot about. For example with Twitter, research often involves: Choosing keywords or hashtags that identify the relevant streams of
information, or identifying set of seed users from whom to snowball in reconstructing networks of communication.
We access Twitter data via application programming interfaces (APIs) these generally do not give access to the full stream of information so we don't get a random sample of all activity.
Both of the above can lead to bias which may lead to incorrect conclusions e.g. conclusions about composition of communication network on Twitter will be
biased towards most central/active actors if snowball sampling is used
32
Once we have collected our Twitter data, we need to aggregate them to construct networks of communication. Network ties can be: RTs (retweets) - used to broadcast messages previously sent by other
users @mentions - used to engage in direct communication with others.
Conover et al. (2011) found that there is strong ideological polarization on Twitter when RTs are used for network ties, but no polarisation when @mentions are used
33
Gonzlez-Bailn, S. (2014) Once again, the data cannot speak by themselves, because a lot of
choices are made along the way to determine how best to analyze themtheir interpretation very much depends on those choices; which are not data-driven but human....In other words, Big Data will not bring about the end of theory; quite the contrary. And social science has a crucial role to play in the discovery of the biases that are intrinsic to digital data, as well as in the construction of convincing stories about what those data reveal.
34
References Adamic, L., and N. Glance (2005): "The Political Blogosphere and the 2004 U.S. Election: Divided They Blog," Mimeograph. Available
at: http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf.
Anderson, C. (2008): The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. in Wired magazine.
Burt, R. (2011). Structural holes in virtual worlds. Booth School of Business (Univ. of Chicago) working paper.
Conover, M. D., Jacob Ratkiewicz, M. Francisco, B. Goncalves, Alessandro Flammini, and Filippo Menczer. 2011. Political Polarization on Twitter. in International Conference on Weblogs and Social Media (ICWSM'11).
Gonzlez-Bailn, S. (2014): "Social Science in the Era of Big Data," forthcoming in Policy & Internet. Available at SSRN: http://ssrn.com/abstract=2238198
Hansen, D. L., Shneiderman, B., and Smith, M. A. (2010). Analyzing Social Media Networks with NodeXL: Insights from a connected world. Morgan-Kaufmann, Burlington, MA.
Smith, M., Rainie, L., Himelboim, I. And B. Shneiderman (2014): Mapping Twitter Topic Networks: From Polarized Crowds to Community Clusters, Pew Research Center report. http://www.pewinternet.org/2014/02/20/mapping-twitter-topic-networks-from-polarized-crowds-to-community-clusters
Williams, D. (2010). The mapping principle, and a research framework for virtual worlds. Communication Theory, 20(4):451470.
http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf
Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34