Upload
annabelle-kelly
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
www.the-data-mine.co.uk
Mining Social Networks
Dr Andy Pryke
Commercial Programming LectureOctober 2011
www.the-data-mine.co.uk
Contents
What are Social NetworksWhy Analyse Them?Analysis TechniquesExample Applications
www.the-data-mine.co.uk
Social Network Analysis
Also called Organizational Network Analysis Pre-dates data mining. Developed by sociologists and
anthropologists Formalise their understanding of family and
community relationships.
www.the-data-mine.co.uk
What is a Network
Referred to technically as a "graph". Each person (or organisation etc.) is represented as a
node. Visually this is normally a dot or square.
Connections are called “links” or “edges” Represented as a line. Indicates communications (e.g. emails), purchases, visits, or less
tangible things such as emotional relationships. Can be “directed” or “undirected”
e.g. On Twitter, you follow Stephen Fry, but he doesn’t follow you!
www.the-data-mine.co.uk
Email communication Graph
Nodes = People Links = Emails Source: orgnet.com
www.the-data-mine.co.uk
Example - Mapping Links between Blogs
Sources:http://discovermagazine.com/2007/may/map-welcome-to-the-blogospherehttp://datamining.typepad.com/gallery/blog-map-gallery.html
1 - Daily Kos
2 - BoingBoing
3 - LiveJournal Users
4 - Highly Interlinked Blogs
5 - Porn Blogs - not linked in
6 - Sports Blogs - Separate but connected
www.the-data-mine.co.uk
Example - Twitter Social Network
Source: Bruno Peeters
http://bvlg.blogspot.com/2007/04/twitter-vrienden.html
www.the-data-mine.co.uk
VideoNicholas Christakis
The hidden influence of social networks
TED Talk, Feb 2010
www.the-data-mine.co.uk
Applications of Social Network DM
Typical applications of social network analysis and data mining: Detection of criminal activity, Counter terrorism, "homeland
security" and intelligence Analysis of relationships within companies Sociological and anthropological studies Reciprocal trust schemes such as e-bay ratings Recommended friends on Facebook Filter or recommend social media content Etc….
www.the-data-mine.co.uk
Complex Network Example
www.the-data-mine.co.uk
Complex Network Example
www.the-data-mine.co.uk
How do we Analyse Networks?
www.the-data-mine.co.uk
Graph Statistics - Individual Nodes
Degree Centrality Number of connections to other nodes. High values mean many connections. Can measure links in and out separately
Applications….
www.the-data-mine.co.uk
Graph Statistics - Individual Nodes
Degree Centrality Number of connections to other nodes. High values mean many connections. Can measure links in and out separately
Applications Who is most listened to on Twitter? Who has most contacts within a company? Which user’s reviews influence others the most?
www.the-data-mine.co.uk
Graph Statistics - Individual Nodes
Closeness Centrality The average number of steps required to reach any
other node. Communications are easier if you don't have to go through too many people.
Applications...
www.the-data-mine.co.uk
Graph Statistics - Individual Nodes
Closeness Centrality The average number of steps required to reach any
other node. Communications are easier if you don't have to go through too many people.
Applications Is this person central to the group? Is your message likely to reach the audience?
www.the-data-mine.co.uk
Graph Statistics - Individual Nodes
Betweenness Centrality How much of a link between other nodes is this
node? Applications…
www.the-data-mine.co.uk
Graph Statistics - Individual Nodes
Betweenness Centrality How much of a link between other nodes is this
node? Applications
Someone who has a high betweenness centrality is often a broker between others.
What happens if this person leaves the network?
www.the-data-mine.co.uk
Graph Statistics - Networks as a Whole
Structural holes Gaps in linkage between groups.
Applications…
www.the-data-mine.co.uk
Graph Statistics - Networks as a Whole
Structural holes Gaps in linkage between groups.
Applications Bridges across this access information from both, suggesting
influence and understanding of an organisation. Can we create a bridge? Is there an opportunity to control or influence communications
between groups?
www.the-data-mine.co.uk
Graph Statistics - Networks as a Whole
Degree of centralisation is the network held together by just a few nodes? Or is it more cohesive? Measures include average and variance of degree centrality
Applications…
www.the-data-mine.co.uk
Graph Statistics - Networks as a Whole
Degree of centralisation is the network held together by just a few nodes? Or is it more cohesive? Measures include average and variance of degree centrality
Applications Is a crime network vulnerable to disruption? What happens to a company if a few key people leave?
www.the-data-mine.co.uk
Graph Statistics – More…
There are many other measures, for examples see: http://faculty.ucr.edu/~hanneman/networkshop/index.html http://en.wikipedia.org/wiki/Social_network
www.the-data-mine.co.uk
Data Mining Approaches to Networks
Structural Equivalence Find nodes with similar roles in the network
Cluster Analysis Identify groups of nodes which are closely connected - and
characterise them Identifying the Most Influential People Predicting Node Types (e.g. Fraudster) Profiling Sub-networks (e.g. terrorist cell)
www.the-data-mine.co.uk
Twitter - Clustered Network
To reduce clutter, we can cluster people who reference each other,and only show links within clusters.
http://www.neoformix.com/2009/TorontoTwitterCommunity.html
www.the-data-mine.co.uk
Data Mining Social Networks - Challenges
Standard problems Incompleteness – We don’t know everything Incorrectness – What we think we know is wrong Inconsistency – We have contradictions in our data
Data transformation - Getting data into a form acceptable by your tools
Fuzzy Boundaries - Networks do not normally have distinct boundaries
Network Dynamics - Relationships change over time
www.the-data-mine.co.uk
Example Application - Viral Marketing
"In our experiments with the Epinions knowledge-sharing Web site, the most valuable customer had a network value of over 20,000, meaning that marketing to that customer was as effective as marketing to over 20,000 others in the absence of network effects, but the customer's number of direct links to others in the network (i.e., people who read his reviews) was much smaller."Pedro Domingos, Mining Social Networks for Viral Marketing http://www.cs.washington.edu/homes/pedrod/papers/iis04.pdf
www.the-data-mine.co.uk
Example - Identifying Academic Groups
Community Detection in Large-Scale Social NetworksNan Du, Bin Wu, Xin Pei , Bai Wang and Liutong Xu, SIGKDD Workshop on Web Mining and Social Network Analysis, August 12-15, 2007, San Jose , California
www.the-data-mine.co.uk
Software for Social Network Analysis / DM
StatNet – R Packages - http://statnet.org/
StatNetTutorial - http://www.jstatsoft.org/v24/i09/paper
JUNG – Open Source Java toolkit for SNA - http://jung.sourceforge.net/
NetMiner - Commercial, Comprehensive SNA - http://www.netminer.com/
Pajek - Comprehensive Social Network Analysis, free for academic use - http://pajek.imfm.si/doku.php
Subdue - Graph based data mining tool. Copyright but freely downloadable - http://ailab.wsu.edu/subdue/
More - http://en.wikipedia.org/wiki/Social_network_analysis_software
www.the-data-mine.co.uk
Looking Forward
Lots and lots of network data out there What about:
Applications for individuals Social Applications (e.g. like TheyWorkForYou.com ) Applications within a University Applications which make money
Potential final year / M.Sc Projects ?
www.the-data-mine.co.uk
Mining Social Networks
Dr Andy Pryke
Commercial Programming LectureOctober 2011
www.the-data-mine.co.uk
Bibliography
Very out of date - do look for newer papers and references!
www.the-data-mine.co.uk
Bibliography - Overview
Paper credited with launching the field - Barnes, J. (1954). Class and Committees in a Norwegian Island Parish. Human Relations, 7, 39-58.
List of systems for Mining Graph data - http://hms.liacs.nl/graphs.html
Introduction to Social Network Analysis - http://www.orgnet.com/sna.html
Network Theory and Analysis in Organizations, a brief overview - http://www.tcw.utwente.nl/theorieenoverzicht/Theory%20clusters/Organizational%20Communication/Network%20Theory%20and%20analysis_also_within_organizations.doc/
www.the-data-mine.co.uk
Bibliography - Journals and Workshops
Social Networks Journal - http://www.elsevier.com/wps/find/journaldescription.cws_home/505596/description
Workshop on Link Analysis and Group Detectionhttp://kt.ijs.si/Dunja/LinkKDD2006/
SIGKDD Workshop on Web Mining and Social Network Analysis http://workshops.socialnetworkanalysis.info/websnakdd2007/
www.the-data-mine.co.uk
Bibliography Data Mining Papers
Maitrayee Mukherjee, and Lawrence B. Holderm, Graph-based Data Mining on Social Networks - http://www-2.cs.cmu.edu/~dunja/LinkKDD2004/Maitrayee-Mukherjee-LinkKDD-2004.pdf
Ingrid Fischer and Thorsten Meinl, Graph Based Molecular Data Mining - An Overview - http://www2.informatik.uni-erlangen.de/Forschung/Publikationen/download/graphBasedDM_SMC2004.pdf
Jennifer Xu and Hsinchun Chen, Criminal Network Analysis and Visualization: A Data Mining Perspective, Communications of the ACM - http://ai.eller.arizona.edu/COPLINK/publications/crimenet/Xu_CACM.doc
www.the-data-mine.co.uk
Bibliography - Data Mining Papers (2)
Pedro Domingos, Mining Social Networks for Viral Marketing - http://www.cs.washington.edu/homes/pedrod/papers/iis04.pdf
David Jensen and Jennifer Neville, Data Mining in Social Networks - Looks specifically at predicting film receipts from IMDB data - http://kdl.cs.umass.edu/papers/jensen-neville-nas2002.pdf
Bootstrapping the FOAF-Web: An Experiment in Social Network Mining - http://www.w3.org/2001/sw/Europe/events/foaf-galway/papers/fp/bootstrapping_the_foaf_web/
www.the-data-mine.co.uk
www.the-data-mine.co.uk
www.the-data-mine.co.uk
www.the-data-mine.co.uk
Impact of Computers on SNA
The rise in the power and use of computers has had two main impacts.
1. New data is available from logs of email conversations, phone calls, chat and website usage, facebook friends, tweets etc...
2. Computers can be employed for analysis and data mining.
www.the-data-mine.co.uk
Role of computer analysis
Data collected about social networks can be complex and large.
Imagine a network documenting each purchase you've made using a credit/debit card, every phone call and SMS, each email etc.
When these kinds of data are collected over large populations, the resulting graphs are much too large to be understood by eye.