Upload
alice-lee
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
How to Analyse Social Network? : Part 2
Social networks can be represented by complex networks.
Reviews
Social network is a social structure made up of individuals (or organizations) called “nodes”, which are connected by one or more types of relationships, represented by “links”. Friendship Kinship Common Interest ….
Graph-based structures are very complex.
2
Source: http://followingfactory.com/
Introduction
Various nature and society systems can be described as complex networks social systems, biological systems, and communication systems.
3
By presented as a graph, vertices (nodes) represent individuals or organizations and edges (links) represent interaction among them
Source: http://www.fmsasg.com/SocialNetworkAnalysis
Introduction
Why is network anatomy so important to characterize? Because structure always affects function.
For instance, the topology of social networks affects the spread of information.
4
Introduction
Network Models Regular Networks: chains, grids, lattices and fully-c
onnected graphs Random network model by Erdős and Rényi: ER model Small-world phenomenon by Watts and Strogatz: WS
model Scale-free network model by Barabási and Albert: BA
model Evolution mechanism of network structures are very
interested among many researchers not only engineering but also physics communities.
5
Types of Network Models
Regular Networks
1. Ring of ten nodes connected to their nearest neighbours.
2. Fully connected network of ten nodes
6
Types of Network Models
Random Networks placing n nodes on a
plane, joining pairs of them
together at random until m links are used.
Nodes may be chosen more than once, or not at all.
7
Types of Network Models
Random Networks Erdös and Rényi studied how the expected topology
of this random graph changes as a function of m.
When m is small, the graph is likely to be fragmented into many small clusters of nodes, called components.
As m increases, the components grow, at first by linking to isolated nodes and later by coalescing with other components.
8
Types of Network Models
Random Networks A phase transition occurs at m = n/2, where many clusters
crosslink spontaneously to form a single giant component.
For m > n/2, this giant component contains on the order of n nodes (its size scales linearly with n), while its closest rival contains only about log n nodes.
All nodes in the giant component are connected to each other by short paths: the maximum number of 'degrees of separation' between any two nodes grows slowly, like log n
9
Types of Network Models
Random Networks Gene networks
Ecosystems
Spread of infectious diseases
Computer viruses
10
Types of Network Models
Small-World Networks Watts and Strogatz studied a simple model that
can be tuned through this middle ground: a regular lattice where the original links are replaced by
random ones with some probability 0<p< 1.
the slightest bit of rewiring transforms the network into a 'small world', with short paths between any two nodes, just as in the giant component of a random graph.
11
Types of Network Models
Small-World Networks the network is much more highly clustered than a rando
m graph,
if A is linked to B and B is linked to C, there is a greatly increased probability that A will also be linked to C
two properties — short paths and high clustering —for many natural and technological networks
12
Types of Network Models
Small-World Networks Starts with a ring of n nodes, each connected by undirected links
to its nearest and next-nearest neighbours out to some range k.
Shortcut links are then added — rather than rewired — between randomly selected pairs of nodes, with probability p per link on the underlying lattice; thus there are typically nkp shortcuts in the graph
How many steps are required to go from one
node to another along the shortest route?
13
Types of Network Models
Small-World Networks how to actually find a short chain of
acquaintances linking yourself to a random target person
search problems
14
Types of Network Models
Scale-Free Networks Some nodes are more highly connected t
han others are.
To quantify this effect, let p denote the fraction of nodes that have k links.
k is called the degree and p is the degree distribution.
connectivity probability distribution P(k) of a node connecting to k other nodes is a power-law degree distribution,
where k is the degree of a node and γ is a scalar exponent
15
Types of Network Models
Scale-Free Networks The probability of attachment is
proportional to the degree of the target node; thus richly connected nodes tend to get richer, leading to the formation of hubs and a skewed degree distribution with a heavy tail.
Red, k=33 links; blue, k=12; green, k=11. Here n=200 nodes, m=199 links..
16
Types of Network Models
Scale-Free Networks Resistant to random fail
ures because a few hubs dominate their topology
17
Types of Network Models
Most large networks have been demonstrated that they have scale-free features according to the BA network properties. There are two issues of realistic networks that are not
related in both ER and WS network properties.
The first issue is a network grows. Both network models start with a fixed number of nodes (size
of network) without modifying it. It means the size of network is constant.
18
Types of Network Models
Most real networks are growing continuously; new nodes are added in the system in anytime World-Wide-Web network grows by increasing the
new documents.
The second issue is a connectivity probability. Two nodes are connected together with randomly
selection in the random network. Most real networks illustrate a preferential connection.
New documents in the World-Wide-Web network will link to popular documents with already high connectivity.
19
Types of Network Models
The BA properties can support these issues of realistic networks: The network expands continuously following a power
law distribution. The new nodes are added and connected with existing
nodes in the network. The new nodes are connected with the existing one
based on a preferential attachment; Higher connectivity probability to a node that has a large
number of connections.
20
Types of Network Models
The network of co-authorship relationships in SEG's journal Geophysics is scale-free
21
Source: http://www.agilegeoscience.com/journal/tag/networks
Graph Representation of Networks Simple Graphs
DEF: A simple graph G = (V,E ) consists of a non-empty set V of vertices (or nodes) and a set E (possibly empty) of edges where each edge is a subset of V with cardinality 2 (an unordered pair).
22
How to analyse social networks?
Graph Representation of Networks Multigraphs
allow multiple edges, but still no self-loops
Pseudographs If self-loops are allowed.
23
How to analyse social networks?
L23 24
Undirected Graphs Terminology
Vertices are adjacent if they are the endpoints of the same edge.
Q: Which vertices are adjacent to 1? How about adjacent to 2, 3, and 4?
1 2
3 4
e1
e3
e2
e4e5
e6
L23 25
Undirected Graphs Terminology
A: 1 is adjacent to 2 and 32 is adjacent to 1 and 33 is adjacent to 1 and 24 is not adjacent to any vertex
1 2
3 4
e1
e3
e2
e4e5
e6
L23 26
Undirected Graphs Terminology
A vertex is incident with an edge (and the edge is incident with the vertex) if it is the endpoint of the edge.
Q: Which edges are incident to 1? How about incident to 2, 3, and 4?
1 2
3 4
e1
e3
e2
e4e5
e6
L23 27
Undirected Graphs Terminology
A: e1, e2, e3, e6 are incident with 1
2 is incident with e1, e2, e4, e5, e6
3 is incident with e3, e4, e5 4 is not incident with any edge
1 2
3 4
e1
e3
e2
e4e5
e6
L23 28
Digraphs
Last time introduced digraphs as a way of representing relations:
Q: What type of pair should each edge be (multiple edges not allowed)?
1
2
3
4
L23 29
Digraphs
A: Each edge is directed so an ordered pair (or tuple) rather than unordered pair.
Thus the set of edges E is just the represented relation on V.
1
2
3
4
(1,2)
(1,1)
(2,2)
(2,4)
(1,3)
(2,3)
(3,4)
(3,3)
(4,4)
L23 30
Digraphs
DEF: A directed graph (or digraph) G = (V,E ) consists of a non-empty set
V of vertices (or nodes) and a set E of edges with E V V.
The edge (a,b) is also denoted by a b and a is called the source of the edge while b is called the target of the edge.
Degree: The degree of a vertex counts the number of
edges that
Oriented Degree when Edges Directed: The in-degree of a vertex (deg-) counts the
number of edges that stick in to the vertex. The out-degree (deg+) counts the number
sticking out.
31
Network Analysis
Handshaking Theorem
THM: In an undirected graph
In a directed graph
32
Network Analysis
Ee
eE )deg(2
1 ||
EeEe
eeE )(deg )(deg ||
For a directed graph G = (V,E ) define matrix AG by: Rows, Columns –one for each vertex in V Value at i th row and j th column is
1 if i th vertex connects to j th vertex (i j ) 0 otherwise
For a directed multigraph G = (V,E ) define the matrix AG by:
Rows, Columns –one for each vertex in V Value at i th row and j th column is
The number of edges with source the i th vertex and target the j th vertex
33
Adjacency Matrix
Complete Graphs – Kn
A simple graph is complete if every pair of distinct vertices share an edge.
Cycles Graphs – Cn
The cycle graph Cn is a circular graph.
Wheels Graphs- Wn The wheel graph Wn is just a cycle graph with an extra vertex in
the middle
Bipartite Graphs A simple graph is bipartite if V can be partitioned into V = V1 V2
so that any two adjacent vertices are in different parts of the partition. No two vertices of the same party are adjacent.
34
Other Types of Graphs
There are various measures of the centrality of a vertex within a graph that determine the relative importance of a vertex within the graph how important a person is within a social network
who is the most well-known author in the citation network
35
Centrality Measures
Degree centrality Degree centrality is defined as the number of links incident upon
a node (i.e., the number of ties that a node has).
Degree is often interpreted in terms of the immediate risk of node
for catching whatever is flowing through the network such as a virus, or some information.
If the network is directed (meaning that ties have direction), then we usually define two separate measures of degree centrality, namely indegree and outdegree.
36
Centrality Measures
Degree centrality Indegree is a count of the number of ties directed to
the node. Outdegree is the number of ties that the node directs
to others. For positive relations such as friendship or advice, we
normally interpret indegree as a form of popularity, and outdegree as gregariousness.
37
Centrality Measures
Degree centrality An entity with high degree centrality:
Is generally an active player in the network. Is often a connector or hub in the network. Is not necessarily the most connected entity in the network
(an entity may have a large number of relationships, the majority of which point to low-level entities).
May be in an advantaged position in the network. May have alternative avenues to satisfy organizational
needs, and consequently may be less dependent on other individuals.
Can often be identified as third parties or deal makers.
38
Centrality Measures
Degree centrality An entity with high degree centrality:
Alice has the highest degree centrality, which means that she is quite active in the network. However, she is not necessarily the most powerful person because she is only directly connected within one degree to people in her clique—she has to go through Rafael to get to other cliques.
39
Centrality Measures
Source: http://www.fmsasg.com/SocialNetworkAnalysis/
Degree centrality
40
Centrality Measures
Betweenness Centrality Betweenness is a centrality measure of a vertex within
a graph. Vertices that occur on many shortest paths between
other vertices have higher betweenness than those that do not.
41
Centrality Measures
Betweenness Centrality An entity with a high betweenness centrality
generally: Holds a favored or powerful position in the network. Represents a single point of failure—take the single
betweenness spanner out of a network and you sever ties between cliques.
Has a greater amount of influence over what happens in a network.
42
Centrality Measures
Betweenness Centrality An entity with a high betweenness centrality
generally:
Rafael has the highest betweenness because he is between Alice and Aldo, who are between other entities. Alice and Aldo have a slightly lower betweenness because they are essentially only between their own cliques. Therefore, although Alice has a higher degree centrality, Rafael has more importance in the network in certain respects.
43
Centrality Measures
Source: http://www.fmsasg.com/SocialNetworkAnalysis/
Betweenness centrality
44
Centrality Measures
Closeness Centrality Closeness is one of the basic concepts in a topological
space. We say two sets are close if they are arbitrarily near to each
other. The concept can be defined naturally in a metric space where a
notion of distance between elements of the space is defined, but it can be generalized to topological spaces where we have no concrete way to measure distances.
45
Centrality Measures
Closeness Centrality Closeness is a centrality measure of a vertex within a graph.
Vertices that are 'shallow' to other vertices (that is, those that tend to have short geodesic distances to other vertices with in the graph) have higher closeness.
Closeness is preferred in network analysis to mean shortest-path length, as it gives higher values to more central vertices, and so is usually positively associated with other measures such as degree.
Closeness centrality measures how quickly an entity can access more entities in a network
46
Centrality Measures
Closeness Centrality An entity with a high closeness centrality
generally: Has quick access to other entities in a network. Has a short path to other entities. Is close to other entities. Has high visibility as to what is happening in the
network.
47
Centrality Measures
Closeness Centrality
Rafael has the highest closeness centrality because he can reach more entities through shorter paths. As such, Rafael's placement allows him to connect to entities in his own clique, and to entities that span cliques.
48
Centrality Measures
Source: http://www.fmsasg.com/SocialNetworkAnalysis/
Hub and Authority (for directed graph) If an entity has a high number of relationships pointing to it, it has a high
authority value, and generally: Is a knowledge or organizational authority within a domain. Acts as definitive source of information.
Hubs are entities that point to a relatively large number of authorities. They are essentially the mutually reinforcing analogues to authorities. Authorities point to high hubs. Hubs point to high authorities. You cannot have one without the other.
49
Centrality Measures
Source: http://www.fmsasg.com/SocialNetworkAnalysis/
Eigenvector Centrality Eigenvector centrality is a measure of the
importance of a node in a network. It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes.
Google's PageRank is a variant of the Eigenvector centrality measure.
50
Centrality Measures
Eigenvector Centrality
51
Centrality Measures
Eigenvector Centrality
52
Centrality Measures
53
Centrality Measures
54
RFID Datenvolumen Centrality Measures
PageRank
Only Structure Consideration
Knowledge of Global Network Structure
Broken Link Problems
KONECT: the Koblenz Network Collection contains 168 network datasets (for instance)
Animal networks are networks of contacts between animals. Authorship networks are unweighted bipartite networks
consisting of links between authors and their works. Citation networks consist of documents that reference each
other. Coauthorship networks are unipartite network connecting
authors who have written works together. Communication networks contain edges that represent
individual messages between persons. consists of Matlab code to generate statistics and plots
about them
55
Social Network Analysis Software
Source: konect.uni-koblenz.de/networks
“Pajek”: Large Network Analysis Software
5757
Introduction to Slovenian Spider: Pajek
http://vlado.fmf.uni-lj.si/pub/networks/pajek/ Free software Windows 32 bit
Pajek 2.05
“Whom would you choose as a friend ?”
5858
Introduction
Its applications: Communication networks: links among pages or
servers on Internet, usage of phone calls Transportation networks Flow graphs of programs Bibliographies, citation networks
59
Data Structures
Six data structures: Network(*.net) – main object (vertices and lines - arcs, edg
es) Partition(*.clu) – nominal property of vertices (gender); Vector(*.vec) – numerical property of vertices; permutation (*.per) – reordering of vertices; cluster (*.cls) – subset of vertices (e.g. a cluster from partiti
on); hierarchy (*.hie) – hierarchically ordered clusters and vertic
es.
60
Introduction
Pajek 2.05
61
Network Definitions
Graph Theory Graphs represent the structure of networks
Directed and undirected graphs Lists of vertices arcs and edges, where each arch
and edge has a value. To view the network data files: NotePad, EditPlus
62
Network Data File
62
Open Network Data File (*.net)
Number of Vertices
63
Transform
Transform
64
Report Information
65
Visualization
Energy – Idea: the network is represented like a physical system, and we are searching for the state with minimal energy. Two algorithms are included:
Layout/Energy/Kamada-Kawai – slower Layout/Energy/Fruchterman-Reingold – faster, drawing in a plane or space (2D or
3D), and selecting the repulsion factor
66
Network Creation
66
67
Partitions
File name: *.clu
68
Degree
Social Network Analysis: Theory and Applications
Graphs (ppt), Zeph Grunschlag, 2001-2002. KONECT:
http://konect.uni-koblenz.de/networks Pajek:
http://pajek.imfm.si/doku.php?id=download http://www.fmsasg.com/SocialNetworkAnalysis/
69
References