Descriptive Statistics for Networks

Embed Size (px)

DESCRIPTION

Descriptive statistics for networks

Citation preview

  • Networks and Discrete Mathematics

    Descriptive Statistics on Real Networks Chronis Moyssiadis Vassilis Karagiannis

    08/11/2011 WS.04 Webscience: lecture 2

    Aristotle University, School of Mathematics Master in Web Science

  • Lesson 2 Overview Scope

    Introduce contemporary basic descriptive statistics on the topology of real networks.

    Provide the skills to explore networks.

    Means Mathematical definition and interpretation of each statistical

    measure on the topology of networks Network examples (using igraph and NodeXL applications) and

    simple exercises solved by hand.

    Next Lesson Applications from Algebraic Graph Theory in network analysis. Random models of networks (ER, Small-Worlds, Scale-Free)

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 2

  • Main Part (4 hours)

    (after completing the first lesson)

  • Networking Complex Systems

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 4

    Some step (probably the final) of the connections among the Twitter users who recently tweeted the word Worldbank when queried on June 20, 2011, scaled by numbers of followers (with outliers thresholded). Connections created when users reply, mention or follow one another

  • Networking Complex Systems

    Most researchers would probably agree that a complex system is a system composed of many interacting parts, such that the collective behavior of those parts together is more than the sum of their individual behaviors.

    Complex system can thus be said to be a system of interacting parts that displays emergent behavior.

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 5

    2010. Mark Newman, Complex Systems: A Survey

  • Networking Complex Systems To quantify the details of the system one must specify first its

    topologywho interacts with whomand then its dynamicshow the individual atoms behave and how they interact.

    Topology is usually specified in terms of lattices or networks, and

    this is one of the best developed areas of complex systems theory.

    The final or another initial step is the construction of a model

    that agrees with the empirical observations and probably is capable to make predictions in a statistical manner of view, or in case of making significant errors, continue by gathering new observations and adjust or reconstruct the initial proposed model.

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 6

    2010. Mark Newman, Complex Systems: A Survey

  • Topology helps understanding Function and Evolution

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 7

    Tokyo railway network

    Lattice as a chessboard

  • Huge Complicated Topologies

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 8

    Most complex systems, however, have more complicated non-regular topologies that require a more general network framework for their representation

  • The size the scale, and the shape

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 9

    The Greek Statisticians Scientific Collaboration Network, over 20 years (2010).

  • Weighted vs. Unweighted Networks

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 10

    A common way to represent a complex system as a network is by some transformation of a similarity or dissimilarity measure between objects. Each new transformed value constitutes a weight of the link between a pair of objects (Horvath ch. 5). Accordingly a link or a tie is set to present if its weight is higher than a cut-off, and removed otherwise. The produced network is either binary (b, c, d) or weighted ().

    The weighted net The binary net without weights

    The 0 - 1 net Using a cut-off w(link) > 1

    The 0 - 1 net Using a cut-off w(link) > 2

  • Global and Local Statistical Measures

    Local measures (statistics, indices) are those that characterize individual nodes, links or their neighborhood.

    Global measures (statistics, indices) are the distribution of any local measure over the set of the nodes or links, or some summary statistic that is produced in accordance with the distribution of any local measure over the set of nodes or links.

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 11

  • Network type and Connectivity

  • The first questions to be answered Does the network have multiple edges or loops? (usually we

    delete them, but the researcher is responsible for the answer) Is it connected? If not:

    Find the node and the edge connectivity numbers. find the components

    Is it a directed network? find the strongly connected components and the weakly connected

    components. Compute network density Giant component, component distribution. In some disconnected networks a component much greater in

    comparison to all the others is observed (giant component). What is the proportion of nodes that it includes? What is the number of nodes in the second largest component? What is the distribution of the components regarding the number of the

    nodes included therein?

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 13

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 14

    The giant component includes 70% of the nodes while the second largest component has 15% of the nodes

    Giant Component

  • Example (Scientific Collaboration Net)

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 15

    Component distribution Nodes in component 1 2 3 4 5 6 8 9 10 11 12 418 % (Sum of nodes/ n) 0,1 0,3 0,4 0,5 0,7 0,8 1,1 1,2 1,3 1,5 1,6 55,4

    # of components 67 26 25 6 3 7 1 1 1 1 2 1

    Sparse: density= 0.003

  • A first look concerning vulnerability.

    Articulation points bi-components

    and nodes at the edge of the network

  • Articulation points in a connected net

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 17

    Red nodes are the articulation points and blue ones are these of degree 1 (at the edge of the biological network)

  • Bi-components in a connected net

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 18

    bi-components show dense parts of the network and possible weaknesses

  • A crucial step towards the topology. Degree,

    link weight and weighted degree

    distributions

  • Find the degree distribution regardless of connectivity

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 20

  • Descriptive statistics

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 21

    (G) = 0.0

    Median degree = 2.0

    Average degree = 2.1

    (G) = 4.0

    SD = 1.12

    1, 1, 4, 2, 3, 2, 4, 3, 2, 2, 4, 2, 3, 1, 1, 1, 0, 2, 2, 2,

    Coefficient of Variation = SD/Average Degree = 53%

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 22

    Degree sequence : 1, 1, 4, 2, 3, 2, 4, 3, 2, 2, 4, 2, 3, 1, 1, 1, 0, 2, 2, 2

    Degree freq 0 0.05 1 0.30 2 0.70 3 0.85 4 1.00

    Degree freq 0 1.00 1 0.95 2 0.70 3 0.30 4 0.15

    Degree freq 0 0.05 1 0.25 2 0.40 3 0.15 4 0.15

    Plots of the CCDF, CCDF with log(degree), CCDF with log(degree) and log (freq)

  • Degree distributions

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 23

    Usually for the sake of noise reduction a logarithmic binning of the degrees is performed having more than M=10 bins and the bin size is given by the form: so the bin number assigned to node with degree ki is The end of the bin mi on the k-axis is given by

    =

    ln1M

    r [ ]1,0,ln1

    = MMk

    rM iii

    ek iiMr

    m)1( +=

  • Actually dont see all the zero values because log(0) =

    logarithmic binning on log-log plot

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 24

    Same bins, but plotted on a log-log scale

    100

    101

    102

    103

    104

    100

    101

    102

    103

    104

    105

    106

    integer value

    frequ

    ency

    Noise in the tail: Here we have 0, 1 or 2 observations of values of x when x > 500

    here we have tens of thousands of observations when x < 10

    Slide from Lada Adamic

  • 8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 25

    Newman 2003

  • Distribution of the (link) weights

    Weighted networks are similarly described by a matrix Wij specifying the weight on the edge connecting the vertices i and j (Wij = 0 if the nodes i and j are not connected, or positive if they are connected)

    A first characterization of weights is obtained by the distribution P( w ) that any given edge

    has weight w. This distribution may a priori be homogeneous and characterized by a typical scale, or on the contrary carry a novel heterogeneity.

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 26

    2007, Caldarelii Vespignani book.

  • Weighted Degree Distribution - Strength

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 27

    Questions: How can we take into consideration the number of links in the computation of the strength? What is the distribution of the node strength? (the answer to this question in similar to the one about estimation of the degree distribution)

  • Association between degree and strength In the absence of correlation between

    the weight of the links and the degree

    of nodes, the weights wij are on

    average independent of link {i,j},

    hence considering the mean weight

    the strength of each node could be

    approximately computed as

    This observation can be resulted after a linear regression of strength on

    degree and reveals that weights give

    the same information as degrees. 8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 28

    2007, Caldarelii Vespignani book.

    ii dws =

    w

    In the presence of correlations between topology and weights we obtain in general a relation of the form 1 or

    ,and1 with,>

    ==

    bwCbdCs i

    bi

  • Example: correlation between topology and weights

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 29

    )tscoefficien both for,001.0(108.1and414.1161.1

    :equationestimated

  • Degree degree correlation

    (how they choose and how we

    choose)

  • The average nearest neighbor degree (ANND)

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 31

    Are the nodes preferentially connected to other nodes with similar degree or with dissimilar degree?

    2007, Caldarelii Vespignani book.

  • Assortativity

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 32

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 33

    4, 2, 3, 2, 4, 3, 2, 2, 4, 2, 3, 1, 2, 2, 0, 1, 1, 1, 1, 2

    Node 3 has (3)={1, 2, 4, 5} neighbors having degrees {1,1,2,3}, hence knn(3) = (1+1+2+3)/4 = 1.75

    There are 3 nodes having degree 4 with knn(3) = 1.75, knn(7) =2.5 and knn(10) = 2.25 hence

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 34

    Great fluctuations around the line exist hence this is a poor example of a disassortative small network

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 35

    A Disassortative collaboration networks (At HSI conferences professors collaborate with their students more often that with other professors). Hubs prefer non-hubs (found in biological, social media, technological nets)

  • Weighted ANND

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 36

    2007, Caldarelii Vespignani book.

    Node i with small average nearest neighbors degree but large weighted average nearest neighbors degree is mostly connected to low-degree nodes but the link with largest weight points towards a well-connected hub

  • Example on Weighted ANND

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 37

    The Dissasortative collaboration networks (HSI conferences) Using weights we get a clearer picture of dissasortativity Hubs prefer non-hubs

  • But, in Dissasorative networks hubs can be interconnected

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 39

  • The rich-club coefficient

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 40

    Colizza et. al, 2006

    What is the tendency of nodes with high connectivity?

  • The rich-club coefficient

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 41

    Colizza et. al, 2006 The rich club phenomenon. Hubs are interconnected in a disassortative network (a property of both computer and social networks)

    Opsahl, 2010 proposed the quotient network randomized a from comes)(rwnull

  • The weighted rich-club coefficient

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 42

    A weighted network with 5 Hubs (Opsahl: http://toreopsahl.com/tnet/two-mode-networks/weighted-rich-club-effect/

    1

    ( )r

    w rE rank

    ll

    Wrw

    >

    >

    =

    =

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 43

    Weighted rich club effect:

    Scientists with at most 10

    collaborators tend to collaborate with

    each other while this is not the case

    with the hubs of the network (points

    under the horizontal line y=1)

    http://sites.google.com/site/vcolizza2/PhysRevLett_101_168702.pdf?attredirects=0

  • 8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 44

    Binary rich

    club effect:

    The last

    result is

    clearly

    observed in

    that picture

  • 8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 45

    Weighted

    rich club

    effect:

    The last

    result is

    clearly

    observed in

    that picture

  • The distance

  • Distances in Real-World Networks

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 47

  • Distance in binary networks

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 48

    Mean distance (giant Component): Sum all the elements and divide by 14(14-1) = 3.32

    Diameter (giant component) = 8

    The distance matrix The distribution of distances is another useful exploration tool

    When the distance is meaningful? This net is disconnected

  • Distance in Weighted Networks

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 49

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 50

    The Giant Component of the Scientific collaboration network. Diameter: 16 Average distance: 6.8 (binary case)

  • Node Importance Centrality Indices

  • Centrality Indices

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 52

    A centrality coefficient is a measure that captures the importance of a node's or links position in the network.

  • Degree Centrality Strength (weighted)

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 53

    The degree centrality of a node is its degree. Nodes with more connections tend to have more power.

  • Eigenvector Centrality

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 54

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 55

    Values 0 for the nodes in the giant component

    Depends both on the number and the quality of the connections

    3 0,094966 1 0,033984 2 0,033984 4 0,082647 5 0,114758 7 0,135986 6 0,08973 8 0,092863 9 0,061756

    10 0,061756 11 0,079709 12 0,047158 13 0,052068 14 0,018633 15 0 16 0 18 0 19 0 20 0 17 0

  • Closeness Centrality

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 56

    It is based upon on the concept of distances between nodes

  • Closeness Centrality

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 57

    In a disconnected network, each component has to be examined separately because in such case closeness is not well defined.

    3 0,023256 1 0,018182 2 0,018182 4 0,026316 5 0,027027 7 0,03125 6 0,025 8 0,03125 9 0,027778

    10 0,027778 11 0,025 12 0,02 13 0,020408 14 0,016393 15 1 16 1 18 0,5 19 0,5 20 0,5 17 0

  • Betweenness Centrality

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 58

    It is based upon on the concept of network shortest paths between nodes

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 59

    In a disconnected network, each component has to be examined separately because in such case closeness is not well defined.

    3 23,5 1 0 2 0 4 12 5 15 7 43,5 6 0 8 42,5 9 16

    10 16 11 30,5 12 0 13 12 14 0 15 0 16 0 18 0 19 0 20 0 17 0

  • Comparison between centrality indices

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 60

    Kolaczyk 2009

    closeness

    betweenness eigenvector

  • Clustering, Cliquishness, Cohesiveness

    and Hierarchical Structure

  • A broader aspect of connectedness and centrality

    One important question on social networks is how tightly clustered they are. For example, the question in what extent my friends are friends with each

    other captures one facet of this. Clustering has an interesting history as a term, growing out of the earlier

    sociology literature, based on partitioning signed graphs into subsets where nodes within elements of the partition have only positive relationships between them, and only negative relationships exist across elements of the partition (Ch. 6 in Wasserman and Faust).

    Additionally a variety of concepts measure how cohesive or closely knit a social network is. An early concept related to this is the idea of a clique. One measure of

    cliqueshness is to count the number and size of the cliques of a network. Cliques are generally required to contain at least 3 nodes.

    In recent network literature the notion of clustering had been related to that of transitivity: a friend of a friend is a friend (Ch. 4 in Wasserman and Faust)

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 62

  • A connected triple centered at node i is defined as a path of length 2 having node i as the intermediate node. The number of all possible connected triples at node i having degree di is: The transitivity of a node i as also the clustering coefficient of a node i is defined as:

    Transitivity and the Clustering Coefficient

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 63

    ( ) = = ( 1)( ) 2 2i iid ddT i

    1

    2 3

    A connected triple centered at node 1

    1

    3 2 One triangle centered at node 1 but 3 triangles centered at nodes 1,2 and 3

    Two measures of the aspect that a friend of a friend is a friend

  • Some facts on transitivity and clustering coefficient of the entire network

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 64

    The Watts and Strogatz clustering coefficient tends to weight the contributions of low-degree vertices more heavily than the transitivity coefficient, because such vertices have a small denominator. Bollobas verified that T = C if all nodes have the same degree or all clustering coefficients are equal

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 65

    3 0 1 0 2 0 4 0 5 0,333333 7 0,166667 6 1 8 0 9 0

    10 0 11 0,166667 12 1 13 0,333333 14 0 15 0 16 0 18 1 19 1 20 1 17 0

    Average clustering coefficient = 0.3--------Transitivity = 0.257 Average clustering coefficient of the giant component = 0.214 ---------- Transitivity = 0.1875

  • Weighted Transitivity and Clustering Coefficient

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 66

  • Facts and Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 67

    Transitivity = 0.316 Clustering coefficient = 0.46 Weighted Clustering coefficient = 0.292

    Cw < C Triples are formed by scientists

    that either are old but they did not collaborate frequently or they are new scientists with close collaboration and few articles.

    If Cw > C, we are in presence of a network in which the interconnected triples are more likely formed by the edges with larger weights. On the contrary, Cw < C signals a network in which the topological clustering is generated by edges with low weight. (Caldarelli book p. 69)

  • Cliquishness

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 68

    The clique number of the network and the maximal sets of cliques (biological net). Clear they constitute a cohesive group of proteins

  • The k-core decomposition

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 69

    Larger values of coreness clearly correspond to groups of vertices with larger degree and more central position as group in the networks structure. Properties of the network that can be seen: hierarchical arrangement, degree correlations and centrality

    2005, Ignacio Alvarez-Hamelin et al.

  • Hierarchical Structure by the C(k) function

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 70

    To investigate if any hierarchical organization is present in real networks we measured the C(k) function for several networks for which large topological maps are available.

    Actor Network: the high-k range of C(k) scales as k-1. The majority of actors with a few links (small k) appear only in one movie. Each such actor has a clustering coefficient equal to one, as all are part of the same cast, and are therefore connected to each other. The high k nodes include many actors that acted in several movies, and thus their neighbors are not necessarily linked to each other, resulting in a smaller C(k).

    The scaling of C(k) for (a) actor network, (b) The semantic web, connecting two words if they are listed as synonyms in the Merriam Webster Dictionary, (c) The WWW , (d) Internet at the Autonomous System level, each node representing a domain. The dashed line in each figure has slope -1

    Ravasz, 2004

  • Important subgraphs that may uncover Functionality and evolutionary principles

    of the network

  • Motifs

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 72

    disadvantage: dont know if motif is part of a larger cohesive community

  • Example

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 73

    0 0 26 3

  • Example (scientific collaboration net)

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 74

    z score = (Nreal Nrand)/SD

    Frequency Mean-Freq Standard-Dev Z-Score p-Value [Original] [Random] [Random] 86.573% 99.999% 0.00011008 -1219.6 1

    13.427% 0.00101% 0.00011008 1219.6 0

    1

    2 3 1

    3 2

    Although it was observed (weighted clustering coefficient vs unweighted) that triplets are not due to scientists with frequent collaboration using Milos study it is clear that the 13.427% triplets that contained in the network constitute a statistical significant characteristic of its evolution.

  • Community structure of the network

  • Triadic Closure

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 76

    A weak link (tie) bridge

    After some time

    A weak link (tie) local bridge

    The strength of weak ties (links) edge betweenness

  • Finding Communities Social and other networks have a natural community structure We want to discover this structure rather than impose a certain

    size of community or fix the number of communities

    Without looking, can we discover community structure in an

    automated way?

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 77

    Girvan & Newman: betweenness clustering

  • Finding community structure in very large networks (fast greedy alforithm)

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 78

    Consider edges that fall within a community or between a community and the rest of the network

    Define modularity:

    ),(22

    1wv

    vw

    wvvw ccm

    kkAm

    Q

    =

    probability of an edge between two vertices is proportional to their degrees

    adjacency matrix

    For a random network, Q = 0 the number of edges within a community is no different from

    what you would expect

    if vertices are in the same community

    Clauset, M. E. J. Newman, Cristopher Moore, 2004 Slide from Lada Adamic

    0 Q 1

  • Communities edge betweeness

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 79

    modularity = 0.45, very low

  • Extensions to weighted networks (with fast greedy algorithm)

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 80

    Betweenness clustering? Will not work strong ties will have a disproportionate number of

    short paths, and those are the ones we want to keep Modularity (Analysis of weighted networks, M. E. J. Newman)

    reuters new articles keywords

    ),(22

    1wv

    vw

    wvvw ccm

    kkAm

    Q

    =

    weighted edge

    =j

    iji AkSlide from Lada Adamic

  • Weighted Community structure of the giant component

    8/11/2011 WS.04 lecture 2:Descriprive Stat on Real Nets - C. Moyssiadis V. Karagiannis 81

    modularity = 0.86 16 communities

    Networks and Discrete MathematicsLesson 2 OverviewMain Part (4 hours)(after completing the first lesson)Networking Complex SystemsNetworking Complex SystemsNetworking Complex SystemsTopology helps understanding Function and EvolutionHuge Complicated TopologiesThe size the scale, and the shapeWeighted vs. Unweighted NetworksGlobal and Local Statistical MeasuresNetwork typeand ConnectivityThe first questions to be answeredExampleExample (Scientific Collaboration Net)A first look concerning vulnerability.Articulation points bi-componentsand nodes at the edge of the networkArticulation points in a connected netBi-components in a connected netA crucial step towards the topology.Degree,link weightand weighted degreedistributionsFind the degree distribution regardless of connectivityDescriptive statistics ExampleDegree distributionslogarithmic binning on log-log plotSlide Number 25Distribution of the (link) weightsWeighted Degree Distribution - StrengthAssociation between degree and strengthExample: correlation between topology and weightsDegree degreecorrelation(how they choose and how we choose)The average nearest neighbor degree (ANND)AssortativityExampleExampleExampleWeighted ANND Example on Weighted ANNDBut, in Dissasorative networks hubs can be interconnectedExampleThe rich-club coefficientThe rich-club coefficientThe weighted rich-club coefficientExampleSlide Number 44Slide Number 45The distanceDistances in Real-World NetworksDistance in binary networksDistance in Weighted NetworksExampleNode ImportanceCentrality IndicesCentrality IndicesDegree Centrality Strength (weighted)Eigenvector CentralityExampleCloseness CentralityCloseness CentralityBetweenness CentralityExampleComparison between centrality indicesClustering, Cliquishness, Cohesivenessand Hierarchical StructureA broader aspect of connectedness and centrality Transitivity and the Clustering CoefficientSome facts on transitivity and clustering coefficient of the entire networkExampleWeighted Transitivity and Clustering CoefficientFacts and ExampleCliquishnessThe k-core decompositionHierarchical Structure by the C(k) functionImportant subgraphs that may uncover Functionality and evolutionary principles of the networkMotifsExampleExample (scientific collaboration net)Community structure of the networkTriadic ClosureFinding CommunitiesFinding community structure in very large networks (fast greedy alforithm)Communities edge betweenessExtensions to weighted networks (with fast greedy algorithm)Weighted Community structure of the giant component