25
Identifying and Characterizing Nodes Important to Community Using the Spectrum of the graph

Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Identifying and Characterizing Nodes

Important to Community Using the

Spectrum of the graph

Page 2: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

=> Published in volume 6 of the journal PLoS ONE’s

November 2011 edition

=> Authors: Yang Wang, Zengru Di, Ying Fan

all from the department of Systems Science, Beijing

Normal University, China

Citation

Page 3: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Overview

• Networks represent the interaction structure

among components in a wide range of real complex

systems

• Exploring network communities

• reveals the network

• provides new aspect of dynamic processes

• uncovers the relationship among the nodes

• This paper devices a new approach to identify the

important nodes without knowing the exact partition

of the network

Page 4: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Construction

• Based on the implication that the Spectrum of the

adjacency matrix gives indication of community

structure in network

• Distinguishes the critical nodes as

• community core - eigenvalues

• bridge – graph Laplacian

• Experiments on synthetic and real networks

Page 5: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Definitions

• Eigen vector: A non-zero column vector v is a

eigenvector of a matrix A iff there exists a number λ

such that Av= λv.

• Eigen value: The number λ is called the eigen

value corresponding to that eigenvector v.

Page 6: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Identifying important nodes

• Proposed Method: A Centrality Metric based on

the spectrum of Adjacency Matrix

• Definitions: Binary network G=(V,E)

• |V| = m, |E| = n

• Eigenvectors are orthogonal and normalized

• Objective Function :

• Maximize eigenvalues (λ) using perturbation

theory

• where Pk is the relative change

in the c largest eigenvalues as node k is

removed

Page 7: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Centrality Metric

• where Vik is the kth element of vi

and Pk lies in the interval [0,1]. If a node k is

important to the community structure, Pk will be

large

• In a network with n nodes and c communities,

• To scale the index to 1, Ik = Pk / c where

• If the index I is large than 1/n, it is an important

node

Page 8: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Distinguish two kinds of important nodes

• RatioCut Technique:

|Ci| is the size of the community Ci. Ratio cut

problem reduces to Mincut problem when the sizes

of the communities are almost the same.

• Case 1: c = 2

Index vector s with N elements

Page 9: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Continued

• RatioCut function becomes::

L is the graph Laplacian defined as Lij=-Aij for i≠j

and Lii=ki where ki is the degree of node i.

Also there are two constraints on s

Page 10: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Continued

• The partition problem can be devised as the

following minimization problem

• Solution to this problem is found to be the

eigenvector corresponding to the second-smallest

eigenvalue of L, denoted by u2

• Community core nodes: |ui2| is relatively large

• bridge nodes: |ui2| is near zero

Page 11: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Continued

• Case 2: c > 2

A new n x c-index matrix S is defined as

si,j = 1/√|Cj| if vertex i є Cj, else 0

RatioCut= Tr(STLS). L is a symmetric matrix which

can be written as L=UDUT where U is the

eigenvector of L and D is the diagonal matrix of

eigenvalues Dii=βi

RatioCut can be written as

Page 12: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Continued

• Defining vertex vector of i as ri and let [ri]j=Uij

the equates can be written as

given that the network has almost equal sized

communities. [Gk: set of vertices in community k]

Minimizing the RatioCut equates to the

maximization problem

Where p is a parameter. For clear community

structure, p=c can be chosen.

Page 13: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Continued

• If the community structure is quire clear, vertex

vector magnitude |ri| in the first p terms give the

identity of bridge nodes, denoted by b

if the index b of a given vertex is near zero, it

indicates that the presence of that node results in a

large RatioCut and hence it is a bridge node.

Page 14: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Continued

• In order to scale the index to 1, a new term is

defined as wk where wk= bk / c

• Considering an ER random network with n nodes

as a null model, index of each node would be 1/n

• If w-score of any node is smaller than 1/n, this

vertex has nearly equal membership in more than

one community and hence it is a bridge node.

Page 15: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Pros of this approach

• Less computational cost O(mn)

Page 16: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Experimental Results • Synthetic Network

The centrality metric I predicts

node 1, 8 and 15 as important

nodes. W-score identifies 15 as

the bridge node

ΔH index also gives correct

prediction, however requires

significant computational cost

M can identify cores only

Page 17: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Experimental Results (contd.)

Real World Network

Zachary’s karate club (social network) with c=2

The centrality metric I identifies the community core: node 1 and node 34

(administrator and Instructor).

The w-score identifies node 3 as the overlapping node i.e. the bridge between

these two communities

Page 18: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Zachary’s karate club visualization

The diameter of each vertex is proportional to I

Large diameter indicates important vertex

Color of each vertex is related to the index w-score

Red vertices behave like “overlapping” nodes or bridges

Yellow vertices lie inside their own communities

Page 19: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Word Association Network

Four communities: Intelligence, Astronomy, Light, Colors

word Bright is related to all of them. Likewise Sun

Community critical nodes: Bright, Sun, Moon, Smart

Community cores: Moon and Smart

Bridges: Bright and Sun

Page 20: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Scientist Collaboration Network

Network represents scientists whose research centers on the properties of

networks of one kind or another

Edges placed between scientists who have published one paper together

Centrality metric I identifies the group leader: Newman, Boccaletti, Barabasi

w-score is not large as they have collaboration between scientists outside

their own communities

Page 21: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

C. Elegans neural network

Network is divided into 3 communities (sensory, interneuron, motor neuron)

Each node represents a neuron and each edge represents a synaptic

connection between neurons

high centrality metric I: important interneurons (AVA, AVB, … )

w-score is very small because most of the important nodes act as bridge

since the connection between communities is more necessary

Page 22: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Applications in weighted networks

Artificial Network

Adjacency matrix for undirected network is real and symmetric

Works well in small artificial network

10 nodes with two communities

Higher weight means closer relationship between vertices

4 and 9 are the core of the communities

11 is the bridge between communities

Page 23: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Applications in weighted networks (Contd.)

Real Network: SFI (Santa Fe Collaboration)

SFI collaboration network

Vertices 2, 12 and 24 are group leaders (community cores)

Vertices 1, 9 and 11 are bridges

The result is different from the corresponding unweighted network

edge weight might affect the result s

Page 24: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Limitations

In case of many heterogeneous cluster size, the community identification fails

This limitation is a result of the adjacency matrix property

Nsmall 2 < Nlarge , small communities cannot be detected

δ = Nlarge / Nsmall

I cannot identify the important nodes in the small communities when the

communities are in very different size

Page 25: Identifying and Characterizing Nodes Important to ...Identifying and Characterizing Nodes Important to Community Using the ... following minimization problem ... node underlying community

Conclusion/Observation

Proposed method works well in many cases without knowing the exact

community structure

The number of communities must be known, although

This paper does not say anything about the effect of removing/adding any

node

The underlying community structure change is not taken into consideration

The directed case is not considered which is subject to future research

The identification of such key nodes is important and could potentially be

used

to identify the organizer of the community in social networks,

to develop an immunization strategy in an epidemic process,

to identify key nodes in biological networks