42
Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Embed Size (px)

Citation preview

Page 1: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Graph Element Analysis

Woochang Hwang

Department of Computer Science and Engineering

State University of New York at Buffalo

Page 2: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Introduction Real World Networks Centralities Motivation Bridging Centrality Bridge Cut Discussion Future Works

Page 3: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Introduction

Many real world systems can be described as networks. Social relationships: e.g. collaboration relationships in

academic, entertainment, business area. Technological systems: e.g. internet topology, WWW, or

mobile networks. Biological systems: e.g. regulatory, metabolic, or interaction

relationships.

Almost all of these real world networks are Scale-free.

Page 4: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Real World Networks

Yeast PPI network

Page 5: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Real World Networks

Proteom Size (PDB)

Page 6: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Real World Networks Power law degree distribution: Rich get richer

Small World: A small average path length Mean shortest node-to-node path Can reach any nodes in a small number of hops, 5~6 hops

Robustness: Resilient and have strong resistance to failure on random attacks and vulnerable to targeted attacks

Hierarchical Modularity: A large clustering coefficient How many of a node’s neighbors are connected to each other

Disassortative or Assortative Biological networks: disassortative Social networks: assortative

Page 7: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Real World Networks

E. Ravasz et al., Science, 2002

Page 8: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Real World Networks

Protein NetworksMetabolic Networks

Page 9: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Real World NetworksComplex systems maintain their basic functions even under errors and failures

(cell mutations; Internet router breakdowns)

node failure

Page 10: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Real World Networks

Robust. For <3, removing nodes does not break network into islands.

Very resistant to random attacks, but attacks targeting key nodes are more dangerous.

Ma x

Clu

s te r

Siz

e P

ath

Leng

th

Page 11: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities Degree centrality: number of direct neighbors of node v

where N(v) is the set of direct neighbors of node v.

Stress centrality : the simple accumulation of a number of shortest paths between all node pairs

where ρst(v) is the number of shortest paths passing through node v.

)()( vNvd

VvtsstS vvC )()(

Page 12: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities Closeness centrality: reciprocal of the total distance

from a node v to all the other nodes in a network

δ(u,v) is the distance between node u and v.

Eccentricity: the greatest distance between v and any other vertex

}:),(max{

1)(

VuvudistvCE

Vu

c vuvC

),(

1)(

Page 13: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities Shortest path based betweenness centrality : ratio of the number of

shortest paths passing through a node v out of all shortest paths between all node pairs in a network

σst is the number of shortest paths between node s and t and σst(v) is the number of shortest paths passing on a node v out σst

Current flow based betweenness centrality: the amount of current that flows through v in a network

Random walk based betweenness centrality

Vvts st

stB

vvC

)(

)(

)1(21

)(

)(

nn

IvC Vts

stV

CB

Page 14: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities Markov centrality: values for nodes show which node

is closer to the center of mass. More central nodes can be reached from all other nodes in a shorter average time.

where the mean first passage time (MFPT) msv in the Markov chain and n is |R|, R is a given root set.

where n denotes the number of steps taken and denotes the probability that the chain starting at state s and first returns to state t in exactly n steps.

Vssv

M m

nvC )(

1n

nstst nfm

nstf

Page 15: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities Information centrality: incorporates the set of all

possible paths between two nodes weighted by an information-based value for each path.

where with Laplacian L and J=11T, and is the element on the sth row and sth column in CI.

It measures the harmonic mean length of paths ending at a vertex s, which is smaller if s has many short paths connecting it to other vertices.

Random walk based closeness centrality is equivalent to information centrality

nCtracenCsC II

ssI

2)()( 1

1)( JLC I IssC

Page 16: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities Eigenvector centrality: a measure of importance of

nodes in a network using the adjacency and eigenvector

matrices.

where CIV is a eigenvector and λ is an eigenvalue. Only the largest eigenvalue will generate the desired centrality measurement.

Hubbel Index, Katz status index, etc….

IVIV ACC

Page 17: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities Bargain Centraity: In bargaining situations, it is advantageous to be

connected to those who have few options; power comes from being connected to those who are powerless. Being connected to powerful people who have many competitive trading partners weakens one’s own bargaining power.

where α is a scaling factor, β is the influence parameter, A is the adjacency matrix, and is the n-dimentional vector in which every entry is 1.

1)( 1AAICBar

1

Page 18: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities

PageRank: link analysis that scores relatively importance of web pages in a web network.

The PageRank of a Web page is defined recursively; a page has a high importance if it has a large number of incoming links from highly important Web pages.

PageRank also can be viewed as a probability distribution of the likelihood that a random surfer will arrive at any particular page at certain time.

Hypertext Induced Topic Selection (HITS), etc….

))(/)(....)(/)(()1()( 11 nnPRPRPR tCtCtCtCddvC

Page 19: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Centralities Subgraph centrality: accounts for the participation of a

node in all sub graphs of the network.

the number of closed walks of length k starting and ending node v in the network is given by the local spectral moments μk(v).

0 !

)()(

k

k

k

vvSC

Page 20: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Observation Scale-free networks, basic properties

o Power law degree distributiono Small Worldo Robustnesso Hierarchical Modularityo Disassortative or Assortative

There should be some bridging nodes/edges between modules in scale-free networks based on these observations, and we did recognize the bridging nodes/edges by visual inspection of small example networks.

Finding the bridging nodes/edges, which are locating between modules, is an interesting and important problem for many applications on many different fields. (Networks’ robustness, paths protection, effective targets finding, etc.)

Page 21: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Motivation Existing measurements are not enough for identifying the

bridging nodes/edges: those existing indices are dominated by degree of the node of interest.

Betweenness of an edge also have a strong inclination to attach onto high degree nodes.

High tendency of cluttering in the center of the network. So, it is hard to differentiate the bridging nodes/edges from other kinds of nodes/edges.

Our focus in this research is to target vulnerable and central components in a network from a totally different point of view.

Page 22: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Bridge

Bridge A bridge should be located on an important path, e.g.

shortest path.

A bridge should be located between modules.

The neighbor regions of a bridging node should have low range of public domain among them.

Page 23: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Betweenness and Bridging Coefficient Betweenness: global importance of a node/edge from shortest

paths viewpoint.

Bridging Coefficient: a measurement that measuring the extent how well a node or edge is located between well connected regions.

the average probability of leaving the direct neighbor sub-graph of a node v.

Vtvs st

st vv

)()(

Vts st

st ee

)()(

)( 1)(

)(

)(

1)(

vNi id

i

vdv

)1|),())(|()((

)()()()()(

jiCjdid

jjdiide

Page 24: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Bridging Coefficient

Figure 1. Bridging Coefficient

Page 25: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Bridging Centrality

)()()( vvBr RRvC

Bridging Centrality is defined as the product of the rank of the betweenness and the rank of the bridging coefficient.

)()()( eeBr RReC

Page 26: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Application on a synthetic network

Figure 2. Figure 2A and 2B shows the results of bridging and betweenness centrality in the synthetic network respectively. The network contained 162 nodes and 362 edges and was created by adding bridging nodes to three independently generated sub-networks. Figure 2C shows the results for a synthetic network wherein 500 nodes were added to each sub-graph in Figure 2A and containing the same bridging nodes.

Page 27: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Application on Web Network Examples

Figure 3. Results for Web Networks: Figure 1A and 1B shows the results for the AT&T Web Network and RPI Web Network, respectively. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The color map for the percentile values is shown in the Figure.

Page 28: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Application on Social Network Examples

Figure 4. Results for Social Networks: Figure 2A and 2B shows the results for the Les Miserable Character Network and Physics Collaboration Network, respectively. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The nodes corresponding to Valjean (V), Javert (J), Pontmercy (P) and Cosette (C) are labeled in Figure 4A. The nodes corresponding to Rothman (R), Redner (R2), Dodds (D), Krapivsky (K) and Stanley (S) are labeled in Figure 2B. The color map for the percentile values is shown in the Figure.

Page 29: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Application on Biological Network Examples

Figure 5. Results for Biological Networks: Figure 3A and 3B shows the results the Cardiac Arrest Network and Yeast Metabolic Network, respectively. The nodes corresponding to Src, Shc and Jak2 (J2) are labeled in Figure 3A. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The color map for the percentile values is shown in the Figure.

Page 30: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Assessing Network Disruption, Structural Integrity and Modularity

Figure 6. Sequential node removal analysis on the yeast metabolic network

Page 31: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Assessing Ability To Occupy Topological Position

Figure 7A shows the clique affiliation of the nodes detected by three metrics, the bridging centrality (black squares), degree centrality (open circles), betweenness centrality (black circles). Maximal cliques were identified in the Yeast PPI network, and then we measured whether the detected nodes for each metric are in the identified cliques or not. In Figure 7B, random betweenness between detected cliques was measured in the clique graph for each metric, bridging centrality (black squares), degree centrality (open circles), betweenness centrality (black circles). Figure 7C compares the number of singletons that were generated according to sequential node deletion for each metric such as bridging centrality (dot line), degree centrality (gray line), betweenness centrality (black line). The nodes with the highest values for each of these network metrics were sequentially deleted and enumerated the number of singletons that were produced.

Page 32: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Assessing Ability To Occupy Modulating Position

Figure 8. The biological and the topological characteristics of the direct neighbors of the node ordered by two metrics, the bridging centrality (black bar), betweenness centrality (white bar). Figure 6(a) shows the gene expression correlation on the direct neighbors of each percentile. Figure 6(b) shows the average clustering coeffcient of the nodes in each percentile.

Page 33: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Druggability

The nodes corresponding to SHC, SRC, and JAK2 had the highest, 2nd and 3rd highest bridging centrality values.

The target of receptor antagonist drugs such as losartan, also signals via SRC and SHC in cardiac fibroblasts (cardiac structural tissue).

JAK2 activation is a key mediator of aldosterone-induced angiotensin-converting enzyme expression; the latter is the target of drugs such as captopril, enapril and other angiotensin-converting enzyme inhibitors (related high blood pressure)

Page 34: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Druggability C21 Steroid Hormone Metabolism

Network The metabolites with the highest

values of bridging centrality were: i) Corticosterone, ii) Cortisol, iii) 11 β -Hydroxyprogesterone, iv) Pregnenolone and, v) 21-deoxy-cortisol.

Corticosterone and cortisol are produced by the adrenal glands and mediate the flight or fight stress response, which includes changes to blood sugar, blood pressure and immune modulation.

Page 35: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Druggability Steroid Biosynthesis

Network The metabolites with the

highest values of bridging centrality were: i) Presqualene diphosphate, ii) Squalene, iii) (S)-2,3-epoxy-squalene, iv) Prephytoene diphosphate and, v) Phytoene.

Anti-fungal agents, a promising target for anti-cholesterol drugs (25) and the anti-cholesterolemic activity

Page 36: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Bridge Cut Algorithm

Iterative Graph Partitioning Algorithm

1. Compute Bridging Centrality for each edge

2. Cut the highest bridging edge

3. Identify an isolated module as a cluster if the density of the isolated module is greater than a threshold.

Page 37: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Clustering Validation F-measure

Davies-Bouldin Index

where diam(Ci) is the diameter of cluster Ci and d(Ci ;Cj) is

the distance between cluster Ci and Cj . So, d(Ci ;Cj) is small if cluster i and j are compact and theirs centers are far away from each other. Therefore, DB will have a small values for a good clustering.

callecision

callecisionmeasureF

RePr

)Re(Pr2

k

i ji

jiji CCd

CdiamCdiam

kDB

1 ),(

)()(max

1

Page 38: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Bridge Cut

Table 1: Comparative analysis. Performance of bridge cut method on DIP PPI dataset (2339 nodes, 5595 edges) is compared with seven graph clustering approaches (Maximal clique, quasi clique, Rives, minimum cut, Markov clustering, Samanta). The fourth column represents the average F-measure of the clusters for MIPS complex modules. The fifth column indicates the Davies-Bouldin cluster quality index. Comparisons are performed on the clusters with 4 or more components.

Methods Clusters Size MIPS complex (F-measure)

DB

Bridge Cut 114 7.6 0.53 4.78 Betweenness Cut 131 6.3 0.49 6.2

Max Cliq 120 4.7 0.49 N/A Quasi Cliq 103 9.2 0.46 N/A

Rives 74 31 0.33 13.5 Mincut 227 8.7 0.35 7.23 MCL 210 8.4 0.47 6.82

Samantha 138 7.2 0.43 6.8

Page 39: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Bridge Cut

Table 2. Comparative analysis. Performance of bridge cut method on the school friendship dataset (551 nodes, 2066 edges) is compared with seven graph clustering approaches (Maximal clique, quasi clique, Rives, minimum cut, Markov clustering, Samanta). Column descriptions are the same as Table 1

Methods Clusters Size DB Bridge Cut 40 8.6 5.46

Betweenness Cut 48 7.1 5.57 Max Cliq 133 4.4 N/A

Quasi Cliq 109 9.5 N/A Rives 46 10.9 10.4

Mincut 53 9.3 6.29 MCL 50 8.0 5..47

Samantha 40 13.5 7.1

Page 40: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Discussion The recognition of the bridges should be very valuable

information for many different applications on many different areas. Identifying functional, physical modules, or key components

using the bridging centrality will provide an effective and totally new way of looking at biological systems.

Discovering sub-communities or important components in social network system.

Network robustness improvement, network protection, and paths protection using bridging information.

Drug Target Identification

Page 41: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Future Works

Directed network Complexity

Page 42: Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab.

Thank You!