20
BioMed Central Page 1 of 20 (page number not for citation purposes) BMC Systems Biology Open Access Research article Understanding network concepts in modules Jun Dong and Steve Horvath* Address: Department of Human Genetics and Department of Biostatistics, University of California, Los Angeles, CA 90095, USA Email: Jun Dong - [email protected]; Steve Horvath* - [email protected] * Corresponding author Abstract Background: Network concepts are increasingly used in biology and genetics. For example, the clustering coefficient has been used to understand network architecture; the connectivity (also known as degree) has been used to screen for cancer targets; and the topological overlap matrix has been used to define modules and to annotate genes. Dozens of potentially useful network concepts are known from graph theory. Results: Here we study network concepts in special types of networks, which we refer to as approximately factorizable networks. In these networks, the pairwise connection strength (adjacency) between 2 network nodes can be factored into node specific contributions, named node 'conformity'. The node conformity turns out to be highly related to the connectivity. To provide a formalism for relating network concepts to each other, we define three types of network concepts: fundamental-, conformity-based-, and approximate conformity-based concepts. Fundamental concepts include the standard definitions of connectivity, density, centralization, heterogeneity, clustering coefficient, and topological overlap. The approximate conformity-based analogs of fundamental network concepts have several theoretical advantages. First, they allow one to derive simple relationships between seemingly disparate networks concepts. For example, we derive simple relationships between the clustering coefficient, the heterogeneity, the density, the centralization, and the topological overlap. The second advantage of approximate conformity-based network concepts is that they allow one to show that fundamental network concepts can be approximated by simple functions of the connectivity in module networks. Conclusion: Using protein-protein interaction, gene co-expression, and simulated data, we show that a) many networks comprised of module nodes are approximately factorizable and b) in these types of networks, simple relationships exist between seemingly disparate network concepts. Our results are implemented in freely available R software code, which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/ModuleConformity/ ModuleNetworks Background Network terminology is used to study important ques- tions in systems biology. For example, networks are used to study functional enrichment [1], to analyze the struc- ture of cellular networks [2], to model biological signal- ling or regulatory networks [1,3], to reconstruct metabolic networks [4], and to study the dynamic behavior of gene regulatory networks [5]. Published: 4 June 2007 BMC Systems Biology 2007, 1:24 doi:10.1186/1752-0509-1-24 Received: 22 January 2007 Accepted: 4 June 2007 This article is available from: http://www.biomedcentral.com/1752-0509/1/24 © 2007 Dong and Horvath; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Understanding network concepts in modules

Embed Size (px)

Citation preview

BioMed CentralBMC Systems Biology

ss

Open AcceResearch articleUnderstanding network concepts in modulesJun Dong and Steve Horvath*

Address: Department of Human Genetics and Department of Biostatistics, University of California, Los Angeles, CA 90095, USA

Email: Jun Dong - [email protected]; Steve Horvath* - [email protected]

* Corresponding author

AbstractBackground: Network concepts are increasingly used in biology and genetics. For example, theclustering coefficient has been used to understand network architecture; the connectivity (alsoknown as degree) has been used to screen for cancer targets; and the topological overlap matrixhas been used to define modules and to annotate genes. Dozens of potentially useful networkconcepts are known from graph theory.

Results: Here we study network concepts in special types of networks, which we refer to asapproximately factorizable networks. In these networks, the pairwise connection strength(adjacency) between 2 network nodes can be factored into node specific contributions, namednode 'conformity'. The node conformity turns out to be highly related to the connectivity. Toprovide a formalism for relating network concepts to each other, we define three types of networkconcepts: fundamental-, conformity-based-, and approximate conformity-based concepts.Fundamental concepts include the standard definitions of connectivity, density, centralization,heterogeneity, clustering coefficient, and topological overlap. The approximate conformity-basedanalogs of fundamental network concepts have several theoretical advantages. First, they allow oneto derive simple relationships between seemingly disparate networks concepts. For example, wederive simple relationships between the clustering coefficient, the heterogeneity, the density, thecentralization, and the topological overlap. The second advantage of approximate conformity-basednetwork concepts is that they allow one to show that fundamental network concepts can beapproximated by simple functions of the connectivity in module networks.

Conclusion: Using protein-protein interaction, gene co-expression, and simulated data, we showthat a) many networks comprised of module nodes are approximately factorizable and b) in thesetypes of networks, simple relationships exist between seemingly disparate network concepts. Ourresults are implemented in freely available R software code, which can be downloaded from thefollowing webpage: http://www.genetics.ucla.edu/labs/horvath/ModuleConformity/ModuleNetworks

BackgroundNetwork terminology is used to study important ques-tions in systems biology. For example, networks are usedto study functional enrichment [1], to analyze the struc-

ture of cellular networks [2], to model biological signal-ling or regulatory networks [1,3], to reconstruct metabolicnetworks [4], and to study the dynamic behavior of generegulatory networks [5].

Published: 4 June 2007

BMC Systems Biology 2007, 1:24 doi:10.1186/1752-0509-1-24

Received: 22 January 2007Accepted: 4 June 2007

This article is available from: http://www.biomedcentral.com/1752-0509/1/24

© 2007 Dong and Horvath; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 1 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

Here we study the meaning of network concepts in rela-tively simple networks, e.g. gene co-expression networksand protein-protein interaction (PPI) networks. Specifi-cally, we consider undirected networks that can be repre-sented by a symmetric adjacency matrix A = [aij], wherethe pairwise adjacency (connection strength) aij takes val-ues in the unit interval, i.e., 0 ≤ aij ≤ 1. For an unweightednetwork, the adjacency aij = 1 if nodes i and j are con-nected and 0 otherwise. For a weighted network, 0 ≤ aij ≤ 1.For notational convenience, we set the diagonal elementsto 1.

Fundamental network conceptsOther authors refer to network concepts as network statis-tics or network indices. Network concepts include connec-tivity, mean connectivity, density, variance of theconnectivity (related to the heterogeneity) etc. Networkconcepts can be used as descriptive statistics for networks.While some network concepts (e.g. connectivity) havefound important uses in biology and genetics, other net-work concepts (e.g. network centralization) appear lessinteresting to biologists. Before attempting to understandwhy some concepts are more interesting than others, it isimportant to understand how network concepts relate toeach other in biologically interesting networks. As a steptoward this goal, we explore the meaning of network con-cepts in module networks, which are defined below.

In the following, we review fundamental network con-cepts. Further details on the definitions and notations canbe found in the Methods section.

The node connectivity is given by

In unweighted networks, the connectivity ki of node iequals the number of directly linked neighbors. Inweighted networks, the connectivity equals the sum ofconnection weights with all other nodes. Highly con-nected 'hub' genes are thought to play an important rolein organizing the behavior of biological networks [6-9].Connectivity has been found to be an important comple-mentary gene screening variable for finding biologicallysignificant genes in cancer [10,11] and primate braindevelopment [12].

The line density [13] is defined as the mean off-diagonaladjacency and is closely related to the mean connectivity.

where the function Sp(·) is defined for a vector v as Sp(v)

= ∑i = (vp)τ 1.

The normalized connectivity centralization (also knownas degree centralization) is a simple and widely used indexof the connectivity distribution. By definition [14], thenormalized connectivity centralization is given by

A frequent question of social network analysis concernsthe causes and consequences of centralization in networkstructure, i.e. the extent to which certain nodes are farmore central than others within the network in question.The centralization index has been used to describe struc-tural differences of metabolic networks [15].

Many measures of network heterogeneity are based on thevariance of the connectivity, and authors differ on how toscale the variance [13]. Our definition of the network het-erogeneity equals the coefficient of variation of the con-nectivity distribution, i.e.

This heterogeneity measure is scale invariant with respectto multiplying the connectivity by a scalar. Biological net-works tend to be very heterogeneous: while some 'hub'nodes are highly connected, the majority of nodes tend tohave very few connections. Describing the heterogeneity(inhomogeneity) of the connectivity (degree) distributionhas been the focus of considerable research in recent years[6,16-18].

The clustering coefficient of node i is a density measureof local connections, or 'cliquishness' [19,20]. Specifi-cally,

In unweighted networks, ni equals twice the number ofdirect connections among the nodes connected to node i,and πi equals twice the maximum possible number ofdirect connections among the nodes connected to node i.Consequently, ClusterCoefi equals 1 if and only if allneighbors of i are also connected to each other. For gen-eral weighted networks with 0 ≤ aij ≤ 1, one can prove 0 ≤ClusterCoefi ≤ 1 [21]. The relationship between the cluster-

Connectivity k ai i ijj i

= =≠∑ . (1)

Densitya

n n

S

n n

mean

n

ijj ii=−

=−

=−

≠∑∑( )

( )

( )

( ),

1 1 11 k k (2)

vip

Centralizationn

n nDensity

nDensity=

− −−⎛

⎝⎜⎞⎠⎟

≈ −2 1

max( ) max( )k k..

(3)

Heterogeneityiance

mean

nS

S= = −

var ( )

( )

( )

( ).

k

k

k

k2

12

1 (4)

ClusterCoefn a a a

ai

i

i

il lm mim i ll i

ill i

= =( ) −{ }

≠≠

≠ ≠

∑∑∑ ∑π

,

aill i

2 2.. (5)

Page 2 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

Page 3 of 20(page number not for citation purposes)

Hierarchical clustering dendrogram and module definitionFigure 1Hierarchical clustering dendrogram and module definition. A) Drosophila PPI network. The dendrogram results from average linkage hierarchical clustering. The color-band below the dendrogram denotes the modules, which are defined as branches in the dendrogram. Of the 1371 proteins, 862 were clustered into 28 proper modules, and the remaining proteins are colored in grey; B) yeast PPI network; C) weighted gene co-expression network (yeast); D) unweighted gene co-expres-sion network (yeast). To facilitate a comparison between the weighted and the unweighted gene co-expression networks, we used the module assignment of C) in D). Note that the colors of C) tend to stay together in D), which illustrates high module preservation.

A B

0.0

0.2

0.4

0.6

0.8

1.0

Fly Protein−Protein Network

Colored by module membership

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Yeast Protein−Protein Network

Colored by module membership

C D

0.5

0.6

0.7

0.8

0.9

1.0

Yeast Co−expression Network: Soft Thresholding

Colored by module membership

0.0

0.2

0.4

0.6

0.8

1.0

Yeast Co−expression Network: Hard Thresholding

Colored by module membership of Soft Thresholding

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

ing coefficient and modular structure has been investi-gated by several authors [20,22-24].

The topological overlap between nodes i and j reflectstheir relative interconnectedness [20,25]. It is defined by

where lij = ∑u≠i,jaiuauj. In an unweighted network, lij equalsthe number of nodes to which both i and j are connected.In this case, TopOverlapij = 1 if the node with fewer connec-tions satisfies two conditions: (a) all of its neighbors arealso neighbors of the other node, and (b) it is connectedto the other node. In contrast, TopOverlapij = 0 if i and j areun-connected and the two nodes do not share any neigh-bors. By convention, TopOverlapii = 1. One can prove that0 ≤ aij ≤ 1 implies 0 ≤ TopOverlapij ≤ 1 [21].

The Topological Overlap Matrix Can Be Considered as Adjacency MatrixSince the matrix TopOverlap = [TopOverlapij] is symmetricand its entries lie in [0, 1], it satisfies our assumptions onan adjacency matrix. Roughly speaking, the topologicaloverlap matrix can be considered as a 'smoothed out' ver-sion of the adjacency matrix. The elements of TopOverlapprovide an alternative measure of connection strengthbased on shared neighbors. There is evidence that replac-ing A by TopOverlap may counter the adverse effects ofspurious or missing adjacencies [25,26]. Since the adja-cency matrices of the PPI networks in our applicationswere very sparse, we replaced them by the correspondingtopological overlap matrices. In contrast, we used theoriginal adjacency matrix when analyzing gene co-expres-sion networks since high specificity is desirable for meas-uring interconnectedness in co-expression networks.

The topological overlap matrix can be used for module definitionOur main interest lies in (sub-)networks comprised ofnodes that form a module inside a larger network. Since aparticular module network may encode a pathway or aprotein complex, these special types of networks havegreat practical importance. Similar to the term 'cluster', noconsensus on the meaning of the term 'module' seems toexist in the literature. In our applications, we use a cluster-ing procedure to identify modules (clusters) of nodes withhigh topological overlap. We follow the suggestion of [20]to turn the topological overlap matrix TopOverlap into adissimilarity measure by subtracting it from 1, i.e. dissTop-Overlapij = 1 - TopOverlapij.

We use dissTopOverlapij as input of average linkage hierar-chical clustering to arrive at a dendrogram (clustering

tree) [27]. Modules are defined as the branches of the den-drogram. For example, in Figure 1 we show the dendro-grams of our network applications. Genes or proteins ofproper modules are assigned a color (e.g. turquoise, blueetc). Genes outside any proper module are colored grey.Our module definition depends on how the branches arecut off the dendrogram. Several methods and criteria foridentifying branches in a dendrogram have been pro-posed, see e.g. [20,21,28]. In practice, it is advisable tostudy how robust the results are with respect to alternativemodule detection methods. In our online R software tuto-rial, we show that our findings are highly robust withrespect to alternative module definitions. In addition, weuse a functional enrichment analysis of the resulting mod-ules to provide indirect evidence that the modules are bio-logically meaningful. Our module detection approach hasled to biologically meaningful modules in several applica-tions [9,10,12,20,28-30] but we make no claim that it isoptimal. Our theoretical results will apply to all moduledetection methods that result in approximately factoriza-ble networks.

ResultsConformity and factorizable networksWe define an adjacency matrix A to be exactly factorizableif, and only if, there exists a vector CF with non-negativeelements such that

aij = CFiCFj for all i ≠ j (7)If the non-negative solution of equation (7) is unique, itis referred to as conformity vector CF and CFi is the con-formity of node i. One can easily show that the vector CFis not unique if the network contains only n = 2 nodes.However, for n > 2 it is unique for a weighted network, seeour derivations surrounding equation (20).

We also define the concept of conformity for a general,non-factorizable network. The idea is to find an exactlyfactorizable adjacency matrix ACF = CF CFτ - diag(CF2) + Ithat best approximates A. Note that the diagonal elementsof ACF and A equal 1.

In the appendix, we define the conformity as a maximizerof the factorizability function

. Alternative methods of

decomposing an adjacency matrix are briefly discussedbelow.

In equation (43), we define a measure of network factor-izability as follows

TopOverlapl a

k k aijij ij

i j ij=

+

{ } + −min ,,

1(6)

Fa v v

aA

ij i jj ii

ijj ii

( )( )

( )v = −

−≠

∑∑∑∑

1

2

2

Page 4 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

The factorizability F(A) is normalized to take on values inthe unit interval [0, 1]. The higher F(A), the better ACF - Iapproximates A - I.

Modules can be approximately factorizableApproximate factorizability is a very strong structuralassumption on an adjacency matrix. It certainly does nothold for general networks. However, we provide empiricalevidence that many clusters (modules) of genes or pro-teins in real networks are approximately factorizable.Table 1 reports the mean values of F(A) for the applica-tions considered in this paper. For example in the Dro-sophila PPI network, the mean factorizability F(A) is 0.82across 'proper' modules defined as clusters in the network.In contrast, the factorizability of the subnetwork com-prised of non-module nodes is only 0.17. In the yeast PPInetwork, the mean factorizability of proper modules is0.85 while it equals only 0.20 for the grey module. In theweighted yeast gene co-expression network, the mean fac-torizability of proper modules equals 0.73 while it is only0.18 for the improper module. Similarly in theunweighted yeast gene co-expression network, the meanfactorizability of proper modules equals 0.62 while it isonly 0.11 for the improper module. A more detailed tablepresenting network concepts in each module is also pro-vided [see Additional file 1].

Our empirical results support the following

Observation 1 For many modules defined with a clusteringprocedure, the subnetwork comprised of the module nodes isapproximately factorizable.

This observation motivates us to study network conceptsin approximately factorizable networks.

Conformity-based network conceptsWe refer to the standard network concepts known fromthe literature as fundamental network concepts. In general,fundamental network concepts are functions of the off-diagonal elements of the adjacency matrix A. More pre-cisely, we use network concept functions to define differenttypes of network concepts depending on the input matrix(see Table 2 and equation (21)). For example, wheninputting an adjacency matrix with its diagonal elementsreplaced by 0, one arrives at fundamental network con-cepts (see Definition 5 in the Methods section). Wheninputting the conformity-based (CF-based) adjacencymatrix ACF with its diagonal elements replaced by 0, onearrives at CF-based network concepts (see Definition 6 inthe Methods section). The conformity vector can be usedto define the approximate CF-based matrix

ACF,app = CF CFτ = [CFiCFj].

Note that the i-th diagonal element of ACF,app equals .

When ACF,app is used as input of a network concept func-

tion, one arrives at an approximate CF-based concept (seeDefinition 7 in the Methods section).

We will demonstrate that approximate CF-based conceptssatisfy simple relationships. Below, we show that thesesimple relationships carry over to fundamental networkconcepts in approximately factorizable networks.

In Definition 7, we provide a formula for calculatingapproximate CF-based analogs of the fundamental net-work concepts. Specifically, we find

F AA I A I

A I

CF F

F

( )( ) ( )

.= −− − −

−1

2

2

CFi2

Table 1: Summary of fundamental network concepts in real network applications.

Fly Protein Yeast Protein Yeast (Weighted) Yeast (Unweighted)

Concept Proper Grey Proper Grey Proper Grey Proper Grey

Factorizability .82 (.086) .170 .85 (.100) .200 .73 (.084) .180 .62 (.130) .110Density .21 (.074) .017 .28 (.120) .026 .08 (.056) .005 .40 (.150) .024Centralization .18 (.091) .052 .20 (.055) .036 .10 (.026) .021 .41 (.110) .140Heterogeneity .35 (.130) .460 .36 (.140) .430 .56 (.066) .580 .51 (.097) .830Mean Cluster Coef. .28 (.110) .050 .36 (.120) .093 .13 (.072) .032 .72 (.087) .370Mean Conformity .45 (.076) .130 .51 (.120) .150 .26 (.084) .062 .63 (.100) .120

Each network contained several proper modules. Non-module genes were grouped into a single (improper) grey module. For each concept, we report the mean and standard error across the proper modules. A more detailed table presenting network concepts in each module is also provided [see Additional file 1].

Page 5 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

where Sp(CF) = ∑i(CFi)p. Note that the approximate CF-based clustering coefficient does not depend on the i-index. This is why we sometimes omit this index and sim-ply write ClusterCoefCF,app.

Approximate CF-based network concepts satisfy simple relationshipsHere we demonstrate a major advantage of approximateCF-based network concepts: they exhibit simple relation-ships. Using the fact that S1(kCF,app) = S1(CF)2, and theapproximation n/(n - 1) ≈ 1, equations (8) imply the fol-lowing relationship

or equivalently,

Further, it is straightforward to derive a simple relation-ship between approximate CF-based topological overlap,connectivity and heterogeneity under the following mild

assumptions: and .

Specifically, we find

In the following subsection, we outline the conditionswhen equations (9) and (10) hold approximately for fun-damental network concepts in approximately factorizablemodule networks.

Relating fundamental- to approximate CF-based conceptsIn the Methods section, we provide a heuristic argumentfor the following

Observation 2 In approximately factorizable networks, funda-mental network concepts are approximately equal to theirapproximate CF-based analogs,

FundamentalNetworkConcept ≈ NetworkConceptCF,app.

The observation implies that in approximately factoriza-ble networks, Connectivity ≈ ConnectivityCF,app and Density ≈DensityCF,app, etc. Observation 2 is illustrated for networkdensity, centralization, heterogeneity, and clustering coef-ficients in Figure 2 (Drosophila PPI network), Figure 3(yeast PPI network), and Figure 4 (weighted andunweighted yeast gene co-expression networks; density isnot included due to limited space). A consequence of thisobservation is that the simple relationships satisfied byapproximate CF-based network concepts also apply totheir corresponding fundamental network concepts inapproximately factorizable networks. In particular, equa-tions (9) and (10) imply the following

Observation 3 In approximately factorizable networks,the following relationships hold among fundamental net-work concepts

mean(ClusterCoef) ≈ (1 + Heterogeneity2)2 × Density,(11)

and

k CF S

DensityS

n n

S

n

CF app i i

CF app

, ,

,

( ),

( )

( )

( )

=

=−

≈ ⎛⎝⎜

1

12

1

1

CF

CF CF ⎞⎞⎠⎟

=− −

2

1 1

1 2

,

( )

( )( )max( )

( ),Centralization

nS

n n

SCF app

CFCF

CF

nn

S

n

S

n

HeterogeneityCF app

⎛⎝⎜

⎞⎠⎟

≈ −⎛⎝⎜

⎞⎠⎟

=

1 1( )max( )

( ),

,

CFCF

CF

nnS

S

ClusterCoefS

SCF app i

2

12

2

1

1( )

( ( )),

( )

( ), ,

CF

CF

CF

CF

=⎛

⎝⎜

⎠⎟22

2

1

1

,

( ( ) )

min( , ) ( ), ,TopOverlapCF CF S

CF CF SCF app iji j

i j=

++

CF

CF 11 − CF CFi j,

(8)

HeterogeneityClusterCoef

DensityCF appCF app

CF app,

,

,,≈ −1

ClusterCoef Heterogeneity DensityCF app i CF app CF ap, , , ,( )≈ + ×1 2 2pp .

(9)

10

2S ( )CF≈

10

1

−≈

CF CF

CF CF Si j

i jmin( , ) ( )CF

TopOverlap CF CFS

S

CF SCF app ij i j

i, , max( , )

( )

( )

max( ( )≈ =2

1

1CF

CF

CF ,, ( )) ( )

( )

max( , )(

, , , ,

CF S

n

nS

S

k k

nHe

j

CF app i CF app j

1 2

12

1

CF CF

CF

≈ + tterogeneityCF app, ).2

(10)

TopOverlapk k

nHeterogeneityij

i j≈ +max( , )

( ).1 2

(12)

Table 2: Brief overview of different types of network concepts.

Input Matrix Type of Concept

Example: Connectivity

A - I fundamental Connectivityi(A - I) = ∑j≠iaij

ACF - I = CF CFτ - diag(CF2) CF-based Connectivityi(ACF - I) = CFi∑j≠iCFj

ACF,app = CF CFτ approximate CF-based

Connectivityi(ACF,app) = CFi∑jCFj

A network concept arises by evaluating a network concept function on a special type of input matrix. We assume that the diagonal elements of the matrix A - I are 0.

Page 6 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

Observation 3 is important since it highlights the fact thatseemingly disparate network concepts satisfy simple andintuitive relationships in approximately factorizable net-works. Equations (11) and (12) are illustrated in Figure 5(Drosophila PPI network), Figure 6 (yeast PPI network),and Figure 7 (weighted and unweighted yeast gene co-expression networks; TOM plots are not included due tolimited space). Equation (12) has several important con-sequences. To begin with, it illustrates that the topologicaloverlap between the most highly connected node and allother nodes is approximately constant. Specifically, if wedenote the index of the most highly connected node by[1] and its connectivity by k[1] = max(k), then

As an aside, we briefly mention that TopOverlap[1]j has asimple interpretation in terms of the hierarchical cluster-ing dendrogram that results from using dissTopOverlapij =1 -TopOverlapij as input. In this case, TopOverlap[1]j isrelated to the longest branch length in the dendrogram.

In the following, we relate TopOverlap[1]j to the funda-

mental network concept Centralization. According to

equation (3), ≈ Centralization + Density. Substi-

tuting this expression in equation (13) implies

TopOverlap[1]j ≈ (Centralization + Density)(1 + Heterogeneity2) (14)

Equation (14) is illustrated in Figure 5 (Drosophila PPInetwork), Figure 6 (yeast PPI network), and Figure 7(weighted and unweighted yeast gene co-expression net-works).

In factorizable networks, fundamental network concepts are simple functions of the connectivityHere we demonstrate another advantage of approximateCF-based network concepts. They allow one to relatefundamental network concepts to simple functions ofthe connectivity. Toward this end, note the followingsimple relationship between the conformity CF and theapproximate CF-based connectivity kCF,app:

TopOverlapk

nHeterogeneityj[ ]

[ ]( ).1

1 21≈ + (13)

max( )k

n

Yeast PPI module networks: the relationship between funda-mental network concepts NetworkConcept(A - I) (y-axis) and their approximate CF-based analogs NetworkConceptCF,app (x-axis)Figure 3Yeast PPI module networks: the relationship between fundamental network concepts Network-Concept(A - I) (y-axis) and their approximate CF-based analogs NetworkConceptCF,app (x-axis). This figure demonstrates Observation 2. A) Density versus DensityCF,app; B) Centralization versus CentralizationCF,app; C) Heterogeneity versus HeterogeneityCF,app; D) Intramodular clustering coeffi-cients ClusterCoefi versus ClusterCoefCF,app. In Figures A), B) and C), each dot corresponds to a module since these net-work concepts summarize an entire network module. In Fig-ure D), each dot corresponds to a node since these network concepts are node specific. A reference line with intercept 0 and slope 1 has been added to each plot.

A B

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0.1

0.2

0.3

0.4

0.5

0.6

R^2=0.99

Approximate CF−Based Density

Den

sity

0.05 0.10 0.15 0.20 0.25 0.30 0.35

0.05

0.10

0.15

0.20

0.25

0.30

0.35

R^2=0.71

Approximate CF−Based Centralization

Cen

tral

izat

ion

C D

0.2 0.4 0.6 0.8 1.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

R^2=0.88

Approximate CF−Based Heterogeneity

Het

erog

enei

ty

− −

−−

− −−−

− −−

−−

−−

−−− −

−−

−−

−−

−−

−−

−−

− −

0.1 0.2 0.3 0.4 0.5 0.6

0.0

0.2

0.4

0.6

0.8

Approximate CF−Based Clustering Coefficients

Clu

ster

ing

Coe

ffici

ents

oo

ooooo

ooooo

oo

oooo

o

oo

o

oooooo

o

o

o

ooo

o

o

o

ooo

ooo

o

oo

o

oo

o

oo

o

oooooo

o

ooo

o

oo

o

oo

o

oo

o

o

ooooooooo

oo

o

ooooooooo

oooooooo

o

o

ooo

o

o

o

oooo

o

o

o

ooo

ooooooooooooo

o

o

oo

o

ooo

oo

o

oooo

oo

oo

o

o

o

ooo

oo

ooo

ooo

ooooo

ooo

ooo

oooooooooo

oo

o

oooooo

o

ooooooooooo

oo

oo

o

ooo

oo ooo

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

oo

o

o

oo

o

oo

oo

oo

o

o

oo

oo

oo

oooo

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

oooo

o

o

o

ooo

oo

ooooo

o

oo

o

o

o

o

o

ooooo

o

oo

oo

oo

o

o

oo

o

o

o

ooo

o

ooo

o

o

o

o

oo

o

o

o

ooo

o

o

o

ooo

o

o

oo

o

o

o

o oo

ooooo

oooooooo

oo

oo

o

o

ooo

o

ooo

oo

o

ooo

o

o

ooo

o

o

o

o

o

oooooo

oo

o

oo

oo

o

o

oo

oo

oo

o

o

o

o

o

oo

o

oooo

oo

oo

oo

oooooooooooo

oooo

oooooooooooooooooo

o

o

o

ooooooooo

ooo

o

o

o

o

oo

o

o

o

o

o

oo

o

ooo

o

o

o

ooooooo

o

o

oo

o

o

o

o

oooo

o

o

o

o

o

o

o

o

o

oooo

o

o

o

oo

o

ooo

o

o

ooo

o

ooo

o

oo

oo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

ooooo

o

o

o

o

o

oo

oooo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

oo

ooooo

o

o

ooo

oo

ooo

ooo

o

oo

o

oo

o

o

o

o

oo

ooo

ooooo

oooooooo

oo

o

oo

oo

o

oo

oo

o

o

oooo

o

o

ooooo

o

ooo

o

oooo

o

o

o

o

o

o

o

ooooooooooooooo

oo

o

o

ooo

o

ooo

o

oooooo

ooooo

oo

o

ooo

ooooo

o

oooo

o

oo

o

ooo

o

o

o

ooooo

o

oooooooo

ooo

o

o

oo

o

oo

o

oo

ooo

oooooooooo

oooo

oo

ooooo

oo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

ooo

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

o

o

o

o

o

ooooooooo

oo

o

oo

o

ooo

o

oo

o

oo

o

oo

o

oo

o

o

o

o

o

o

o

ooo

o

oo

o

o

o

oo

o

o

o

o

o

ooo

o

oooo

o

ooo

ooooooo

o

o

oo

oooooooo

ooooo

o

o

oooooo

ooo

oo

o

oooo

o

oo

o

o

o

o

o

o

ooooo

o

ooo

ooooo

o

o

o

o

ooo

o

o

oo

oo

oo

ooo

oo

oo

oo

o

oo

o

o

o

o

ooo

o

o

oo

o

o

oo

o

o

oo

o

ooooo

o

o

oo

o

ooo

oo

oo

o

o

ooo

o

o

o

oo

oo

o

oooooo

oo

o

oo

o

oooo

oo

o

ooo

oo

oo

o

o

oooo

o

o

oo

oo

oo

o

oo

o

oo

o

o

oo

o

o

o

oo

o

oo

o

o

ooo

o

o

oo

o

oooo

o

o

o

oo

ooo

o

o

oo

o

o

o

ooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooo

o

o

ooooo

ooo

ooooooooooooooo

o

o

o

ooooooooo

ooo

o

oo

oooo

oo

o

o

ooo

o

o

o

o

oooooooooooo

oooo

o

o

o

o

o

o

o

oo

o

o

oooo

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

oo

o

oo

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

ooo

oooo

ooooo

o

o

oooo

o

o

oo

o

o

ooo

o

oo

o

oo

o

oooooooooooooooooooo

oooooo

o

oooooooooo

ooooo

o

oo

oo

o

oo

o

o

oooooo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

oo

oooo

o

o

o

oo

oo

o

o

o

o

oo

o

oo

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

ooooo

ooo

ooo

o

o

o

ooo

oo

ooooooooooooo

o

ooo

o

oooooooooo

ooooooooo

o

oo

ooo

o

ooo

oo

o

o

o

ooo

o

oo

o

oo

o

o

o

oo

oooooooooooo

o

o

o

o

o

oo

o

o

o

oooooo

o

o

ooo

oo

ooooo o

o

o

o

o

o

o

o

o

o

o

o

oo

oooo

o

o

o

ooo

o

o

o

o

o

oooo

o

oo

o

o

oo

oo

ooooooooo

o

ooo

o

ooo

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

oo

o

ooooo

o

oooo

oo

ooo

o

o

o

o

o

oo

o

o

ooo

o

o

o

o

o

ooooooooooooooo

oo

o

oooo

o

o

oo

o

o

oo

o

ooo

o

ooooo

o

o

o

o

ooo

o

oo

oooo

oo

o

ooo

o

oo

oo

o

o

ooo

oooo

oooooooo

o

ooo

ooo

oo

ooo

ooo

oooooooo

ooooooooo

oo o

oo

ooo

oo

oo

o

o

oo

oo

o

ooo

o

o

o

o

o

o

ooo

oo

o

o

oooooo

o

o

o

oooooo

ooo

ooooooo

o

o

o

ooo

oooooo

ooooooo

o

ooo

o

oo

o

o

o

o

oo

oo

o

oo

o

ooo

ooo

o

ooooo

o

ooooooooooooo

oooo

o

oo

o

ooooo

o

oooo

oo

o

o

oo

oo

o

oo

oo

ooooooooo

o

ooooo

o

oo

o

o

o

o

oooo

o

oooooo

o

o

o

oooo

o

oo

o

oooooooo

o

o

o

o

o

oooo

o

oooooo

ooooo

oooooo

o

o

o

o

oo

o

ooo

o

oo

o

o

o

o

o

oo

R^2=0.7

Drosophila PPI module networks: the relationship between fundamental network concepts NetworkConcept(A - I) (y-axis) and their approximate CF-based analogs NetworkConceptCF,app (x-axis)Figure 2Drosophila PPI module networks: the relationship between fundamental network concepts Network-Concept(A - I) (y-axis) and their approximate CF-based analogs NetworkConceptCF,app (x-axis). This figure demonstrates Observation 2. A) Density versus DensityCF,app; B) Centralization versus CentralizationCF,app; C) Heterogeneity versus HeterogeneityCF,app; D) Intramodular clustering coeffi-cients ClusterCoefi versus ClusterCoefCF,app. In Figures A), B) and C), each dot corresponds to a module since these net-work concepts summarize an entire network module. In Fig-ure D), each dot corresponds to a node since these network concepts are node specific. A reference line with intercept 0 and slope 1 has been added to each plot.

A B

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

R^2=0.99

Approximate CF−Based Density

Den

sity

0.1 0.2 0.3 0.4 0.5

0.1

0.2

0.3

0.4

R^2=0.9

Approximate CF−Based Centralization

Cen

tral

izat

ion

C D

0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.2

0.3

0.4

0.5

0.6

0.7

R^2=0.87

Approximate CF−Based Heterogeneity

Het

erog

enei

ty

−− −−

−−−

−−

−−

−− −−

−−− −−

−−−

−−

0.1 0.2 0.3 0.4 0.5 0.6

0.0

0.2

0.4

0.6

0.8

1.0

Approximate CF−Based Clustering Coefficients

Clu

ster

ing

Coe

ffici

ents

oooooooooo

ooo

ooo

ooo

o

ooooo

oo

o

o

oooo

oo

ooooo

ooooooo

o

ooo

o

o

ooooo

oo

ooo

oo

ooo

o

ooo

o

oo

oooo

oo

o

ooooooo

o

o

ooooo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

ooooo

o

oooo

oo

oo

o

o

o

o

o

o

o

ooo

oo

o

o

oo

o

ooo

o

o

o

o

o

oo

o

ooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

oo

o

ooo

o

oooo

o

o

o

ooo

oooooooo

oo

o

oooo

oooooo

oo

o

ooo

oooooooooo

o

ooo

o

ooo

ooo

o

o

ooooooo

oooooooooooooooooooooooooo

o

oooooooooooooooooooooooooo

oooo

ooo

ooo

oo

o

o

oo

o

oooo

oo

o

oooo

o

oooo

o

ooo

o

oo

o

ooo

o

oooooooo

o

oooo

oo

o

o

o

o

o

o

o

oo

o

oo

ooooooo

oo

oo

ooooooo

o

oo

oooo

oooo

o

o

o

o

o

o

o

o

oo

oo

o

oo

o

ooo

oo

o

ooo

oo

o

oo

o

ooooo

o

o

o

o

ooo

oo

oo

ooooo

o

oo

o

ooooooooooooooooo

oooooo

o

o

oooooooooooooo

oooooooooooooooooooo

o

o

ooooooooooo

o

o

ooooooo

o

oooooo

ooo

oo

o

oooooooooooo

oooooooo

ooo

ooooo

o

o

o

o

ooo

ooo

oo

oo

oooo

o

oo

o

oo

o

ooooooooooooooooooooooooooooooo

ooo

o

ooooooooooo

o

oooooooo

ooooooo

o

o

oo

oooooooooo

o

oooooooooo

ooooooooooo

o

o

o

ooooo

oo

oo

o

o

o

o

o

oooooo

o

o

oooooooo

oo

ooo

oooo

o

o

oooo

oooooooooo

o

o

oo

o

ooo

o

ooooo

o

oooooooo

ooooooooo

oooo

oo

ooooo

o

oooooooo

o

ooo

o

o

ooo

oo

o

o

oo

o

o

o

ooooooooo

ooooooooooooo

o

oo

o

ooooo

o

ooooooooooooooooooooooooooooooooooooo

o

oooooooooooooooooo

ooooooooo

oo

o

o

oooooooo

oooooo

o

o

o

oooo

oooooooooooooooooo

oooooooooooooooooooo

o

oooooooooooooo

o

oooo

o

oooooooooooo

o

oooooooooooooooooooooooooooo

o

oooooooooooooooooooooooooooooooooooooooooooooooo

o

ooooo

ooooo

oo

o

oooooooooooooooo

o

oo

o

oooooo

o

ooooooooooooooo

o

o

o

oo

o

oooooooooooooooooooooooooooooooooooooooooooooooooooo

o

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

o

oooooo

o

oooooo

oooooooooooooooooooooooooooooooooo

o

oooooooooooooooooooooooooooooooooooo

o

oo

o

ooo

R^2=0.87

Page 7 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

Since in approximately factorizable networks kCF,i ≈ ki, wefind that the conformity CF is approximately given by thescaled connectivity, i.e.

This equation shows that conformity can be interpreted asa scaled connectivity in approximately factorizable net-works. Since approximate CF-based network concepts are

simple functions of the conformity, substituting

for CF implies that approximate CF-based concepts can beapproximated by simple functions of the connectivity. Forexample, we find the following simple expressions for thecluster coefficient and the topological overlap.

Observation 4CFk

SiCF app i

CF app= , ,

,( ).

1 k(15)

CFk

k≈

S1( ). (16)

k

kS1( )

ClusterCoefS

S

TopOverlapk k S S

i

iji j

≈+

( ( ))

( ( )),

( ( ) (

22

13

2 1

k

k

k k)))

min( , ) ( ) ( )

( , ) ( )

( ),

k k S S k k

k k

n

S

Si j i j

i j

1 1

2

1k kkk+ −

≈max

Drosophila PPI module networks: the relationship between fundamental network conceptsFigure 5Drosophila PPI module networks: the relationship between fundamental network concepts. This figure demonstrates Observation 3 and equation (14). In Figures A) and B), each point is a protein colored by its module assign-ment, and the red line has intercept 0 and slope 1. Figure A) illustrates the relationship between the mean clustering coef-ficient (short horizonal line) and (1 + Heterogeneity2)2 * Den-sity (equation (11)). Figure B) illustrates the relationship between the topological overlap with the hub node and (Den-sity + Centralization) * (1 + Heterogeneity2) (equation (14)). Figure C) is a color-coded depiction of the topological over-lap matrix TopOverlapij in the turquoise module network. Fig-ure D) represents the corresponding approximation max(ki,kj)(1 + Heterogeneity2)/n (equation (12)). Figures E) and F) are their analogs for the brown module. The turquoise and the brown module represent the largest and third largest module. Analogous plots for the other modules can be found in our online supplement.

A B

−− −−

−−−

−−

−−

−− −−

−−

− −−

−−

−−−

−0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

(1+Heterogeneity^2)^2*Density

Clu

ster

ing

Coe

ffici

ent

oooooooo

oo

o

oo

oo

o

ooo

o

ooooo

oo

o

o

oo

oo

oo

o

oooo

oo

o

oo

o

o

o

ooo

o

o

oo

oo

o

oo

ooo

oo

ooo

o

ooo

o

oo

ooo

o

oo

o

ooooooo

o

o

ooo

oo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

ooooo

o

oooo

oo

oo

o

o

o

o

o

o

o

ooo

oo

o

o

oo

o

ooo

o

o

o

o

o

oo

o

ooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

oo

o

ooo

o

oooo

o

o

o

ooo

oooooooo

oo

o

o

ooo

o

ooooo

oo

o

ooo

o

oooooo

ooo

o

ooo

o

o

oo

ooo

o

o

oooooo

o

oooooooooooooooooooooooooo

o

oooo

ooooooooooooooooooooo

o

oooo

ooo

o

oo

oo

o

o

oo

o

oooo

oo

o

oooo

o

ooo

o

o

ooo

o

o

o

o

ooo

o

oooooo

oo

o

oooo

oo

o

o

o

o

o

o

o

oo

o

oo

ooo

oo

oo

oo

oo

ooooooo

o

oo

oooo

oooo

o

o

o

o

o

o

o

o

oo

oo

o

oo

o

oo

o

oo

o

ooo

oo

o

oo

o

ooooo

o

o

o

o

ooo

oo

oo

ooooo

o

oo

o

ooooooooooooooooo

oooo

oo

o

o

ooooo

ooooooooo

oooooooooooooooooooo

o

o

ooooooooooo

o

o

ooo

ooo

o

o

oooooo

ooo

oo

o

ooo

o

ooooooo

o

oooooooo

o

oo

oo

ooo

o

o

o

o

ooo

ooo

oo

oo

oooo

o

o

o

o

oo

o

ooo

ooooooooooooooooooooo

ooooooo

ooo

o

ooo

oooooooo

o

oooooooo

ooooooo

o

o

o

o

o

oooo

o

oooo

o

oooooooooo

ooooooooooo

o

o

o

ooooo

o

o

oo

o

o

o

o

o

o

ooooo

o

o

oooooooo

oo

oo

o

ooo

o

o

o

oooo

oooooooooo

o

o

oo

o

ooo

o

ooo

oo

o

ooooooo

o

ooooooooo

oooo

oo

ooooo

o

ooooooo

o

o

ooo

o

o

oo

o

oo

o

o

oo

o

o

o

ooooooo

oo

o

ooo

ooooooooo

o

oo

o

ooooo

o

oooooooooooooooooooooooooooooooooo

ooo

o

ooooooooooooo

o

oooo

o

oooooooo

oo

o

o

ooooooo

o

oooooo

o

o

o

oooo

ooooooooooooooooo

o

oooooooooooooooooooo

o

oooooooooooooo

o

oooo

o

oooooooooooo

o

ooooooooooooooooooooooooooo

o

o

ooooooooooooooooooooo

o

oooooooooooooooooooooooooo

o

oooo

o

ooooo

oo

o

o

ooooooooooooooo

o

oo

o

oooooo

o

ooooooooooooooo

o

o

o

oo

o

oooooooo

oo

ooo

oooooooooooooooo

o

oooooooooooooooooooooo

o

oooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooo

o

oooooo

o

oooooo

oooooooooooooooooooooooooooooooooo

o

oooooooooooooooooooooooooooooooooooo

o

oo

o

ooo

R^2=0.87

o

o

o

o

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

oo

ooo

oo

ooo

ooo

oo

o

o

o

oooo

o

oo

oo

o

o

o

ooo

ooooo

o

o

oo

o

ooo

o

o

o

o

o

oooo

o

o

o

oo

o o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

oo

oo

oooo

ooo

o

oo

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

oo

ooo

o

oooo

oooo

oo

ooo

ooo

ooo

o

oooo

o

oo

oo

ooooo

o

o

o

o

o

oooooooo

o

oooooooooooo

o

oooo

ooo

o

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

o

o

oooooo

o

o

oo

o

oooo

o

o

o

o

oooo

oo

ooo

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

ooooooooo

o

oooo

oo

o

o

oo

ooooo

o

oo

o

o

o

o

o

ooo

o

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

ooo

o

oo

oo

o

o

oo

ooo

oo

oooooooooooooooooooooooooooooooooooo

o

o

o

ooooo

ooo

o

oo

oo

oooo

o

oo

o

o

o

oooooo

oo

o

ooooo

ooo

ooo

o

oo

oo

o

oo

oooo

oooo

o

oo

oo

ooo

oooo

o

o

oo

ooo

o

o

o

oo

o

ooooooo

ooo

o

o

oo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

oo

ooo

oo

o

ooooooo

o

oooooo

o

oo

o

ooooooo

ooo

o

o

o

o

ooo

o

oo

o

o

o

o

o

oo

o

ooo

o

o

o

oo

o

o

o

o

oo

o

oooo

o

oooo

o

o

oo

ooooo

oo

oo

o

oooo

o

o

o

o

o

o

o

o

o

ooo

oo

oo

o

o

o

o

o

oooooo

o

ooooooooo

oo

o

o

o

o

oo

o

o

ooo

o

ooo

oo

o

oo

ooooo

oo

o

ooo

o

o

o

o

oo

o

o

o

o

ooo

ooo

o

oooooo

o

o

o

o

o

o

o

o

o

oooo

o

o

o

o

oo

oooooo

o

o

o

o

oo

o

ooo

o

o

o

ooooooooooo

oooooooo

o

ooo

o

oo

oo

o

oo

o

o

oo

o

ooo

oo

oo

oooooo

oooo

ooo

ooo

oo

oooo

o

oo

o

o

o

o

oo

o

o

o

o

ooooo

o

oo

oooooo

o

oo

o

oo

o

oooo

o

oooo

oooooo

o

o

oooo

o

ooooo

ooo

o

ooooo

o

o

oo

o

o

ooo

ooo

o

o

o

o

o

ooo

o

oo

o

o

oo

oo

oo

o

o

o

oooo

o

oo

o

ooooooooo

oooo

oo

oo

o

o

oo

oooo

o

o

o

o

o

ooo

ooo

ooooooo

o

o

o

o

ooooooo

oo

o

o

o

o

oooooo

o

ooo

o

ooo

ooooooo

o

o

o

o

oo

o

oooo

o

oooo

o

oo

o

ooooooo

o

o

o

ooooo

ooo

oo

o

oo

o

o

oo

o

o

o

oo

oo

o

oo

o

oo

o

o

oooo

o

oo

o

o

o

o

oo

o

oo

oo

ooooo

o

o

ooooo

o

ooooooo

ooooo

o

o

o

ooo

ooo

oo

oooo

ooo

o

oooo

o

o

o

o

o

oo

oo

o

oooo

o

oo

oo

o

ooo

o

oooo

o

oo

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

oooooo

o

oooooo

ooo

o

oooooo

o

o

o

ooo

ooo

o

o

oo

oo

o

ooo

o

o

ooooo

o

o

oo

ooo

oo

ooo

oo

o

o

o

o

ooo

ooo

o

ooo

o

ooooo

o

o

oo

oo

oo

o

oo

o

o

oooo

ooo

o

oo

o

o

0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

(Centralization+Density)*(1+Heterogeneity^2)

TO

M w

ith th

e H

ub N

ode

R^2=0.91

C D E F

Yeast gene co-expression module networks: the relationship between fundamental network concepts NetworkConcept(A - I) (y-axis) and their approximate CF-based analogs Network-ConceptCF,app (x-axis)Figure 4Yeast gene co-expression module networks: the rela-tionship between fundamental network concepts NetworkConcept(A - I) (y-axis) and their approximate CF-based analogs NetworkConceptCF,app (x-axis). This figure demonstrates Observation 2. A reference line with intercept 0 and slope 1 has been added to each plot. The fig-ures on the left (right) hand side depict network concepts from the weighted (unweighted) network. A) and B) Central-ization versus CentralizationCF,app; C) and D) Heterogeneity versus HeterogeneityCF,app; E) and F) Intramodular clustering coefficients ClusterCoefi versus ClusterCoefCF,app. The analogous plots for Density are not presented since the fundamental network concepts and their approximate CF-based analogs are almost identical and the dots fall near the reference line with R2 = 1 for both weighted and unweighted networks, and thus are omitted due to limited space. In Figures A), B), C) and D), each dot corresponds to a module since these net-work concepts summarize an entire network module. In Fig-ure E) and F), each dot corresponds to a node since these network concepts are node specific.

A B

0.04 0.06 0.08 0.10 0.12 0.14 0.16

0.02

0.04

0.06

0.08

0.10

0.12

0.14

p=7, R^2=0.99

Approximate CF−Based Centralization

Cen

tral

izat

ion

0.15 0.20 0.25 0.30 0.35 0.40

0.2

0.3

0.4

0.5

tau=0.65, R^2=0.97

Approximate CF−Based Centralization

Cen

tral

izat

ion

C D

0.5 0.6 0.7 0.8 0.9 1.0

0.50

0.55

0.60

0.65

p=7

Approximate CF−Based Heterogeneity

Het

erog

enei

ty

0.4 0.6 0.8 1.0 1.2 1.4 1.6

0.3

0.4

0.5

0.6

0.7

0.8

tau=0.65

Approximate CF−Based Heterogeneity

Het

erog

enei

ty

E F

−−−−

−−−

0.05 0.10 0.15 0.20 0.25 0.30

0.00

0.10

0.20

0.30

Approximate CF−Based Clustering Coefficients

Clu

ster

ing

Coe

ffici

ents

o

o

o

o

o

ooooooo

oo

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

o

oo

ooo

o

o

o

o

o

ooo

oo

o

o

oo

ooo

o

oo

o

oooo

o

oo

oooooo

ooooooo

oo

o

oo

o

oo

o

o

oooooooooo

o

ooooooooooo

oo

o

o

oo

oo

oo

oo

oo

o

o

oooo

o

oo

o

o

ooooo

oo

o

oo

o

ooo

oo

oo

ooooo

o

ooooo

o

o

oo

o

ooooo

oo

o

ooo

o

o

o

ooo

oo

ooooo

o

oo

ooo

o

ooooooooo

ooo

o

o

o

oooo

o

o

o

o

o

oo

o

o

oo

o

oo

o

ooo

o

o

ooo

o

o

o

oooo

oooo

o

ooo

oooooooo

oooooooo

o

ooooo

oooooooo

o

oooooo

oo

o

ooooo

o

o

o

o

oo

ooooo

o

oooooooo

o

o

oooo

o

oooooo

o

ooo

oo

o

oooooooo

oooooooo

o

ooooo

o

ooooooo

ooooooo

o

oooo

oo

oo

oo

o

oo

ooo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

oo

ooooo

o

oooo

o

o

o

o

o

ooooo

ooo

o

ooo

ooo

oo

o

o

oooo

ooo

o

o

oo

oo

ooooo

o

o

oooooooo

oo

o

oo

o

ooooo

o

o

oooo

oo

ooooooooooooo

o

oooooooooooo

o

oooooo

o

oooooo

o

o

o

oooooo

o

ooo

o

ooooooo

o

oooo

o

oo

o

oooooooooooooooooooo

ooooooo

ooooooooooooooooooooooooooooo

o

o

o

ooooo

o

ooooooooo

o

oooooooo

ooo

o

o

o

ooooooo

o

ooo

o

ooooooooooo

o

ooooooooooo

o

oooo

o

ooooo

ooooooooooooooooooooooooo

o

ooo

o

oooooooo

ooooo

o

o

oo

oooooooooo

oooooo

o

ooooo

o

oooooo

o

o

oooooooooooo

ooo

o

o

ooooooooooooo

o

oooooooo

oooooooooooooooooooooooooooooo

oooooooo

o

ooooooooooo

oo

oooo

ooooo

ooo

ooooo

o

oo

o

oooooooooooooooooooo

ooooooo

o

oooooooooo

o

oo

o

o

o

ooo

o

o

ooooooooooooooooooooooo

o

ooooooo

o

ooo

ooooooo

ooooooooooooooo

oo

ooo

o

o

oo

o

ooo

o

oooo

o

o

o

o

ooo

oooooo

ooo

o

ooo

o

ooooooooooooo

o

ooooooooo

o

o

oooooooo

o

ooooooooo

o

oooo

ooooooooooooooooooo

o

oo

oooooo

o

oo

ooooo

ooooooooooooooooooooo

o

ooooooo

oooooooooooooooo

o

oooo

o

ooooooooooooooooo

ooo

o

oooo

o

oooooooooooooo

o

oo

oooooooo

o

ooo

o

oo

o

ooooooooooooooooo

o

ooooooooooo

o

ooooo

oo

ooooooooo

ooo

ooooooooooooooo

o

oo

ooooooooooo

oooooooooooooo

o

ooooooooo

o

o

o

o

ooo

o

ooooooooooooooooooooooo

o

ooooooo

o

ooooooo

o

oooooooooooo

o

o

ooo

ooooo

oo

ooooo

oooooooooo

o

ooooooo

o

oooooo

o

o

o

oo

o

o

ooo

o

oo

o

o

o

o

o

o

o

oo

o

oo

o

o

o

ooo

o

o

oooooooo

ooo

ooo

o

oo

o

o

o

oooo

ooo

ooooo

o

oo

o

oooo

o

ooooooo

oo

oooo

o

o

oo

o

ooooo

o

o

oooo

oo

oo

oo

oooooo

o

oo

o

ooo

o

ooooo

o

oo

oooooooo

ooo

oooooooooooooo

oooooooooooo

o

oooooo

o

ooo

ooooooooooooooooo

oo

oooooooooo

ooooo

oooo

o

o

ooooo

ooo

o

o

o

ooooooooooooo

o

oooooooooooo

oooooooooo

ooooooooooooooooo

oooooooo

oo

o

oooooooo

oooo

ooooooooooooooooo

oooooooo

o

ooooooooooooooooooo

oo

o

o

o

oooo

ooooooooooooooooo

ooooo

oooo

oo

ooooo

o

o

oooooooooo

o

ooo

ooo

ooooo

oooooooo

o

ooo

o

oo

ooooooooooooooooooooo

o

oo

o

ooooo

oo

o

o

o

oooooo

oooo

ooo

o

ooooooooooooooooooooooo

o

oo

oooo

oooooooooo

oooooooooo

o

ooooo

p=7, R^2=0.82

−− −

−−−

0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Approximate CF−Based Clustering Coefficients

Clu

ster

ing

Coe

ffici

ents

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

o

o

ooo

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

oo

oo

o

ooo

o

ooo

o

o

ooo

o

ooo

ooo

oo

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

oo

o

o

ooo

ooo

ooo

o

o

oo

o

o

o

oo

o

oo

o

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

ooo

o

o

o

o

o

o

o

o

o

oooo

o

o

o

oo

o

o

ooo

o

oo

ooo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

oo

o

o

o

oo

o

o

oooo

o

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

ooo

oo

ooo

o

o

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

oo

o

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oooo

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oooo

oo

o

o

oo

o

o

o

o

o

ooo

ooo

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

o

o

oo

o

o

o

ooo

o

oo

o

o

o

o

o

o

oooo

o

o

o

o

o

o

o

ooo

o

ooo

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

oooooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

o

o

o

o

o

ooo

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

o

o

o

o

o

o

o

oooo

oo

o

oo

o

oo

o

o

o

ooo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

oo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

oo

ooo

oo

o

o

ooo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

oo

o

ooo

oo

o

oooo

oo

o

o

o

oo

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

oo

o

o

o

oo

o

o

oooo

o

ooo

o

o

oo

o

o

o

oo

o

o

oo

o

o

ooo

o

o

o

ooo

o

oo

oo

ooo

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

ooo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

oooo

o

o

o

oo

o

oo

o

o

oo

o

o

o

oo

o

oo

oo

o

o

o

oo

oo

o

o

oo

o

oo

o

o

o

o

oo

ooooo

o

oo

o

oo

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

ooo

o

o

o

o

o

ooo

o

o

o

o

o

o

oo

oo

oooo

o

o

o

o

oo

o

o

oo

o

o

o

oo

ooo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

oo

o

o

oo

o

oo

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

oo

o

o

o

o

o

o

o

ooo

o

o

o

ooo

o

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

oo

ooo

oooo

o

oo

ooo

oooo

o

o

ooooooo

ooooo

o

o

o

o

o

oo

o

oo

oo

oooooo

o

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

o

o

oo

oo

oo

oo

oo

oo

o

oo

oo

oo

o

oo

oooo

o

o

o

o

o

o

o

ooo

o

o

oo

o

o

o

oo

o

o

o

o

ooooo

o

o

oo

oo

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

ooooo

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

ooo

o

oo

oooo

oo

o

oo

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

oo

ooo

o

o

o

oo

oo

o

o

ooo

o

oo

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

oo

ooo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

oo

ooo

o

oo

oooo

o

o

o

o

o

o

oo

o

o

o

o

o

oo

oo

o

oo

o

ooo

o

o

o

o

o

o

o

o

o

ooooo

o

ooo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

oo

oo

oo

oo

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oooo

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

tau=0.65, R^2=0.6

5

Page 8 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

where the last approximation assumes and

Protein-protein interaction and gene co-expression network applicationsDrosophila and yeast protein-protein networkTo illustrate our results, we computed network conceptsin module networks based on Drosophila and yeast pro-tein-protein interaction (PPI) networks downloaded fromBioGrid [31]. As described before, we defined the mod-ules as branches of the hierarchical clustering dendro-gram, see Figure 1.

Of the 1371 proteins in the Drosophila PPI network, 862were clustered into 28 modules, and the remaining pro-teins grouped into an improper (grey) module. The mod-ule sizes of the proper modules range from 10 to 96, mean30.79, median 23, and interquartile range 24.

Of the 2292 proteins in the yeast PPI network, 2050 wereclustered into 44 proper modules, and the remaining pro-teins grouped into an improper module. The module sizesof the proper modules range from 10 to 219, mean 46.59,median 24, and interquartile range 38.8.

Yeast gene co-expression networksWe now illustrate our theoretical results using gene co-expression networks that have been used by manyauthors, e.g. [11,21,32]. Gene co-expression networks areconstructed on the basis of microarray data from the tran-scriptional response of cells to changing conditions. Thereis evidence that genes with similar expression profiles aremore likely to encode interacting proteins [33,34].

In gene co-expression networks, nodes correspond to geneexpression profiles. The corresponding adjacency matrixis determined from a measure of co-expression betweenthe genes. In the examples below, we will use the absolutevalue of the Pearson correlation coefficient between thegene expression profiles to measure co-expressionbetween gene pairs. As detailed at the end of the Methodssection, one can transform the Pearson correlation matrixinto an adjacency matrix by hard thresholding or softthresholding. Hard thresholding results in an unweighted

S

S1

20

( )

( )

kk

S k k

k k Si j

i j

1

10

( )

min( , ) ( )

k

k

−≈

Yeast PPI module networks: the relationship between funda-mental network conceptsFigure 6Yeast PPI module networks: the relationship between fundamental network concepts. This figure demonstrates Observation 3 and equation (14). In Figures A) and B), each point is a protein colored by its module assign-ment and the red line has intercept 0 and slope 1. Figure A) illustrates the relationship between the mean clustering coef-ficient (short horizonal line) and (1 + Heterogeneity2)2 * Den-sity (equation (11)). Figure B) illustrates the relationship between the topological overlap with the hub node and (Den-sity + Centralization) * (1 + Heterogeneity2) (equation (14)). Figure C) is a color-coded depiction of the topological over-lap matrix TopOverlapij in the turquoise module network. Fig-ure D) represents the corresponding approximation max(ki,kj)(1 + Heterogeneity2)/n (equation (12)). Figures E) and F) are their analogs for the brown module. The turquoise and the brown module represent the largest and third largest module. Analogous plots for the other modules can be found in our online supplement.

A B

−−−

−−

− −−−

−−

−−

− −

−−−

−−

−−

−−

−−

−−

−−

−−

−−

0.1 0.2 0.3 0.4 0.5 0.6

0.0

0.2

0.4

0.6

0.8

(1+Heterogeneity^2)^2*Density

Clu

ster

ing

Coe

ffici

ent

oo

oo

ooo

oooo

o

oo

oooo

o

oo

o

ooo

o

oo

o

o

o

ooo

o

o

o

oo

o

ooo

o

oo

o

oo

o

oo

o

ooooo

o

o

o

oo

o

oo

o

oo

o

oo

o

o

ooooo

oooo

oo

o

ooooooooo

oooooooo

o

o

o

oo

o

o

o

ooo

o

o

o

o

ooo

oooooo

ooooooo

o

o

oo

o

ooo

oo

o

oooo

oo

oo

o

o

o

ooo

oo

ooo

ooo

ooooo

ooo

ooo

oooo

oooooo

oo

o

o

ooooo

o

oooo

ooooooo

oo

oo

o

ooo

o

o ooo

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

o

oo

oo

oo

o

ooo

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

oooo

o

o

o

ooo

oo

o

oooo

o

oo

o

o

o

o

o

ooooo

o

o

o

oo

oo

o

o

o

o

o

o

o

ooo

o

ooo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

ooo

o

o

oo

o

o

o

ooo

ooooo

oooooo

oo

oo

oo

o

o

ooo

o

ooo

o

o

o

ooo

o

o

ooo

o

o

o

o

o

oooooo

o

o

o

oo

oo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

oo

oo

oo

oooooooooooo

oooo

oooo

ooooooooo

ooooo

o

o

o

ooooooooo

ooo

o

o

o

o

oo

o

o

o

o

o

oo

o

ooo

o

o

o

ooooooo

o

o

oo

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

ooo

o

ooo

o

oo

oo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oooo

o

o

o

o

o

oo

o

ooo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

oo

ooooo

o

o

ooo

oo

ooo

ooo

o

oo

o

oo

o

o

o

o

oo

oo

o

o

oo

oo

ooo

oo

oo

o

oo

o

oo

o

o

o

o

o

oo

o

o

oooo

o

o

o

oooo

o

ooo

o

oooo

o

o

o

o

o

o

o

oooooo

oo

oooo

oo

o

oo

o

o

oo

o

o

oo

o

o

oooooo

ooooo

oo

o

o

oo

o

o

ooo

o

oo

oo

o

o

o

o

oo

o

o

o

o

o

oooo

o

oooooooo

ooo

o

o

oo

o

o

o

o

oo

o

oo

oooooo

oo

oo

oooo

oo

ooooo

oo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

ooo

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

o

o

o

o

o

ooooooooo

oo

o

o

o

o

oo

o

o

oo

o

oo

o

oo

o

oo

o

o

o

o

o

o

o

ooo

o

oo

o

o

o

oo

o

o

o

o

o

ooo

o

oooo

o

ooo

oo

ooo

oo

o

o

oo

oooooooo

o

ooo

o

o

o

ooooo

o

ooo

oo

o

ooo

o

o

oo

o

o

o

o

o

o

oooo

o

o

ooo

oo

ooo

o

o

o

o

ooo

o

o

o

o

oo

oo

ooo

o

o

oo

oo

o

oo

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

oo

o

ooooo

o

o

oo

o

ooo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

ooooo

o

oo

o

oo

o

oooo

oo

o

oo

o

oo

o

o

o

o

oooo

o

o

oo

oo

oo

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

oo

o

o

oo

o

oooo

o

o

o

oo

ooo

o

o

o

o

o

o

o

ooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooo

ooooooooooooooooooooooooo

o

o

o

o

ooo

o

oo

o

oooooooooooooo

o

o

o

ooo

oooooo

ooo

o

o

o

o

ooo

oo

o

o

oo

o

o

o

o

o

oooo

oo

ooooo

o

o

ooo

o

o

o

o

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

ooo

ooo

o

ooooo

o

o

oooo

o

o

oo

o

o

o

oo

o

oo

o

oo

o

oooooooooooo

oooooooo

oooooo

o

oooooooooo

ooooo

o

oo

oo

o

oo

o

o

o

ooooo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

oo

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

ooooo

ooo

ooo

o

o

o

ooo

o

o

ooooooooooooo

o

ooo

o

o

ooooooooo

ooooooooo

o

oo

ooo

o

ooo

o

o

o

o

o

o

oo

o

oo

o

oo

o

o

o

oo

oooooooo

oooo

o

o

o

o

o

oo

o

o

o

o

ooo

oo

o

o

ooo

oo

ooo

o

o o

o

o

o

o

o

o

o

o

o

o

o

oo

oooo

o

o

o

oo

o

o

o

o

o

o

oooo

o

oo

o

o

oo

oo

oooooooo

o

o

ooo

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

ooooo

o

oooo

o

o

ooo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

ooooooo

o

ooooooo

oo

o

o

ooo

o

o

oo

o

o

oo

o

oo

o

o

ooooo

o

o

o

o

o

oo

o

oo

oooo

oo

o

ooo

o

oo

oo

o

o

ooo

oooo

o

ooooooo

o

ooo

o

o

o

o

o

o

oo

o

oo

ooooooo

o

ooooooooo

oo o

oo

ooo

oo

oo

o

o

oo

oo

o

ooo

o

o

o

o

o

o

ooo

oo

o

o

oo

o

oo

o

o

o

o

o

ooooo

ooo

ooooooo

o

o

o

ooo

ooooo

o

ooooooo

o

oo

o

o

oo

o

o

o

o

oo

oo

o

oo

o

ooo

oo

o

o

ooooo

o

ooooooooooooo

oooo

o

oo

o

ooooo

o

oooo

oo

o

o

oo

o

o

o

oo

oo

ooooo

oooo

o

ooooo

o

oo

o

o

o

o

oooo

o

oooooo

o

o

o

oooo

o

oo

o

oooooooo

o

o

o

o

o

oo

oo

o

oooooo

ooooo

oooo

oo

o

o

o

o

oo

o

ooo

o

oo

o

o

o

o

o

oo

R^2=0.74

ooooo

oo

o

o

o

oo

o

ooo

oo

ooooooo

ooooooooooooooooooooooooo

oo

o

ooo

ooo

o

ooooooooo

o

ooo

oo

o

ooo

o

oooo

oo

ooooooooo

ooooo

oo

o

o

o

ooooooooo

o

oooooooooooooooo

o

o

oooooooo

ooooooo

o

oo

oooo

o

oooooo

o

oooo

o

oooo

ooo

ooo

o

ooo

ooo

o

o

o

oo

o

oooooooo

o

oo

oo

oooo

o

oo

ooooo

oo

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

ooooo

oooo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

oo

o

o

o

o

oo

oo

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

ooo

o

ooo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

oooo

o

o

o

o

oooo

o

o

o

ooooo

o

o

o

o

o

o

oooo

o

oo

o

o

oo

o

o

o

o

o

o

o

ooo

o

ooo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

ooo

o

o

oo

o

o

o

o

oo

oooooooooooooooooo

o

ooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooo

o

ooo

oooooooooooooooooooooooooooooo

o

o

o

o

o

o

o

ooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

ooo

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

ooo

o

oo

oo

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

oo

oo

ooo

o

oo

o

oooo

o

o

o

o

o

oo

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooooo

oooo

oo

o

ooo

o

oo

o

oooo

oooo

ooooo

ooooo

o

ooooooo

o

oooo

o

o

o

o

oooo

oo

o

o

o

o

o

oooo

o

o

oo

o

oooo

o

o

oooo

ooooo

ooooooooo

o

oooooooo

o

o

o

o

o

o

ooo

oooo

o

ooooo

oo

ooo

oo

o

oo

o

o

o

oooo

ooo

oo

ooo

o

o

oooooo

o

ooo

oo

oo

o

o

o

o

o

o

oo

o

o

o

o

oo

ooo

o

o

o

oo

o

o

ooo

o

oo

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oooo

oooo

ooo

o

oo

o

ooo

o

oo

o

o

o

o

oo

o

oo

o

oo

o

o

o

ooo

o

o

oo

o

o

o

oo

o

o

o

o

o

oo

oooo

oo

o

o

oooo

o

ooo

o

o

o

ooooooo

o

oooooooo

o

oooo

o

oooo

o

oo

oo

oo

o

oo

ooo

o

o

oo

o

o

o

o

o

o

ooooooooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

oo

o

oo

o

o

o

oooo

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

oooo

o

o

oo

o

ooooooo

ooo

o

o

o

o

o

o

o

o

ooo

oo

o

o

ooo

o

o

o

o

o

oo

oo

o

ooo

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

ooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooo

ooooooooooooooooooooooooo

o

o

oooooo

ooooo

oo

ooooooo

oooo

oo

ooooo

o

oo

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

oooo

oo

o

o

o

ooo

oo

o

o

oo

o

oo

o

oo

o

o

o

oo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

oo

o

oo

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oooooooo

o

oo

o

o

o

o

ooo

o

o

oooooo

o

o

oo

oo

o

o

oo

o

ooo

oooo

oooo

ooooooo

oooo

ooooo

ooooooo

oo

o

oo

o

oo

oo

o

oo

o

oooooo

oo

o

o

o

oo

o

o

ooo

oo

o

o

ooo

o

o

o

oo

oo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

ooo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

oo

oooo

o

oo

ooo

o

o

o

o

o

o

oo

o

oo

ooooo

ooooo

o

o

oo

o

o

ooooo

o

o

o

o

ooooooo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

oooooo

o

oo

o

o

o

o

o

oo

o

o

o

o

o

oo

oo

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

o

oooo

ooo

oo

oo

ooo

o

ooooooooooooo

o

o

o

oo

o

o

o

o

o

oo

oooo

o

oooo

o

o

o

o

oooo

oooo

oo

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

ooo

o

o

o

o

o

o

o

ooooo

o

oo

oooooo

o

o

oo

oo

o

oo

o

o

ooo

oo

o

o

o

o

o

oo

o

oo

oooo

o

o

o

o

oo

o

o

ooo

o

oo

o

o

oooo

oo

o

ooo

o

o

o

oo

o

o

oo

ooooo

o

o

o

ooo

o

o

o

oo

o

oooo

o

oooo

o

o

o

o

o

o

o

oo

oo

oo

oooooo

oo

ooo

o

oo

o

o

ooo

ooooo

o

o

oooo

ooooooo

ooo

o

ooooooo

ooo

o

ooooo

o

o

o

oooo

ooo

o

ooo

ooo

o

o

oooo

oo

o

ooo

oooo

oo

oooooo

o

o

o

o

oo

ooo

oooooo

ooooooooooooooooooo

o

o

o

oo

oooo

oooo

ooooooo

o

oooo

oo

oooooooo

o

o

oo

oo

o

o

o

oooooooooo

o

ooo

o

oo

oo

oooooooo

oo

o

o

oo

oooo

ooo

ooo

oo

o

oooooooooooooo

oo

o

o

0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.0

0.2

0.4

0.6

0.8

(Centralization+Density)*(1+Heterogeneity^2)

TO

M w

ith th

e H

ub N

ode

R^2=0.76

C D E F

Yeast gene co-expression module networks: the relationship between fundamental network conceptsFigure 7Yeast gene co-expression module networks: the rela-tionship between fundamental network concepts. This figure demonstrates Observation 3 and equation (14). The figures on the left (right) hand side depict network con-cepts from the weighted (unweighted) network. Each point is a gene colored by its module assignment. The red line has intercept 0 and slope 1. Figures A) and B) illustrate the rela-tionship between the mean clustering coefficient (short hori-zonal line) and (1 + Heterogeneity2)2 * Density (equation (11)). Figure C) and D) illustrates the relationship between the top-ological overlap with the hub node and (Density + Centraliza-tion) * (1 + Heterogeneity2) (equation (14)).

A B

−−−−

−−−

0.00 0.05 0.10 0.15 0.20 0.25 0.30

0.00

0.10

0.20

0.30

(1+Heterogeneity^2)^2*Density

Clu

ster

ing

Coe

ffici

ent

o

o

o

o

o

ooooooo

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

o

o

ooo

o

o

o

o

oo

ooo

o

oo

o

oooo

o

oo

ooooo

o

oo

ooooo

oo

o

o

o

o

oo

o

o

oooooooooo

o

oooooooooo

o

o

o

o

o

oo

oo

o

o

oo

oo

o

o

oooo

o

oo

o

o

ooooo

oo

o

oo

o

o

oo

oo

oo

oooo

o

o

ooooo

o

o

oo

o

o

oo

oo

oo

o

ooo

o

o

o

ooo

oo

ooooo

o

o

o

ooo

o

oooo

o

oooo

ooo

o

o

o

oooo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

ooo

o

o

ooo

o

o

o

oooo

ooo

o

o

ooo

oooooooo

oooooooo

o

oo

oo

o

oooooooo

o

oooooo

oo

o

ooooo

o

o

o

o

oo

ooooo

o

oooooooo

o

o

oooo

o

oooo

oo

o

oo

o

o

o

o

oo

o

ooooo

oooooooo

o

o

ooo

o

o

ooooooo

ooooooo

o

oooo

oo

oo

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

ooooo

o

oooo

o

o

o

o

o

ooooo

ooo

o

o

oo

ooo

o

o

o

o

ooo

o

ooo

o

o

oo

oo

ooooo

o

o

oooo

ooo

o

oo

o

oo

o

oooo

o

o

o

oooo

oo

oooooo

oo

ooooo

o

oooooooooooo

o

oooooo

o

oooooo

o

o

o

oooooo

o

ooo

o

oooooo

o

o

oooo

o

oo

o

oooooo

oooooooooooooo

oooooo

o

oooooooooooo

oooo

o

oooooo

oooooo

o

o

o

ooooo

o

ooo

oooooo

o

ooo

ooooo

ooo

o

o

o

ooooooo

o

ooo

o

oooooooooo

o

o

oooo

ooooooo

o

oooo

o

ooooo

ooooooooooo

oooooooooooooo

o

ooo

o

oooooooo

ooooo

o

o

oo

ooooo

ooo

oo

o

ooooo

o

ooooo

o

oooooo

o

o

o

ooooooooooo

ooo

o

o

oooo

ooooooooo

o

oooooooo

oooo

oo

oooooooooooooo

oooooooooo

o

ooooooo

o

oooooooooo

o

o

o

oooo

ooooo

ooo

o

oooo

o

oo

o

oooooooooooooooooooo

ooooo

oo

o

oooooooooo

o

o

o

o

o

o

oo

o

o

o

oooooooooooooooooooooo

o

o

ooo

o

ooo

o

ooo

ooooo

o

o

o

ooooooo

ooooooo

oo

o

oo

o

o

oo

o

ooo

o

oooo

o

o

o

o

ooo

o

ooooo

o

oo

o

ooo

o

ooooooooooooo

o

ooooooooo

o

o

oooooooo

o

ooooooooo

o

oooo

ooooooooooooooooooo

o

oo

oooooo

o

oo

o

oooo

ooooooo

oooooooooooooo

o

ooooooo

ooooooooooooooo

o

o

oooo

o

o

oooooooooooooooo

oo

o

o

oooo

o

oooooooooooooo

o

oo

o

oooooo

o

o

ooo

o

oo

o

ooooooooooooooooo

o

ooooooo

oooo

o

ooooo

oo

ooooooooo

ooo

o

oooooooooooooo

o

oo

ooooooooooo

oooooooooo

oooo

o

ooooooooo

o

o

o

o

oo

o

o

ooooooooooooooooooooooo

o

oooooo

o

o

oooo

ooo

o

oo

oooooooooo

o

o

ooo

oo

ooo

oo

o

o

ooo

oooooooooo

o

ooooooo

o

o

ooooo

o

o

o

oo

o

o

oo

o

o

oo

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oooooooo

o

o

o

oo

o

o

oo

o

o

o

ooo

o

o

oo

ooo

oo

o

oo

o

oooo

o

ooooooo

oo

ooo

o

o

o

oo

o

o

oooo

o

o

oooo

o

o

oo

oo

o

ooo

o

o

o

oo

o

ooo

o

ooooo

o

oo

ooooooo

o

o

oo

o

oooooooooooo

o

oooooooooooo

o

oooooo

o

ooo

ooooooooooo

oooooo

oo

oooooooooo

oo

ooo

oooo

o

o

ooo

oo

ooo

o

o

o

o

o

oooo

ooooooo

o

oooo

oooooooo

o

oooo

ooo

o

o

oooooooo

o

ooooooo

o

oooooooo

o

o

o

oooooooo

oooo

ooo

oooooo

ooooooo

o

oooooooo

o

oooo

o

ooooo

ooooooooo

oo

o

o

o

oooo

oo

ooooo

ooooo

ooooo

ooooo

o

ooo

oo

ooooo

o

o

oooooooooo

o

oo

o

ooo

oooo

o

oooooooo

o

ooo

o

oo

oooooooooo

ooooo

oooooo

o

oo

o

oo

ooo

o

o

o

o

o

oooooo

o

ooo

ooo

o

ooooooooooooooooooooooo

o

oo

ooo

o

oooooooooo

oooooooo

oo

o

ooooo

p=7, R^2=0.82

−− −

−−−

0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

(1+Heterogeneity^2)^2*Density

Clu

ster

ing

Coe

ffici

ent o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

o

o

ooo

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

ooo

o

o

ooo

o

ooo

ooo

oo

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

ooo

ooo

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

oo

o

o

o

oo

o

oo

ooo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oooo

o

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

ooo

oo

ooo

o

o

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

oo

o

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

ooo

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oooo

oo

o

o

oo

o

o

o

o

o

ooo

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

oo

o

o

o

o

o

o

oooo

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

ooooo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

oo

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

ooo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

oo

o

oooo

oo

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

oo

o

o

o

oo

o

o

oooo

o

oo

o

o

o

oo

o

o

o

oo

o

o

oo

o

o

ooo

o

o

o

ooo

o

oo

oo

o

oo

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

ooo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

ooo

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

ooo

oo

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

o

o

oo

o

o

o

oo

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

oo

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

oo

o

oo

oooo

o

oo

o

oo

oooo

o

o

ooooooo

oo

ooo

o

o

o

o

o

oo

o

oo

oo

oooo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

ooooo

o

o

oo

oo

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

ooooo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

ooo

o

oo

o

oo

o

oo

o

oo

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

oo

ooo

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

oo

ooo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

oo

ooo

o

oo

ooo

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

oo

o

oo

o

ooo

o

o

o

o

o

o

o

o

o

ooooo

o

ooo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

tau=0.65, R^2=0.6

C D

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

ooooo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

oo

o

o

o

o

o

o

ooo

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oooo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

oooo

oo

o

oo

o

oo

o

ooo

oo

oo

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

ooo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

ooo

o

oo

oo

o

o

ooo

o

o

ooo

oo

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

oooo

o

ooo

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

oooo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

ooo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

o

oo

o

oo

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

oo

o

oo

o

o

o

oo

oo

o

o

o

o

o

o

o

oo

o

ooo

o

o

o

o

o

oo

o

o

o

o

ooooo

o

oooooo

o

oo

o

o

o

o

o

o

oo

ooo

o

oo

ooo

oooo

o

o

o

o

ooooooo

oo

oo

o

o

o

o

oo

o

o

o

o

oo

o

o

o

oooo

o

oo

oo

o

o

o

ooo

o

o

o

o

o

ooo

o

o

o

ooo

oo

o

o

o

o

o

oooo

o

o

o

oooo

oo

o

oo

o

ooo

o

o

oo

o

ooo

o

o

o

o

o

o

oo

oo

o

ooo

o

o

o

o

o

ooo

o

o

o

o

o

ooo

o

o

ooo

oo

o

o

o

oooo

o

oooooo

ooo

o

ooo

o

oo

o

o

o

o

oo

oooo

oooo

o

o

o

ooo

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

ooo

o

o

oooo

o

o

o

o

o

o

o

o

oooooo

oo

ooo

o

ooo

o

oo

o

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

oooo

oo

o

ooooo

o

oo

o

o

o

o

oo

oo

o

o

o

oo

oooo

oo

o

oo

oo

o

o

ooo

o

o

oo

ooo

ooo

oo

o

o

o

oo

o

o

oo

o

o

o

oooo

o

o

o

o

oooo

o

o

oo

o

oo

ooo

ooo

oo

o

o

o

o

o

o

o

oo

oo

oo

oo

o

oo

o

o

o

o

o

o

oo

o

ooo

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

oo

o

o

o

oo

o

oo

o

o

o

o

o

ooo

o

o

o

o

o

o

oo

o

o

oo

o

o

oooo

oooo

o

o

o

o

oo

o

o

o

o

oo

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

oo

o

ooo

oo

o

o

oooo

oo

o

oo

o

oo

o

o

o

oo

ooo

o

o

o

o

o

oo

o

o

o

o

o

o

ooo

o

ooo

o

o

o

o

oooo

o

ooo

oo

oo

o

ooo

o

oo

o

oo

ooooo

ooo

oo

o

o

o

oo

o

o

o

ooooo

ooo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

o

o

oo

ooo

ooo

o

oo

o

o

o

o

oooo

o

o

ooo

o

o

o

o

o

ooo

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

oooo

o

o

oo

oooo

oo

o

o

oooo

o

o

oo

ooo

o

o

o

o

oo

o

o

o

oo

o

ooooo

o

o

o

oo

oo

o

o

ooo

o

ooo

o

o

o

oo

o

o

ooo

ooo

oo

o

oo

o

o

oo

o

ooo

o

oooooo

oo

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

o

ooo

o

o

o

o

o

o

oo

o

o

ooooo

o

o

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

oo

oo

o

oo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

oo

o

o

oo

o

oo

oooo

oo

o

o

ooo

o

oooo

oo

o

o

o

o

oooo

o

oooooo

ooo

o

oo

oo

oo

o

o

ooooo

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

ooo

o

o

o

o

oo

o

o

o

o

o

oo

o

o

oo

o

ooooo

o

oo

o

oo

o

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

oo

o

o

oo

oo

oo

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

oo

ooo

o

ooo

o

o

o

o

oo

o

oo

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

ooooo

o

o

o

o

ooo

oo

o

o

oo

o

oooo

o

o

oo

oo

o

o

o

oo

oo

oo

o

o

o

o

o

o

o

ooo

oo

o

o

o

o

o

o

o

oo

o

oo

o

oo

o

o

oo

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

oo

oo

oo

ooo

oo

ooo

o

o

o

oo

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

o

o

oooo

o

oo

oo

oo

o

o

o

ooo

o

oo

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

ooo

ooo

o

o

o

oo

oo

o

oo

ooo

oo

oo

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oooo

o

o

oo

o

o

o

ooo

o

o

oo

o

oo

ooooo

o

o

oo

oo

ooooo

o

o

o

o

o

ooo

oo

oo

o

oo

o

o

ooo

o

oo

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

oooo

ooo

o

o

o

o

o

o

o

o

oo

oo

ooo

o

oooo

o

oooo

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

ooo

o

o

o

o

o

oo

oo

o

o

ooo

o

o

o

o

o

oo

oo

o

o

o

oo

oo

o

oo

o

oo

ooo

0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

(Centralization+Density)*(1+Heterogeneity^2)

TO

M w

ith th

e H

ub N

ode

p=7, R^2=0.81

oo

o

o

o

o

o

o

oo

o

o

o

o

o

ooo

o

oooo

oo

oo

o

o

oo

oo

o

o

o

o

o

o

o

oo

ooo

oo

o

oo

ooo

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

ooo

o

ooo

o

oo

o

oooo

oo

o

oooo

o

o

o

o

o

oooo

o

o

o

oooo

o

o

o

oooooooo

oo

o

o

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

oo

o

ooo

o

o

o

o

o

oo

oo

o

o

o

oo

ooo

ooo

o

o

o

oo

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oooo

o

oooooo

oo

oo

o

o

o

ooooooo

ooo

oo

oo

o

o

o

o

o

ooo

o

o

o

o

oo

oo

ooo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

oo

oo

ooooo

ooo

oo

o

o

o

o

o

o

ooo

oo

o

ooo

o

o

o

o

o

oo

ooo

o

ooo

oo

oo

o

ooo

o

o

o

o

o

oo

oo

o

oooo

o

oo

o

oo

oo

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

o

oo

o

oo

o

ooo

o

o

oo

o

o

o

oo

ooooo

o

o

o

oo

o

o

ooo

o

ooo

o

oo

o

o

oooo

o

o

ooo

o

o

oo

o

oooo

oo

o

oo

o

o

o

oo

o

o

o

o

ooo

oo

o

ooo

ooo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

oooo

o

o

oo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

ooooo

o

o

o

o

ooo

o

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

ooo

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

oooo

o

o

o

o

ooooooo

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

oooo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

oo

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

ooo

o

oooo

oo

o

o

o

ooo

o

o

oooooo

o

oo

o

ooo

o

oo

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

ooo

o

oooo

o

o

o

ooo

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

oooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

ooo

o

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

o

oooo

o

ooo

oo

oo

o

o

o

o

o

oo

o

o

o

ooooo

o

o

o

oo

o

o

o

oo

o

o

o

ooooo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

ooo

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

o

ooo

o

oo

o

o

o

o

o

o

o

o

o

oo

o

oooooo

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

ooooo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

ooo

o

o

o

oo

o

oo

o

o

o

o

oo

o

o

ooooo

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oooo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oooo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

ooo

o

o

ooo

o

o

o

o

o

o

o

o

oo

oooooooooooooooooo

o

oooooo

o

oooo

o

o

ooo

o

oo

o

ooooo

o

oooooooo

oo

oooooooo

o

o

ooo

ooo

o

o

o

oo

o

o

ooooooo

o

oo

o

o

o

oo

ooooooooooo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

ooo

o

o

o

o

oo

o

o

oo

o

oo

ooo

o

oo

ooo

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

oo

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

oo

o

o

oo

ooo

o

oo

oo

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

oooo

o

o

o

o

o

oo

ooo

oo

o

oo

o

o

oooo

o

o

o

o

o

o

o

ooooo

oo

o

o

o

o

o

o

o

oo

oo

o

oo

o

ooooo

o

oo

o

o

o

oo

o

o

o

ooooooo

o

oo

ooooooo

oo

oo

o

o

ooooooo

o

oo

ooo

o

o

o

o

o

o

o

o

o

o

oo

oooooo

o

o

o

oo

o

o

o

o

o

o

ooooo

o

ooo

o

o

oo

oo

o

o

o

o

oooo

oo

o

oo

oo

ooo

o

o

oo

o

o

oo

o

oo

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

oo

oo

o

o

oo

oo

o

oo

ooo

oo

ooooo

ooo

o

oo

oooo

o

oooo

oo

o

oooo

oo

o

o

o

o

ooo

o

o

oo

oo

o

o

oo

o

o

oo

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

ooo

o

o

o

ooo

oo

o

o

o

o

o

o

o

oooo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

oooo

oo

o

o

o

o

o

o

ooo

0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(Centralization+Density)*(1+Heterogeneity^2)

TO

M w

ith th

e H

ub N

ode

tau=0.65, R^2=0.74

Page 9 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

network and soft thresholding results in a weighted net-work [21]. We applied our methods to a yeast cell cyclemicroarray data comprised of 44 microarrays and 2001genes. This dataset recorded gene expression levels duringdifferent stages of cell cycles in yeasts and has been widelyused before to illustrate clustering methods [35].

Of the 2001 genes (microarray probesets) in the weightedyeast gene co-expression network, 1081 were clusteredinto 8 proper modules. The module sizes of the propermodules range from 53 to 308, mean 135.1, median101.5, and interquartile range 69.3. To facilitate a com-parison between the weighted and the unweighted geneco-expression networks, we used the module assignmentof the weighted network for the unweighted network aswell. It turns out that the module assignment is highlypreserved between the weighted and the unweighted geneco-expression networks, see Figures 1C) and 1D).

Functional annotation of modulesSince the scope of this paper is a mathematical and topo-logical analysis of module networks, we defined moduleswithout regard to external gene ontology information.Also we do not provide an in-depth analysis of the biolog-ical meaning of the network modules. But we briefly men-tion that there is indirect evidence that most of theresulting modules are biologically meaningful. We usedthe functional gene annotation tools from the Databasefor Annotation, Visualization and Integrated Discovery(DAVID) [36] to test for both enriched biochemical path-ways and subcellular compartmentalization. We find thatmost modules are significantly enriched with known geneontologies. A functional enrichment analysis for each net-work application is provided. For the Drosophila PPI net-work, [see Additional file 3]; for the yeast PPI network,[see Additional file 4]; for the weighted and unweightedyeast gene co-expression networks, [see Additional file 5].

Empirical relationships in 4 different networksIn accordance with Observation 2, we find a close rela-tionship (R2 ≥ 0.6) between the fundamental networkconcepts and their approximate CF-based analogs. Specif-ically, we relate the network density, centralization, heter-ogeneity and clustering coefficients to their approximateCF-based analogs in Figures 2 (Drosophila PPI network),Figure 3 (yeast PPI network), and Figure 4 (weighted andunweighted yeast gene co-expression networks).

In accordance with Observation 3, we find a close rela-tionship (R2 ≥ 0.6) between the mean clustering coeffi-cient mean (ClusterCoef ) and (1 + Heterogeneity2)2 ×Density. Further, we find a close relationship between Top-Overlap[1]j and (Centralization + Density)(1 +Heterogeneity2), see Figures 5 (Drosophila PPI network),Figure 6 (yeast PPI network), and Figure 7 (weighted andunweighted yeast gene co-expression networks).

We find that our theoretical observations fit better in theweighted- than in the unweighted yeast gene co-expres-sion network.

Network concepts and module sizeSince the number of genes inside a module (module size)varies greatly among the modules, it is natural to wonderwhether the reported relationships between network con-cepts are due to the underlying module sizes. We find thatthe relationship between fundamental network conceptsand their approximate CF-based analogs remains highlysignificant even after correcting for module sizes [seeAdditional file 2]. The same holds for the relationshipsbetween network concepts. Thus, none of the reportedrelationships is trivially due to module sizes. But we findthat many network concepts depend on the underlyingmodule size. We find that large modules are less factoriz-able than small modules: there is a strong negative corre-lation between module factorizability F(A) and modulesize. We also find that fundamental network concepts(e.g. density) depend on module size in our applications.For the factorizability, density, centralization, heterogene-ity and mean clustering coefficient, the correlation coeffi-cients with module size are -0.84, -0.46, -0.17, 0.26, and -0.36 in Drosophila PPI module networks; they are -0.55,-0.52, 0.05, 0.5, and -0.44 in yeast PPI module networks;they are -0.93, -0.52, -0.82, 0.27, and -0.55 in weightedyeast gene co-expression module networks; they are -0.86,-0.77, -0.56, 0.87, and -0.85 in unweighted yeast gene co-expression module networks. A more detailed analysis ispresented in the Additional files [see Additional file 2].

A simple exactly factorizable network example: constant network

A simple, exactly factorizable network is given by an adja-

cency matrix A with constant adjacencies (aij = b, b ∈ (0,

1]). The adjacency matrix is exactly factorizable since aij =

CFiCFj where CFi = . This network can be interpreted as

the expected adjacency matrix of an Erdös-Rényi network[37]. One can easily derive the following expressions forthe fundamental network concepts: Connectivityi = (n - 1)b,

Density = b, Centralization = 0, Heterogeneity = 0, Cluster-Coefi = b and TopOverlapij = b.

Since A is exactly factorizable, the fundamental networkconcepts equal their CF-based analogs. However, theapproximate CF-based concepts are different from theirexact counterparts, see Table 3. For reasonably large valuesof n, the fundamental network concepts are very close totheir approximate CF-based analogs. This illustratesObservation 2. With the results in Table 3, one can easilyverify Observation 3 and equation (16) in this example.

b

Page 10 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

Example: block diagonal adjacency matrixIn the following, we will consider a block diagonal adja-cency matrix where each block has constant adjacencies,i.e.

We assume that the first and second blocks have dimen-sions n1 × n1 and n2 × n2, respectively. Such a block diago-nal matrix can be interpreted as a network with twodistinct modules. Setting n2 = 0 results in the simple con-stant adjacency matrix, which we considered before.

We denote by f1 = (1, 1,..., 1, 0, 0, ..., 0) the vector whosefirst n1 components equal 1 and the remaining compo-nents equal 0. Similarly, we define f2 = (0, 0, ...,0, 1, 1, ...,1) = 1 - f1. To simplify the calculation of the conformity,we further assume that

Then the conformity is uniquely defined by

as one can show using equations (36) and (37) in theappendix. Further, using Proposition 10 in the appendix,one can show that the factorizability is given by

A

b b

b b

b b

b b

b

=

1 0 0 0

1 0 0 0

1 0 0 0

0 0 0 1

0 0 0 1

1 1

1 1

1 1

2 2

2 bb

b b

2

2 20 0 0 1

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

. (17)

n n b

n n b2 2 2

2

1 1 12

1

11

( )

( ).

−−

< (18)

CF f= b1 1,

F An n b

n n b n n b( )

( )

( ) ( ).= −

− + −1 1 1

2

1 1 12

2 2 22

1

1 1(19)

Table 4: Network concepts in the simulated block-diagonal network.

Concept Fundamental CF-based Approx CF-based

Connectivityi

Density

Centralization

Heterogeneity

TopOverlapij

ClusterCoefi

The indicator function Ind(·) takes on the value 1 if the condition is satisfied and 0 otherwise.

( ) ( )n b Ind n b Indi n i n1 1 2 21 11 1

− + −≤ > ( )n b Indi n1 111

− ≤ n b Indi n1 1 1≤

n n b n n b

n n n n1 1 1 2 2 2

1 2 1 2

1 1

1

( ) ( )

( )( )

− + −+ + −

n n b

n n n n1 1 1

1 2 1 2

1

1

( )

( )( )

−+ + −

n b

n n n n12

1

1 2 1 2 1( )( )+ + −

n n b n b

n n n n2 1 1 2 2

1 2 1 2

1 1

1 2

(( ) ( ) )

( )( )

− + −+ − + −

( )

( )( )

n n b

n n n n1 2 1

1 2 1 2

1

1 2

−+ − + −

n n b

n n n n1 2 1

1 2 1 21 2( )( )+ − + −

( )[ ( ) ( )

[ ( ) ( ) ]

n n n n b n n b

n n b n n b1 2 1 1

212

2 22

22

1 1 1 2 2 22

1 1

1 1

+ − + −− + −

n

n2

1

n

n2

1

b Ind b Indi j n i j n1 21 1, ,≤ >+ b Indi j n1 1, ≤ bn b

n bIndi j n1

1 1

1 1

1

1 1 1

+− + ≤( ) ,

b Ind b Indi n i n1 21 1≤ >+ b Indi n1 1≤ b Indi n1 1≤

Table 3: Network concepts in the constant Erdös-Rényi network.

Network Concepts Fundamental Approximate CF-based

Connectivityi (n - 1)b nb

Density b

Centralization 0 0Heterogeneity 0 0

TopOverlapij b

ClusterCoefi b b

bn

n −1

bnb

n b

+− +

11 1( )

Page 11 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

In particular, if n1 ≈ n2 and b1 = b2, i.e. if the adjacencymatrix is comprised of two nearly identical blocks, the fac-torizability is F(A) ≈ 1/2. Similarly, one can show that ifthe matrix A is comprised of B identical blocks, then F(A)≈ 1/B.

This block diagonal network allows one to arrive atexplicit formulas for fundamental-, CF-based-, andapproximate CF-based network concepts, see Table 4.

In the following, we study the relationship between fun-damental network concepts and their approximate CF-based analogs in the limit when the block diagonal net-work becomes approximately factorizable. Specifically,we calculate network concepts in the limit b2 → 0 when n1,n2 and b1 are kept fixed. Under this assumption, b2→ 0 isequivalent to F(A) → 1. Then, one can easily show that

For reasonably large values of n1 (say n1 > 20), these limitsillustrate Observation 2. Similarly, one can easily verifyObservation 3 and equation (16) in the case when the fac-torizability F(A) is close to 1 and n1 is reasonably large.

DiscussionThis paper does not describe a new software or method forconstructing networks. Instead, it presents theoreticalresults which clarify the mathematical relationshipbetween network concepts in module networks. A deeperunderstanding of network concepts may guide the dataanalyst on how to construct and use networks in practice.Our results will pertain to any network that is approxi-mately factorizable irrespective of its constructionmethod. While the term 'factorizable' network is new,numerous examples of these types of networks can befound in the literature, e.g. [38]. A recent physical modelfor experimentally determined protein-protein interac-tions is exactly factorizable [39]. In that model, the 'affin-ity' aij between proteins i and j is the product of thecorresponding conformities. The conformities are approx-imately given by CFi = exp(-Ki) where Ki is the number of

hydrophobic residues in the i-th protein. Another relatedexample is an exactly factorizable random network modelfor which the edges between pairs of nodes are drawnaccording to a linking probability function [40,41].

We find that in many applications, the conformity ishighly related to the first eigenvector of the adjacencymatrix. The idea of using a variant of the singular valuedecomposition for decomposing an adjacency matrix hasbeen proposed by several authors [42-45]. However, weprefer to define the conformity as a maximizer of the fac-

torizability function for

the following reasons: First, the factorizability satisfiesthat FA(CF) = 1 if, and only if, A is exactly factorizable net-

work with aij = CFiCFj. Second, we prefer to define the con-

formity without reference to the diagonal elements aii of

the adjacency matrix. Third, the definition naturally fitswithin the framework of least squares factor analysiswhere conformity can be interpreted as the first factor[46]. An algorithm for computing the conformity in gen-eral networks is presented in the appendix. While networkanalysis focuses on the adjacency matrix, factor analysistakes as input a correlation or covariance matrix. In mod-ule networks, the first factor (conformity) corresponds toa normalized connectivity measure, see equation (16).Future research could explore the network interpretationof higher order factors.

The topological structure of complex networks has beenthe focus of numerous studies, e.g. [7,8,16-18,20,38,47].Here we explore the structure of special types of networks,which we refer to as module networks.

To derive results for factorizable module networks, wedefine several novel terms including a measure of networkfactorizability F(A), conformity, CF-based network con-cepts, approximate CF-based network concepts.

The first result (Observation 1) uses both PPI and gene co-expression network data to show empirically that subnet-works comprised of module nodes are often approxi-mately factorizable. This insight could be interesting toresearchers who develop module detection methods.Approximate factorizability is a very stringent structuralassumption that is not satisfied in general networks.While modules in gene co-expression networks tend to beapproximately factorizable if the corresponding expres-sion profiles are highly correlated, the situation is morecomplicated for modules in PPI networks: only afterreplacing the original adjacency matrix by a 'smoothed

lim ,

lim

( ), ,

( )

F Ai CF app i

F A

Connectivityn

nConnectivity

= −1

1

1

1

11

1

1

1

1

1

1

Densityn

nDensity

Centralizationn

CF app

F A

= −

= −→

,

( )

,

limnn

Centralization

Heterogeneity Heterogenei

CF app

F A

1

1

,

( )

,

lim→

= tty

TopOverlapn b

n bTopOverlap

CF app

F Aij

,

( )

,

lim( )

→= − +

+1

1 1

1 1

1 1

1 CCF app ij

F Ai CF app iClusterCoef ClusterCoef

, ,

( ), ,

,

lim .→

=1

Fa v v

aA

ij i jj ii

ijj ii

( )( )

( )v = −

−≠

∑∑∑∑

1

2

2

Page 12 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

out' version (the topological overlap matrix), do we findthat the resulting modules are approximately factorizable.

The second result (Observation 2) shows that fundamen-tal network concepts are approximately equal to theirapproximate CF-based analogs in approximately factoriz-able networks (e.g. modules). While fundamental net-work concepts are defined with respect to the adjacencymatrix, approximate CF-based network concepts aredefined with respect to the conformity vector. The closerelationship between fundamental and approximate CF-based concepts in module networks can be used to pro-vide an intuitive interpretation of network concepts inmodules. We demonstrate that these high correlationsbetween module concepts remain significant even afteradjusting the analysis for differences in module size [seeAdditional file 2].

The third result (Observation 3) shows that the meanclustering coefficient is determined by the density and thenetwork heterogeneity in approximately factorizable net-works. Further, the topological overlap between twonodes is determined by the maximum of their respectiveconnectivities and the heterogeneity. Thus, seemingly dis-parate network concepts satisfy simple and intuitive rela-tionships in these special but biologically important typesof networks.

The fourth result (Observation 4) is that in approximatelyfactorizable networks, fundamental network concepts canbe expressed as simple functions of the connectivity.Under mild assumptions, we argue that the clusteringcoefficient and the topological overlap matrix can beapproximated by simple functions of the connectivity.

Our empirical data also highlight how network conceptsdiffer between subnetworks of 'proper' modules and thesubnetwork comprised of improper (grey) module nodes,see Table 1. For all applications, we find that proper mod-ules have high factorizability, high density, high meanconformity. Based on our theoretical derivations, it comesas no surprise that proper modules also have a high aver-age clustering coefficient and a high centralization whencompared to the improper module. But we find no differ-ence in heterogeneity between proper and improper mod-ule networks.

As a consequence of approximate factorizability, networkconcepts with disparate meanings in social network the-ory are closely related in module networks. Our resultsshed some light on the relationship between network con-cepts traditionally used by social scientists (e.g. centraliza-tion, heterogeneity) and concepts used by systemsbiologists (e.g. topological overlap). For example, equa-tion (13) shows that in module networks, the topological

overlap between a hub gene and other module genes isrelated to the centralization.

ConclusionUsing several protein-protein interaction and gene co-expression networks, we provide empirical evidence thatsubnetworks comprised of module nodes often satisfy animportant structural property, which we call 'approximatefactorizability'. In these types of networks, simple rela-tionships exist between seemingly disparate network con-cepts. Several network concepts with very differentmeanings in general networks turn out to be highly corre-lated across modules. These results are pertinent for sys-tems biology since a biological pathways may correspondto an approximately factorizable module network.

MethodsThe adjacency matrix and notationWe study the properties of an adjacency matrix (network)A that satisfies the following three conditions:

(A.1) A is symmetric and has dimension n × n.

(A.2) The entries of A are bounded within [0, 1], that is, 0≤ aij ≤ 1 for all 1 ≤ i,j ≤ n.

(A.3) The diagonal elements of A are all 1, that is, aii = 1for all 1 ≤ i ≤ n.

Matrix and vector notation

We will make use of the following notations. We denoteby ei the unit vector whose i-th entry equals 1 and by 1 the

'one' vector whose components all equal 1. The Frobenius

matrix norm is denoted by . The

transpose of a matrix or vector is denoted by the super-script τ . For any real number p, we use the notation Mp andvp to denote the element-wise power of a matrix M and avector v respectively. We define the function Sp(·) for a

vector v as Sp(v) = ∑i = (vp)τ 1. Further denote by I the

identity matrix and by diag(v2) a diagonal matrix with its

i-th diagonal component given by ,i = 1, ...,n. We

define the maximum function max(M) as the maximumentry of matrix M and max(v) as the maximum entry ofthe vector v. Similarly we define the minimum functionmin(·). Also, we define mean(v) = S1(v)/n and variance(v)

= S2(v)/n - (S1(v)/n)2.

M mF ijji= ∑∑ 2

vip

vi2

Page 13 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

Uniqueness of the conformity for an exactly factorizable networkOne can easily show that the vector CF is not unique if anexactly factorizable network contains only n = 2 nodes.However, for n > 2 the conformity is uniquely definedwhen dealing with a weighted network where aij > 0.

Specifically, we prove the following statement. If A is an n× n (n ≥ 3) dimensional adjacency matrix with positiveentries (aij > 0), then the system of equations in (7) has atmost one solution CF with positive entries. If the solutionexists, it is given by

where denotes the 'product connectivity' of

the i-th node.

Proof: by assumption, we have aij = CFiCFj for a positive

vector CF and n ≥ 3. Multiplying both sides of equation(7) yields

.

Since is positive, we find

. Similarly, eliminat-

ing the i-th row and column from A yields

. Since , we conclude that CFi is

uniquely defined by

Network concept functions and fundamental network conceptsIn general, we define a network concept function to be a ten-sor valued function (e.g. the connectivity vector) that

takes a square matrix (e.g. the network adjacency matrix)as input.

Denote by M = [mij] a general n × n matrix. Then we willstudy the following network concept functions:

where the components of matrix BM in the denominatorof the clustering coefficient function are given by bij = 1 ifi ≠ j and bii = Ind(mii > 0). Here the indicator functionInd(·) takes on the value 1 if the condition is satisfied and0 otherwise.

For the sake of brevity, we study only a limited selectionof network concept functions and do not claim that theseare more important than others studied in the literature.Our general formalism for relating fundamental networkconcepts to their approximate CF-based analogs shouldallow the reader to adapt our derivations to alternativeconcepts as well.

Now we are ready to define the fundamental network con-cepts that are studied in this article.

Definition 5 (Fundamental Network Concept) The fun-damental network concepts of a network A are defined by eval-uating the network functions (equation (21)) on A - I, i.e.

FundamentalNetworkConcept = NetworkConcept(A - I).

As special cases of this definition, we find the followingconcepts. The connectivity (also known as degree) of thei-th node is given by

CFp

pi

i

mmn n

n

=

( )⎛

⎜⎜⎜⎜

⎟⎟⎟⎟=

∏ 1

1 2 1

12

/( ( )), (20)

p ai ijjn= =∏ 1

a CFCF CFlml mm l ml mm lln n

≠ ≠ =

−∏∏ ∏∏ ∏= = ( )1

2 1( )

CFlln=∏ 1

CF alln

lml mmn

= ≠−∏ ∏∏= ( )1

12 1( )

CF a a all i lml m im in

lml mm lil i≠ ≠≠−

≠ ≠∏ ∏∏ ∏∏ ∏= ( ) = ( )⎛⎝⎜

⎞⎠⎟,

( )1

2 12

12(( )n−1

CF CF CFi lln

ll i= = ≠∏ ∏1

CFa

a

ai

lml mmn

lml m im in

lml mm=( )

( )=

( )≠−

≠≠−

≠∏∏

∏∏

∏∏1

2 1

12 1

12( )

,( )

(( )

( )

/

n

lml mmn

imm

n n

i

mmn

a

a

p

p

≠− =

=

∏∏∏

( )⎛

⎝⎜⎜

⎠⎟⎟

=

( )

1

12 2 1

12

1

1 (( ( )).

2 1

12

n

n

−⎛

⎜⎜⎜⎜

⎟⎟⎟⎟

Connectivity M m M

Density Mm

n n

Cent

i ij ij

ijji

( ) ,

( )( )

,

= =

=−

∑∑

e 1τ

1

rralization Mn

n

M

nDensity M

Heterogenei

( )max( )

( ) ,=− −

−⎛⎝⎜

⎞⎠⎟2 1

1

tty Mn MM

M

TopOverlap MMM M

iji j i j

( )( )

( ),

( )min

= −

=+

1 1

1 1

e e e e

τ

τ

τ τ

21

{{ , },

( ) ,

e 1 e 1 e e

e e

e e

i j i j

ii i

i M i

M M M

ClusterCoef MMMM

MB M

τ τ τ

τ

τ

+ −

=

1

(21)

k Connectivity A I ai i ijj i

= − =≠∑( ) .

Page 14 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

The line density [13] equals the mean adjacency, i.e

For notational convenience, we sometimes omit the refer-ence to the adjacency matrix and simply write Density todenote the fundamental network concepts.

The normalized connectivity centralization (also knownas degree centralization) [14] is given by

Our definition of the network heterogeneity equals thecoefficient of variation of the connectivity distribution,i.e.

Note that Heterogeneity(b * M) = Heterogeneity(M) for ascalar b ≠ 0.

The clustering coefficient of node i is a density measureof local connections, or 'cliquishness' [19,20]. Specifi-cally,

The topological overlap between nodes i and j reflectstheir relative interconnectedness. It is defined by

where lij = ∑u≠i,jaiuauj.

Network concepts in exactly factorizable networksIn the following, we will present explicit formulas for thefundamental network concepts in Definition 5 when theadjacency matrix A is exactly factorizable, i.e. if aij =CFiCFj. We define the CF-based adjacency matrix as fol-lows

ACF := CF CFτ - diag(CF2) + I, (27)

where diag(CF2) denotes the diagonal matrix with diago-

nal elements , i = 1 ...n. Then one can easily show that

for exactly factorizable networks

Using our definition of network concept functions inequations (21), one can easily derive the following formu-las for NetworkConcept(ACF - I) in terms of the quantities

Sp(CF) = ∑i .

Approximate CF-based network concepts in general networksWhen ACF - I is used as input of a network concept func-tion, it gives rise to a CF-based network concept asdetailed in the following

Definition 6 (CF-based Network Concepts) Assume thatthe conformity vector CF can be defined for a general adjacencymatrix A. Then the CF-based network concepts are defined byevaluating the network concept functions on ACF - I = CF CFτ -diag(CF2), i.e.

NetworkConceptCF := NetworkConcept(ACF - I).

By definition, fundamental network concepts are equal totheir CF-based analogs if A is exactly factorizable.

In the following, we define approximate CF-based analogsof the fundamental network concepts. The theoreticaladvantage of these approximate CF-based concepts is thatthey satisfy simple relationships. Define the approximateCF-based adjacency matrix as follows

ACF,app = CF CFτ. (30)Note that only the diagonal elements differ betweenACF,app and ACF . We define the approximate CF-based net-

Density A Ia

n n

S

n n

mean

n

ijj ii( )

( )

( )

( )

( ).− =

−=

−=

−≠∑∑1 1 1

1 k k

(22)

Centralization A In

n nDensity

n

n n( )

max( )

( )(− =

− −−⎛

⎝⎜⎞⎠⎟

=− −2 1 2 1

k

))(max( ) ( )).k k− mean

(23)

Heterogeneity A Iiance

mean

nS

S( )

( )

( )

( )

( ).− = = −

var k

k

k

k2

12

1

(24)

ClusterCoef ClusterCoef A In a a a

ai i

i

i

il lm mim i ll i

il

= − = = ≠≠ ∑∑( ) ,

πll i ill i

a≠ ≠∑ ∑( ) −{ }2 2.

(25)

TopOverlap TopOverlap A Il a

k k aij ijij ij

i j ij= − =

++ −

( )min{ , }

,1

(26)

CFi2

A A

NetworkConcept A I NetworkConcept A ICF

CF

=− = −

,

( ) ( ).

(28)

CFip

Connectivity A I CF S CF

Density A IS

i CF i i

CF

( ) ( ) ,

( )( )

− = −

− =

12

12

CF

CF −−−

− =−

S

n n

Centralization A In

n

ConnectivityCF

2

1

2

( )

( ),

( )max( (

CF

AA I

nDensity A I

Heterogeneity A In S

CFCF

CF

−−

− −⎛⎝⎜

⎞⎠⎟

− =

))( ) ,

( )(

1

2(( ) ( ) ( ) ( ) ( ))

( ( ) ( )),

CF CF CF CF CF

CF CF

S S S S

S S

Clus

12

3 1 4

12

22

21

− +−

tterCoef A IS CF S CF

S CFi CF

i i

i

( )( ( ) ) ( ( ) )

( ( ) )− =

− − −−

22 2

44

12

CF CF

CF −− −

− =− −

( ( ) ).

( )( ( )

S CF

TopOverlap A ICF CF S CF CF

i

ij CFi j i

22 2

22

CF

CF jj i j

i i j j i j

CF CF

CF S CF CF S CF CF CF

2

1 1 1

)

min( ( ( ) ), ( ( ) ))

+− − + −CF CF

(29)

Page 15 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

work concepts by using ACF,app as input of the networkconcept functions as detailed in the following

Definition 7 (Approximate CF-based Network Con-cepts) The approximate CF-based network concepts of a net-work A with conformity CF are defined by evaluating thenetwork functions (equations (21)) on ACF,app = CF CFτ, i.e.

NetworkConceptCF,app := NetworkConcept(ACF,app).

In approximately factorizable networks, fundamental network concepts are approximately equal to their approximate CF-based analogs

Here we will provide a heuristic derivation of Observation2. Since the components of CF are positive, one can easily

show that S4(CF) ≤ S2(CF)2. For many large, exactly factor-

izable networks, the ratio S4(CF)/S2(CF)2 is close to 0.

Since S4(CF)/S2(CF)2 =

, this implies that ACF -

I ≈ ACF,app. Since the network concept functions are contin-

uous functions, this implies NetworkConcept(ACF - I) ≈ Net-

workConcept(ACF,app). These derivations are summarized in

the following

Observation 8 (Approximate Formulas for CF-basedConcepts) If S4(CF)/S2(CF)2 ≈ 0, then

NetworkConcept(ACF - I) ≈ NetworkConcept(ACF,app).(31)

In particular, for exactly factorizable networks (i.e. A - I =ACF - I), this means that the fundamental network con-cepts can be approximated by their approximate CF-basedanalogs.

In our real data applications, we show empirically thatequation (31) holds even in networks that satisfy theassumptions of Observation 8 only approximately.

In the appendix (equation (43)), we define a measure ofnetwork factorizability as follows

Thus, in approximately factorizable networks (i.e. F(A)close to 1), A - I can be approximated by ACF - I. For a con-tinuous network functions, this implies

NetworkConcept(A -I) ≈ NetworkConcept(ACF - I),

i.e. the fundamental network concepts are approximatelyequal to their CF-based analogs in approximately factoriz-able networks. Observation 8 states that

NetworkConcept(ACF -I) ≈ NetworkConcept(ACF,app).

Combining the last two equations leads to NetworkCon-cept(A - I) ≈ NetworkConcept(ACF,app). These derivations aresummarized as follows.

In approximately factorizable networks, the fundamentalnetwork concepts are approximately equal to theirapproximate CF-based analogs, i.e.

FundamentalNetworkConcept ≈ NetworkConceptCF,app.

Construction of gene co-expression networksGene co-expression networks are constructed from micro-array data that measures the transcriptional response ofcells to changing conditions. We consider the case of ngenes with gene expression profiles across m microarraysamples. Thus, the gene expression profiles are given byan n × m matrix

X = [xij] = (x1 x2 � xn)τ, i = 1, ..., n; j = 1, ..., m,(33)

where the i-th row is the transcriptional responses of

the i-th gene.

Recently, several groups have suggested thresholding thepairwise Pearson correlation coefficient cor(xi, xj) in orderto arrive at gene co-expression networks, which are some-times referred to as 'relevance' networks [11,32]. In thesenetworks, a node corresponds to the gene expression pro-file of a given gene. The corresponding adjacency matrix isdetermined from a measure of co-expression between thegenes. In the examples below, we will use the absolutevalue of the Pearson correlation coefficient between thegene expression profiles to measure co-expression.

To transform the co-expression measure into an adja-cency, one can make use of an adjacency function. Thechoice of the adjacency function determines whether theresulting network will be weighted (soft-thresholding) orunweighted (hard-thresholding). The adjacency functionis a monotonically increasing function that maps theinterval [0, 1] into [0, 1]. A widely used adjacency func-tion is the signum function which implements 'hard'thresholding involving the threshold parameter τ. Specif-ically,

aij = Signum(|cor(xi, xj)|, τ) = Ind(|cor(xi, xj)| ≥ τ),(34)

( ) , ,A I A ACF CF app F CF app F− −

2 2

F AA I A I

A I

CF F

F

( )( ) ( )

,= −− − −

−1

2

2(32)

xiτ

Page 16 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

where the indicator function Ind(·) takes on the value 1 ifthe condition is satisfied and 0 otherwise. Hard threshold-ing using the signum function leads to intuitive networkconcepts (e.g., the node connectivity equals the numberof direct neighbors), but it may lead to a loss of informa-tion: if τ has been set to 0.8, there will be no connectionbetween two nodes if their similarity equals 0.79.

To avoid the disadvantages of hard thresholding, we pro-posed a 'soft' thresholding approach that raises the abso-lute value of the correlation to the power β ≥ 1 [21], i.e.

aij = Power(|cor(xi, xj)|, β) = |cor(xi, xj)|β. (35)In our yeast cell cycle gene co-expression network analy-sis, we followed the analysis steps described in [21].Briefly, we used the 2001 most varying and connectedgenes. Next, we used the power adjacency function with β= 7 (equation (35)) to construct a weighted gene co-expression network and the signum adjacency functionwith τ = 0.65 (equation (34)) to construct an unweightednetwork.

Using our R software tutorial, the reader can easily verifythat our conclusions are highly robust with respect to a)different ways of constructing co-expression networks andb) different ways of constructing modules.

Availability and requirementsAn R implementation and the data can be obtained fromthe internet: http://www.genetics.ucla.edu/labs/horvath/ModuleConformity/ModuleNetworks

Appendix: node conformity and factorizability of a general networkEquation (20) provides an explicit formula for the con-formity of a weighted, exactly factorizable network. For ageneral, non-factorizable network, we describe here howto compute the conformity by optimizing an objectivefunction. In the following, we assume a general n × n adja-cency matrix A where n > 2. Let v = (v1,v2, ...,vn)τ be a vectorof length n. We could define the conformity as a vector v*that minimizes the following objective function f(v) = ∑i∑j≠i(aij - vivj)2. But instead, we find the following equiva-lent formulation as a maximization problem more usefulsince it naturally gives rise to a measure of factorizability.

Specifically, we define the objective function

It is clear that FA(CF) = 1 for an exactly factorizable net-work with aij = CFiCFj for i ≠ j. Note that FA(v) ≤ 1 and FA(0

) = 0. One can easily show that if v* maximizes FA(v), then-v* also maximizes FA(v). Further, all components of v*must have the same sign since otherwise, flipping the signof the negative components leads to a higher value ofFA(v). This leads us to the following

Definition 9 (Conformity, Factorizability) We define theconformity CF as the vector with non-negative entries thatmaximizes FA(v). If there is more than one such maximizer,

then a maximizer closest to k/ is chosen. Further, we

define the factorizability F (A) as the corresponding maximumvalue FA(CF).

Our definition of the conformity is a generalization ofDefinition 7 since F(A) = 1 if, and only if, A is exactly fac-torizable with aij = CFiCFj for i ≠ j. The advantages of thisDefinition are briefly described in the discussion section.

In general, FA(v) may have multiple maximizers as can bedemonstrated with the block diagonal simulated example(equation (17)) by choosing n1 = n2 and b1 = b2. By form-ing the first derivative of the factorizability function FA(v)in terms of vi, one can show that a local maximum satisfies

i.e.

Proposition 10 (Expressions for the Factorizability) Ifthe conformity vector CF of the adjacency matrix A exists, thenthe factorizability F(A) is given by

Proof Since ,

it will be sufficient to show that

. From the

definition of the Frobenius norm of a matrix B, one can

show that where the trace of a matrix is

the sum of its diagonal elements. Thus,

. Using equation (38), we find that trace((A - I)(ACF - I)) =

Fa v v

a

A I diagA

ij i jj ii

ijj ii

F( ) :( )

( )

( )v

v vv= −

−= −

− + −≠

∑∑∑∑

1 1

2

2

2 τ 22

2A I F−.

(36)

S1( )k

a CF CF CFij jj i

i jj i≠ ≠

∑ ∑= 2, (37)

( ( )) || || .A I diag− + =CF CF CF CF222 (38)

F AA I

A I

S S

A I

CF F

F F

( )( ) ( )

.=−

−= −

2

22

242

CF CF (39)

F AA I diag

A I

F

F

( )( ) ( )

= −− + −

−1

2 2

2

CF CF CF τ

( ) ( )A I A I A I A ICF F F CF F− − − = − − −2 2 2

B trace B BF2 = ( )τ

( ) ( ) (( )( ))A I A I A I A I trace A I A ICF F F CF F CF− − − = − + − − × − −2 2 2 2

Page 17 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

tr((A - I)CF CFτ) - tr((A - I)diag(CF2)) = CFτ(A - I)CF =

. Thus,

The

remainder of the proof is straightforward.

Equation (38) suggests that the conformity is an eigenvec-tor of the 'hat' adjacency matrix

:= A - I + diag(CF2).

An algorithm for computing the conformity is based onthe following

Lemma 11 If A denotes a symmetric real matrix with eigenval-ues d1, ..., dn sorted according to their absolute values, i.e., |d1|

≥ |d2| ≥ ... ≥ |dn|, and the corresponding orthonormal eigenvec-

tors are denoted by u1, ..., un, then is minimized at

v* = u1.

The proof can be found in Horn and Johnson (1991).

Denote by CF(i - 1) an estimate of the conformity CF.Next define

(i - 1) = A - I + diag(CF(i - 1)2). (40)Define a new estimate of the conformity by

where (i - 1) and (i - 1) denote the largest eigen-

value and corresponding unit length eigenvector of (i -

1). One can easily show that all the components of (i

- 1) must have the same sign and we assume without lossof generality non-negative components. Lemma 11 with A

= (i - 1) implies that

Considering the diagonal elements, one can easily showthat

Thus, we arrive at the following

FA(CF(i)) ≥ FA(CF(i - 1)), (42)

which suggests a monotonic algorithm for computing CF.

Equation 16 suggests to choose k/ as a starting

value of the algorithm. These comments give rise to thefollowing

Definition 12 (Algorithmic Definition of Conformity,Factorizability) For a general network A, set CF(1) = k/

and apply the monotonic iterative algorithm described

by equations (40) and (41). If the limit CF(∞) exists, we

define it as the conformity CF = CF(∞). Further, we define thenetwork factorizability as

Note that the conformity satisfies equation (38) by defini-tion of convergence. One can easily show that 0 ≤ F(A) ≤1. Further, one can easily show that F(A) = 1 if, and onlyif, A is exactly factorizable with aij = CFiCFj, i.e. A - I = ACF- I.

The algorithm described by equations (40) and (41) ismonotonic (equation (42)). It is a special case of an algo-rithm described in [46] for fitting a least squares factoranalysis model with one factor. Theoretical properties ofthe algorithm are discussed in [46] and [48].

We find that for most real networks, the conformity ishighly related to the first eigenvector of the adjacencymatrix, i.e. the conformity vector CF is roughly equal to

u1 where d1 is the largest eigenvalue of A and u1 is the

corresponding unit length eigenvector with positive com-ponents.

A ICF F− 2

( ) ( )A I A I A I A ICF F F CF F− − − = − − −2 2 2

A

AF

− vvτ 2

d1

A

CF u( ) ( ) ( ),i d i i= − −1 11 1 (41)

d1 u1

A

u1

A

A I diag i i i

A I diag i

F− + − − − −

≥ − + − −

( ( ) ) ( ) ( )

( ( ) )

CF CF CF

CF CF

1 1 1

1

2 2

2

τ

(( ) ( ) .i iF

CF τ 2

A I diag i i i

A I diag i i

F− + − −

≥ − + −

( ( ) ) ( ) ( )

( ( ) ) ( ) (

CF CF CF

CF CF CF

1 2 2

2

τ

iiF

) .τ 2

S1( )k

S1( )k

F AA I A I

A I

CF F

F

( )( ) ( )

.= −− − −

−1

2

2(43)

d1

Page 18 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

Additional material

AcknowledgementsWe would like to acknowledge the grant support from Program Project Grant 1U19AI063603-01 and NINDS/NIMH 1U24NS043562-01. We are grateful for discussions with Andy Yip, Lora Bagryanova, Dan Geschwind, Johanna Hardin, Ken Lange, Peter Langfelder, Ai Li, Jake Lusis, Paul Mischel, Stan Nelson, Nan Zhang and Wei Zhao.

References1. Kim J, Bates DG, Postlethwaite I, Heslop-Harrison P, Cho KH: Least-

Squares Methods for Identifying Biochemical RegulatoryNetworks from Noisy Measurements. BMC Bioinformatics 2007,8:8.

2. Klamt S, Saez-Rodriguez J, Gilles E: Structural and FunctionalAnalysis of Cellular Networks with Cellnetanalyzer. BMC Sys-tems Biology 2007, 1(2):.

3. Schaub M, Henzinger T, Fisher J: Qualitative Networks: A Sym-bolic Approach to Analyze Biological Signaling Networks.BMC Systems Biology 2007, 1(4):.

4. Gille C, Hoffmann S, Holzhuetter H: METANNOGEN: CompilingFeatures of Biochemical Reactions Needed for the Recon-struction of Metabolic Networks. BMC Systems Biology 2007,1(5):.

5. Sotiropoulos V, Kaznessis Y: Synthetic Tetracycline-InducibleRegulatory Networks: Computer-Aided Design of DynamicPhenotypes. BMC Systems Biology 2007, 1(7):.

6. Albert R, Jeong H, Barabasi AL: Error and attack tolerance ofcomplex networks. Nature 2000, 406(6794):378-382.

7. Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and central-ity in protein networks. Nature 2001, 411:41.

8. Han JD, Bertin N, Hao T, Goldberg D, Berriz G, Zhang L, Dupuy D,Walhout A, Cusick M, Roth F, Vidal M: Evidence for dynamicallyorganized modularity in the yeast protein-protein interac-tion network. Nature 2004, 430(6995):88-93.

9. Carlson M, Zhang B, Fang Z, Mischel P, Horvath S, Nelson SF: Geneconnectivity, function, and sequence conservation: Predic-tions from modular yeast co-expression networks. BMCGenomics 2006, 7(40):.

10. Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M,Zhao W, Shu Q, Lee Y, Scheck A, Liau L, Wu H, Geschwind D, FebboP, Kornblum H, Cloughesy T, Nelson S, Mischel P: Analysis of onco-genic signaling networks in Glioblastoma identifies ASPM asa novel molecular target. Proc Natl Acad Sci USA 2006,103(46):17402-17407.

11. Carter S, Brechbuler C, MGriffin , Bond A: Gene co-expressionnetwork topology provides a framework for molecular char-acterization of cellular state. Bioinformatics 2004,20(14):2242-2250.

12. Oldham M, Horvath S, Geschwind D: Conservation and evolutionof gene co-expression networks in human and chimpanzeebrain. Proc Natl Acad Sci USA 2006, 103(47):17973-8.

13. Snijders T: The degree variance: An index of graph heteroge-neity. Social Networks 1981, 3:163-174.

14. Freeman L: Centrality in social networks: Conceptual clarifica-tion. Social Networks 1978, 1:215-239.

15. Ma HW, Zeng AP: The connectivity structure, giant strongcomponent and centrality of metabolic networks. Bioinformat-ics 2003, 19(11):1423-1430.

16. Barabasi AL, Oltvai ZN: Network Biology: Understanding thecells's functional organization. Nature Reviews Genetics 2004,5:101-113.

17. Pagel M, Meade A, Scott D: Assembly rules for protein networksderived from phylogenetic-statistical analysis of wholegenomes. BMC Evolutionary Biology 2007, 7(Suppl 1):S16.

18. Watts DJ: A simple model of global cascades on random net-works. PNAS 2002, 99(9):5766-5771.

19. Watts DJ, Strogatz SH: Collective dynamics of 'small-world'networks. Nature 1998, 393(6684):440-2.

20. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierar-chical organization of modularity in metabolic networks. Sci-ence 2002, 297(5586):1551-5.

21. Zhang B, Horvath S: A general framework for weighted geneco-expression network analysis. Statistical Applications in Geneticsand Molecular Biology 2005, 4:17.

22. Gutteridge A, Kanehisa M, Goto S: Regulation of metabolic net-works by small molecule metabolites. BMC Bioinformatics 2007,8:88.

23. Zhao J, Yu H, Luo JH, Cao ZW, Li YX: Hierarchical modularity ofnested bow-ties in metabolic networks. BMC Bioinformatics2006, 7:386.

24. Spirin V, Gelfand MS, Mironov AA, Mirny LA: A metabolic networkin the evolutionary context: Multiscale structure and modu-larity. PNAS 2006, 103(23):8774-8779.

25. Yip AM, Horvath S: Gene network interconnectedness and thegeneralized topological overlap measure. BMC Bioinformatics2007, 8:22.

26. Li A, Horvath S: Network neighborhood analysis with themulti-node topological overlap measure. Bioinformatics 2007,23(2):222-231.

27. Kaufman L, Rousseeuw PJ: Finding Groups in Data. An Introduction toCluster Analysis New York: John Wiley & Sons, Inc; 1990.

28. Ghazalpour A, Doss S, Zhang B, Plaisier C, Wang S, Schadt E, ThomasA, Drake T, Lusis A, Horvath S: Integrating genetics and net-work analysis to characterize genes related to mouseweight. PloS Genetics 2006, 2(8):.

29. Ye Y, Godzik A: Comparative Analysis of Protein DomainOrganization. Genome Biology 2004, 14(3):343-353.

30. Gargalovic P, Imura M, Zhang B, Gharavi N, Clark M, Pagnon J, YangW, He A, Truong A, Patel S, Nelson S, Horvath S, Berliner J, Kirch-gessner T, Lusis A: Identification of inflammatory gene mod-ules based on variations of human endothelial cell responsesto oxidized lipids. PNAS 2006, 103(34):12741-6.

31. Breitkreutz B, Stark C, Tyers M: The GRID: the general reposi-tory for interaction datasets. Genome Biol 2003, 4:R23.

Additional file 1Complete list of network concepts in the modules. An extended version of Table 1.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-24-S1.xls]

Additional file 3Functional enrichment analysis (gene ontology) of the Drosophila PPI modules (DAVID software).Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-24-S3.xls]

Additional file 4Functional enrichment analysis (gene ontology) of the yeast PPI modules (DAVID software).Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-24-S4.xls]

Additional file 5Functional enrichment analysis (gene ontology) of the yeast gene co-expression modules (DAVID software).Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-24-S5.xls]

Additional file 2Network concepts and module size. Descriptions of how module concepts are related to module sizes in the Drosophila PPI, yeast PPI networks, and yeast gene co-expression networks.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-24-S2.pdf]

Page 19 of 20(page number not for citation purposes)

BMC Systems Biology 2007, 1:24 http://www.biomedcentral.com/1752-0509/1/24

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

32. Butte AJ, Kohane IS: Mutual information relevance networks:Functional genomic clustering using pairwise entropy meas-urements. PSB 2000, 5:415-426.

33. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysisand display of genome-wide expression patterns. PNAS 1998,95(25):14863-14868.

34. Ge H, Liu Z, Church GM, Vidal M: Correlation between tran-scriptome and interactome mapping data from Saccharomy-ces cerevisiae. Nat Genet 2003, 33:15-16.

35. Spellman P, Sherlock G, Zhang M, Iyer V, Anders K: Comprehen-sive Identification of Cell Cycle-regulated Genes of the YeastSaccharomyces Cerevisiae by Microarray Hybridization. MolBiol Cell 1998, 9(12):3273-3297.

36. Dennis G, Sherman B, Hosack D, Yang J, Gao W, Lane HC, LempickiR: DAVID: Database for Annotation, Visualization, and Inte-grated Discovery. Genome Biology 2003, 4(5):P3.

37. Erdös P, Rényi A: On the evolution of random graphs. Publ MathInst Hung Acad Sci 1960, 5:17-60.

38. Barrat A, Barthelemy M, Pastor-Satorras R, Vespignani A: The archi-tecture of complex weighted networks. Proc Natl Acad Sci USA2004, 101:3747-52.

39. Deeds EJ, Ashenberg O, Shakhnovich EI: From The Cover: A sim-ple physical model for scaling in protein-protein interactionnetworks. PNAS 2006, 103(2):311-316.

40. Servedio V, Caldarelli G, Butta P: Vertex intrinsic fitness: How toproduce arbitrary scale-free networks. Physical Review E (Statis-tical, Nonlinear, and Soft Matter Physics) 2004, 70(5 Pt 2):056126.

41. Masuda N, Miwa H, Konno N: Geographical threshold graphswith small-world and scale-free properties. Physical Review E(Statistical, Nonlinear, and Soft Matter Physics) 2005, 71(3 Pt2A):036108.

42. Yeung MS, Tegner J, Collins J: Reverse engineering gene net-works using singular value decomposition and robust regres-sion. PNAS 2002, 99(9):6163-6168.

43. Alter O, Golub G: Reconstructing the pathways of a cellularsystem from genome-scale signals by using matrix and ten-sor computations. PNAS 2005, 102(49):17559-17564.

44. Price N, Reed J, Papin J, Famili I, Palsson B: Analysis of metaboliccapabilities using singular value Decomposition of ExtremePathway Matrices. Biophys J 2003, 84(2):794-804.

45. Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Zhang J, Sun S, Ling L,Zhang N, Li G, Chen R: Topological structure analysis of theprotein-protein interaction network in budding yeast. NuclAcids Res 2003, 31(9):2443-2450.

46. de Leeuw J, Michailidis G: Block relaxation algorithms in statis-tics. Journal of Computational and Graphical Statistics 2000, 9:26-31.

47. Barabasi AL, Albert R: Emergence of scaling in random net-works. Science 1999, 286(5439):509-12.

48. Gifi A: Nonlinear multivariate analysis Wiley, Chichester, England; 1990.

Page 20 of 20(page number not for citation purposes)