“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA

“An Extension of Weighted Gene Co-Expression Network

Analysis to Include Signed Interactions”

Michael MasonDepartment of Statistics, UCLA

Contents• Here we consider the application of a

generalized WGCNA that keeps track of the sign of the co-expression information.

• standard unsigned networks are based on

• Here we focus on signed networks based on

2

1),( ji

ij

xxcors

),( jiij xxcors

Step 1: Define a Gene Co-expression Similarity

Step 2: Define a Family of Adjacency Functions

Step3: Determine the AF Parameters

Step 4: Define a Measure of Node Dissimilarity

Step 5: Identify Network Modules (Clustering)

Step 5: Find Biologically Interesting Modules

Step 6: Find Key Genes in Interesting Modules

General Framework Of Network Construction

Adjacency Functions: Hard and Soft Thresholding

• A network can be represented by an adjacency matrix, A=[aij], that encodes how a pair of nodes is connected. – A is a symmetric matrix with entries in [0,1] – For unweighted networks, hard thresholding is applied to

S to yield A. If sij > τ, aij = 1 else aij = 0.– For weighted networks, soft thresholding is applied with

0 < aij < 1, and aij = sijβ.

– Both types of adjacency functions can be applied to unsigned and signed co-expression similarity measures. In this analysis we employ soft thresholding.

Defining a co-expression similarity measures that keeps track of the sign

Unsigned networks are based on the absolute value of the correlation.

Signed networks preserve sign information from the correlation

),( jiij xxcors 2

1),( ji

ij

xxcors

Cor(xi,xj) Cor(xi,xj)

Generalized Connectivity

• A gene’s connectivity (also known as degree) equals the row sum of the adjacency matrix. Intuitively for unweighted networks this is the number of direct neighbors a gene has.

• For our signed networks, the connectivity of the i-th gene measures the extent of positive correlations with the other genes in the network.

i ijjk a

For high powers of beta, signed weighted networks exhibit approximate scale free

topology

• Scale Free Topology refers to the frequency distribution of the connectivity k, P(k)~k-λ

• p(k)=proportion of nodes that have connectivity k

Frequency Distribution of Connectivity

Connectivity k

Fre

qu

en

cy

0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035

01

00

20

03

00

40

05

00

60

07

00

How to check Scale Free Topology?• Idea: Log transformation p(k) and k and look at scatter plots

• Linear model fitting R2 index can be used to quantify scale free topology•In our cancer and mouse embryonic stem cell applications, we find R2 = 0.97 and 0.94 for β= 12 and 22, respectively.

The scale free topology criterion for choosing the parameter values of an

adjacency function.

A) CONSIDER ONLY THOSE PARAMETER VALUES THAT RESULT IN APPROXIMATE SCALE FREE TOPOLOGY

B) SELECT THE PARAMETERS THAT RESULT IN THE HIGHEST MEAN NUMBER OF CONNECTIONS

• Criterion A is motivated by the finding that most metabolic networks (including gene co-expression networks, protein-protein interaction networks and cellular networks) have been found to exhibit a scale free topology

• Criterion B leads to high power for detecting modules (clusters of genes) and hub genes.

Trade-off between criterion A and criterion B when varying the power

β in signed cancer network

Trade-off between criterion A and criterion B when varying the power β in

signed mouse embryonic stem cell network

How to measure distance in a network?

• Biological Answer: look at shared neighbors with the topological overlap matrix.– Intuition: if 2 people share the same friends they are

close in a social network– In an unsigned network negatively correlated genes

are treated as friends while in the signed network they are treated as enemies.

– Two genes have high topological overlap if they share (positively correlated) friends

Topological Overlap leads to a network distance measure (Ravasz

et al 2002)

• Generalized in Zhang and Horvath (2005) to the case of weighted networks.

min( , ) 1

iu uj iju

iji j ij

a a a

TOMk k a

1ij ijDistTOM TOM

SIMPLE TOM example

• In this simple example TOM1,2 reduces to a.

• If cor(x1, xu) and cor(xu, x2) = -1, then in an unsigned network TOM1,2 = 1, while in a signed network TOM1,2 = 0.

Application: comparing Signed to Unsigned Networks using

brain cancer data described inHorvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W, Shu, Q, Lee Y, Scheck AC, Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum HI, Cloughesy TF, Nelson SF, Mischel PS (2006) "Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a

Novel Molecular Target", PNAS | November 14, 2006 | vol. 103 | no. 46 | 17402-17407

Preservation of Modules between Unsigned and Signed Methods in Brain

CancerUnsigned Network Signed Network

Message: no difference between signed and unsigned analysis

Analysis of Networks in Mouse ESC data described in Ivanova et al

Preservation of Large Modules between Unsigned and Signed Methods in Mouse

embryonic stem cells.Signed network exhibits 4 additional modules

Unsigned Network Signed Network

Gene significanceDefinition• Differential gene expression test between

control versus knockout– Control: Mouse microrray samples treated with

empty virus versus– Knockout: microarray samples treated with a Oct4

RNAi (Oct4 is of major biological importance in ES pluripotency)

• Individual gene significance = t-test statistic– Note that the t-test keep tracks of the sign

• Goal: To relate gene significance to intramodular connectivity

Absolute Mean Significance Increases Once New Modules are Found via

Signed WGCNAUnsigned Signed

Message: signed networks allowed us to split large modulesinto smaller, biologically more significant modules

Behind the Scenes: Brown Module is Hidden within Turquoise

Unsigned Signed

Signed WGCNA shows influence of known pluripotency transcription

factors

• Separated into their own module, both the connectivity and relative gene significance of the TF’s increase.

Brown Module• Shows Oct4 is a highly connected hub and it is

highly significant in this module.• This module could not have been detected in an

unsigned network.• Note that the signed intramodular connectivity is

a biologically important screening variable.• Biological importance of module is verified by 2

fold enrichment of Oct4 and Nanog binding.

Conclusion

• Signed weighted gene co-expression network analysis is a robust extension of unsigned WGCNA, preserving large modules while finding new and biologically interesting modules, thus facilitating a system’s level understanding of gene and/or protein interactions.

Acknowledgement

Biostatistics/Bioinformatics• Steve Horvath• Qing Zhou• Peter Langfelder

Documents

“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA