Upload
prosper-berry
View
217
Download
1
Embed Size (px)
Citation preview
“An Extension of Weighted Gene Co-Expression Network
Analysis to Include Signed Interactions”
Michael MasonDepartment of Statistics, UCLA
Contents• Here we consider the application of a
generalized WGCNA that keeps track of the sign of the co-expression information.
• standard unsigned networks are based on
• Here we focus on signed networks based on
2
1),( ji
ij
xxcors
),( jiij xxcors
Step 1: Define a Gene Co-expression Similarity
Step 2: Define a Family of Adjacency Functions
Step3: Determine the AF Parameters
Step 4: Define a Measure of Node Dissimilarity
Step 5: Identify Network Modules (Clustering)
Step 5: Find Biologically Interesting Modules
Step 6: Find Key Genes in Interesting Modules
General Framework Of Network Construction
Adjacency Functions: Hard and Soft Thresholding
• A network can be represented by an adjacency matrix, A=[aij], that encodes how a pair of nodes is connected. – A is a symmetric matrix with entries in [0,1] – For unweighted networks, hard thresholding is applied to
S to yield A. If sij > τ, aij = 1 else aij = 0.– For weighted networks, soft thresholding is applied with
0 < aij < 1, and aij = sijβ.
– Both types of adjacency functions can be applied to unsigned and signed co-expression similarity measures. In this analysis we employ soft thresholding.
Defining a co-expression similarity measures that keeps track of the sign
Unsigned networks are based on the absolute value of the correlation.
Signed networks preserve sign information from the correlation
),( jiij xxcors 2
1),( ji
ij
xxcors
Cor(xi,xj) Cor(xi,xj)
Generalized Connectivity
• A gene’s connectivity (also known as degree) equals the row sum of the adjacency matrix. Intuitively for unweighted networks this is the number of direct neighbors a gene has.
• For our signed networks, the connectivity of the i-th gene measures the extent of positive correlations with the other genes in the network.
i ijjk a
For high powers of beta, signed weighted networks exhibit approximate scale free
topology
• Scale Free Topology refers to the frequency distribution of the connectivity k, P(k)~k-λ
• p(k)=proportion of nodes that have connectivity k
Frequency Distribution of Connectivity
Connectivity k
Fre
qu
en
cy
0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035
01
00
20
03
00
40
05
00
60
07
00
How to check Scale Free Topology?• Idea: Log transformation p(k) and k and look at scatter plots
• Linear model fitting R2 index can be used to quantify scale free topology•In our cancer and mouse embryonic stem cell applications, we find R2 = 0.97 and 0.94 for β= 12 and 22, respectively.
The scale free topology criterion for choosing the parameter values of an
adjacency function.
A) CONSIDER ONLY THOSE PARAMETER VALUES THAT RESULT IN APPROXIMATE SCALE FREE TOPOLOGY
B) SELECT THE PARAMETERS THAT RESULT IN THE HIGHEST MEAN NUMBER OF CONNECTIONS
• Criterion A is motivated by the finding that most metabolic networks (including gene co-expression networks, protein-protein interaction networks and cellular networks) have been found to exhibit a scale free topology
• Criterion B leads to high power for detecting modules (clusters of genes) and hub genes.
Trade-off between criterion A and criterion B when varying the power
β in signed cancer network
Trade-off between criterion A and criterion B when varying the power β in
signed mouse embryonic stem cell network
How to measure distance in a network?
• Biological Answer: look at shared neighbors with the topological overlap matrix.– Intuition: if 2 people share the same friends they are
close in a social network– In an unsigned network negatively correlated genes
are treated as friends while in the signed network they are treated as enemies.
– Two genes have high topological overlap if they share (positively correlated) friends
Topological Overlap leads to a network distance measure (Ravasz
et al 2002)
• Generalized in Zhang and Horvath (2005) to the case of weighted networks.
min( , ) 1
iu uj iju
iji j ij
a a a
TOMk k a
1ij ijDistTOM TOM
SIMPLE TOM example
• In this simple example TOM1,2 reduces to a.
• If cor(x1, xu) and cor(xu, x2) = -1, then in an unsigned network TOM1,2 = 1, while in a signed network TOM1,2 = 0.
Application: comparing Signed to Unsigned Networks using
brain cancer data described inHorvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W, Shu, Q, Lee Y, Scheck AC, Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum HI, Cloughesy TF, Nelson SF, Mischel PS (2006) "Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a
Novel Molecular Target", PNAS | November 14, 2006 | vol. 103 | no. 46 | 17402-17407
Preservation of Modules between Unsigned and Signed Methods in Brain
CancerUnsigned Network Signed Network
Message: no difference between signed and unsigned analysis
Analysis of Networks in Mouse ESC data described in Ivanova et al
Preservation of Large Modules between Unsigned and Signed Methods in Mouse
embryonic stem cells.Signed network exhibits 4 additional modules
Unsigned Network Signed Network
Gene significanceDefinition• Differential gene expression test between
control versus knockout– Control: Mouse microrray samples treated with
empty virus versus– Knockout: microarray samples treated with a Oct4
RNAi (Oct4 is of major biological importance in ES pluripotency)
• Individual gene significance = t-test statistic– Note that the t-test keep tracks of the sign
• Goal: To relate gene significance to intramodular connectivity
Absolute Mean Significance Increases Once New Modules are Found via
Signed WGCNAUnsigned Signed
Message: signed networks allowed us to split large modulesinto smaller, biologically more significant modules
Behind the Scenes: Brown Module is Hidden within Turquoise
Unsigned Signed
Signed WGCNA shows influence of known pluripotency transcription
factors
• Separated into their own module, both the connectivity and relative gene significance of the TF’s increase.
Brown Module• Shows Oct4 is a highly connected hub and it is
highly significant in this module.• This module could not have been detected in an
unsigned network.• Note that the signed intramodular connectivity is
a biologically important screening variable.• Biological importance of module is verified by 2
fold enrichment of Oct4 and Nanog binding.
Conclusion
• Signed weighted gene co-expression network analysis is a robust extension of unsigned WGCNA, preserving large modules while finding new and biologically interesting modules, thus facilitating a system’s level understanding of gene and/or protein interactions.
Acknowledgement
Biostatistics/Bioinformatics• Steve Horvath• Qing Zhou• Peter Langfelder