Upload
kevlyn
View
34
Download
0
Embed Size (px)
DESCRIPTION
Patching the Puzzle of Genetic Network. Grace S. Shieh Institute of Statistical Science, Academia Sinica [email protected]. Outline. What is Genetic Network? Why the area is one of the frontiers? How Statistical modeling/computational algorithms simplify the complex puzzle? - PowerPoint PPT Presentation
Citation preview
Patching the Puzzle of Genetic Network
Grace S. Shieh
Institute of Statistical Science,Academia Sinica
Outline
What is Genetic Network?Why the area is one of the frontiers?
How Statistical modeling/computational algorithms simplify the complex puzzle?
Applications
Dogma of biology
DNA -> mRNA -> Protein
Proteins: the elements that function in organisms, e.g. yeast and human.
Somatic mutations affect key pathways in Lung adenocarcinoma Nature, Oct.2008
Science, Sept, 2008
Complex human disease
l Digenic effects may underlie:
Type II diabetes Schizophrenia Retinitis pigmentosa Glaucoma
Tong et al., Science 2004
Complex human disease These diseases may have similar synthetic
effect in the yeast genetic interaction map
The topology of the genetic network of neighborhood of SGS1 (Tong et al., 2004)
Elements of genetic network derived from model organism, e.g. yeast, are likely to be conserved
Experimental method to reveal genetic interactions Systematic Genetic Analysis with ordered
Arrays of Yeast Deletion Mutants Tong et al., 2001, Science
Global mapping of the Yeast Genetic interaction network
Tong et al., 2004, Science Genome landscape of a cell Costanzo et al. 2010, Science
Costanzo et al., Science 2010
Synthetic sick or lethal (SSL) gene pairs: when both genes are mutated, the organism will die, but neither lethal
SSL is important for understanding how an organism tolerates genetic mutations
Hartman, Garvik and Hartwell, 2001, Science
Scenarios resulting in synthetic interaction
SSL
2 partially redundant pathways
A
B
D
C
E
F
H
G
I
Partially redundant
genes
C1
A
B
E
D
C2
3 partially redundant
pathways, 2 required
E
F
H
G
I
A
B
D
C
J
K
M
L
Protein complex
tolerating 1 but not
2 destabiliz
ing mutations
A C
FDE
B
< 2% < 4% *
A Pattern Recognition Approach to Infer Gene
Networks
Grace S. Shieh
joined withC.-L. Chuang, C.-H. Jen and C.-M. Chen
Bioinformatics 2008
Excerpted from Tong et al. (2001) Science
Transcriptional Compensation (transcription reverse compensation) interactions (Lesage et al. 2004; Wong & Roth, 2005, Genetics; Kafri et al.,2005, Nature Genetics):
among paralogues or SSL gene pairs, when one gene is mutated, its partner gene’s expression increases (decreases)
Goal: to predict TC and TRC interactions among SSL
gene pairs
Four sets of Yeast (Sachromyces cerevisiae) micro-array gene expression data (Spellman, et al, 1998) were used.The red channel R: intensities of synchronized yeast by alpha factor arrest, arrest of a cdc 15 or cdc 28 mutant and Elutration; The Green channel G: average of non-synchronized.
Cell cycles of CLN2 gene
qRT-PCR experiments
For a given pair of SSL genes,Experimental group: gene A’s expression, gene B been knocked out Control group: gene A’s expression, gene B wildtype if A >> B => A& B may be TC if A << B => A& B may be TRC
Gene expression of Transcription Compensation (TC) pairs
Gene expression of Transcription Reverse Compensation (TRC) pairs
The dependence of patterns and their associated interactions
Assumption for PARE:
the dependence of CP (SP) and TC (TD) interactions is significant. To test this hypothesis: Fisher’s exact test
The Proportion of Complementary Pattern (CP) in TC Screen genes with significant changes over
time by resulted in 35 gene pairs
( ) ( )max ( ) min ( ) 1.5t i t iG t G t >
Fisher’s exact test: p-value < 0.02 significant at 95% level
CP SP TotalTC 13 9 22
TD 2 11 13
Total 15 20 35
PARE The gene expression of the regulating gene is treated as
object contour, and the lagged-1 expression of the target gene the boundary of interest in image segmentation algorithm
1,t t′= +
( ) ( )D1,
jii j
def
t
G tG tE
t t
′ ∂∂= ⋅
′∂ ∂∑( ) ( )22
D2, 2 2
def jii j
t
G tG tE
t t=
′ ∂∂⋅
′∂ ∂∑
( ) ( )Area,
12
g g∈ℑ
= ′ ×∑uuv uuvdef
i j i jt
E t t
( ){ }( ), ( ) 90 g g′∈ℑ >uuv uuv oi jt t tθ
Discrete Signals Because gene expression is discrete signal, the 1st- and 2nd-
order partial differential terms can be modified as follows:
the interaction can be determined as weighted sum of the internal and external energies:
D1 D2 Area
, , , ,= ⋅ + ⋅ − ⋅
i j i j i j i jS E E Eα β γ
,i jS
( ) ( 1) ( )∂ + −=
∂ Δi i iG t G t G tt t
22
2
)()()1(2)2()(
ttGtGtG
ttG iiii
Δ++−+
=∂
∂
PARE In this study, each gene is represented by a node in a In this study, each gene is represented by a node in a
graphical model, which is denoted by graphical model, which is denoted by , where , where ii = 1, 2, …, = 1, 2, …, NN. . The edge The edge represents the gene-gene interaction between represents the gene-gene interaction between
and and , where the enhancer gene , where the enhancer gene plays a key role in plays a key role in activating or repressing the target gene activating or repressing the target gene ..
iG
iG jG iG
jG
,i jS
Training set vs test set Leave-one-out cross validation: among n pairs, use n-1 pairs to train PARE, then predict
the left 1 pair, iteratively for n.
3-fold cross validation: among all pairs, use 2/3 pairs to train, then predict the
left 1/3, from all combinations iterative this for N times
Experimental Results (TC/TRC)
alpha data set (18 time points) –
Table 1. The prediction results, checked against the qRT-PCR experiments
*Since 500 times 3-fold CVs were performed, only averages of TPRs are reported.
Training Test
TPR FPR TPR Std FPR
Lagged Corr. 46%
EB-GGMs 52%
PAREn-fold 76% 20% 73% 23%
3-fold 78%* 18%* 71%* 3% 23%*
Experimental Results (TC/TRC) For the alpha dataset, PARE yields
71-73% of true-positive rate
prediction accuracy 81%
FPR for predicting TC (TD) interaction was bounded by 12% (10%) genome-wide.
Experimental Results (TC/TRC)
Checking against published literature These genetic interactions are consistent
with the following experimental results:
Sgs1 and Srs2 are known redundant pathways in replication (Ira et al., 1999; Lee et al., 1999)
Ex: Srs2 and Sgs1-Top3 suppress crossovers during double stand break repair in yeast.
Sgs1/Top3/Rmi1 and Mus81/Mms4 complex are involved in both double-strand break repair and homologous recombination (Frabe et
al., 2002).
This indicates that Sgs1/Top3/Rmi1 and Mus81/Mms4 are alternative pathways to resolve recombination intermediates.
Inferring transcriptional interactions 132 pairs of Activator-target gene (AT) and Repressor-target (RT) gene interactions were collected from published literatures (MIPS, Mewes et al, 1999, Nucleic Acids Research; Gancedo, 1998, Microbiology & Molecular Biology; Draper et al., 1994, Molecular & Cellular Biology, etc)
Test for CP (SP) associatied with RT (AT) pairs in the data
Chi-Squared test
Experimental Results (AT/RT)
*the average of 500 times repeats
Table 2. The prediction results using Elu data set, checked against the 132 TIs from literatures.
Training Test
TPR FPR TPR Std FPR
Lagged Corr. 51%
EB-GGMs 59%
PAREn-fold 79% 16% 77% 17%
3-fold 81%* 16%* 74%* 3% 19%*
FPRs for genome-wide TIs predictions, and they are bounded by 21%.
Conclusions The proposed PARE learns gene expression
patterns, then it can predict similar genetic interactions using microarray data.
TPRs of PARE applied to the alpha (Elu) dataset are about 73% (77%) for inferring TC/TD interactions (TI), respectively.
Inferring genesis of obesity in human (join w. Karine & Jean-Daniel
MGED from Human adipocyte-derived cell lines
Adipocytes cells that primarily compose adipose tissue specialized in storing energy as fat
0 2 4 6 8 10-1
0
1
2
C/EBP alpha (time-course)
dayexpression level (log
2)
0 2 4 6 8 100
1
2
C/EBP alpha (MGED in ratio)
day
J i/J i-1
Time-course MGED
PARE to infer genesis of obesity in human
Training stage: MGED of human adipocytes-derived cell lines
70 known transcriptional interactions (TIs) from iHOP
Prediction results: 40+ pairs of TIs and some genetic interactions
predicted Some are consistent with existing experimental
results, some novel ones
Inferring TIsData preparation: Select significantly expressed genes:
P-value < 0.01 Significantly expressed in at least 1 time point (5 time
points in total)
->36 genes with a function of interest Interact with 14 genes of interest (AP2, CCL2, CCL5,
LEP, etc…) -> 504 gene pairs
WebPARE: webcomputing service of PARE (Chuang+, Wu+, Cheng and Shieh*, 2010, Bioinformatics)
To provide a simple web-interface for users to infer GIs/TIs using time course gene expression data and existing knowledge, e.g. pre-stored validated TIs in yeast, mouse, human, etc (TRANSFAC)
45
An example:
A list of genes involved in cell cycle and a data set (e.g. Elu) were uploaded to WebPARE, TIs of these pairs were of interest.
Using integrated (pre-stored) pairs of TIs in yeast, PARE correctly predicted 118 out of 176 TIs, mTPR=67%
e.g. The significant predicted network from 66 pairs ->
46
WebPARE html www.stat.sinica.edu.tw/WebPARE
Demo WebPARE can be assessed at:
http://www.stat.sinica.edu.tw/WebPARE
Acknowledgement Dr. Ting-Fang Wang and Da-Yow Huang, Inst. of Biological Chemistry, Academia Sinica
Drs. Karine Clement and J-D. Zucker, INSERM & IRD, France
Cheng-Long Chuang, Chin-Yuan Guo, Chia-Chang
Wang, Dr. Shi-Fong Guo, Yu-Bin Wang, Jia-Hung Wu
Inst. of Statistical Science
Thank you for your attention!
Wanted ( 誠徵 )
兼任 PhD students Research assistants to work at Shieh lab.( 謝叔蓉老師實驗室 ) 統計所中研院
Parameter estimation
Next, we estimate parameters via the particle swarm optimization (PSO) algorithm (Kennedy and Eberhart, 1995) is a stochastic optimization technique that simulate the behavior of a flock of birds.
Example (finding largest gradient)
Evolutionary Process of PSO
Gene expression of Activator-Target (AT) gene pairs
Gene expression of Repressor-Target (RT) gene pairs