Patching the Puzzle of Genetic Network

Preview:

DESCRIPTION

Patching the Puzzle of Genetic Network. Grace S. Shieh Institute of Statistical Science, Academia Sinica gshieh@stat.sinica.edu.tw. Outline. What is Genetic Network? Why the area is one of the frontiers? How Statistical modeling/computational algorithms simplify the complex puzzle? - PowerPoint PPT Presentation

Citation preview

Patching the Puzzle of Genetic Network

Grace S. Shieh

Institute of Statistical Science,Academia Sinica

gshieh@stat.sinica.edu.tw

Outline

What is Genetic Network?Why the area is one of the frontiers?

How Statistical modeling/computational algorithms simplify the complex puzzle?

Applications

Dogma of biology

DNA -> mRNA -> Protein

Proteins: the elements that function in organisms, e.g. yeast and human.

Somatic mutations affect key pathways in Lung adenocarcinoma Nature, Oct.2008

Science, Sept, 2008

Complex human disease

l Digenic effects may underlie:

Type II diabetes Schizophrenia Retinitis pigmentosa Glaucoma

Tong et al., Science 2004

Complex human disease These diseases may have similar synthetic

effect in the yeast genetic interaction map

The topology of the genetic network of neighborhood of SGS1 (Tong et al., 2004)

Elements of genetic network derived from model organism, e.g. yeast, are likely to be conserved

Experimental method to reveal genetic interactions Systematic Genetic Analysis with ordered

Arrays of Yeast Deletion Mutants Tong et al., 2001, Science

Global mapping of the Yeast Genetic interaction network

Tong et al., 2004, Science Genome landscape of a cell Costanzo et al. 2010, Science

Costanzo et al., Science 2010

Synthetic sick or lethal (SSL) gene pairs: when both genes are mutated, the organism will die, but neither lethal

SSL is important for understanding how an organism tolerates genetic mutations

Hartman, Garvik and Hartwell, 2001, Science

Scenarios resulting in synthetic interaction

SSL

2 partially redundant pathways

A

B

D

C

E

F

H

G

I

Partially redundant

genes

C1

A

B

E

D

C2

3 partially redundant

pathways, 2 required

E

F

H

G

I

A

B

D

C

J

K

M

L

Protein complex

tolerating 1 but not

2 destabiliz

ing mutations

A C

FDE

B

< 2% < 4% *

A Pattern Recognition Approach to Infer Gene

Networks

Grace S. Shieh

joined withC.-L. Chuang, C.-H. Jen and C.-M. Chen

Bioinformatics 2008

Excerpted from Tong et al. (2001) Science

Transcriptional Compensation (transcription reverse compensation) interactions (Lesage et al. 2004; Wong & Roth, 2005, Genetics; Kafri et al.,2005, Nature Genetics):

among paralogues or SSL gene pairs, when one gene is mutated, its partner gene’s expression increases (decreases)

Goal: to predict TC and TRC interactions among SSL

gene pairs

Four sets of Yeast (Sachromyces cerevisiae) micro-array gene expression data (Spellman, et al, 1998) were used.The red channel R: intensities of synchronized yeast by alpha factor arrest, arrest of a cdc 15 or cdc 28 mutant and Elutration; The Green channel G: average of non-synchronized.

Cell cycles of CLN2 gene

qRT-PCR experiments

For a given pair of SSL genes,Experimental group: gene A’s expression, gene B been knocked out Control group: gene A’s expression, gene B wildtype if A >> B => A& B may be TC if A << B => A& B may be TRC

Gene expression of Transcription Compensation (TC) pairs

Gene expression of Transcription Reverse Compensation (TRC) pairs

The dependence of patterns and their associated interactions

Assumption for PARE:

the dependence of CP (SP) and TC (TD) interactions is significant. To test this hypothesis: Fisher’s exact test

The Proportion of Complementary Pattern (CP) in TC Screen genes with significant changes over

time by resulted in 35 gene pairs

( ) ( )max ( ) min ( ) 1.5t i t iG t G t >

Fisher’s exact test: p-value < 0.02 significant at 95% level

CP SP TotalTC 13 9 22

TD 2 11 13

Total 15 20 35

PARE The gene expression of the regulating gene is treated as

object contour, and the lagged-1 expression of the target gene the boundary of interest in image segmentation algorithm

1,t t′= +

( ) ( )D1,

jii j

def

t

G tG tE

t t

′ ∂∂= ⋅

′∂ ∂∑( ) ( )22

D2, 2 2

def jii j

t

G tG tE

t t=

′ ∂∂⋅

′∂ ∂∑

( ) ( )Area,

12

g g∈ℑ

= ′ ×∑uuv uuvdef

i j i jt

E t t

( ){ }( ), ( ) 90 g g′∈ℑ >uuv uuv oi jt t tθ

Discrete Signals Because gene expression is discrete signal, the 1st- and 2nd-

order partial differential terms can be modified as follows:

the interaction can be determined as weighted sum of the internal and external energies:

D1 D2 Area

, , , ,= ⋅ + ⋅ − ⋅

i j i j i j i jS E E Eα β γ

,i jS

( ) ( 1) ( )∂ + −=

∂ Δi i iG t G t G tt t

22

2

)()()1(2)2()(

ttGtGtG

ttG iiii

Δ++−+

=∂

PARE In this study, each gene is represented by a node in a In this study, each gene is represented by a node in a

graphical model, which is denoted by graphical model, which is denoted by , where , where ii = 1, 2, …, = 1, 2, …, NN. . The edge The edge represents the gene-gene interaction between represents the gene-gene interaction between

and and , where the enhancer gene , where the enhancer gene plays a key role in plays a key role in activating or repressing the target gene activating or repressing the target gene ..

iG

iG jG iG

jG

,i jS

Training set vs test set Leave-one-out cross validation: among n pairs, use n-1 pairs to train PARE, then predict

the left 1 pair, iteratively for n.

3-fold cross validation: among all pairs, use 2/3 pairs to train, then predict the

left 1/3, from all combinations iterative this for N times

Experimental Results (TC/TRC)

alpha data set (18 time points) –

Table 1. The prediction results, checked against the qRT-PCR experiments

*Since 500 times 3-fold CVs were performed, only averages of TPRs are reported.

Training Test

TPR FPR TPR Std FPR

Lagged Corr. 46%

EB-GGMs 52%

PAREn-fold 76% 20% 73% 23%

3-fold 78%* 18%* 71%* 3% 23%*

Experimental Results (TC/TRC) For the alpha dataset, PARE yields

71-73% of true-positive rate

prediction accuracy 81%

FPR for predicting TC (TD) interaction was bounded by 12% (10%) genome-wide.

Experimental Results (TC/TRC)

Checking against published literature These genetic interactions are consistent

with the following experimental results:

Sgs1 and Srs2 are known redundant pathways in replication (Ira et al., 1999; Lee et al., 1999)

Ex: Srs2 and Sgs1-Top3 suppress crossovers during double stand break repair in yeast.

Sgs1/Top3/Rmi1 and Mus81/Mms4 complex are involved in both double-strand break repair and homologous recombination (Frabe et

al., 2002).

This indicates that Sgs1/Top3/Rmi1 and Mus81/Mms4 are alternative pathways to resolve recombination intermediates.

Inferring transcriptional interactions 132 pairs of Activator-target gene (AT) and Repressor-target (RT) gene interactions were collected from published literatures (MIPS, Mewes et al, 1999, Nucleic Acids Research; Gancedo, 1998, Microbiology & Molecular Biology; Draper et al., 1994, Molecular & Cellular Biology, etc)

Test for CP (SP) associatied with RT (AT) pairs in the data

Chi-Squared test

Experimental Results (AT/RT)

*the average of 500 times repeats

Table 2. The prediction results using Elu data set, checked against the 132 TIs from literatures.

Training Test

TPR FPR TPR Std FPR

Lagged Corr. 51%

EB-GGMs 59%

PAREn-fold 79% 16% 77% 17%

3-fold 81%* 16%* 74%* 3% 19%*

FPRs for genome-wide TIs predictions, and they are bounded by 21%.

Conclusions The proposed PARE learns gene expression

patterns, then it can predict similar genetic interactions using microarray data.

TPRs of PARE applied to the alpha (Elu) dataset are about 73% (77%) for inferring TC/TD interactions (TI), respectively.

Inferring genesis of obesity in human (join w. Karine & Jean-Daniel

MGED from Human adipocyte-derived cell lines

Adipocytes cells that primarily compose adipose tissue specialized in storing energy as fat

0 2 4 6 8 10-1

0

1

2

C/EBP alpha (time-course)

dayexpression level (log

2)

0 2 4 6 8 100

1

2

C/EBP alpha (MGED in ratio)

day

J i/J i-1

Time-course MGED

PARE to infer genesis of obesity in human

Training stage: MGED of human adipocytes-derived cell lines

70 known transcriptional interactions (TIs) from iHOP

Prediction results: 40+ pairs of TIs and some genetic interactions

predicted Some are consistent with existing experimental

results, some novel ones

Inferring TIsData preparation: Select significantly expressed genes:

P-value < 0.01 Significantly expressed in at least 1 time point (5 time

points in total)

->36 genes with a function of interest Interact with 14 genes of interest (AP2, CCL2, CCL5,

LEP, etc…) -> 504 gene pairs

WebPARE: webcomputing service of PARE (Chuang+, Wu+, Cheng and Shieh*, 2010, Bioinformatics)

To provide a simple web-interface for users to infer GIs/TIs using time course gene expression data and existing knowledge, e.g. pre-stored validated TIs in yeast, mouse, human, etc (TRANSFAC)

45

An example:

A list of genes involved in cell cycle and a data set (e.g. Elu) were uploaded to WebPARE, TIs of these pairs were of interest.

Using integrated (pre-stored) pairs of TIs in yeast, PARE correctly predicted 118 out of 176 TIs, mTPR=67%

e.g. The significant predicted network from 66 pairs ->

46

WebPARE html www.stat.sinica.edu.tw/WebPARE

Demo WebPARE can be assessed at:

http://www.stat.sinica.edu.tw/WebPARE

Acknowledgement Dr. Ting-Fang Wang and Da-Yow Huang, Inst. of Biological Chemistry, Academia Sinica

Drs. Karine Clement and J-D. Zucker, INSERM & IRD, France

Cheng-Long Chuang, Chin-Yuan Guo, Chia-Chang

Wang, Dr. Shi-Fong Guo, Yu-Bin Wang, Jia-Hung Wu

Inst. of Statistical Science

Thank you for your attention!

Wanted ( 誠徵 )

兼任 PhD students Research assistants to work at Shieh lab.( 謝叔蓉老師實驗室 ) 統計所中研院

 

Parameter estimation

Next, we estimate parameters via the particle swarm optimization (PSO) algorithm (Kennedy and Eberhart, 1995) is a stochastic optimization technique that simulate the behavior of a flock of birds.

Example (finding largest gradient)

Evolutionary Process of PSO

Gene expression of Activator-Target (AT) gene pairs

Gene expression of Repressor-Target (RT) gene pairs