35
Cis-regultory module 10/24/07

Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Cis-regultory module

10/24/07

Page 2: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

TFs often work synergistically

(Harbison 2004)

Page 3: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Combinatorial control

Page 4: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

lysogenic growth

lytic growth

(source: Gary Kaiser)

-phase

E coli

Page 5: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

ORcI cro

-operon

Page 6: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

ORcI cro

-operon

on off

lysogenic growth

Page 7: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

ORcI cro

-operon

off on

lytic growth

OR1OR2OR3

Page 8: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

cro

-operon

cI

Pol II

lysogenic

crocI

Pol II

lytic

Page 9: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Cis-regulatory module (CRM)

• “A CRM is a DNA segment, typically a few hundred base pairs in length containing multiple binding sites, that recruits several cooperating factors to a particular genomic location” – Ji and Wong (2006)

Page 10: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Statistical Methods

• Predict modules when the motifs are known. (simpler)– LRA, by Wasserman and Fickett (1998)

• Predict modules when the motifs also need to be discovered. (more difficult)– CisModule, by Zhou and Wong (2004)– EMCModule, by Gupta and Liu (2005)

Page 11: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

LRA

Page 12: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

LRA

Cooperative motifs:

Basic idea: True regulatory regions are likely to have multiple motif sites. P

roba

bilit

y fo

r be

ing

regu

lato

ry

Page 13: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

LRA

• Training data contain a subset of known regulatory and control regions.

p

pp

1log)(logit

nnxxp ...)(logit 110

highest motif matching score within a given sequence

regression coefficient

Probability for being a regulatory

region

Page 14: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Application: skeletal-muscle gene regulation

• 5 muscle-specific TFs are known:– Mef-2, Myf, SRF, Tef, Sp-1

• 29 regulatory regions are known.

• Can we predict the regulatory regions just from sequence motif information?

Page 15: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Computational Procedure

• Motif matrices are identified by Gibbs sampling using sequence information from the 29 regulatory regions.

• For some TF, motifs cannot be found by the de novo approach. Use literature motifs instead.

• Top two matching scores for each TF are included as covariates.

• Apply LRA model. Use leave-one-out cross-validation to evaluate model performance.

Page 16: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Results

•Single motifs are highly non-specific.

•Simple multi-sites analysis improves specificity at the cost of reducing sensitivity.

Page 17: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Results

•Single motifs are highly non-specific.

•Simple multi-sites analysis improves specificity at the cost of reducing sensitivity.

Page 18: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Results

•Single motifs are highly non-specific.

•Simple multi-sites analysis improves specificity at the cost of reducing sensitivity.

•Logistic regression further improves specificity at reduced cost for sensitivity.

Page 19: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

• Motifs must be known in advance.

• When known regulatory sequences are few, it is difficult to identify motifs by using traditional methods.

Objective:

• Integrating motif discovery and module finding in a single statistical model.

Limitations of LRA

Page 20: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

De novo module identification

Two tasks

• Identify TF motifs

• Identify CRMs.

Page 21: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Why module approach can help motif discovery

•Due to poor specificity, a short sequence can be enriched simply by chance.

•The probability for random matches is much smaller for motif co-occurrence.

Page 22: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

cisModule

Basic idea:• A two-level

hierarchical mixture model (HMx).– Level 1: modules

sequences

(Zhou and Wong 2004)

Page 23: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

cisModule

Basic idea:• A two-level

hierarchical mixture model (HMx).– Level 1: modules

sequences– Level 2: motifs

modules

(Zhou and Wong 2004)

Page 24: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

• Treat HMx model as a stochastic machinery to generate sequences.– From the first sequence position, make a series of random

decisions of whether to initiate a module of length l or generate a letter from the background model.

– Inside a module, If a site for the kth motif was initiated at position n, then generate wk letters from its PWM and place them at [n, n+wk-1], otherwise generate a letter from the background.

– After reaching the end of the current module, decide whether sampling from the background or initiating a new module.

HMx Model as a Stochastic Process

(Zhou and Wong 2004)

Page 25: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

given alignment, update model parameters

given model parameters, update module/motif locations

Model inference: Gibbs sampling

Page 26: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

An numerical experiment

• Merge the 29 regulatory regions with a set of sequences randomly selected from ENSEMBL promoters.

• Test the ability of cisModule to identify motifs under “noisy” environment.

Page 27: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Results

Page 28: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Limitations of CisModule

• The length of module, and number of motifs are externally provided.

• Convergence time could be slow. Multiple cycles are needed each starting from a new seed.

• Assuming that combinations of different motifs are independent.

Page 29: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

EMCModule

• Gupta and Liu (2005) developed a similar approach called EMCModule.

• Main difference:– They use the collection of literature motifs as initial

“seeds” for motif discovery. – Their method improves the convergence speed.– Their definition of CRMs are a little different: the

number of motifs are fixed within one module, but the order of and distance between different motifs can be varied.

Page 30: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Further issues

• Comparative genomic approach can also be incorporated into module discovery. (Zhou and Wong 2007).

• The modules identified by these methods can be viewed as belonging to one “type”. New methods need to developed to discover multiple module types.

• While module-based approach is helpful for finding cooperative motifs, it may hurt discovery of single motifs.

Page 31: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

(Yuh et al. 1998)

Page 32: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

(Yuh et al. 1998)

Page 33: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

(Yuh et al. 1998)

Page 34: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

(Yuh et al. 1998)

Page 35: Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Reading List

• Wasserman and Fickett (1988)– LRA. One of the first work on cis-regulatory modules.

• Zhou and Wong (2004)– cisModule. A statistical method to identify cis-

regulatory modules without knowledge of motif information.

• Yuh et al. (1998)– An influential biological paper on how information can

be integrated from different modules to regulate gene expression.