19
The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘08 Genome Research 2008

The AMADEUS Motif Discovery Platform

  • Upload
    arva

  • View
    67

  • Download
    0

Embed Size (px)

DESCRIPTION

The AMADEUS Motif Discovery Platform. C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University. Genome Research 2008. ApoSys workshop May ‘ 08. TF. TF. 5 ’. 3 ’. Gene. BS. BS. Promoter Analysis: Exteremely brief intro. - PowerPoint PPT Presentation

Citation preview

Page 1: The AMADEUS Motif Discovery Platform

The

AMADEUSMotif Discovery Platform

C. Linhart, Y. Halperin, R. ShamirTel-Aviv University

ApoSys workshop May ‘08Genome Research 2008

Page 2: The AMADEUS Motif Discovery Platform

• Transcription is regulated primarily by transcription factors (TFs) – proteins that bind to DNA subsequences, called binding sites (BSs)

• TFBSs are located mainly (not always!) in the gene’s promoter – the DNA sequence upstream the gene’s transcription start site (TSS)

• TFs can promote or repress transcription

Promoter Analysis:Exteremely brief intro

TFTFGene5’ 3’

BSBSTSS

Page 3: The AMADEUS Motif Discovery Platform

• The BSs of a particular TF share a common pattern, or motif, which is often modeled using:– Consensus string

TASDAC (S={C,G} D={A,G,T})– Position weight matrix (PWM / PSSM)

Promoter Analysis (cont.)TFBS models

A0.10.800.70.20C00.10.50.10.40.6G000.50.10.40.1T0.90.100.100.3

> Threshold = 0.01:

TACACC (0.06)TAGAGC (0.06)TACAAT (0.015)…

Page 4: The AMADEUS Motif Discovery Platform

Promoter Analysis (cont.): Typical pipeline

Cluster I

Cluster II

Cluster III

Gene expressionmicroarrays

Clustering

Location analysis(ChIP-chip, …)

Functional group(e.g., GO term)

Promotersequences

Motifdiscovery

Co-regulated gene set

Page 5: The AMADEUS Motif Discovery Platform

Reverse-engineer the transcriptional regulatory network = find the TFs (and their BSs) that regulate the studied biological processInput: A set of co-expressed genesOutput: “Interesting” motif(s):

1. Known motifs: PRIMA, ROVER, …

2. Novel motifs: MEME, AlignACE, …

3. A group of co-occurring motifs = cis-regulatory module (CRM):

MITRA, CREME, …

Promoter Analysis (cont.): Goals

AMADEUS

Page 6: The AMADEUS Motif Discovery Platform

• Extant tools perform reasonably well for:– Finding known/novel motifs in organisms with short,

simple promoters, e.g., yeast– Identifying some of the known motifs in complex

species, e.g., TFs whose BSs are usually close to the TSS• … but often fail in other cases!• Each tool is custom-built for a specific target score, often

parametric (i.e., assumes a BG model) or uses a small part of the genome as BG reference;Majority of tools can efficiently handle only dozens of genes

• Comparison of tools: [Tompa et al. ’05]

Promoter Analysis: Status of motif discovery tools

Page 7: The AMADEUS Motif Discovery Platform

AMADEUSA Motif Algorithm for DetectingEnrichment in mUltiple Species

• Research platform:• Extensible: add new algs, scores, motif models• Flexible: control params, algs, scores of execution

• Experimental tool:• Sensitive: find subtle signals • Efficient: analyze many long sequences• Informative: show lots of info on motifs • User-friendly: nice GUI

Page 8: The AMADEUS Motif Discovery Platform

Main features: I/OInput:

• Type: target set / expression data• Multiple species / target-sets• Sequence region (promoter, 1st intron, 3’ UTR, …)

Output:• Non-redundant set of motifs• Rich info per output motif:

1. Graphical motif logo2. Multiple scores & combined p-value3. Similarity to known TFBS models4. List of target genes5. BS localization graph6. Targets mean expression graph

Page 9: The AMADEUS Motif Discovery Platform

Main features: alg.Algorithm: Multiple refinement phases: • Each phase receives best candidates of previous phase,

and refines them (e.g., uses a more complex motif model)• First phases are simple and fast (e.g., try all k-mers);

Last phases are more complex (e.g., optimize PWM using EM)

Page 10: The AMADEUS Motif Discovery Platform

Main features: scoresMotif scores:

• User selects scores to use, a subset of:─ Target-set: Over/under-representation:

1. Hypergeometric2. GC-content+length binned binomial

─ Expression: 1. Enrichment of ranked expression (multiple conditions)

(Not yet in the public version) ─ Global/spatial:

1. Localization2. Strand-bias3. Chromosomal preference

• Scores are combined into a single p-value• Doesn’t assume specific models for distribution of BSs

and/or expression values

Page 11: The AMADEUS Motif Discovery Platform

Main features: misc.GUI:

• Control all parameters• Save/load parameters from file• Save textual+graphical output to file• TFBS viewer

Other:• Ignore redundant sequences (with identical subsequence) • Applicable to multiple genome-scale promoter sequences • Bootstrapping: Empirical p-value estimation using

random target sets / shuffled data• Execution modes: GUI , batch• Interoperability: Java application

Page 12: The AMADEUS Motif Discovery Platform

Case study:G2 & G2/M phases of human cell

cycle [Whitfield et al. ’02]CHR (not in TRANSFAC)

NF-Y

(Module was reported in [Linhart et al., ’05], [Tabach et al. ’05])Module: CHR and NF-Y motifs co-occur

Page 13: The AMADEUS Motif Discovery Platform

Benchmark I:Yeast TF target sets [Harbison et al.

’04]Source: ChIP-chip [Harbison et al., ’04]Data: target-sets of 83 TFs with known BS motifsAverage set size: 58 genes (=35 Kbps)Success rates: (for top 2 motifs of lengths 8 & 10)

Page 14: The AMADEUS Motif Discovery Platform

Performance on metazoan datasetsResults on 42 target-sets:• Collected from 29 publications• Based on high-throughput expr’s• Species: human, mouse, fly, worm • Sets: 26 TFs, 8 microRNAs• All have known motifs

Page 15: The AMADEUS Motif Discovery Platform

Global Analysis I:Localized human+mouse motifs

Input: • All human & mouse promoters (2 x ~20,000) • Region: -500…100 (w.r.t. TSS)• Total sequence length: ~26 Mbps• [No target-set / expression data]• Score: localization

Results: • Recovered known TFs: Sp1, NF-Y, GABP, TATA, Nrf-1, ATF/CREB, Myc, RFX1• Recovered the splice donor site• Identified several novel motifs

Page 16: The AMADEUS Motif Discovery Platform

Input: • All fly promoters (~14,000) • Region: -1000…200 (w.r.t. TSS)• Total sequence length: ~11 Mbps• [No target-set / expression data]• Score: chromosomal preference

Results: • DNA Replication Element Factor (DREF) on X chromosome

Global Analysis II:Chromosomal preference

Page 17: The AMADEUS Motif Discovery Platform

Global Analysis II:Chromosomal preference (cont.)

Input: • All worm promoters (~18,000) • Region: -500…100 (w.r.t. TSS)• Total sequence length: 6.6 Mbps• [No target-set / expression data]• Score: chromosomal preference

Results: • Novel motif on chrom IV

Page 18: The AMADEUS Motif Discovery Platform

Summary• Developed Amadeus motif discovery platform:

• Easy to use• Feature-rich, informative• Sensitive & efficient

• Constructed a large, real-life, heterogeneous benchmark for testing motif finding tools• Demonstrated various applications of motif discovery• http://acgt.cs.tau.ac.il/amadeus

Page 19: The AMADEUS Motif Discovery Platform

Acknowledgements

Tel-Aviv UniversityChaim LinhartYonit HalperinRon Shamir

The Hebrew University of JerusalemGidi Weber