132
© 2007 Genomatix Software GmbH http://www.genomatix.de - http://www.genomatix-software.com Too many matches…

Too many matches…

  • Upload
    ziazan

  • View
    61

  • Download
    1

Embed Size (px)

DESCRIPTION

Too many matches…. A typical question:. A typical approach:. ?. Too many matches…. What are the potential TF sites involved in regulation of my gene of interest ?. “Let´s run Mat I nspector over the promoter region of my gene”. A typical question:. A typical approach:. ?. - PowerPoint PPT Presentation

Citation preview

Page 1: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Too many matches…

Page 2: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

“Let´s run MatInspector over the promoter region of my gene”

A typical question:

A typical approach:

What are the potential TF sites involved in regulation of my gene of interest ?

Too many matches…

Page 3: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

“Let´s extract from NCBI. 3kb upstream of TSS to be sure to have the promoter…”

A typical question:

A typical approach:

Where do I get my input promoter DNA sequence from?

Too many matches…

Page 4: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

A typical result:

Too many matches…

Which of those matches are relevant?How do I get rid of all those “false positives” ?

Page 5: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Important facts to consider:

There is not a single false positive match

MatInspector gives you all physical TF binding sites

A single isolated TF binding site carries no function

TFs work through complexes which are represented on sequencelevel through sets of TF binding sites in certain distance relationshipand orientation ->promoter frameworks

A physical TFBS is found every 10 to 15 bps throughout the genome

TF binding sites…

Page 6: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Okay, what is now a physical TF binding site ?

What is a functional TF binding site?

TF binding sites…

Page 7: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Physical binding sites have no function in transcription on their own

A physical binding site is invariable

A physical binding site is a fixed part of the genome

This DNA sequence usually can bind to its cognate protein(s)

Physical binding sites can be detected by MatInspector

= weight matrix / IUPAC string

False positives?

Page 8: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Transcriptional function is defined by the cellular and genomic context

One binding site,five cell types...

...but binding proteins are present only in 2 cell types!

-> no functional binding site in the other 3 cell types!

A functional binding site depends on context!

A functional binding site requires a cellular context

A functional binding site requires a genomic context

Even when binding proteinsare present...

...biological function may require additional binding sites!

Module

Physical vs functional TFBS

Page 9: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

A transcriptional module is the smallest functional unit

A transcriptional module consists of two or more TFBSs

Strand orientation, relative order and distance of TFBSs are important

A module also has a strand orientation and can shift within a promoter

F1 + F2 - F3 +/-

Transcriptional modules are present in promoters and enhancers

Transcriptional modules

TATAbox

INRbox

The core promoter - just another module

Transcriptional modules integrate signals via the interacting TFs

Page 10: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

No common organization? Common modules!

A B C

A B C

A B C

Why uses nature modules?

Page 11: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Promoter modules can work in three different ways

High / LowIs possible

High / LowIs possible

High / LowIs possible

High / Highonly

Binding Affinity:

Synergistic

“Composite elements”

Antagonistic

or or

Synergistic

“Short range module”distance ≤ 50 bp

“Looping module”distance up to 300bp

“Short range module”distance ≤ 50 bp

Transcriptional modules

Page 12: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Modules are the basic elements of regulatory pathways and networks

Transcriptional modules define target genes of pathways

NFkappaB regulates a number of “target genes”

NFkappaB

IL-6IL-8ICAM-1

SAA-1

SAA-2

ELAM-1

IFN-ß

IP-10 G-CSF

IL-2

HLA-A HLA-B IL-1E-Selectin

C/EBPCREB

IRF-1

NFkappaB

NFkB CREB NFkB C/EBP

NFkB IRF-1 NFkB NFkB

NFkappaB is involved in regulation of target genes of several pathways

Induced by 2 pathways !

Transcriptional modules

Page 13: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Key – lock principle

TFIIB TFIIE

TFIIH

INR

TFIID TFIIF

RNA polymerase II

TFIIA

TBP

proximal promoter

core promoter

distal promoter/enhancer

TF binding sites

„DNA-looping“

TATA

TF binding sites

Transcription factor binding sites

Protein complex

binding

Transcriptional modules

Page 14: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Transcription regulation mechanism

Gene A, transcript n

Transcription regulation implies a regulatory network

Protein complex

ExonPromoter

Gene B, transcript p

Gene C, transcript m

Primary transcript

Transcriptional modules

Page 15: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Context dependent expression by different protein complexes

TFIIB TFIIE

TFIIH TFIID TFIIF TFIIA

TBP TATA

TFIIB TFIIE

TFIIH

INR

TFIID TFIIF TFIIA

TBP TATA

Same lock – different keys: Same gene - different biological context

Transcriptional modules

Page 16: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Context specific transcription regulation

Example: Analysis of the RANTES promoter in different cell lines

Transcriptional modules

Fessele, S., Maier, H., Zischek, C., Nelson, P.J., Werner, T. (2002) "Regulatory context is a crucial part of gene function" Trends in Genetics 18, 60-63 (MEDLINE 1181130)

Experimentally verified evidence that TFBSs from modules, which are crucial for regulation in

one biological context (cell type), are totally irrelevant in another !

Page 17: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Module matches reduce experimental efforts by orders of magnitude

Modules contribute strongly to functional promoter analysis

Modules are usually linked to at least one known biological function

A module match in a promoter makes this gene a good candidate

Additional independent evidence is required to prove the target

A module match in a promoter does not prove the gene to be a target

A module match immediately suggests experimental verification

Transcriptional modules

Page 18: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Very interesting – but how does all this help me with my original question ?

The question still is:

What are the potential TF sites involved in regulation of my gene of interest ?

Promoter sequences

Page 19: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

More things to consider before asking that question !

There was another one:

Promoter sequences

“Let´s extract from NCBI. 3kb upstream of TSS to be sure to have the promoter…”

Where do I get my input promoter DNA sequence from?

Page 20: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

More things to consider …

3 kb is too large for meaningful analysis

even going 10kb upstream of TSS is no guarantee to havethe relevant promoter sequence

multiple promoters are the rule, not an exeption

the non-coding first exon is always part of the promoter

Huh? What does this mean ?Where do I get this damn promoter now?

Promoter sequences

Page 21: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Which promoter? One gene = one promoter ?

Genes usually have alternative transcripts with alternative promoters

Gene A?

Gene A?

Gene A?

Alternative transcripts/promoters

Page 22: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Context dependent expression via different promoters

Example: Glucokinase

Coding exons Hepaticpromoter

Pancreaticpromoter

Y Tanizawa, A Matsutani, KC Chiu, and MA PermuttHuman glucokinase gene: isolation, structural characterization, and identification of a microsatellite repeat polymorphismMol. Endocrinol., Jul 1992; 6: 1070 - 1081.

Alternative transcripts/promoters

Page 23: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Comparative genomic map of the Glucokinase GCK

Promoter set 1

Pancreatic promoter Data from ElDorado

Alternative transcripts/promoters

Promoter set 2

Hepatic promoter

Page 24: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Important facts to consider:

Alternative promoter usage is often tied to regulation oftissue specific gene expression

Alternative promoter usage is of very high biological relevance.There are several examples where aberrant regulation of the identical primary transcript leads to severe biological effects

Alternative transcripts/promoters

Page 25: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Aromatase: Switch in promoter usage is associated with disease

1.1 1.4 1.f 1.6 1.3 1 II III IV V VI VII VIII IX X

AATAAAAATAAA

Normal breast Breast cancer

Aromatase

The gene product is absolutely identical. The only difference is in thealternative promoter usage. On transcript level this can be seen only in the non-coding first exon.

Alternative transcripts/promoters

Page 26: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

The aim of in silico promoter analysis - summary

context 1

context 2

context 3

:

context n

1. Identification of the promoter sequence

2. Prediction of physical transcription factor binding sites

3. Functional context

4. Context dependent functional transcription factor binding sites

Promoter Analysis

Page 27: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Yes! I know all of this! I just wanted to know from where I can get my promoter sequence(s) easily!

ElDorado promoter sequence retrieval

www.genomatix.de

If you don´t have one already, sign up for a free evaluation account. first...

... then login here!

Page 28: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Page 29: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Either enter here the locus ID, or the gene name

…or choose a sequence file from your directory...

… or copy & paste a raw sequence here. It can be cdNAor whatever you have. It will be exactly mapped to thegenomes within seconds.

Upload a file from your local disk…

...accession number…

… or exact contig position

ElDorado promoter sequence retrieval

Choose the organism.

Page 30: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

HMGCS1 ( for example)

ElDorado promoter sequence retrieval

Input in this section delivers results based on gene name or keyword search. Over a million of names, synonyms and gene IDs help to find what you want - fast!

IMPORTANT!Affymetrix probe-set-ID input :

Our annotation is NOT based on the Affymetrix NetAffx assignment!It is rather based on genomic mapping of each single probe. A transcript will be retrieved if at least one probe of the set (usually 11 probes) matches. For mixed probe sets (cross-hybridisation), all relevant transcripts will be retrieved, which might lead to a result with transcripts from different loci.

Input in this section delivers results based on ultra fast sequence mapping. Copy and paste raw sequence data here (min.15 nucleotides) or enter an accession number.In contrast to the entry of an accession number above, here the sequence is actually retrived from data base and mapped onto the genome(s). NOTE: many EST based accession numbers have poor sequence homology and deliver no result.

Page 31: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

… here you can choose which chip´s probes to see...

ElDorado promoter sequence retrieval

… licensed customers can add their own sequence data

Page 32: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

This gives you an interactive graphical representation of the genomic context of your gene

ElDorado promoter sequence retrieval

Page 33: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Everything is clickable –just play around !

ElDorado promoter sequence retrieval

Here you can scale the view

switch display of components on and off mapping positions of Affymetrix single probes !

scale/slide the retrieved genomic "window"

Orange indicates your input. In this case a gene name. It is very informative when your query is based on sequence data. Then you see the mapping positions.

select regions of the graphics and safe them into a file

Page 34: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Now we have zoomed into the promoter region

Clicking on this trancriptional start region (TSR)...

...displays this hyperlink to ...

Page 35: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

...this profile of the different experimentally verified TSS (CAGE tags) in the different tissue types.

Page 36: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

This is a table-like representation of all annotated elements. It is especially useful for quick and easy retrieval of the dna sequence(s) of interest.

ElDorado promoter sequence retrieval

Page 37: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Tick/un-tick the boxes of what you would like to see, and then...

Page 38: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

This for instance...

...tells you that this SNP deletes three potential TF binding sites and creates a new one. A potential regulatory active SNP...

Page 39: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

from here you can directly run aMatInspector analysis forthis promoter...

...again,play around with theinteractive graphics...

Click the symbols and jump right into MatBase, the TF knowledge base..

ElDorado promoter sequence retrieval

Page 40: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

now, finally the first way to extract a promoter sequence ...

...and/or any other element displayed in the list below.

Choose your desired length.Unless you have good reason to change the length of the proximal promoter, leave the defaults!

Page 41: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

This shows you all annotated alternative transcripts plus all Affymetrix probe setsingle probe mappings plus another way to extract your promoter sequence(s)

Page 42: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

You know this already...Three different known transcripts for this locus...

... and four distinct promoters ! How this comes, I´ll tell you in a minute

Page 43: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Tick the promoter of your interest...

...choose format...

...and extract the sequence.

Or submit the promoter directly to MatInspectorfor graphical analysis.It works on a single sequence, too.

Or submit sequences directly to one of those tasks. But they make sense only with multiple sequences. More on that later!

Page 44: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

But why do I have four promoters here?

And two even don´t have a transcript assigned, as it is written here!

And what´s all thatCompGen thing about?

The multiple promoter thing I showed you before. Remember the GCK example, liver and pancreas?

Now to the CompGen promoters.They are derived by a proprietary comparative genomics approach.

Page 45: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Page 46: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Exhaustive cross-mapping of

all transcripts to all genomes of all organisms

in ElDorado generates our homology groups.

The tick-boxes you know already...

We need them for later promoter retrieval.Note the Promoter Set number !For our example we have an homologous locus assigned inchimp, macaca, human, rat, dog, cow, opossum, chicken, and zebrafish.

Page 47: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Get a feeling for the degree of phylogenetic conservation of the resp. promoter.

See how much experimental evidence supports this promoter

Page 48: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

You should be familiar with this view, now.

Here the orange indicates a promoter belonging to a promoter set.

With these tick-boxes you can switch on and off the display of the different Promoter Sets

APromoter Set represents phylogenetically conserved promoters

Page 49: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Don´t waste my time here!

How do I get my promoter sequence now?

And which one of all those promoters should I take ?

Well, which one? If you do not have any other information(experimental or from literature),I would recommend that you consider all available alternative promoters for further analysis

Page 50: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Two easy ways of promoter sequence retrieval by two mouse clicks I showed you some minutes ago.

There are more...

ElDorado promoter sequence retrieval

Don´t waste my time here!

How do I get my promoter sequence now?

And which one of all those promoters should I take ?

oh... you cannot access these options?

Page 51: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

You should license GenomatixSuite with at leastthe 10-fold evaluation account upgrade.

Otherwise it is slightly more cumbersome...

ElDorado promoter sequence retrieval

Use one of the options I showed you before and get Contig and positional information...

... and use that for sequence retrieval from your second to Genomatix favorite system, e.g. NCBI

Hint: If you are interested in the TF results rather than the sequence, use the “search for common transcription factor binding sites” option as shown before.

Page 52: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

From physical to functional TF site

Quite interesting…

But I am not a single step closer to the answer of my real question:

What are the potential TF sites involved in regulation of my gene of interest ?

Well, I think you are. Essential first step is to analyze the right sequence in a length that allows for meaningful results.

Now that you have the real promoter sequence(s), let´s see how to go on from here...

Page 53: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

The ideal situation for determining potential functional binding sites would be to have a set of genes apparently being co-regulated in the given cellular and experimental context, f.i. from a microarray experiment.A comparative promoter analysis with FrameWorker would very likely give you a pattern of involved TFs, as shown in numerous publications (see our web site at “About us -> Publications”).

Then we have to look for additional evidence that some of the physical TF sites might be functional ones. Best would be to go for a ChromatinIP experiment. However, for such you would need some hints for which TF to make or buy antibodies. Further computer analysis is required anyhow!

There are three different roads to go...

From physical to functional TF site

But I have only a single gene.

And that´s the one I am interested in!

Page 54: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

We talked about promoter modules before. Search your sequence for promoter modules with ModelInspector.Our Promoter Module Library contains over 550 promoter modules, each of them experimentally verified to carry transcriptional regulatory activity. A module match increases probability that an involved TF site is functional.

Look for phylogenetically conserved patterns of TF sites in a comparative genomics promoter set with FrameWorker.TFs being part of such phylogenetically conserved frameworks carry higher probability for being functional.

Do extensive literature data mining with BiblioSpherePE for known TF correlations, pathway analysis and gene set creation for comparative promoter analysis.TFs showing biological activity in another experimental context are functional (at least in that context).

From physical to functional TF site

Okay, how do I do this?

Let´s go !

Page 55: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

ElDorado promoter sequence retrieval

Lets start with an analysis for promoter modules...

Page 56: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Search for promoter modules

If you are licensed, you can have a quick look at the promoter module library. Each module is experimentally verified to carry regulatory activity.

Page 57: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Choose a sequence file from your directory

Or copy & paste a raw sequence here.or… you know the rest !

Don´t click anything below, unless you want to scan an entire data base !

Search for promoter modules

Page 58: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

go for vertebrate modules...

Click here! You can wait for the result…

Search for promoter modules

Page 59: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Search for promoter modules

Page 60: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Search for promoter modules

Page 61: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Not really. It is a nice example to show this approach. Very frequently one finds functionally related modules. However, there is no guarantee…It adds just another line of evidence.

Search for promoter modules

Now we have focused down to 21 very interesting positions in this promoter with modules that are composed of a total of 11 different transcription factor binding sites.Our arbitrary chosen example HMGCS1 belongs to the cholesterol biosynthesis pathway. Some of the found promoter modules do have proven function in sterol regulation!

…Wow! That´s impressive!

But that example is a mock-up, isn´t it?

Page 62: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Not really. It is a nice example to show this approach. Very frequently one finds functionally related modules. However, there is no guarantee…It adds just another line of evidence.

Okay, how does the other thing help?How did you call it, phylogenetically conserved frameworks?

That´s right. For this approach you first need a set of phylogenetically conserved promoters. Remember several slides before ?

Phylogenetically conserved frameworks

Page 63: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Inspect and choose your Promoter Set...

ElDorado promoter sequence retrieval

...scroll to the top of the page...

and tick the promoters of one set.

In this example I choose Promoter Set 3for human, rat, dog and cow.

Page 64: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

From here you can have a look at TF binding sites which are common to the input promoters

Phylogenetically conserved frameworks

...scroll down...

Great !

That is what I really want to know: Which TF sites do they have in common?

Page 65: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Be careful !!

Phylogenetically conserved frameworks

Great !

That is what I really want to know: Which TF sites do they have in common?

This is not more than a tiny hint!I can show you many cases where totally unrelated exons do have more TF sites in common than closely co-regulated promoters. What you are really looking for is a conserved pattern of TF sites. And we are going to do so.But first let´s have a look on the nucleotide sequence level...

Page 66: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

DiAlign TF gives an overlay of a true multiple sequence alignment (not pairwise) and common TF sites. Check DiAlign for other sequences (including amino acids)!It is extremely fast and especially powerful for finding short homologies in largely unrelated sequences.

Phylogenetically conserved frameworks

Page 67: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

The parameters should be selfexplanatory. Youcan always click for help

Phylogenetically conserved frameworks

Page 68: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Here an output example.

Phylogenetically conserved frameworks

Page 69: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

It is pretty informative to get a feeling for the degree of homology, which parts are more conserved than others and which TF binding sites reside in the homologous parts.Then, it is of interest to see where the evolutionary pressure was rather on functional conservation (TFBS) than on sequence conservation.

Phylogenetically conserved frameworks

Why did you do this?

What does it tell me?

Page 70: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Then, if you do a framework analysis on two highly homologous sequences we run into a combinatorial explosion. FrameWorker checks for it and might give you a warning. However, in this case everything is fine...

Phylogenetically conserved frameworks

Why did you do this?

What does it tell me?

Page 71: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

If you do a framework analysis on two highly homologous sequences we run into a combinatorial explosion. FrameWorker checks for it and might give you a warning. However, in this case everything is fine...

Phylogenetically conserved frameworks

Why did you do this?

What does it tell me?

Now,we finally go to the FrameWorker analysis!

Page 72: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Here you can select for TFs only, known to be associated with certain tissues.Here you can choose the matrix library

Phylogenetically conserved frameworks

This filter is a positive filter!Only TFs known to be associated with a tissue are listed here.A TF not listed in a certain tissue does NOT mean that it is not expressed there!It just has not been reported, yet.

Page 73: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

More options gives you...

Phylogenetically conserved frameworks

...well, more options !Don´t change those parameters unless you know exactly what you are doing !

Page 74: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Phylogenetically conserved frameworks

Use it with care!It slows down FrameWorker considerably!

This decides the number of input sequences which have to show a common pattern of TF sites

This sets the distance constraints between two adjacent TF sites. More important than the absolute distance is the distance variance. Always start at default values (unless you know already better) and relax gradually if nothing meaningful is found.

If you know that a certain TF is involved in the regulation of your gene, make it a mandatory element and search only for frameworks containing such. Mandatory elements are most helpful in focusing your analysis. If you don´t know one a priory, I´ll show you later in BiblioSpherePE how to get to those.Toggle multiple choices by holding the "Ctrl" key when clicking!

This option gives you an idea of the specificity of the found frameworks. It checks how often a framework would be found in a background of 5.000 random human promoter sequences.

One word on this parameter. It decides the minimum/maximum number of TF sites being allowed in one framework. In this case I increased the default value from 6 up to 10 since we want to identify the largest conserved pattern in this phylogenetic promoter set. We might lower this later.

And always think about the HELP pages !

Page 75: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

All four promoters have 18 TF sites in common. This number might differ from the „search for common TF“ job earlier, since now we take strand specificity into account.

The longest frameworks contain 8 TF sites. There are 4 different frameworks. If you click the link, you jump direct to the graphical representation

Phylogenetically conserved frameworks

Page 76: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Here you have a graphical representation. You already know how this works...

You can save this framework in your personal directory for subsequent sequence or database scans

Phylogenetically conserved frameworks

Scroll downtothe bottom ofthe page...

Here you see the detailed description of the framework. It is perfectly conserved throughout the species

Page 77: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Phylogenetically conserved frameworks

At the bottom of the output you find this list. Now we not only have identified the TFs but also the exact positions which are worth a closer look. You can scan with your saved frameworks all of our promoter databases for promoters with similar organization.

Why should I do this?

Would this give me additional information ?

Page 78: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Why should I do this?

Would this give me additional information ?

In this example with an 8 element framework and almost no distance variation between the TF sites, you will find exactly 1 match in over 56.000 human promoters: the input gene.How to use this approach with less selective frameworks for identification of similarly organized promoters?

I'll show you later…

Phylogenetically conserved frameworks

Page 79: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Fine! I think I have seen now two strategies. You mentioned three?

Yes. The third is knowledge driven and bases on a combination of literature data mining, sequence analysis and pathway/network analysis. For this you need first to download and install the Java client of BiblioSpherePE

Knowledge based analysis

Page 80: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Page 81: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

For more detailed introduction to BiblioSpherePE please have a look at

http://www.genomatix.de/products/BiblioSphere/BiblioSpherePE5.html

Page 82: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

...un-tick this box...We are interested in the full network

around our gene, not only the connected transcription factors

HMGCS1

Knowledge based analysis

Choose "single gene" here...

Page 83: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Page 84: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Here you have a list of all other genes, being connected to your input gene by at least one co-citation in entire PubMed on

abstract level

Knowledge based analysis

Click around,and see what happens !

This sets the context sensitive filter stringency. The most stringent including computer based semantic analysis is an ordered Gene1 – function word – Gene2 level (B3).(B4) shows expert curated gene-gene relationships only. Expert knowledge is derived by different sources, like Genomatix experts, Molecular Connection´s NetPro data base, STKE, etc...

Page 85: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

This filters the co-citation frequencyI have intentionally chosen an example with no expert curation available, since I want to demonstrate how to generate new knowledge!

Knowledge based analysis

Page 86: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Here you see the network around HGMCS1, all other genes connected on GFG level

Page 87: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Here connected transcription factors only on GFG level..

Page 88: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Now all connected transcription factors..

Page 89: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

A connection line between two genes means that there is a bibliographic connection on abstract level (BO)...

Page 90: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

"Mouse over" and clicking gives you more information...

Page 91: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

The green indicates that there is a binding site for SREBF1 (V$SREB) in at least one of the promoters of HMGCS1

Page 92: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

There is more encoded in the connection lines...

Page 93: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

The little symbols give you some information about the gene and its association with pathways

Knowledge based analysis

Page 94: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Some more helpful options from this page...

The tagged text tells us that the TF SREBF1 is

involved in regulation of HMGCS1

Page 95: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

You can get all info about any gene you click up

there...

over here...

This you know already...

Page 96: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

..as well as this.

Page 97: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

..as well as this.

Hey, hey hey ! Stop it !

I want to know about the regulation of my gene, not to play around with your Biblio...thing!

Page 98: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Hey, hey hey ! Stop it !

I want to know about the regulation of my gene, not to play around with your Biblio...thing!

BiblioSphere PathwayEdition !We already found TFs of interest, known to be involved in regulation of our gene.Now let´s see the biological environment of our gene and find a group of related genes which might share some regulatory motifs.Let´s go back and display all genes contained in this network...

Page 99: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Let´s load the GO-Filter"biological process"...

Page 100: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Here you see the tree for

the selected filter. Expand and collapse by clicking on the +/-

Go to the table view by

this tab...

Page 101: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

The Z-Score gives you a measure whether certain categories are significantly over- or under-represented by the displayed gene set.

Top scoring is sterol and cholesterol metabolism...

Everything above 3 is statistically significant!

Clicking here opens the tree on the left and highlights the category as well as the resp. genes in the pathway view.

Page 102: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

This finally applies the filter to your gene set.

Superimpose as many filters as you´d like !

Page 103: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

We see two TFs in here,SREBF1 andSREBF2,both Sterol Regulatory ElementBinding Protein factors.

The"redraw"button

Double-click on SREBF1in order to see all connectionsto that TF

Page 104: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

Anothertableview...

Page 105: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

...the colors encode for...

Highlight those genes with your mouse, andcopy them...

Page 106: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Knowledge based analysis

They all are connected with my original gene in PubMed

Now we have expanded our single input gene with a set of seven additional genes! And we know already quite a lot about them!

All genes, with very high high statistical significance,belong to the GO-category "Cholesterol Metabolic Process"

SREB transcription factors seem to play a rolein the regulation of those genes

Now lets check whether the promoters of thosegenes share a complex framework.

For such we first need to export those genesinto GenomatixSuite´s Gene2Promoter

Page 107: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Oh my god... more...

Where do I find this now ?

Back to sequence level

Relax ! It´s easy and not far away...

Page 108: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Paste here the gene symbols which we just copied inBiblioSpherePEDon´t forget this !

Otherwise you will be asked for all findings in all organisms.

Back to sequence level

APOA1, LDLR, SREBF2, VLDLR, FDFT1, APOA1, LDLR, SREBF2, VLDLR, FDFT1, FDPS, MVK, HMGCS1FDPS, MVK, HMGCS1

Page 109: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

Page 110: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

You are right! It pretty much is the same display as the comparative genomics page which we have generated earlier.

The difference in this case is that we now compare promoters of different genes within one organism…

Hey stop !

Haven´t I seen this before ?

Back to sequence level

Page 111: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

Eight lociwith 26 different unique promoters !

9.216 combinations possible9.216 combinations possiblefor exhaustive analysis!for exhaustive analysis!

Combinatorial explosion !Combinatorial explosion !

We have to find a way to circumvent this

Since we are concentrating on SREB TF-sites, let´s concentrate on those promoters which contain an V$SREB binding site.

How should I know which ones?

How do I do this ?

Very easy!Just scroll down to the bottom of the page...

Page 112: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

Select the desired TF-matrix family here

Page 113: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

...and all relevant promoters are checked already for you

Now we have reduced to 12 different promoters from 8 different loci, each containing at least one SREB site.

Page 114: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Scroll to the bottom of the Gene2Promoter result page...

Back to sequence level

We have done this before...

Page 115: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

You see? Now we have tolerable combinatorics and can perform an exhaustive promoter analysis.

Back to sequence level

Page 116: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Remember? We have been here before, too...

Back to sequence level

...but now we choose V$SREB as a mandatory element for our framework.

Hint: you can select multiple elements by holding the "Ctrl" key while clicking.

...and with these parameters you have to play around a little bit. Start at default.Gradually relax stringency.Go down in Quorum Constraint step by step,or allow for higher distance variance(e.g. 20, 30, 40, 50, usw...)The lower the distance variance andthe more elements per model, the higher is the resulting model selectivity.

Page 117: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

For example, at quorum of 30%, allowed distance range of 5 to 200 bp,distance variance of 50 bp maximum elements allowed: 10we find quite a lot of frameworks in the different promoter combinations.

There are frameworks with 6 elements! This is quite significant and expected to be extremely selective.

Tick the boxes of the models for subsequent database search for other promoters with similar organization.With 6 elements I expect to find the 3 genes from which this models were derived only: SREBF2, HMGCS1, and MVK

Page 118: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

Scroll all the way down...

This list is quite interesting! Here we have the differents TF sites in this set of frameworks.This list represents those TFs which we should concentrate on, when analyzing the regulation of the original input gene. It is pretty comparable to the list from our phylogenetic approach before.

There is now good evidence that those factors play a role in regulation in the biological context of cholesterol metabolism.

Now lets see how selective this model is...

Page 119: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

It is just one click away...

Back to sequence level

Page 120: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

This should look familiar to you !

But now we are going for the database section...

Back to sequence level

Unless you have a good reason to do so, always go for the database of promoters of annotated genes. This allows for GO-group Z-scoring of the database hits later on...

Page 121: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

This is a termination parameter. If this number of hits is reached before the end of the database, the search is terminated

Back to sequence level

Careful!

Some browsers crash with too many hits to display in HTML !(>10.000)

A database search usually takes several minutes. In order to avoid a server time-out go for the e-mail option. You´ll receive a mail with a direct link to your result file( it will be kept in your "Results Directory", too)

Page 122: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Eight matches!

Back to sequence level

Wow !

In four sequences.Each model matches exactly once per sequence...

...out of a total of 56.193 different promoters

The three genes of our

"training set"...

Page 123: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

...plus one additional "new" gene!This one was not in our input list and is identified only by

common promoter organization!

Page 124: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

Those four genes now are extremely likely to share common regulation in the given biological context!

The TFs in the framework now are the top candidates for further inspection.

Page 125: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

Back to sequence level

Those four genes now are extremely likely to share common regulation in the given biological context!

The TFs in the framework now are the top candidates for further inspection.

STOP !!First I had too many matches in

MatInspector,now there are too many slides !!

Page 126: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

New Knowledge

I am terribly sorry for that! However, eukaryotic transcriptional regulation is pretty complex.

Our group of researchers works in this field since more than two decades..

As you have seen, our tools - though pretty easy to use - require some explanations and sometimes a slightly different mind-setting, going beyond looking at single, isolated TF binding sites.

I hope I was able to show you some basic strategies to follow.

Nevertheless, lets have a final look at the additional gene which we have found with the database search in our example...

Page 127: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

New Knowledge

Page 128: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

New Knowledge

Page 129: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

New Knowledge

MMAB is a transferase involved in vitamin B(12) activation and linked

to a disease:methylmalonyl aciduria

Page 130: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

New Knowledge

Feeding all 4 genes from ModelInspector into BiblioSpherePEshows that they are all connected

plus...

Page 131: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

In our example, we started with a single gene ( HMGCS1),ElDorado

put it into biological context in and concentrated on an potential regulator ( SREB),BiblioSpherePE

identified common promoter organization (TF-Framework) GEMS Launcher , FrameWorker

searched for additional genes with similar promoter organization andGEMS Launcher , ModelInspector

put the genes back into biological context.BiblioSpherePE

Literature confirmed that we indeed found a co-regulated network and identified the molecular basis for such.

This could NEVER be achieved by statistical analysis of isolated TFBS

Page 132: Too many matches…

© 2007 Genomatix Software GmbHhttp://www.genomatix.de - http://www.genomatix-software.com

.

There is so much more in GenomatixSuite PE

I did neither say a word to matrix generation, nor to direct experimental planning for knock-out/knock-in experiments with SequenceShaper Expand the hit-list by shortening the framework, etc... etc...

Get in touch with us via [email protected] we will give you a tour through the entire system at a web-meeting.

Some informative links:http://www.genomatix.de/company/publications1.htmlhttp://www.genomatix.de/training/tasks.htmlhttp://www.genomatix.de/download/download4.htmlhttp://www.genomatix.de/cgi-bin/UMapps/register.pl