113
Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany www. biobase .de

Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Embed Size (px)

Citation preview

Page 1: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Regulatory Sequences

(Basics) 

Alexander Kel 

Senior Vice President of Genome Informatics,BIOBASE GmbH, Halchtersche Strasse 33D-38304 WolfenbuettelGermany www.biobase.de 

 

Page 2: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSCompel

TRANSFAC

TRANSPATH

Patho DBS/MARt DB

- mechanistic- semantic

Match Patch

Catch

Pathway builder Array analyser

Cytomer TRANSGenome TRANSPLORER

CMFinder

Page 3: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

BIOBASE customers*

* not complete

TRANSFACSyngentaCeleraMonsantoPfizer Merck Sharp & DomeAmgenTakedaNovartisGlaxoSmithKline

TRANSPATHVertex

BothAventis Eli LillySchering PloughHoffmann La RocheAkzo Nobel

More than 200 academic labs including:

Harvard Stanford Tokyo University Riken Labs Max Planck

More than 7000 registered users on our portal

gene-regulation.com

Page 4: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Same blocks - different structures

LEGO system

Page 5: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Concepts of gene regulation

Page 6: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

DNA

RNA

protein

transcription

translation

amplification, methylation,chromatin structure

splicing, degradation

modification, degradation

information carrier 1

transformation

carrier organization

information carrier 2

Page 7: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFACGene structure

ContigGene

Splice Variants mRNA

Regulatory Elements

CDS5’-UTR 3’-UTR

5‘

Splicing

3‘

Transcription

primarytranscript

altern.exon

Page 8: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC

ContigGene

Splice Variants mRNA

Regulatory Elements

CDS5’-UTR 3’-UTR

5‘

Splicing

3‘

Transcription

primarytranscript

altern.exon

Gene structure

Page 9: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

promoterenhancer 1enhancer 2

TSS

TATAbox

initiatorInr

box Abox Bbox Cbox A‘

compositeelement

box E box Dbox D‘box Fbox Gbox A‘‘

General schema of the modular hierarchical structure of transcription regulatory regions of

eukaryotic genes.

Page 10: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

…cis

trans

Page 11: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Human genes Sequences and positions of AP-1 binding sites glutathione P-

transferase

enhancer at -2500

hemoglobin, epsilon

-80 н.п.

Akt-2

-100 н.п.

IFN-

-89 н.п.

Apo АII

-792 н.п.

Melanotransferin

-2013 н.п.

Collagenase

-72 н.п.

proto-oncogene

c-myc

-335 н.п.

porphobilinogen deaminase

-162 н.п.

GM-CSF

enhancer at -3500

TGAСTTT

TGACATC

TGTCACC

TGACTCA

TGAGTCA

TGAGTCA

TGATTTA

TGACTCA

TGACTCA

Page 12: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

What is a transcription factor?

A transcription factor is a protein that regulates transcription

after nuclear translocation

by specific interaction with DNA

or by stoichiometric interaction with a protein that can be assembled

into a sequence-specific DNA-protein complex.

Page 13: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Transcription factors

Sequence-specific DNA binding

Non-DNA binding

TF1 TF2 TF3 TF4

adapter

Co-activator

HAT

DNA

Layer I

Layer III

Layer II

Page 14: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Structure of transcription factors

USF-1, dimer

Page 15: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

DNA binding domain

Activation domain

oligomerization domain

Ligand- binding domain

Protein-protein interaction domain

Structure of transcription factors

Page 16: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany
Page 17: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

N Gene Schema and positions of a CE

TRANSCompel accession number

1.

Scavenger receptor, Homo sapiens

Enhancer –4500/-4100

C00080

2.

GM-CSF, Mus musculus

-53 -40 : :

C00081

3.

Collagenase, Homo sapiens

-89 -82 -72 -66 : : : :

C00083

4.

IgH , Mus musculus

Enhancer at 3’ flank

C00133

5.

Interleukin 2, Homo sapiens

-283 -268 : :

C00109

6.

Interleukin 2, Homo sapiens

-167 -142 : :

C00165

7.

Интерлейкин 2, Mus musculus

-167 -142 : :

C00158

8.

IgH, Homo sapiens

C00173

9.

Сывороточный амилоид А1, Rattus norvegicus

-117 -73 : :

С00101

10.

IRF-1, Mus musculus

-123 -113 -49 -40 : : : :

C00192

AP-1 Ets

AP-1 Ets

AP-1 Ets

AP-1 Ets

AP-1 NFAT

AP-1 NF-B

AP-1 Oct-2

Ets CBF

NF-B C/EBP

NF-B STAT-1

Page 18: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Ternary complex NFATp - AP1 - DNA

Page 19: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Synergistic activation of transcription

Low level of transcription

Low level of transcription

F1

F1

F1

F2

F2

F2

Composite elements

Minimal functional units where both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene expressionand provide cross-coupling of different signal transduction pathways.

Page 20: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

M e m b ran e re ce p tor

S rc

S H 3

S H 2 R a s

R a s

G D P

G T P

A d aptorsP L C

P I3 -K

Phospho ry lation

IP 3

C a 2+

C a 2+C a2+

Ca2+ dependent cana l

Calc ineurin

E R K

E R K

JN K

JN K

P 3 8M A P K

P 3 8M A P K

N FAT p N FAT p

NFATp

P

P Pc-F o s c-F o s

с-F os

c-Ju n

c-Jun

c-Ju n

c-Ju n

AT F -2 AT F -2

AT F -2

IL -2

P K B /A k t

C om posite e lem ent

cytoplasm

Nucleus

Integration of signals. Cross-coupling of signal transduction pathways

Page 21: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

S S

F F

S S

F F

1 1

11

2 2

22

1)Cooperative binding to DNA and ternary complex formation

SS

F

1 2

2

3)

F1

Sim ultaneous interaction of activation domains w ith the com ponents of the basal complex

Mechanisms of functioning of synergistic composite elements

S S

F F

S S

F F

1 1

11

2 2

22

2)A new protein surface for DNA recognition could be formed

Page 22: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

S

F

S

F

1

1

2

2

4) Form ing a new protein surface for in teraction w ith the basal complex

Mechanisms of functioning of synergistic composite elements

F2F1

s1 s2

F1F2

5)Relief of autoinhibition as a result of protein-protein interactions

Page 23: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

7)

F1

F2

DNA wrapping around a nucleosome allows transcription factors to in teract

SS 1 2

2

8)

F

HAT com plex

F1

Recruitm ent of a HAT com plex by one of the transcription factors

Mechanisms of functioning of synergistic composite elements

S

SF

F

2

1

2

1

6)DNA bending by one of the transcription factors

Page 24: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

HDAC complex

1)HAT com plex

M utually exclusive binding of factor F1(activator) and F2 (repressor)

Mechanisms of functioning of antagonistic composite elements

Page 25: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

HDAC complex

HAT complex

2)

Binding of F2 (repressor) results in the conform ational changes of F1 (activator)

Mechanisms of functioning of antagonistic composite elements

Page 26: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

-180 -150-249

AP-1

NFAT

HMG Y

NFAT NFAT

AP-1STAT 6 NF-Y

-114 -88

AP-1

NFAT

HMG Y

-60

AP-1

NFAT

TATA

-28

c-MAF

CE CE

ST

Mouse IL-4 promoter

+1

Page 27: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

ST

GM-CSF Homo sapiens

+1

T-cell specific inducible enhancer at –3500 bp Promoter

TATTT

-54

AP-1

NFAT

CE

NF-Bp50/p65

-88

AP-1

NFAT

CE

AP-1

NFAT

CE

AP-1

NFAT

AP-1

NFAT

CE

NF-Bc-Rel/p65

HMG Y(I)

-114

CD28 response element

CBF CBF

Page 28: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W-binding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF-1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains (AD), which contact the RNA polymerase II basal transcription machinery.

Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66

Enhanceosome

Page 29: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany
Page 30: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

T F IIA

T F IIE

T F IIH

S ite -sp e c if ic T F

T F IIF

R N A p o l I I

T F IID

C o-a ctiva torp 300 /C B P

A cetila se P C A F

Closed nucleosomes

Acetilation

T F IIB

Acetylase

Acetylation

Page 31: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Scaffold/matrix attached regions (S/MARs) are regions of the DNA strand that are found the basis of chromatin loops. They anchor the DNA to the proteinaceous nuclear matrix.

Each loop is considered to be a functional domain.

S/MARs may act as border elements and thus, protect gene expression from position effects.

S/MARs genesresidual DNA

S/MARs

Page 32: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

enhancerpromoter

gene(transcribed region)SAR

SAR SAR SAR

LCR

LCR

open chromatin

compact chromatin

(regulated)

nuclear scaffold

J. Bode / E. Wingender 1993

S/MARs

Page 33: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Databases on gene regulation

Page 34: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

• Clear identification of where you are (which species and which protein).

• Tabular presentation of controlled-vocabulary terms.

• Annotations linked to PubMed references.

• Clear paths of navigation between protein reports, within a species and between species.

• Links to ‘public domain’ databases.

BKL: collected information is displayed in a ‘one page per protein’ format = Protein Reports

Page 35: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

N Databases containing gene regulation information

URL

1. EMBL Nucleotide sequence database http://www.ebi.ac.uk/embl.html 2. GeneBank http://www.ncbi.nlm.nih.gov/Web/Genbank/inde

x.html 3. SWISS-PROT http://www.expasy.ch 4. PIR: Protein Information Resourсe http://www-nbrf.georgetown.edu/pir 5. PDB http://www.pdb.bnl.gov/ 6. EPD - Eukaryotic promoter database

http://www.epd.isb-sib.ch

7. TRANSFAC http://transfac.gbf.de/TRANSFAC 8. TRRD http://www.bionet.nsc.ru/trrd/ 9. COMPEL http://compel.bionet.nsc.ru/ 10. TFD - Transcription factor database http://www.ifti.org/ 11. RegulonDB http://www.cifn.unam.mx/Computational_Biolog

y/regulondb 12. SCPD - The Promoter Database of

Saccharomyces cerevisiae http://cgsigma.cshl.org/jian/

13. Muscle-Specific Regulation of Transcription (A Catalogue of Regulatory Elements)

http://agave.humgen.upenn.edu/MTIR/HomePage.html

14. EpoDB. (Database of genes that relate to vertebrate red blood cells)

http://agave.hum-gen.upenn.edu/epodb/

15. GENET http://www.iephb.ru/~spirov/genet00.html

16. PlantCARE

http://sphinx.rug.ac.be:8080/PlantCARE/

17. PLACE http://www.dna.affrc.go.jp/htdocs/PLACE/ 18 DBTSS http://dbtss.hgc.jp/

Page 36: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Feature gene

Definition region of biological interest identified as a gene and for which a name has been assigned;

Optional Qualifiers

/allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /label= /map="text" /note="text" /product="text" /pseudo /phenotype="text" /standard_name="text" /usedin=accnum:feature_label

Comments the gene feature describes the interval of DNA that corresponds to a genetic trait or phenotype; the feature is, by definition, not strictly bound to it's positions at the ends; it is meant to represent a region where the gene is located.

Page 37: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Feature promoter

Definition region on a DNA molecule involved in RNA polymerase binding to initiate transcription;

Optional Qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /map="text" /note="text" /phenotype="text" /pseudo /standard_name="text" /usedin=accnum:feature_label

Molecule Scope DNA

or look for: (start of) mRNA, or precursor_RNA, or prim_transcript, or exon /number=1, ...

Page 38: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Feature misc_feature

Definition region of biological interest which cannot be described by any other feature key; a new or rare feature;

Optional Qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /map="text" /note="text" /number=unquoted /phenotype="text" /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label

Comments this key should not be used when the need is merely to mark a region in order to comment on it or to use it in another feature's location; use the '-' pseudo-key instead.

e.g.:FT misc_feature 4538FT /note="transcription initiation site« FT /gene="CDC6"

Page 39: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Feature enhancer

Definition a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter;

Optional Qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /label=feature_label /gene="text /map="text" /note="text" /standard_name="text" /usedin=accnum:feature_label

Organism Scope eukaryotes and eukaryotic viruses

Page 40: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Feature protein_bind

Definition non-covalent protein binding site on nucleic acid;

Mandatory Qualifiers /bound_moiety="text"

Optional Qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /map="text" /note="text" /standard_name="text" /usedin=accnum:feature_label

Comments note that RBS is used for ribosome binding sites.

Page 41: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Qualifier bound_moiety

Definition moiety bound

Value Format "text"

Example /bound_moiety="repressor"

Qualifier usedin

Definition indicates that the feature is used in a compound feature in another entry

Value Format Accession-number:feature-name or Database_name::Acc_number:feature_label

Example /usedin=X10087:proteinx

Comment database_name is an abbreviation for the name of the database in which the entry for the accession number can be found.

Page 43: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data libraryFH Key Location/QualifiersFHFT source 1..3204FT /db_xref="taxon:9606„FT /sequenced_mol="DNA„FT /organism="Homo sapiens„FT promoter 1..3201FT /note="melanocortin-1 receptor„FT /gene="MC1R„FT misc_signal 570..575FT /note="E-BOX„...FT TATA_signal 922..941FT protein_bind 1343..1350FT /evidence=EXPERIMENTALFT /bound_moiety="AP-1„...FT TATA_signal 1553..1559...FT misc_binding 1957..1964FT /evidence=EXPERIMENTALFT /bound_moiety="AP-2„FT misc_binding 2060..2067FT /evidence=EXPERIMENTALFT /bound_moiety="AP-2„

FT misc_binding 2069..2074FT /evidence=EXPERIMENTALFT /bound_moiety="SP-1„FT misc_binding 2603..2608FT /evidence=EXPERIMENTALFT /bound_moiety="SP-1"

Here:misc_signal "E-BOX" and TATA_signal are identified by homology and positional reasoning, AP-1 and AP-2 binding sites are suggested by homology, Sp1 sites are confirmed by gel shift analysis

Page 44: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Feature TATA_signal

Definition

TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T) [1,2];

Optional Qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /map="text" /note="text" /usedin=accnum:feature_label

Organism Scope eukaryotes and eukaryotic viruses

Molecule Scope DNA

References [1] Efstratiadis, A. et al. Cell 21, 653-668 (1980) [2] Corden, J., et al. "Promoter sequences of eukaryotic protein-encoding genes" Science 209, 1406-1414 (1980)

Page 45: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Feature CAAT_signal

Definition CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG(C or T)CAATCT [1,2].

Optional Qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /gene="text" /map="text" /note="text" /usedin=accnum:feature_label

Organism Scope eukaryotes and eukaryotic viruses

Molecule Scope DNA

References [1] Efstratiadis, A. et al. Cell 21, 653-668 (1980) [2] Nevins, J.R. "The pathway of eukaryotic mRNA formation" Ann Rev Biochem 52, 441-466 (1983)

Feature GC_signal

Definition GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG;

Optional Qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /map="text" /note="text" /usedin=accnum:feature_label

Page 46: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

Feature misc_signal

Definition

any region containing a signal controlling or altering gene function or expression that cannot be described by other signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin).

Optional Qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /map="text" /note="text" /phenotype="text" /standard_name="text" /usedin=accnum:feature_label

Page 47: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

EMBL data library

ID MMIGHALP standard; DNA; MUS; 17956 BP.XXAC X96607;...FT enhancer 4537..6107FT /note="locus control region„FT /note="alpha„FT /gene="IgH" 

ID SSLCREG1 standard; DNA; MAM; 1190 BP.XXAC X86793;XXSV X86793.1XXDT 10-MAY-1995 (Rel. 43, Created)DT 30-MAY-1995 (Rel. 43, Last updated, Version 3)XXDE S.scrofa locus control region (1190 bp)XXKW locus control region. ...FT source 1..1190FT /chromosome="9„FT /db_xref="taxon:9823„FT /organism="Sus scrofa„FT /clone_lib="clonetech„FT /map="p2.4„FT - 5..1190FT /note="locus control region (HSI)"

Page 48: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Eukaryotic Promoter Database (EPD)

Praz et al., Nucleic Acids Res. 30, 322-324 http://www.epd.isb-sib.ch

Page 49: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Eukaryotic Promoter Database (EPD)

Praz et al., Nucleic Acids Res. 30, 322-324 (2002) http://www.epd.isb-sib.ch

All EPD 4809

Vertebrate promoters 2540

Arthropode promoters 2000

Plant promoters 198

Viral 129

Nematode promoters 26

Page 50: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Eukaryotic Promoter Database (EPD)

ID HS_MYC_1 standard; single; VRT.XXAC EP11146;XXDT ??-APR-1987 (Rel. 11, created)DT 10-OCT-2001 (Rel. 69, Last annotation update).XX

DE c-myc (cellular homologue of myelocytomatosis virus 29 oncogene),DE promoter 1, MYC gene.OS Homo sapiens (human).XXHG Homology group 52; Mammalian c-myc proto-oncogene, promoter 1.AP Alternative promoter #1 of 2; exon 1; site 1.NP none.XXDR EPD; EP11148; HS_MYC_2; alternative promoter; [+162; +].DR EPDEX; HS_MYC.DR EMBL; X00364.2; HSMYCC; [-2327, 8669]. [ EMBL; GenBank; DDBJ ]...DR SWISS-PROT; P01106; MYC_HUMAN.DR TRANSFAC; R01157; HS$CMYC_01; [-49,-27]; by position....DR MIM; 190080.

Page 51: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Eukaryotic Promoter Database (EPD)...DR MIM; 190080.XXRN [1]RX MEDLINE; 84026482.RA Battey J., Moulding C., Taub R., Murphy W., Stewart T., Potter H.,RA Lenoir G., Leder P.;RT "The human c-myc oncogene: structural consequences ofRT translocation into the IgH locus in Burkitt lymphoma";RL Cell 34:779-787(1983)....XXME Nuclease protection [2].ME Nuclease protection; transfected or transformed cells [3].ME Primer extension [2].XXSE aatctccgcccaccggccctttataatgcgagggtctggacggctgaggACCCCCGAGCTXXTX 6. Vertebrate promotersTX 6.1. Chromosomal genesTX 6.1.5. Hormones, growth factors, regulatory proteinsTX 6.1.5.16. Various cellular protooncogenesXXKW Proto-oncogene, Nuclear protein, DNA-binding, Glycoprotein,KW Transcription regulation.XXFP Hs c-myc P1 :+S EM:X00364.2 1+ 2328; 11146.052 010*1XXDO Experimental evidence: 3,3#,6DO Expression/Regulation: +mitogen;+IL-2RF Cell34:779 PNAS80:6307 MCB7:1393 MCB7:2988//

Page 52: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

RegulonDB

Salgado et al., Nucleic Acids Res. 29, 72-74 (2001) http://www.cifn.unam.mx/Computational_Genomics/regulondb/

Page 53: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

SCPD

Zhu & Zhang, Bioinformatics 15, 607-611 (1999) http://cgsigma.cshl.org/jian/

Page 54: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

PlantCARE

Rombauts et al., Nucleic Acids Res. 27, 295-296 (1999) http://sphinx.rug.ac.be:8080/PlantCARE/cgi/index.html

Page 55: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany
Page 56: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

 Schematic representation of  "Oligo-capping" method

Page 57: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRRD

Kolchanov et al., Nucleic Acids Res. 30, 312-317 (2002) http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/

Page 58: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRRD

Kolchanov et al., Nucleic Acids Res. 30, 312-317 (2002) http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/

Page 59: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

GENE

SITE FACTOR

MATRIX

encodes for

contains

binds to and regulates interacts

is used to construct

is an attribute of

TRANSFAC®

a database on gene transcription regulation

Page 60: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

interactingfactor

coding regionregulatory region

gene

expression

SITE

FACTOR

GENE

SYNONYMS

FEATURES

CLASS SPECIES

MATRIX

SEQUENCE

METHODCELL Q

FUNCTIONAL ELEMENT

TRANSFAC structure

Page 61: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Manual annotation of the databases: input client

Page 62: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: FACTOR table, protein sequence

Page 63: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: FACTOR table, protein domains

Page 64: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: FACTOR table, structural and functional features

Page 65: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: FACTOR table, links to other databases

Page 66: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: classification of transcription factors

Page 67: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: CLASS table

Page 68: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC 8.1 (2004-03-31): number of factor entries for different species

human

mouse

rat

other vertebrates

fruit fly

plants

Fungi

Other

0

200

400

600

800

1000

1200

1400

Page 69: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

0

100

200

300

400

500

600

700

800

TRANSFAC 8.1 (2004-03-31): distribution of experimentally known TFBS in 5‘ regions of genes.

Page 70: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: FACTOR table, protein-DNA and protein-protein interactions

Page 71: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: MATRIX table

Page 72: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC® : accompanying tools

PatchTM- pattern search MatchTM- PWM-based search

gATTGGCGCGAAGtttt

aCAGGGCGCCAAAcgcg

aTTTCGCGCCAAActtg

aTTTCGCGCCAAActtg

aTTTCGCGCCAAActtg

GGCTGCGGCCAAAtctcATCTCCCGCCAGGtcagaGTTCGCGGGCAAatgc

cTTCGGCGCGCGGtgtt

tTTTCGCGCCAAAgtca

tTTTGCCGCGAAAagac

gATTGGCGCGAAGtttt

aCAGGGCGCCAAAcgcg

aTTTCGCGCCAAActtg

aTTTCGCGCCAAActtg

aTTTCGCGCCAAActtg

GGCTGCGGCCAAAtctcATCTCCCGCCAGGtcagaGTTCGCGGGCAAatgc

cTTCGGCGCGCGGtgtt

tTTTCGCGCCAAAgtca

tTTTGCCGCGAAAagac

q1 q2

Page 73: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TM

Page 74: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Selection of DNA binding sites by regulatory proteins

Statistical-mechanical theory

O.G. Berg and P.H von Hippel

Mutational drift

Match

Mismatch

Page 75: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

1) Binding affinity of protein to DNA in some useful range2) Number of sequences is large. 3) All possible sequences are equiprobable4) - express the decrease in binding energy when cognate base pare is replaced by B5) Individual base-pare contributions are independent and therefore additive The loss in the binding affinity in one position may be gained in the other position.

1 2 ... l ... s

A 0.5 0.9

T 0.0 0.1

G 1.2 0.0

C 0.1 0.8

00 l

lB

lB

Page 76: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Sites have binding affinity in a limited range E around a requred level E

E

In such set of sitesthe local contributionfrom every positionsmust sum to E

lB

What is the frequencywith wich certain base pair Bapeares at a certain positionin a site?

lBf

l

The same question is askeb in statistical mechanics:

S independent particles in a systemand a given total energy E.What a probability to that the particlelB will have the energy ?

Page 77: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

lB

leEf qlB

41)(

- is determined by the density of potential sites, i.e. by the numberof possible sequence combinations that have the required descrimination energy E

)ln( 0obs

lBobs

llB ffFor any sequence X of the length s the actualdiscrimination energy:

s

l

obslB

obsl

s

llB ll

ffXE1

01

1

)ln()(

12

n

Page 78: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Small-sample effect

4

1

N

nf lB

lB

1

1ln 01

lB

llB n

n

Page 79: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Problems:

1. Small sets of sites2. Homology between sites3. Specific function of nucleotides in certain positions4. Correlations between positions (not additive effect)

Page 80: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

L

ii

L

ii

L

ibi fiIfiIfiIq

i1

max

1

min

1, )()()(

},,,{

,, ,...,2,1),4ln()(CGTAB

BiBi LiffiI

with: bi, nucleotide b found in the i-th position of test sequence,fbi, frequency of nucleotide b in the i-th position of the aligned training sequences,fi

min, minimum frequency in position i,fi

max, maximum frequency in position i,and

TFS identification

Page 81: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Calculating the Ci-values

gapTGCABi BiPBiPiC

,,,,

5ln,ln,5ln

100

Position1 2 3 4 5 6 7 8 9 10 11

A 1 2 0 0 4 0 0 1 5 2 2C 1 2 0 0 1 3 0 4 0 0 2G 1 1 0 5 0 2 0 0 0 3 1T 1 0 5 0 0 0 5 0 0 0 0gap 1 0 0 0 0 0 0 0 0 0 0

P (A) 0.2 0.4 0 0 0.8 0 0 0.2 1 0.4 0.4P (C) 0.2 0.4 0 0 0.2 0.6 0 0.8 0 0 0.4P (G) 0.2 0.2 0 1 0 0.4 0 0 0 0.6 0.2P (T) 0.2 0 1 0 0 0 1 0 0 0 0P (gap) 0.2 0 0 0 0 0 0 0 0 0 0

Ci (A) -0.32 -0.37 0 0 -0.18 0 0 0 0 -0.37 -0.37

Ci (C) -0.32 -0.37 0 0 -0.32 -0.31 0 0 0 0 -0.37

Ci (G) -0.32 -0.32 0 0 0 -0.37 0 0 0 -0.31 -0.32

Ci (T) -0.32 0 0 0 0 0 0 0 0 0 0

Ci (gap) -0.32 0 0 0 0 0 0 0 0 0 0P(B)*lnP(B)+ln(5) 0.00 0.55 1.61 1.61 1.11 0.94 1.61 1.11 1.61 0.94 0.55

Ci 0 34 100 100 69 58 100 69 100 58 34

Page 82: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Position 1 2 3 4 5 6 7 8 9 10 11 A 1 2 0 0 4 0 0 1 5 2 2 C 1 2 0 0 1 3 0 4 0 0 2 G 1 1 0 5 0 2 0 0 0 3 1 T 1 0 5 0 0 0 5 0 0 0 0 - 1 0 0 0 0 0 0 0 0 0 0

Ci 0 34 100 100 69 58 100 69 100 58 34

core

To make it fast

Preselection with the core:

Scoring of the matchScoring of the match

T G A C T

Page 83: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: MatchTM tool

Page 84: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSFAC: MatchTM output

Page 85: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

0

10

20

30

40

50

60

70

80

90

100

0,75 0,8 0,85 0,9 0,95 1

undeprediction error

overprediction error

error sum

Selection of optimal cut-offs

minFN minFPminSUM

Page 86: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Feature table of Genebank entry Corresponding hits found by Match

Matrix-Identifier

Position

MatrixSimilarity

CoreSimilarity Sequence Factor Name

Feature table of Genebank entry Corresponding hits found by Match

Matrix-IdentifierMatrix-Identifier

PositionPosition

MatrixSimilarityMatrixSimilarity

CoreSimilarityCoreSimilarity SequenceSequence Factor NameFactor Name

Example of a search using cut-offs to minimize false negative matches

In this example we searched the homo sapiens angiotensinogen gene (5`region andexon1) for all bindings sites listed in the features of its Genebank entry. For that search we usedcut-offs to minimize false negative matches as these cut-offs are recommended toreduce the probability that Match misses a potential binding site. Corresponding hits for all of the entries in the feature table, which concern a binding site, could be found in the Match output.

Page 87: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC® Professional database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or subsets of matrices from the libraries.

Page 88: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany
Page 89: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Search for most probable binding sites regulating gene expression

Page 90: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Search for binding sites coinciding with SNPs

Page 91: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

•pairs of closely situated binding sites for TFs;

•cooperative functioning of transcription factors;

•direct protein-protein interactions;

•combinatorial regulation of gene transcription.

Key topics

TRANSCompel®

a database on composite regulatory elements

Page 92: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

individual entry

Description of an evidence (experiment, cell type, two individual interactions)

Link to the TRANSFAC GENE

table

Link to the EMBL

Link to the TRANSFAC

FACTOR table

Page 93: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

N Gene Scheme of CE 1. IgH , Mus

musculus 

 

 

2. IL-2, Homo sapiens  

-283 -268 : :

 

3.  

IL-2, Homo sapiens  

-167 -142 : :

 

 5.

4. Il-2, Mus musculus   

-167 -142 : :

 

IgH ,Homo sapiens 

 

6. 

Serum amyloid А1, Rattus norv.

-117 -73 : :

 

7. IRF-1, Mus musculus  

-123 -113 -49 -40 : : : :

AP-1Ets

AP-1NFAT

AP-1NF-B

Ets CBF

AP-1 Oct-2

NF-BC/EBP

NF-BSTAT-1

TRANSCompel®

combinatorial regulation, more than 360 CEs

Page 94: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

TRANSCompel®

functional classification of the composite elements

inducible/inducible - Ca2+ and PKC response NFAT / AP1

- IFN-gamma and TNF-alpha response NF-kappaB / IRF

inducible/constitutive- cholesterol level response SREBP / Sp1

- acute-phase response STAT-3 / Sp1

inducible/tissue-restricted- TGF-beta response in B-cells SMAD / AML

tissue-restricted/tissue-restricted- pancreas islet beta-cells (insulin-producing) HNF3 / BETA2

- pituitary gonadotropes Ptx1 / SF-1

tissue-restricted/ubiquitous- macrophages PU.1 / Sp1

Page 95: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Tissue-specific 

32        

Inducible 

44 119      

Cell-cycle dependent

  1 2    

Dev. stage-dependent

  3    

Ubiquitous constitutive

39 60 2 12

F1 F2

Tissue-specific

Indu-cible 

Cell-cycle dep.

Dev. stage-dependent

Ubiquit. constitut.

2

Inducible/inducible

19 CE‘s ETS / AP-1 providing cross-coupling of Ras/Raf- and PKC-dependent signalling pathways;

15 CE‘s NFATp / AP-1 providing cross-coupling of Ca2+ - and PKC-dependent signalling pathways;

14 CE‘s NF-B / C/EBP NF-B is inducible by IL-1 and TNF-; C/EBP is inducible by IL-6.

Page 96: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Tissue-specific 

32        

Inducible 

44 119      

Cell-cycle dependent

  1 2    

Dev. stage-dependent

  3    

Ubiquitous constitutive

39 60 2 12

F1 F2

Tissue-specific

Indu-cible 

Cell-cycle dep.

Dev. stage-dependent

Ubiquit. constitut.

2

Inducible/constitutive

9 CE‘s ETS / Sp1 ETS factors are inducible through Ras/Raf- dependent signalling pathway;

5 CE‘s Smad / TEF3 Smads are inducible by TGF- signalling.

Page 97: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Tissue-specific 

32        

Inducible 

44 119      

Cell-cycle dependent

  1 2    

Dev. stage-dependent

  3    

Ubiquitous constitutive

39 60 2 12

F1 F2

Tissue-specific

Indu-cible 

Cell-cycle dep.

Dev. stage-dependent

Ubiquit. constitut.

2

Inducible/tissue-restricted

CE‘s Pit-1 / AP-1 Pit1 is pituitary-restricted transcription factor whereas AP-1 and Ets are ubiquitous inducible factors;

Page 98: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

acaggaTGTCCATATTAGGacatctgcg

YY-1

SRF

human c-fosSRF mediates the rapid, transient induction of the c-fos protooncogen by serum growth factors.

YY1 diminishes both basal and serum-induced expression of the c-fos.

TRANSCompel®

antagonistic type of CEs

Page 99: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

GGTGGGcctccggagtgaccaatgagtgTGGACAGATGCCA

Sp1 NF-Y

NF-1

COMPEL: C00006Chicken embryonic -globin gene

Sp1 cooperatively with NF-Y activates transcription in primitive erythroid cells

NF-1 represses transcription in adult cells

acaggaTGTCCATATTAGGacatctgcg

YY-1

SRF

COMPEL: C00009Human c-fos protooncogene

SRF mediates the rapid, transient induction of the c-fos protooncogen by serum growth factors.

YY1 diminishes both basal and serum-induced expression

of the c-fos.

COMPEL: C00054Rat serum amyloid A1 gene

TGGTAGTCTTGCACAGGAAATGACATggtGGGACTTTCCCcaggg

C/EBP NF-B

YY-1

C/EBP and NF-B synergistically activate transcription in liver cellsduring acute phase response

YY1 represses inducible transcription of this

gene.

Antagonistic composite elements

Page 100: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

pattern-based search for potential composite elements in DNA sequences

• All CE‘s are used as individual searching patterns;

• Several parameters are available restricting the search:

mismatches in the site 1 and site 2,

distance between two sites,

composite score

Catch®

Page 101: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

CCACCCATTTCCTC

ACAGGAATgacctggtgcCTCGCCC

TTCCTCctgtgccttag...ctgtttttctaaCCGCCC

GAAGGGCGGGGAcagtt...aagcaaaaAAAGGGAACTGA

AAAGGGAACTGAgtggctgcgaaAGGGTGGGG

GGAAgcaaccagCCCACCA

CCGGAAGCaaccagCCCACC

aaAAGGAAGTGGGCGTGGTttaaag

ACTTCCTC...GGCTCCTCCTCC

Set of CEs

M1

qM1 > n1

M2

qM2 > n2

1. matrix rule

2. distance rule

3. orientation rule

Search for the potential CEs

rules

TRANSCompel®

CEs of similar structure can be used to construct models

Page 102: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Application of CE models for promoter analysis

Four CE types are over-represented in promoters in comparison with several biological sequences tested.

0

20

40

60

80

100

120

140

160

180

Myb/Aml NF-kB/Sp1 Ets/Sp1 E2F/Sp1

in 1

00

00

0 b

p promoters -350/+50

exons_3d

h_chr_15(whole)

h_chr_15_Alu

h_chr_15_L1

h_chr_15_L2

Page 103: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Gene expression profiling

Page 104: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

         GENE ONTOLOGYTM

Page 105: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Genome -ReferenceSequence

Gene

Transcripts

Splicing variants

Polypeptides

Compositeelements

Regulatoryregions

TF bindingsites

Repeats

S/MARs

TRANSGENOME provides the hierarchical structure of the most important elements of a genome in coding regions as well as in regulatory regions. This structure provides the possibility to have a unique reference sequence and to store the location of all gene regulatory and structural elements.

TRANSGENOME

Page 106: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Gene

pre-mRNAs(from RefSeq)

spliced mRNAs

CDS

site

5’UTR3’UTR

RefSeq derived potential starts oftranscription (first exons)

TRANSFAC derived start of transcription(by relative site positions)

EPD derived start of transcription

DBTSS derived start of transcription

Page 107: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Bronchial tree and Intrapulmonary Airways

Lobar bronchus

Segmental bronchus

Bronchus

Bronchiolus

Terminal bronchiolus Alveolar sac

Respiratory bronchiolus

Pulmonary alveolus

Alveolar duct

Alveolar pore Alveolar

epithelium

Alveolar septa

PneumocytesCytomer/Content

Human body Lung Bronchial treeMain bronchus

Page 108: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

SpeciesID

Name

CellID

NameDescription

OrganID

Name Parent

SystemID

Name

HUBID

Cytomer_noOrgan_IDCell_ID

System_IDPeriod_ID

Species_ID

Stage2PeriodStage_IDPeriod_ID

CPID

TFaccCytomer_no

CaccCP

CNID

TFaccCytomer_no

CaccCN

Transfac

FactorID

Acc

PeriodIDT1T2

StageIDT1T2

description

CYTOMER structure

Page 109: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

human body abdominal regions acetabular artery acetabular ramus anastomatic vessel aortic plexus apocrine gland blood blood vessel body cavities bronchi cardiac thoracic nerve cartilage cartilaginous tissue of anular radial ligament central nervous system

brain brain stem meninges of brain mesencephalon

cerebral peduncle base of peduncle cerebral crus lateral groove of midbrain substantia nigra

compact part of substantia nigra lateral part of substantia nigra reticular part of substantia nigra retrorubral part of substantia nigra

tegmentum of midbrain of cerebral peduncle frenulum of superior medullary velum of midbrain interpeduncular fossa

TRANSGENOMEEST

UniGene

Gene expression group 1

Gene expressiongroup 2

Gene expressiongroup 3

CYTOMER® A database on gene expression sources

Page 110: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Spatio-temporal coordinates:CYTOMER

Conditional determinants:TRANSPATH

Factors controlling transcription:TRANSFAC

Expression space E

Expression space Eg of gene g

t(temporal, developmental axis)

(spatial axis:systems,

organs,cells)

x

c(conditional)

The gene expression space

Page 111: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Gene expression profiling

E1 E2 .. ES

g1 x1,1g2

: :gh

x1,2

x1,h x2,h

::

..

..

..

x2,2 xS,2

xS,1

xh,h

x2,1Expression patternof gene g1

Expression matrix:-rows representing genes-columns representing samples (various tissues, developmental stage,...)

Expression profileof state E1(e.g. in organ Oat stage t)

Page 112: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Gene

Expression state

Gene expression profiling

Page 113: Regulatory Sequences (Basics) Alexander Kel Senior Vice President of Genome Informatics, BIOBASE GmbH, Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Gene expression profiling