97
/00 © Burkhard Rost 1 title: Protein Prediction 2 (for Bioinformaticians) - Protein function: Infer function by motifs short title: pp2_introfunc3 lecture: Protein Prediction 2 - Protein function TUM wintersemester

Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

1

title: Protein Prediction 2 (for Bioinformaticians) - Protein function: Infer function by motifsshort title: pp2_introfunc3

lecture: Protein Prediction 2 - Protein function TUM wintersemester

Page 2: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

So far: Function introduction • Molecular biology is just at an exciting beginning • We can compute some aspects of molecular life • Most accurate inference of function: based on homology

Today • Motifs • Function by association

NEXT • “compute” enzyme function? • predict localization

2

Past - TOC today - Next

Page 3: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

I.2b Function Intro: Sequence motifs

3

Page 4: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

Motifs - intro

4

Page 5: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Full sequence (ADH1_human, 95 aa): MANEVIKCKAAVAWEAGKPLSIEEIEVAPPKAHEVRIKIIATAVCHTDAY

TLSGADPEGCFPVILGHEGAGIVESVGEGVTKLKAVWRMQILSKS

Motifs could be:MANEVIKCKAA

Or:MAN[ED]hh[KR]C[KR]

5

Sequence vs motif

Page 6: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

6

How can we use this concept 2 search?

?

Page 7: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

7

Resources for motifs/patterns

PROSITE:http://us.expasy.org/prosite/ [Hulo et al. Nucl. Acids. Res. 32:D134-D137(2004)]

PRINTS:

http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/[Attwood, Briefings in Bioinformatics, 3(3), 252-263 (2002)]

BLOCKS:

http://www.blocks.fhcrc.org/[Henikoff et al., Nucl. Acids Res. 28:228-230 (2000)]

Page 8: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

PROSITE

8

Page 9: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

1986 starts SWISS-PROT 1988 starts PROSITE 1993 starts ExPasy (with Ron Appel) 1998 SIB: Swiss Institute of Bioinformatics 2009 CALIPHO Computer and Laboratory Investigation of Proteins of Human Origin

9

Amos Bairoch

Amos Bairoch

Shapers and Shakers

Page 10: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

SwissProtProSiteExPasyCaliphoSIB - Swiss Inst Bioinformatics

papers: • >220 papers (Nov 2013) • 4 >1,000 citations (Nov 2013) • 70 over 100 (Nov 2013)

10

Amos BairochShapers and Shakers

Amos Bairoch

Page 11: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

SwissProtProSiteExPasyCaliphoSIB - Swiss Inst Bioinformatics

papers: • >220 papers (Nov 2013) • 4 >1,000 citations (Nov 2013) • 70 over 100 (Nov 2013) • H-index

11

Amos BairochShapers and Shakers

Amos Bairoch

What’s good?

Page 12: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

SwissProtProSiteExPasyCaliphoSIB - Swiss Inst Bioinformatics

papers: • >220 papers (Nov 2013) • 4 >1,000 citations (Nov 2013) • 70 over 100 (Nov 2013) • H-index 79 (ISI Nov 2013)

12

Amos BairochShapers and Shakers

Amos Bairoch

Page 13: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Manually align family + annotate motifs Use motifs for automatic alignment and annotation of unknown

13

Motifs and patterns

Search for the motif pattern in a new protein

Find a motif or a pattern in a functionally characterized family

Transfer function annotation

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 14: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

completeness:DB as many motifs as possible high specificity:no false positives at a level at which most are found documentation periodic reviewing

14

PROSITE: Concepts for DB

Page 15: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993

15

PROSITE history

Page 16: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993

16

PROSITE history

Search for the motif pattern in a new protein

Find a motif or a pattern in a functionally characterized family

Transfer function annotation

Page 17: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993Solution:GxxGxxG (membrane)[RK](2)-x-[ST] (phosphorylation)

17

PROSITE history

Search for the motif Find a motif or a pattern in a

Transfer function

Page 18: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

completeness:DB as many motifs as possible high specificity:no false positives at a level at which most are found documentation periodic reviewing

18

PROSITE: Concepts for DB

Search for the motif Find a motif or a pattern in a

Transfer function

Page 19: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993A Bairoch & P Bucher (1994) NAR 22:3583-9PROSITE: recent developments (profiles) A Bairoch, P Bucher & K Hofmann (1996) NAR 24:189-96repeated 1997, 1999 (Hofmann, Bucher, Falquet, Bairoch)L Falquet, M Pani, P Bucher, N Hulo, CJ Sigrist, K Hofmann, & A Bairoch (2002) NAR 30:235-8

19

PROSITE history

Philip Bucher

Page 20: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

CJ Sigrist, L Cerutti, N Hulo, A Gattiker, L Falquet, M Pagni, A Bairoch, P Bucher (2002) Brief Bioinform 3:265-74 N Hulo, CJ Sigrist, V Le Saux, PS Langendijk-Genevaux, L Bordoli, A Gattiker, E De Castro, P Bucher, A Bairoch (2004) NAR 32:D134-7 A Gattiker, E Gasteiger, A Bairoch (2002) Appl Bioinformatics 1:107-8ScanProsite: a reference implementation of a PROSITE scanning tool

20

PROSITE history

Page 21: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

A Bairoch (1991) NAR 19 Suppl: 2241-5, prev (1992) NAR 20 Suppl: 2013-8, x (1993) NAR 21: 3097-103, A Bairoch and P Bucher (1994) NAR 22: 3583-9, A Bairoch, P Bucher and K Hofmann (1996) NAR 24: 189-96, prev (1997) NAR 25: 217-21, K Hofmann, P Bucher, L Falquet and A Bairoch (1999) NAR 27: 215-9, L Falquet, M Pagni, P Bucher, N Hulo, CJ Sigrist, K Hofmann and A Bairoch (2002) NAR 30: 235-8, A Gattiker, E Gasteiger and A Bairoch (2002) Appl Bioinformatics 1: 107-8, CJ Sigrist, L Cerutti, N Hulo, A Gattiker, L Falquet, M Pagni, A Bairoch and P Bucher (2002) Brief Bioinform 3: 265-74, N Hulo, CJ Sigrist, V Le Saux, PS Langendijk-Genevaux, L Bordoli, A Gattiker, E De Castro, P Bucher and A Bairoch (2004) NAR 32: D134-7, CJ Sigrist, E De Castro, PS Langendijk-Genevaux, V Le Saux, A Bairoch and N Hulo (2005) Bioinformatics 21: 4060-6, E de Castro, CJ Sigrist, A Gattiker, V Bulliard, PS Langendijk-Genevaux, E Gasteiger, A Bairoch and N Hulo (2006) NAR 34: W362-5, N Hulo, A Bairoch, V Bulliard, L Cerutti, E De Castro, PS Langendijk-Genevaux, M Pagni and CJ Sigrist (2006) NAR 34: D227-30, N Hulo, A Bairoch, V Bulliard, L Cerutti, BA Cuche, E de Castro, C Lachaize, PS Langendijk-Genevaux and CJ Sigrist (2008) NAR 36: D245-9,

21

PROSITE - evolution of method

Page 22: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

22

PROSITE / ScanProsite

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

K Hofmann, P Bucher, L Falquet & A Bairoch (1999) Nucl Acids Res 27: 215-9N Hulo et al. (2004) Nucleic Acids Res 32: D134-7

Page 23: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

PRINTS

23

Page 24: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

University of Manchester (Faculty of Life Sciences & School of Computer Sciences) PRINTS: dignostic fingerprint database TK Attwood & ME Beck (1994) PRINTs-a protein motif fingerprint database

24

Terry K Attwood

Terry K Attwood

Page 25: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Motifs are stretches of evolutionary conserved fingerprints version 42.0 (Manchester Univ, Feb 2012) 2,156 FINGERPRINTS encoding 12,444 single motifs TK Attwood, P Bradley, DR Flower, A Gaulton, N Maudling, A Mitchell, G Moulton, A Nordle, K Paine, P Taylor, A Uddin, C Zygouri (2003) NAR:31, 400-2

25

PRINTS concept

Page 26: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

homeoboxThe homeobox is a 60-residue motif first identified in a number of Drosophila homeotic and segmentation proteins, but now known to be well-conserved in many other animals, including vertebrates [1-3]. Proteins containing homeobox domains are likely to play an important role in development - most are known to be sequence-specific DNA-binding transcription factors. The domain binds DNA through a helix-turn-helix (HTH) structure.

26

PRINTS: example

Page 27: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

BLOCKS

27

Page 28: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Fred Hutchinson Cancer Center, SeattleHHMI (Howard Hughes Medical Institute) papers:

• >300 papers (Nov 2011) • 3 >1,000 citations (end 2011) • 72 over 100 • H-index 83 (ISI Nov 2011) Paradigm changes

• gene in gene - in intron (1986) • histones NOT only in octamers (2004) • DNA-methylation in histones: H2.AZ in histone spool promotes

gene expression (2008): NOT DNA-methylation shuts off genes (important for cancer drug development)

28

Jorja & Steven HenikoffShapers and Shakers

Page 29: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

compile log-odd ratios

BLOSUMn=threshold at n% pairwise sequence identityS Henikoff & Jorja Henikoff (1992) PNAS 89:10915-9

29

BLOSUM

Steven Henikoff

Page 30: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

BLOcks of amino acid SUbstitution MatricesAlign only conserved regionsJG Henikoff and S Henikoff (1996) Meth Enzymology 266: 88-104

S Pietrokovski, JG Henikoff & S Henikoff (1996) NAR 24: 197-201

30

BLOSUM

Page 31: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

idea taken from multiple alignments

31

BLOCKS

Page 32: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

32

BLOCKS: length distribution

J Liu & B Rost (2003) Current Opinion in Chemical Biology 7, 5-11

Page 33: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

Pfam

33

Page 34: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

classify all proteins and RNA into families to better understand their function and evolution 1997 starts Pfam (Protein families) 2003 Rfam (RNA-families)

Citation giant: • 229 papers (Nov 2011) • 1 with >8,800 citations (Nov 2011) • 6 with >1,000 citations (11/2011) • 32 with > 100 citations (11/2011) • Hirsh index: 48

34

Alex BatemanShapers and Shakers

Page 35: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

EL Sonnhammer, SR Eddy, R Durbin (1997) Pfam: a comprehensive database of protein families based on seed alignments. Proteins 28:405-20 EL Sonnhammer, SR Eddy, E Birney, A Bateman, R Durbin (1998) NAR 26:320-2 A Bateman, E Birney, R Durbin, SR Eddy, RD Finn, EL Sonnhammer (1999) Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. NAR 27:260-2 SJ Sammut, RD Finn, A Bateman (2008) Pfam 10 years on: 10,000 families and still growing. Brief. Bioinform 9:210-9

35

Pfam: Protein families

Page 36: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

36

Pfam: how its done

manual alignment

Page 37: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

version/families/

37

Pfam - current stats

Page 38: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

38

Pfam-7TM

A Bateman, et al. (2004) Nucleic Acids Res 32: D138-41© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 39: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

39

Clusters & FamiliesDB/Method Version Latest

UpdateEntries Update URL (all begin with http://)

Short sequence motifsPROSITE 17.23 10/2002 1573 manual www.expasy.ch/prosite/Blocks+ 8/2001 8656 manual blocks.fhcrc.org/blocks/PRINTS 35.0 7/2002 1750 manual www.bioinf.man.ac.uk/dbbrowser/PRINTS/

Structural domain-like regions

Pfam-A 7.6 9/2002 4463 manual pfam.wustl.eduTIGRFAM 2.1 9/2002 1622 manual www.tigr.org/TIGRFAMs/SMART 3.4 10/2002 654 manual smart.embl-heidelberg.deSBASE 9.0 10/2002 483 semi-

manualhydra.icgeb.trieste.it/~kristian/SBASE/

DOMO 2.0 4/1998 automatic www.infobiogen.fr/services/domo/ProDom 2001.3 12/2001 automatic prodes.toulouse.inra.fr/prodom/doc/prodom.htmGeneRAGE automatic www.ebi.ac.uk/research/cgg/services/rage/TribeMCL automatic www.ebi.ac.uk/research/cgg/tribe/CHOP 10/2002 automatic cubic.bioc.columbia.edu/db/chop/

Integration

InterPro 5.2 9/2002 5875 N/A www.ebi.ac.uk/interpro/MetaFam 4.1 9/2002 N/A metafam.ahc.umn.edu

Clusters of proteins

CluSTr automatic www.ebi.ac.uk/clustr/SYSTERS 3.0 automatic systers.molgen.mpg.dePICASSO 0 3/1998 automatic systers.molgen.mpg.deProtoNet 1.4 9/2002 automatic www.protonet.cs.huji.ac.il/protonet/ProClust 1.0 automatic promoter.mi.uni-koeln.de/~proclust/

J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11

Page 40: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

40

Some overlap between databases

J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11

Page 41: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

41

… not everything that shines is copper

J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11

Page 42: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

localization motifs

42

Page 43: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

motif-based inference of localization

43

Page 44: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

Rajesh Nairnow: FDA, Washington

44

Rajesh Nair

Page 45: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

45

Similar proteins may differ in localization

R Nair & B Rost (2002) Protein Science 11: 2836-47

Page 46: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

46

Shuttle into the nucleus

CYTOPLASM

NUCLEUS

NLS M9

Transportin Importin

Nucleus

Cytoplasm

M Cokol, R Nair & B Rost (2000) EMBO Rep 1: 411-415

Page 47: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

47

Types of zip-codes

following: B Alberts, D Bray, J Lewis, M Raff, K Roberts, JD Watson: The Cell, Garland, 1994

Page 48: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

ONE in PROSITE bi-partite motif

Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E

Coverage F

PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %

NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %

Coverage

48

How many NLS motifs in databases?

Page 49: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

49

Experimental NLS: positive chargesNLS Protein Reference

RKRKK YstDNApolalpha Hsieh et al., 1998RKRRR Amida Irie et al., 2000KKKKRKREK LEF-1 Prieve et al., 1998KKKRRSREK TCF-1 Prieve et al.,. 1998RQARRNRRRRWR HIV-1 Rev Truant et al., 1999RRMKWKK PDX-1 Moede et al., 1999PKKKRKV SV40 LrgT Kalderon et al., 1984PRRRK SRY Sudbeck and Scherer, 1997GKKRSKA H2B Moreland et al., 1987KAKRQR v-Rel Gilmore and Temin, 1988RGRRRRQR Amida Irie et al., 2000PPVKRERTS RanBP3 Welch et al., 1999PYLNKRKGKP Pho4p Welch et al., 1999KRx{7,9}PQPKKKP p53-NLS1 Liang and Clarke, 1999KVTKRKHDNEGSGSKRPK Hum-Ku70 Koike et al., 1999RLKKLKCSKx{19}KTKR GAL4 Chan et al., 1998RKRIREDRKx{18}RKRKR TCPTP Chan et al., 1998RRERx{4}RPRKIPR BDV-P Schwemmle et al., 1999KKKKKEEEGEGKKK act/inh betaA Blauer et al., 1999PRPRKIPR BDV-P Shoya et al., 1998PPRIYPQLPSAPT BDV-P Shoya et al., 1998KDCVINKHHRNRCQYCRLQR TR2 Yu et al., 1998APKRKSGVSKC PolyomaVP1 Chang et al., 1992RKKRRQRRR HIV-1 Tat Truant et al., 1999MPKTRRRPRRSQRKRPPT Rex Palmeri and Malim, 1999KRPMNAFIVWSRDQRRK SRY Sudbeck and Scherer, 1997KRPMNAFMVWAQAARRK SOX9 Sudbeck and Scherer, 1997PPRKKRTVV NS5A Ide et al., 1996YKRPCKRSFIRFI DNAse EBV Liu et al., 1998LKDVRKRKLGPGH DNAse EBV Lyons et al., 1987KRPRP AdenovE1a Bouvier and Baldacci, 1995RRSMKRK hVDR Vihinen-Ranta et al., 1997PAKRARRGYK CPV capsid Kaneko et al., 1997RKCLQAGMNLEARKTKK hGlu.cort. Kaneko et al., 1997RRERNKMAAAKCRNRRR CFOS Kaneko et al., 1997KRMRNRIAASKCRKRKL CJUN Kaneko et al., 1997

Page 50: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

50

Experimental NLS: more complicated

NLS Protein Reference

CYGSKNTGAKKRKIDDA DNAhelicaseQ1 Miyamoto et al., 1997

[AKR]TPIQKHWRPTVLTEGPPVKIRIETGEWE[KA] ASVintegrase Kukolj G. 1998

GGGx{3}KNRRx{6}RGGRN Nab2 Truant et al., 1998

KRxxxxxxxxxKTKK THOV NP Weber et al., 1998

EYLSRKGKLEL VirD2-Nterm Tinland et al., 1992KRPACTLKPECVQQLLVCSQEAKK HCDA Somasekaram et al., 1999

RVHPYQR QKI-5 Wu et al., 1999HARNT Eguchi et al., 1997YNNQSSNFGPMKGGN M9 Bonifaci et al., 1997

SxGTKRSYxxM InfluenzaNP Wang et al., 1997TKRSxxxM InfluenzaNP Wang et al., 1997VNEAFETLKRC MyoD Vandromme et al., 1995

MNKIPIKDLLNPG Mat-alpha Hall et al., 1984

Page 51: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

51

In silico mutagenisis

Page 52: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

52

Increasing accuracy and coverage

Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E

Coverage F

PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %

NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %

Coverage

Page 53: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

53

Increasing accuracy and coverage

Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E

Coverage F

PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %

NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %

Coverage

Page 54: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

54

Types of zip-codes

Page 55: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

Sarah Gilman

55

Kaz Wrzeszczynski

Page 56: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

56

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

Page 57: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

57

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

Page 58: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

58

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

Page 59: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

59

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

Page 60: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

60

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

Page 61: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

61

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

Page 62: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Automate

Unify

Remote homologues

62

Open challenges - motifs and patterns

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 63: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Identify active site / functional element

Search for this structural pattern in a new protein

Transfer function annotation

S Jones & J Thornton (2004) Curr Opin Struc Biol 8:3-7

Manual identification of active site Automatic structural alignment?

63

Structural motifs

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 64: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Find

Search

Add biophysics of the site to the spatial search

64

Open challenges - structural motifs

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 65: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

Example 3: Voltage-gated

potassium channel

65

Page 66: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

66

Example: Voltage-gated potassium channel

V Ruta et al. & R MacKinnon (2003) Nature, 422:180-5

• Eukaryotic voltage-gated potassium channel (VG-K+) • Prokaryotic membrane proteins are easier to crystallize than eukaryotic ones

• find a prokaryotic VG-K+ having functional and structural features similar to the eukaryotic one

© Marco Punta

Page 67: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

67

Voltage-gated K+ channel: sequence

1MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY

GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD

TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY

ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS

IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA

CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI

LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG

SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN

FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG

LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ

QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV

The template: voltage gated potassium channel from Shaker

© Marco Punta

Page 68: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

68

Why called shaker?

??

???

Page 69: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

69

Why called shaker?

© Wikipedia

The shaker (Sh) gene, when mutated, causes a variety of atypical behaviors in the fruit fly .. Under ether anesthesia, the fly’s legs will shake … , it will exhibit aberrant movements. Sh-mutant flies have a shorter lifespan than regular flies; in their larvae, the repetitive firing of action potentials as well as prolonged exposure to neurotransmitters at neuromuscular junctions occurs.

Page 70: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

70

Voltage-gated K+ channel: search

PSI-BLAST: http://www.ncbi.nih.gov/BLAST/ © Marco Punta

Page 71: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

71

Voltage-gated K+ channel: alignment

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

the alignment

© Marco Punta

~ 30% PIDE over 80 aligned residues: enough?

Page 72: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

72

Voltage-gated K+ channel: filter

© Marco Punta

Page 73: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

73

Voltage-gated K+ channel: alignment

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

the alignment

© Marco Punta

~ 30% PIDE over 80 aligned residues: not quite enough to infer similarity in structure

Page 74: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

74

Voltage-gated K+ channel: alignment

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

the alignment

Target :

295

1

the entire sequence of the identified protein

MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT

MQLSGEYLVRLYLVDLILVIILWADYAYRAYKSGDPAGYVKKTLYEIPALVPAGLLALIE

GHLAGLGLFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA

IYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLL

IGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK

© Marco Punta

Page 75: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

75

Voltage-gated K+ channel: function?Shaker channel

• Membrane protein?

© Marco Punta

Page 76: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

76

Voltage-gated K+ channel:

Out

In

α-bundle β-barrel

© Marco Punta

Page 77: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

77

Voltage-gated K+ channel: TMH predicted

Side View single subunit

Top View tetramer

© Marco Punta

Page 78: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

78

Voltage-gated K+ channel: TMH predicted

1 MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY

GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD

TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY

ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS

IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA

CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI

LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG

SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN

FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG

LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ

QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV

S1

S2

S3

S4 S5

P S6

© Marco Punta

Page 79: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

79

Voltage-gated K+ channel: TMHs predicted

MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT

MQLSGEYLVRLYLVDLILVIILWADYAYRAYKSGDPAGYVKKTLYEIPALVPAGLLALIE

GHLAGLGLFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA

IYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLL

IGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK

S1

S2 S3

S4 S5

P S6

TMHs predictions on the target sequence

© Marco Punta

Page 80: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

80

Voltage-gated K+ channel: function of template

Shaker channel

• Membrane protein

• K+ selectivity?

© Marco Punta

Page 81: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

81

Voltage-gated K+ channel:

Out

In + -

-

++ -

-

+

© Marco Punta

Page 82: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

82

Voltage-gated K+ channel: conservation of outer pore

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

P S6

the selectivity filter

S5 S6

P

S4S3S2S1T

Gx

xG

x xT

© Marco Punta

Page 83: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

83

Voltage-gated K+ channel: functional characterization of target

Shaker channel

• Membrane protein

• K+ selectivity

© Marco Punta

Page 84: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

84

Voltage-gated K+ channel: functional characterization of target

Shaker channel

• Membrane protein

• K+ selectivity

• Voltage gating

© Marco Punta

Page 85: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

85

Voltage-gated K+ channel:

Out

In

Out

© Marco Punta

closed

Page 86: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

86

Voltage-gated K+ channel:

Out

In

+

-

Out

© Marco Punta

open

Page 87: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

87

Voltage-gated K+ channel: Conservation of functional residues in target

S5 S6

P

S4S3S2S1

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Sbjct : 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Sbjct : 210 LIGTVSNMF 218

P S6

the gating hinge

© Marco Punta

Page 88: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

88

Voltage-gated K+ channel: Conservation of functional residues in target

S5 S6

P

S3S2S1

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

P S6

+

+++

S4

voltage sensor

© Marco Punta

Page 89: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

89

Voltage-gated K+ channel: Conservation of functional residues in target

S5 S6

P

S3S2S1

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

P S6

S4

other voltage sensing residues

© Marco Punta

Page 90: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

90

Voltage-gated K+ channel: Function of target

Shaker channel

• Membrane protein

• K+ selectivity

• Voltage gating

© Marco Punta

Page 91: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

91

Roderick MacKinnon’s Nobel Prize

© Wikipedia

Roderick MacKinnon (Rockefeller Univ New York)

Nobel Prize Chemistry 2003:“for structural and

mechanistic studies of ion channels”

© Nobel Prize Foundation

potassium sodiumDA Doyle, J Morais Cabral, RA Pfuetzner, A Quo, JM Gulbis, SL Cohen, BT Chait and R MacKinnon. The structure of the potassium channel: Molecular basis of K+ conduction and selectivity. Science 280 (1998) 69-77.

JH Morais-Cabral, Y Zhou and R MacKinnon. Energetic optimization of ion conduction rate by the K+ selectivity filter. Nature 414 (2001) 37-47.

Y Jiang, A Lee, J Chen, M Cadene, BT Chait and R MacKinnon (2002). Crystal structure and mechanism of a calcium-gated potassium channel. Nature 417, 515-522.

Page 92: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

I.2c Function Intro: Function by association

92

Page 93: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

93

Co-expression

Expression data Machine Learning / Clustering Functional classes

For example: P Brown et al. (2000) PNAS 97:262-267© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 94: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/144© Burkhard Rost

94

Interactions / networks

For example: AH Tong et al. (2002) Science 295: 321-324© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 95: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost A Bairoch (2000) Nucleic Acid Res 28:304-305

Differentiate functional and physical interaction

Improve accuracy and coverage (data, algorithm)

Ab initio/de novo prediction

95

Open challenges - function by association

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 96: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Sub-cellular localization (nucleus, membrane,

etc.)

Post-translational modifications

Functionally important residues

Interaction sites

96

Predict aspects of function

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

Page 97: Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteins repeated: 1992, 1993 Solution:

/00© Burkhard Rost

Function introduction • Molecular biology is just at an exciting beginning • We can compute some aspects of molecular life • Most accurate inference of function: based on homology • Homology-based inference of function can be improved by

motifsproblem: definition of motifs still not fully automated

NEXT • Computing chemistry - enzyme function • Prediction of subcellular localization

97

Conclusions today