134
/134 © Burkhard Rost 1 title: Computational Biology 1 - Protein structure: Sequence alignments 2 short title: cb1_alignments2 lecture: Protein Prediction 1 - Protein structure Computational Biology 1 - TUM Summer 2016

Sequence alignments 2 cb1 alignments2 - Rostlab · 2017. 5. 16. · /134 © Burkhard Rost 1 title: Computational Biology 1 - Protein structure: Sequence alignments 2 short title:

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • /134© Burkhard Rost

    1

    title: Computational Biology 1 - Protein structure: 
 Sequence alignments 2

    short title: cb1_alignments2

    lecture: Protein Prediction 1 - Protein structure
 Computational Biology 1 - TUM Summer 2016

  • /134© Burkhard Rost

    Videos: YouTube / www.rostlab.org 
THANKS :

. EXERCISES: Special lectures: • 07/xx Predrag Radivojac - Indiana Univ. • 06/xx Yana Bromberg - Rutgers Univ. No lecture: • 05/09 no lecture • 05/15 Ascension day • 05/23 Student assembly (SVV) • 06/06 Whitsun holiday • 06/15 Corpus Christi LAST lecture: bef: Jul 11
 after: Jul 28 Examen: WEDNESDAY(!!) July 12: 18:00-19:30 TBA • Makeup: TBC: Oct 17 & 19, 2017 - lecture time

    2

    CONTACT: Lothar Richter [email protected]

    Dmitrij Nechaev

    Lothar Richter

    Christian Dallago

    http://www.rostlab.orgmailto:[email protected]?subject=

  • © Burkhard Rost (TUM Munich)

    Recap: 3D comparison

    3

  • /134© Burkhard Rost

    4

    Notation: protein structure 1D, 2D, 3DPQITLWQRPLVTIKIGGQLKEALLDTGADDTVL

    PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL

    1281757077

    120238169200247114740

    904

    466268

    11831

    1241

    292449726217

    102691

    140

    1109760691481976248590

    690

    730

    415371597395000

    5851300

    79586900

    EEEEE

    EEEEEE

    EEEEEEE

    EE

    EEEEE

    EEEEEE

    EE

    kcal/mol0 -1 -2 -3 -4 -5

    1 10 20 30 40 50 60 70 80 90

    1

    10

    20

    30

    40

    50

    60

    70

    80

    90

    1D1D 2D2D 3D3D

  • /134© Burkhard Rost

    5

    Dynamic programming: Global

    GGQLAKEEALEGQPVEVL

    GGQLAKEEAL EGQPVEVL

    GGQLAKEEAL..EGQPVEVL

  • /134© Burkhard Rost

    6

    Dynamic programming concept no gaps

    G G Q L A K E E A LE 1 1G 1 1 1 1Q 1 2 1 1P 1 2 1V 1 2E 1 2V 1 2L 1 2P 1 2

  • © Burkhard Rost /134

    allowing for gaps

    7

    GGQLAKEEAL GQ..PVEVL

    GGQLAKEEAL GQPVEVL

    no gap with gap

  • /134© Burkhard Rost

    8

    Dynamic programming concept gaps

    G G Q L A K E E A LE 1 1G 1 1 1 1Q 2 1 1P 2 1V 2E 3V 3L 4PScore = ∑ 𝛿ij - Go * Ngap

    GQPVE•V•LGQLAKEEAL

    GQGQ

  • /134© Burkhard Rost

    9

    Dynamic programming: optimal alignment

    SW = MUkTkk=1

    Lali∑ - Go ⋅ Ngap - Ge ⋅ (Lgap - Ngap )

    • Global/no gap:

    SB Needleman and CD Wunsch 1970 J Mol Biol 48, 443-53

    • Local/Gap:

    TF Smith and MS Waterman 1981 J Mol Biol 147, 195-197

  • /134© Burkhard Rost

    10

    Alignments: scoring matrixsynonyms:

    [scoring|substitution| exchange|log-odds]

    [matrix|metric|table]

    one particular: Blossum65

    S Henikoff & J Henikoff (1992) PNAS 89:10915-9

  • © Burkhard Rost /13411

    Many more substitution matrices exist today

    Most use: BLOSUM62

  • /134© Burkhard Rost

    12

    Interactive software toolIgnacio Ibarra & Francisco Melo:

Interactive software tool to comprehend the calculation of optimal sequence alignments with dynamic programming

Bioinformatics 2010, 26:1664-5



    http:/melolab.org/sat


    http:/melolab.org/sat

  • /134© Burkhard Rost

    13

    BLAST: fast matching of single ‘words’

    TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTIYKLILQGRTIKAELITEGVDGATGEKVYKQYGNQNAVDAEYTYDNATRTFTITQK

    Default “word” size for “seeds” = 3

    #1 seed=3

    #2 extendTTYKLIL AATAEKVFKQYA WTYDDATKTFTIYKLIL GATGEKVYKQYG YTYDNATRTF

    ? ?

  • /134© Burkhard Rost

    the major challenge for word search algorithms is to get the statistics right

    14

    Word searches: challenge: statistics

  • /134© Burkhard Rost

    15

    Significance of match (e.g. BLAST E-values)

    Hits

  • © Burkhard Rost (TUM Munich)

    Sequence comparisons:

    multiple alignment

    16

  • © Burkhard Rost (TUM Munich)

    How accurate are pairwise alignments?

    17

  • /134© Burkhard Rost

    18

    Zones

    Day

    light

    Zon

    e

    Twili

    ght Z

    one

    Mid

    nigh

    t Zon

    eprofile - pro

    filesequence - profile

    sequence - sequence

    sequ

    ence

    sim

    ilar

    ->

    stru

    ctur

    e sim

    ilar

    B Rost (1997) Fold Des 2:S19-24

    B Rost (1999) Protein Eng 12:85-94

  • /134© Burkhard Rost

    19

    Zones

    Day

    light

    Zon

    e

    Twili

    ght Z

    one

    Mid

    nigh

    t Zon

    eprofile - pro

    filesequence - profile

    sequence - sequence

    sequ

    ence

    sim

    ilar

    ->

    stru

    ctur

    e sim

    ilar

    B Rost (1997) Fold Des 2:S19-24

    B Rost (1999) Protein Eng 12:85-94

  • /134© Burkhard Rost

    20

    Zones

    Day

    light

    Zon

    e

    Twili

    ght Z

    one

    Mid

    nigh

    t Zon

    eprofile - pro

    filesequence - profile

    sequence - sequence

    sequ

    ence

    sim

    ilar

    ->

    stru

    ctur

    e sim

    ilar

    B Rost (1997) Fold Des 2:S19-24

    B Rost (1999) Protein Eng 12:85-94

  • /134© Burkhard Rost

    21

    40

    50

    60

    70

    80

    90

    100

    15 20 25 30 35

    Schematic: structural similarity in Twilight zone

    not true data: just to illustrate the idea of the twilight

    zone

    True Positives=pairs of proteins with similar structure

    Percentage:

    accuracy

    -or-

    specificity

  • /134© Burkhard Rost

    22

    1

    10

    100

    1,000

    15 20 25 30 35

    Schematic: true and false in Twilight zone

    True Positives = pairs of proteins with similar structureFalse Positives=pairs of proteins with DIFFERENT structure

    Number of

    pairs:

  • /134© Burkhard Rost

    23

    All-vs-all: PDB

    3D =

    structural alignment

    1D =

    sequence alignment

    0.5nm rmsd

    SAME 3D

    ignore

    DIFFER in 3D

  • /134© Burkhard Rost

    24

    PDB all-against-all ok?

  • /134© Burkhard Rost

    25

    Databases biased: MUST remove bias!

  • /134© Burkhard Rost

    26

    Databases biased: MUST remove bias!

    Day

    light

    Zo

    ne

    Twili

    ght

    Zone

    Mid

    nigh

    t Zo

    ne

    redundancy-reduced vs.

    redundancy-reduced?

  • /134© Burkhard Rost

    27

    Databases biased: MUST remove bias!

  • /134© Burkhard Rost

    28

    Sequence conservation of protein structure

    B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69

  • /134© Burkhard Rost

    29

    Raw data: density? lesson learned?

    B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69

  • /134© Burkhard Rost

    30

    How to estimate performance from the curves?

    ?gro

    ups

    groups

    groupsgroupsgroups

  • /134© Burkhard Rost

    31

    Distance from new HSSP-curve

    B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69

  • /134© Burkhard Rost

    32

    Twilight zone = true positives explode

    101

    102

    103

    104

    105

    106

    -15 -10 -5 0 5 10

    10 15 20 25 30 35

    Num

    ber o

    f pro

    tein

    pai

    rs

    Distance from HSSP threshold

    Percentage sequence identity

    0

    20

    40

    60

    80

    100

    50 100 150 200 250 300 350 400

    %

    Seq

    uenc

    e id

    entit

    y

    Number of residues aligned

    +25+20

    +15+10

    + 5 0

    - 5- 10

    - 15- 20

    B Rost 1999 Prot Engin 12, 85-94

  • /134© Burkhard Rost

    33

    Twilight zone = false positives bazoom!!

    101

    102

    103

    104

    105

    106

    -15 -10 -5 0 5 10

    10 15 20 25 30 35

    Num

    ber o

    f pro

    tein

    pai

    rs

    Distance from HSSP threshold

    Percentage sequence identity

    0

    20

    40

    60

    80

    100

    50 100 150 200 250 300 350 400

    %

    Seq

    uenc

    e id

    entit

    y

    Number of residues aligned

    +25+20

    +15+10

    + 5 0

    - 5- 10

    - 15- 20

    B Rost 1999 Prot Engin 12, 85-94

  • © Burkhard Rost (TUM Munich)

    Multiple alignment:

    simple hacks

    34

  • /134© Burkhard Rost

    Dynamic programming?
for 3 sequences: O(N1 x N2 x N3)
NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48)

    35

    Multiple alignments

    G G Q L A K E E A L

    E 1 1

    G 1 1 1 1

    Q 2 1 1P 2 1

    V 2

    E 3

    V

    L 4

    P3D

  • /134© Burkhard Rost

    Dynamic programming?
for 3 sequences: O(N1 x N2 x N3)
NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48)

    claim: computer: up to 6 
~60 TB main memory
no quote-> unsure


    36

    Multiple alignments

    G G Q L A K E E A LE 1 1G 1 1 1 1Q 2 1 1P 2 1V 2E 3VL 4P

  • /134© Burkhard Rost

    Dynamic programming?
for 3 sequences: O(N1 x N2 x N3)
NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48) hack 1: 
dynamic programming: pairwise, only space in vicinity of intersection searched n-wise – H Carrillo & DJ Lipman (1988) SIAM J Applied Math. 48: 1073-82

    – DJ Lipman, SF Altschul, JC Kececioglu (1989) PNAS 86:4412-5

    37

    Multiple alignments

  • /134© Burkhard Rost

    Dynamic programming?
for 3 sequences: O(N1 x N2 x N3)
NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48) hack 1: 
dynamic programming: pairwise, only space in vicinity of intersection searched n-wise – H Carrillo & DJ Lipman (1988) SIAM J Applied Math. 48: 1073-82

    – DJ Lipman, SF Altschul, JC Kececioglu (1989) PNAS 86:4412-5

    hack 2:
map to tree / pairwise
Russell Doolittle, UCSD

    38

    Multiple alignments

    Russell Doolittle

    Shapers and Shakers

  • /134© Burkhard Rost

    39

    Multiple alignment: progressive 1

    A B C DA 90 80 70B 90 80C 90D

    GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI

  • /134© Burkhard Rost

    40

    Multiple alignment: progressive

    A B C DA 90 80 70B 90 80C 90D

    GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI

    GGQLAKEEALGGQLAKDEALggqlakeeal

    Step 1

    Step 1

  • /134© Burkhard Rost

    41

    Multiple alignment: progressive

    A B C DA 90 80 70B 90 80C 90D

    GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI

    GGQLAKEEALGGQLAKDEALggqlakeeal

    Step 1

    Step 2

    Step 1 Step 2GGQIAKDEALGGQIAKDEAIggqiakdeal

  • /134© Burkhard Rost

    42

    Multiple alignment: progressive 1

    A B C DA 90 80 70B 90 80C 90D

    GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI

    GGQLAKEEALGGQLAKDEALggqlakeeal

    Step 1

    Step 2

    Step 1 Step 2GGQIAKDEALGGQIAKDEAIggqiakdeal

    ggqlakeealggqiakdeal

    Step 3

    Step 3

  • /134© Burkhard Rost

    43

    Multiple alignment: progressive 2

    A B C DA 90 80 70B 90 80C 90D

    GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI

    GGQLAKEEALGGQLAKDEALggqlakeeal

    Step 1

    Step 1

  • /134© Burkhard Rost

    44

    Multiple alignment: progressive 2

    A B C DA 90 80 70B 90 80C 90D

    GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI

    GGQLAKEEALGGQLAKDEALggqlakeeal

    Step 1

    Step 2

    Step 1 Step 2ggqlakeealGGQIAKDEAL ggqlakeeal

  • /134© Burkhard Rost

    45

    Multiple alignment: progressive 2

    A B C DA 90 80 70B 90 80C 90D

    GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI

    GGQLAKEEALGGQLAKDEALggqlakeeal

    Step 1

    Step 2

    Step 1 Step 2ggqlakeealGGQIAKDEAL ggqlakeeal

    Step 3

    Step 3ggqlakeealGGQIAKDEAI

  • © Burkhard Rost (TUM Munich)

    Profiles: concept

    46

  • /134© Burkhard Rost

    47

    Pairwise vs. multiple: built-up

    GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL

    GGAQGLAKDEAL

  • /134© Burkhard Rost

    48

    Pairwise vs. multiple: built-up

    GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL

    GGAQGLAKDEAL

    GGQQLAKEEALGGQQLAKDEAL

    GGAQGLAKDEAL

  • /134© Burkhard Rost

    49

    Pairwise vs. multiple: built-up

    GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL

    GGAQGLAKDEAL

    GGQQLAKEEALGGQQLAKDEAL

    GGAQGLAKDEAL

    GG.QQLAKEEALGG.QQLAKDEALGGAQGLAKDEAL

  • /134© Burkhard Rost

    50

    Profiles profit from relation of “families”

  • /134© Burkhard Rost

    51

    Computationally: motifs

    K Robison et al (1998) JMB 284, 241-54retrieved from http://weblogo.berkeley.edu/examples.html

    Transcription factor binding sites

    Following slides:

    thanks kudos to Theresa Wirth

    http://weblogo.berkeley.edu/examples.html

  • /134© Burkhard Rost

    52Fig. 7: M Zvelebil & JO Baum (2008) Understanding Bioinformatics, Garland

    Sequence motifs

    Any AAOne-letter code

    Two or more possible AA Disallowed AA Repetition range X(n,m) of X

    Matching sequences e.g.

    GLLMSACVVVGILMSAYPPGLLMSAES

    Representation of a sequence with more than one possible amino acid (AA) or nuclear acid (NA) at a single position

    G-[LI]-L-M-S-A-{RK}-X(1,3)

    slide: Theresa Wirth

  • /134© Burkhard Rost

    53S Henikoff & J Henikoff (1992) PNAS 89:10915-9

    Recap: substitution matrix (BLOSUM)

    GENERIC (for all proteins)

  • /134© Burkhard Rost

    54

    Scoring matrix: generic vs. specific

    slide: Theresa Wirth

    Generic

    (here Blossum62)

    Position-specific

    scoring matrix

    PSSM

  • /134© Burkhard Rost

    • Matrix of numbers with scores for each residue or nucleotide at each position

    55

    PSSM - Position specific substitution matrix: concept

    PSSM: Position-specific scoring matrix

    Building PSSM:

    ◦ Absolute frequencies

    ◦ Add pseudo-counts if necessary

    ◦ Relative frequencies

    ◦ Log likelihoods

    Starting point: 123456

    ATGCTA

    ATTGCT

    TCTGAG

    GTTGAG

    CCATCC

    slide: Theresa Wirth

  • /134© Burkhard Rost

    56

    PSSM - Position specific substitution matrix: one solution

    Absolutefrequency

    A:2T:1G:1C:1

    Relativefrequency

    A:0.4T:0.2G:0.2C:0.2

    Σ=1

    1 2 3 4 5 6

    A 0.4 0 0.2 0 0.4 0.2

    T 0.2 0.6 0.6 0.2 0.2 0.2

    G 0.2 0 0.2 0.6 0 0.4

    C 0.2 0.4 0 0.2 0.4 0.2

    1 2 3 4 5 6

    A 0.47 ∞ -0.22 ∞ 0.47 -0.22

    T -0.22 0.86 0.86 -0.22 -0.22 -0.22

    G -0.22 ∞ -0.22 0.86 ∞ 0.47

    C -0.22 0.47 ∞ -0.22 0.47 -0.22

    Logodds

    slide: Theresa Wirth

  • /134© Burkhard Rost

    57Olga Zhaxybayeva http://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.html

    PSSM - Position specific substitution matrix: example

    slide: Theresa Wirth

    http://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.htmlhttp://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.htmlhttp://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.html

  • © Burkhard Rost (TUM Munich)

    PSI-Blast

    Position-Specific Iterative-

    Basic Local Alignment Tool

    58

  • /134© Burkhard Rost

    59

    Profile-based comparison

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

  • /134© Burkhard Rost

    60BLOSUM: S Henikoff & J Henikoff (1992) PNAS 89:10915-9

    Alignment scores use generic scoring matrix

    Generic scoring matrix

    (here BLOSUM62)

  • /134© Burkhard Rost

    61

    Idea: replace generic by specific scoring

    BLOSUM: S Henikoff & J Henikoff (1992) PNAS 89:10915-9

    Generic scoring matrix

    (here BLOSUM62)

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

  • © Burkhard Rost (TUM Munich)

    concept of PSI-BLAST

    62

  • /134© Burkhard Rost

    1. fast hashing

    63

    PSI-BLAST in steps

  • /134© Burkhard Rost

    64

    Like BLAST match ‘words’

    TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA

    TTYKLILTTYKLIL

    WTYDDATKTFWTVEKAFKTF

    AATAEKVFKQYAAWTVEKAFKTFA

    ? ?

    Default “word” size for “seeds” = 3

    #1 seed=3

    #2 extend

  • /134© Burkhard Rost

    1. fast hashing 2. dynamic programming extension between

    matches

    65

    PSI-BLAST in steps

  • /134© Burkhard Rost

    66

    BLAST + Smith-Waterman

    TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA

    TTYKLILTTYKLIL

    WTYDDATKTFWTVEKAFKTF

    AATAEKVFKQYAAWTVEKAFKTFA

    #1 seed=3

    #2 extend

    dynamic programming to extend

  • /134© Burkhard Rost

    67

    BLAST + Smith-Waterman

    TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA

    TTYKLILTTYKLIL

    WTYDDATKTFWTVEKAFKTF

    AATAEKVFKQYAAWTVEKAFKTFA

    #1 seed=3

    #2 extend

    dynamic programming to extend

    Why is this fast?

  • /134© Burkhard Rost

    1. fast hashing 2. dynamic programming extension between

    matches 3. compile statistics


    EVAL - Expectation values

    68

    PSI-BLAST in steps

    Hits

  • /134© Burkhard Rost

    1. fast hashing 2. dynamic programming extension between

    matches 3. compile statistics 4. collect all pairs and build profile

    69

    PSI-BLAST in steps

  • /134© Burkhard Rost

    70

    Sequence-profile comparison

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    YDFHGVGEDDISIKRG

    PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402

    PS- position specific

  • /134© Burkhard Rost

    1. fast hashing 2. dynamic programming extension between

    matches 3. compile statistics 4. collect all pairs and build profile 5. ?

    71

    PSI-BLAST in steps

    ??? 1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    YDFHGVGEDDISIKRG

  • /134© Burkhard Rost

    1. fast hashing 2. dynamic programming extension between

    matches 3. compile statistics 4. collect all pairs and build profile 5. iterate

    72

    PSI-BLAST in steps

  • /134© Burkhard Rost

    73

    Sequence-profile comparison

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    YDFHGVGEDDISIKRG

    PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402

    PSI- position specific iteration

  • /134© Burkhard Rost

    74

    Steps involved for profile-based alignments

    sequence-sequence query database using substitution metric

    results -> profile

  • /134© Burkhard Rost

    75

    Steps involved for profile-based alignments

    sequence-sequence query database using substitution metric

    results -> profile

    profile-sequence query database using PSSM/profile

    results -> update profile

  • /134© Burkhard Rost

    76

    Steps involved for profile-based alignments

    sequence-sequence query database using substitution metric

    results -> profile

    profile-sequence query database using PSSM/profile

    results -> update profile ite

    rate

  • /134© Burkhard Rost

    PSI-BLAST fast, partial dynamic programming
SF Altschul (1997) NAR 25:3389-3402

    ClustalW/ClustalX slow, dynamic programming, for experts
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80

    77

    Sequence-profile methods

  • /134© Burkhard Rost

    all against all (pairs) 
by dynamic programming 
(varying substitution matrices) build phylogenetic tree

    78

    Clustal (ClustalW, ClustalX)

    A B C DA 90 80 70B 90 80C 90D A B C D

    JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80

  • /134© Burkhard Rost

    PSI-BLAST fast, partial dynamic programming
SF Altschul (1997) NAR 25:3389-3402

    ClustalW/ClustalX slow, dynamic programming, for experts
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80

    MaxHom relatively slow, dynamic programming, good first guess
C Sander & R Schneider (1991) Proteins 9:56-69

    79

    Sequence-profile methods

  • /134© Burkhard Rost

    PSI-BLAST fast, partial dynamic programming
SF Altschul (1997) NAR 25:3389-3402

    ClustalW/ClustalX slow, dynamic programming, for experts
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80

    SAM/HMMer slow, need preprocess, HMM (statistics), very accurate
R Hughey & A Krogh (1996) CABIOS 12:95-107 
S Eddy (1998) Bioinformatics 14:755-63

    80

    Sequence-profile methods

  • /134© Burkhard Rost

    81

    Zones

    Day

    light

    Zon

    e

    Twili

    ght Z

    one

    Mid

    nigh

    t Zon

    e

    profile - profile

    sequence - profilesequence - sequence

    sequ

    ence

    sim

    ilar

    ->

    stru

    ctur

    e sim

    ilar

    B Rost (1997) Fold Des 2:S19-24

    B Rost (1999) Protein Eng 12:85-94

  • © Burkhard Rost (TUM Munich)

    another way to do math: HMM -

    Hidden Markov

    Models82

  • /134© Burkhard Rost

    83SR Eddy (1998) Bioinformatics 755-63

    HMM: Hidden Markov Models

    Fig. 1. A toy HMM, modeling sequences of as and bs as two regions of potentially different residue composition. The model is drawn (top) with circles for states and arrows for state transitions. A possible state sequence generated from the model is shown, followed by a possible symbol sequence. The joint probability P(x,π|HMM) of the symbol sequence and the state sequence is a product of all the transition and emission probabilities. Notice that another state sequence (1-2-2) could have generated the same symbol sequence, though probably with a different total probability. This is the distinction between HMMs and a standard Markov model with nothing to hide: in an HMM, the state sequence (e.g. the biologically meaningful alignment) is not uniquely determined by the observed symbol sequence, but must be inferred probabilistically from it.

  • /134© Burkhard Rost

    84

    ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures

    Date Status Home Score Away Attendance Competition

    Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga

    Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga

    Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga

    Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga

    Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga

    Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga

    Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga

    Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga

    Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga

    Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga

    Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga

    Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga

    slide: Marco Punta

  • /134© Burkhard Rost

    85

    ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures

    Date Status Home Score Away Attendance Competition

    Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga

    Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga

    Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga

    Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga

    Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga

    Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga

    Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga

    Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga

    Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga

    Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga

    Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga

    Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga

    slide: Marco Punta

    W

    L

    D

    WW

    W

    L

    LLL

    L

    L

  • /134© Burkhard Rost

    86

    ProfileHMM: example football

    slide: Marco Punta

    W D LProbabilistic

    W D L

  • /134© Burkhard Rost

    87

    ProfileHMM: example football

    slide: Marco Punta

    Our model for Augsburg’s Bundesliga results:

    3 states: W, D, L

    S(t)=F(S(t-1))

    States S connected by probabilities

    pij≥0; j

    pij=1Σ

  • /134© Burkhard Rost

    88

    ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures

    Date Status Home Score Away Attendance Competition

    Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga

    Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga

    Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga

    Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga

    Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga

    Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga

    Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga

    Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga

    Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga

    Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga

    Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga

    Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga

    slide: Marco Punta

    W

    L

    D

    WW

    W

    L

    LLL

    L

    L

  • /134© Burkhard Rost

    89

    ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures

    Date Status Home Score Away Attendance Competition

    Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga

    Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga

    Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga

    Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga

    Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga

    Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga

    Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga

    Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga

    Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga

    Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga

    Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga

    Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga

    slide: Marco Punta

    W

    L

    D

    WW

    W

    L

    LLL

    L

    L

    H

    A

    A

    H

    H

    H

    A

    H

    A

    H

    A

    A

  • /134© Burkhard Rost

    90

    ProfileHMM: example football

    slide: Marco Punta

    W D L

    H A

  • /134© Burkhard Rost

    91

    ProfileHMM: formalism

    slide: Marco Punta

    HMMs are probabilistic models defined by:

    ! A finite set S of states

    ! A discrete alphabet A of symbols (observed objects)

    ! A probability transition matrix T=(tij) , i,j states

    ! A probability emission matrix E=(eix), i state, x symbol

    … …

    states

    symbols

    tij tij tij tij

    eix eix eix eix

  • /134© Burkhard Rost

    92

    ProfileHMM - intro

    Profile-HMMs are probabilistic models…

    Model -> simulates a system Probabilistic -> produces outcomes based on probabilities

  • /134© Burkhard Rost

    93

    ProfileHMM: example 2: traffic light

  • /134© Burkhard Rost

    94

    ProfileHMM: example 2: traffic light

    R Y G

  • /134© Burkhard Rost

    95

    ProfileHMM: example 2: traffic light

    R Y G

    Deterministic

  • /134© Burkhard Rost

    96

    ProfileHMM: example 3: weather

  • /134© Burkhard Rost

    97

    ProfileHMM: example 3: weather

    S C R

  • /134© Burkhard Rost

    98

    ProfileHMM: example 3: weather

    S C R

    Probabilistic

    *In fact, chaotic, deterministic

    *

  • /134© Burkhard Rost

    99

    ProfileHMM: example 3: weather

    Probabilistic

    pijTransition probability from symbol i to symbol j

    *

    *In fact, chaotic, deterministic

  • /134© Burkhard Rost

    100

    Probabilistic, 1st order Markov model

    Example 2: The weather

    P(Xn+1=x|X1=x1,…,Xn=xn)=P(Xn+1=x|Xn=xn)

  • © Burkhard Rost (TUM Munich)

    HMM for alignment

    101

  • /134© Burkhard Rost

    102M Zvelebil & JO Baum (2008) Understanding Bioinformatics, Garland

    Generic Profile-HMM for alignment

    M1 M2 M3 M4

    I0 I1 I3I2 I4

    D1 D4D3D2

    START END

    ◦ Captures matches, insertions and deletions

    ◦ Transition and emission probabilites

    ◦ Gap penalty handled by variation of transition probabilities

    ◦ Calculation of probability by multiplication of path variables

    slide: Theresa Wirth

  • /134© Burkhard Rost

    consider residue position i 
BEFORE any amino acid is aligned, 


    103

    Entropy in alignment

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30

    CMYFPIVLHQARETKGSDN

    31

    MFYHIPVTNDLGQSAERK

    32

    MFYHIPVTNDLGQSAERK

    33

    WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37MYCHFPI

    QRVLDEKNGTAS

    © Kevin Karplus UCSC

  • /134© Burkhard Rost

    consider residue position i
BEFORE any amino acid is aligned, 
we expect a particular acid according to some prior or background probability, P0, with entropy H0


    104

    Entropy in alignment

    © Kevin Karplus UCSC

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30

    CMYFPIVLHQARETKGSDN

    31

    MFYHIPVTNDLGQSAERK

    32

    MFYHIPVTNDLGQSAERK

    33

    WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37MYCHFPI

    QRVLDEKNGTAS

  • /134© Burkhard Rost

    consider residue position i
BEFORE any amino acid is aligned, 
we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment
posterior probability Pi + priors -> Hi


    105

    Entropy in alignment

    © Kevin Karplus UCSC

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30

    CMYFPIVLHQARETKGSDN

    31

    MFYHIPVTNDLGQSAERK

    32

    MFYHIPVTNDLGQSAERK

    33

    WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37MYCHFPI

    QRVLDEKNGTAS

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30CMYFPIVLHQARETKGSDN

    31MFYHIPVTNDLGQSAERK

    32MFYHIPVTNDLGQSAERK

    33WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37

    MYCHFPI

    QRVLDEKNGTAS

  • /134© Burkhard Rost

    consider residue position i
BEFORE any amino acid is aligned, 
we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment
posterior probability Pi + priors -> Hi if position i conserved: Hi =?


    106

    Entropy in alignment

    © Kevin Karplus UCSC

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30

    CMYFPIVLHQARETKGSDN

    31

    MFYHIPVTNDLGQSAERK

    32

    MFYHIPVTNDLGQSAERK

    33

    WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37MYCHFPI

    QRVLDEKNGTAS

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30CMYFPIVLHQARETKGSDN

    31MFYHIPVTNDLGQSAERK

    32MFYHIPVTNDLGQSAERK

    33WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37

    MYCHFPI

    QRVLDEKNGTAS

  • /134© Burkhard Rost

    consider residue position i
BEFORE any amino acid is aligned, 
we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment
posterior probability Pi + priors -> Hi if position i conserved: Hi -> 0
if position i completely varied: Hi=?


    107

    Entropy in alignment

    © Kevin Karplus UCSC

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30

    CMYFPIVLHQARETKGSDN

    31

    MFYHIPVTNDLGQSAERK

    32

    MFYHIPVTNDLGQSAERK

    33

    WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37MYCHFPI

    QRVLDEKNGTAS

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30CMYFPIVLHQARETKGSDN

    31MFYHIPVTNDLGQSAERK

    32MFYHIPVTNDLGQSAERK

    33WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37

    MYCHFPI

    QRVLDEKNGTAS

  • /134© Burkhard Rost

    consider residue position i
BEFORE any amino acid is aligned, 
we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment
posterior probability Pi + priors -> Hi if position i conserved: Hi -> 0
if position i completely varied: Hi -> H0


    108

    Entropy in alignment

    © Kevin Karplus UCSC

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30

    CMYFPIVLHQARETKGSDN

    31

    MFYHIPVTNDLGQSAERK

    32

    MFYHIPVTNDLGQSAERK

    33

    WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37MYCHFPI

    QRVLDEKNGTAS

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30CMYFPIVLHQARETKGSDN

    31MFYHIPVTNDLGQSAERK

    32MFYHIPVTNDLGQSAERK

    33WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37

    MYCHFPI

    QRVLDEKNGTAS

  • /134© Burkhard Rost

    consider residue position i
BEFORE any amino acid is aligned, 
we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment
posterior probability Pi + priors -> Hi if conserved: Hi -> 0; if varied: Hi -> H0 Hi-H0 reflects the “bits saved” by the alignment

    109

    Entropy in alignment

    © Kevin Karplus UCSC

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30

    CMYFPIVLHQARETKGSDN

    31

    MFYHIPVTNDLGQSAERK

    32

    MFYHIPVTNDLGQSAERK

    33

    WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37MYCHFPI

    QRVLDEKNGTAS

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30CMYFPIVLHQARETKGSDN

    31MFYHIPVTNDLGQSAERK

    32MFYHIPVTNDLGQSAERK

    33WHQYPMERDNKFGTSIVLAC

    34

    CMYFPHIVDGTNSALQEKR

    35

    WHQYPMERDNKFGTSIVLAC

    36

    CPMDQNGEWRTKSAIHVLFY

    37

    MYCHFPI

    QRVLDEKNGTAS

  • /134© Burkhard Rost

    • few members / little divergence 
entropy dominated by priors
-> the background signal dominates

    110

    0

    1

    2

    bits

    sav

    ed

    1 2CPQNDHWERKTMSGAVIYLF

    3HCYMFPQRDEGNKILVAST

    4CMYFPIVLHQARETKGSDN

    5HQNPDCYREGKMSFTALIV

    6MYCHFPIQRVLDEKNGTAS

    7WHQYPMERDNKFGTSIVLAC

    8HCYMFPQRDEGNKILVAST

    9HCYMFPQRDEGNKILVAST

    10

    MYCHFPI

    QRVLDEKNGTAS

    11

    MFYHIPVTNDLGQSAERK

    12

    MFYIH

    PVGLTNRSAQKDE

    13

    WHQYPMERDNKFGTSIVLAC

    14

    PCHNDMQERKTSVIA

    GLFYW

    15

    MYCHFPI

    QRVLDEKNGTAS

    16

    HQNPDCYREGKMSFTALIV

    17

    WHQYPMERDNKFGTSIVLAC

    18

    FYMPIHVGTNLDSARKEQ

    19

    CMYFPHIVDGTNSALQEKR

    20

    WHNPDCQEGYRSKTAMFVIL

    21

    CMPIVTFDGLQSAKRENYH

    22

    CMYFPIVLHQARETKGSDN

    23

    HCYMFPQRDEGNKILVAST

    24

    MYCHFPI

    QRVLDEKNGTAS

    25

    CMYFPHIVDGTNSALQEKR

    26

    WMCYHFPQI

    RVELTDKNSAG

    27

    MFYHIPVTNDLGQSAERK

    28

    WHQYPMERDNKFGTSIVLAC

    29

    CHPDNYEGQRSKTFAVIML

    30

    CMYFPIVLHQARETKGSDN

    31

    MFYHIPVTNDLGQSAERK

    32

    MFYHIPVTNDLGQSAERK

    33

    WHQYPMERDNKFGTSIVLAC

    34CMYFPHIVDGTNSALQEKR

    35WHQYPMERDNKFGTSIVLAC

    36CPMDQNGEWRTKSAIHVLFY

    37MYCHFPI

    QRVLDEKNGTAS

    Alignment entropy for small families

    0

    1

    2

    bits

    sav

    ed

    1MHYFQPIN

    RGDKELVAST

    2MFYHIVPLNGTQDSAERK

    3CMYHFIQNRVTLDGKESAP

    4CMPQDNGHWERKTSIAVLFY

    5MFYHI

    PVNGDTLQSAEKR

    6WMCFYIHVQLPRTKENDSAG

    7MIPFVQTYGNLDARSEKH

    8MFYHI

    PVNGDTLQSAEKR

    9CDQNPHGEKRMWSTAVIYLF

    10

    MHYFQPINRGDKELVAST

    11

    MFYHIV

    PLNGTQDSAERK

    12

    MFYHIVLPGNTRQSAKDE

    13

    FIYHVPLQRATKESGDN

    14

    WHCNDQGPYRMKESFTALIV

    15

    MFYHI

    PVNGDTLQSAEKR

    16

    WHCNDQGPREKYSMTFAVLI

    17

    CWHNDQGPERSKYTMAFVIL

    18

    MFYHIVLPGNTRQSAKDE

    19

    MFHYIQVLPRNKDEGTAS

    20

    CPMQHDNKEGRTSAIVLYFW

    21

    CDQNPHGEKRMWSTAVIYLF

    22

    HCMYFNQRPDIKTELSGVA

    23

    MFYHIV

    PLNGTQDSAERK

    24

    FIYHVPLQRATKESGDN

    25

    WHCNDQGPREKYSMTFAVLI

    26

    MFYHIVLPGNTRQSAKDE

    27

    FIYHVPLQRATKESGDN

    28

    CMYHFI

    QNRVTLDGKESAP

    29

    CMPQDNGHWERKTSIAVLFY

    30CWHNDQGPERSKYTMAFVIL

    31MFIYHVLPRQTAKGSNED

    32MHYFQPINRGDKELVAST

    33

    MFYHIV

    PLNGTQDSAERK

    34

    WMCFYIHVQLPRTKENDSAG

    35

    CWHNDQGPERSKYTMAFVIL

    36

    MFYHIVLPGNTRQSAKDE

    37

    FIYHVPLQRATKESGDN

    38

    CWHNDQGPERSKYTMAFVIL

    39

    CHNDPGQYREKSTFAIVLM

    40

    MFYHIV

    PLNGTQDSAERK

    41

    FIYHVPLQRATKESGDN

    42

    MHYFQPINRGDKELVAST

    0

    1

    2

    3

    4

    bits

    sav

    ed

    1HVLPTNGQASDREK

    2LDPATNSEQGRK 3IYVHPLTANGDSQEKR

    4YIVPHDTLSANGEQKR

    5PGK 6FMYIPVHDGTNLSAEQKR

    7GYEFKRQSIAVLT

    8SIVTAL 9PQGMNEKRSHWTAVLIYF

    10

    LHQRAGPNKDEST

    11

    PGQNTADERSK

    12

    ASE

    13

    FPYGNIDRSTLVKEAQ

    14

    QEYFSKARIVTL

    15

    QEVAL

    16

    QFKSRTIVLAE

    17

    CTYAFVMIL

    18

    HIPGVLTSNADQRKE

    19

    HPLGTQDNSAERK

    20

    AVLE

    21

    RTSHAMVIWLYF

    22

    AL

    23

    REAVKFL

    24

    PRAGQKEHDTSN

    25

    AIVL

    26

    LDTRASEK

    27 28

    EANSQKRP

    29

    MPDQGHENSI

    RATVFLKY

    30

    NGQEKRYSMTFAVIPL

    31

    HPQRKAEGNDTS

    32

    SEIAVLR

    33

    RSAE

    34

    PMIGNLVSTADQKRE

    35

    DPHGMNYFQESTAVILKR

    36

    KRTVAILE

    37

    GVNTDLRSKAQE

    38

    CKSYTAFMVIL

    39

    HYFNDQCPRMKEILTVGSA

    40

    TDNVLSQRAEK

    41

    VEKASL

    42

    NPGQEKCRYMSFAIVTL

    © Kevin Karplus UCSC

  • /134© Burkhard Rost

    111

    SAM-T98: Build alignment

    © Kevin Karplus UCSC

    Reestimate the alignment with the new homologs

    Use the model to search for additional homologs

    Build a model from the sequence or alignment

    SAM-T98 Alignment Building

    (Iterations 1 - 3)

    Start: a single sequence

    End: a SAM-T98 alignment(Iteration 4)

  • /134© Burkhard Rost

    PSI-BLAST fast, partial dynamic programming
SF Altschul (1997) NAR 25:3389-3402

    ClustalW/ClustalX slow, dynamic programming, for experts
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80

    SAM/HMMer slow, need preprocess, HMM (statistics), very accurate
R Hughey & A Krogh (1996) CABIOS 12:95-107/ S Eddy (1998) Bioinformatics 14:755-63

    T-Coffee much slower, requires preprocessing, Genetic Algorithm
Cedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17

    112

    Sequence-profile methods

  • © Burkhard Rost (TUM Munich)

    Genetic Algorithm for alignment

    113

  • /134© Burkhard Rost

    114

    Independence assumption

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    dynamic programming

    (smith-waterman) PSI-BLAST sequence-sequence

    or

    sequence-profile

    SAM

    HMMer

  • /134© Burkhard Rost

    115

    Independence assumption

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    dynamic programming

    (smith-waterman) PSI-BLAST sequence-sequence

    or

    sequence-profile

    SAM

    HMMer

    ALL assume that alignment at position i independent of alignment at position j

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    E E R E E D

    K K K E E R

  • © Burkhard Rost /134

    Genetic algorithm does not make the

    independence assumption

    116

  • © Burkhard Rost /134

    Genetic algorithm operates on segments

    117

  • /134© Burkhard Rost

    118

    Genetic algorithm - concept

    The 2006 NASA ST5 spacecraft antenna. This complicated shape was found through a genetic algorithm optimizing the radiation

    pattern. It is known as an Evolved antenna.

    © Wikipedia

    http://en.wikipedia.org/wiki/Space_Technology_5http://en.wikipedia.org/wiki/Evolved_antenna

  • /134© Burkhard Rost

    119

    Genetic algorithm - concept

    The 2006 NASA ST5 spacecraft antenna.

    This complicated shape was found through a genetic algorithm optimizing the

    radiation pattern. It is known as an Evolved

    antenna.

    © W

    ikipe

    dia

    Initialize population

    Compute fitness

    STOP

    ? • roulette parents

    • cross-over to children

    • mutate children

    • compute fitness

    START

    END

    Crossover

    parents

    child

    mutation

    http://en.wikipedia.org/wiki/Space_Technology_5http://en.wikipedia.org/wiki/Evolved_antennahttp://en.wikipedia.org/wiki/Evolved_antenna

  • /134© Burkhard Rost

    Begin with “library” of local and global pairwise alignments

    120© Cedric Notredame CRG

    T-Coffee Genetic algorithm (GA)

  • /134© Burkhard Rost

    Local Alignment Global Alignment

    Extension

    Multiple Sequence Alignment

    121

    T-Coffee: Mix local and global alignment

    © Cedric Notredame, CRG Barcelona

  • /134© Burkhard Rost

    Local Alignment Global Alignment

    Multiple Sequence Alignment

    Multiple Alignment

    StructuralSpecialist

    122

    T-Coffee: Use more information

    © Cedric Notredame, CRG Barcelona

  • /134© Burkhard Rost

    PSI-BLAST fast, partial dynamic programming
SF Altschul (1997) NAR 25:3389-3402

    ClustalW/ClustalX slow, dynamic programming, for experts
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80

    MaxHom relatively slow, dynamic programming, good first guess
C Sander & R Schneider (1991) Proteins 9:56-69

    T-Coffee much slower, requires preprocessing, Genetic Algorithm
Cedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17

    SSEARCH/PSI-Search SSEARCH - similar to MaxHom with SW only

    PSI-Search: iterated SSEARCH
W Liu, H McWilliam, M Goujon, A Cowley, R Lopez, WR Pearson (2013) Bioinformatics 28:1650-1

    123

    Sequence-profile methods

  • /134© Burkhard Rost

    124

    Sequence-profile comparison

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    YDFHGVGEDDISIKRG

  • /134© Burkhard Rost

    125

    Zones

    Day

    light

    Zon

    e

    Twili

    ght Z

    one

    Mid

    nigh

    t Zon

    e

    profile - profile

    sequence - profilesequence - sequence

    sequ

    ence

    sim

    ilar

    ->

    stru

    ctur

    e sim

    ilar

    B Rost (1997) Fold Des 2:S19-24

    B Rost (1999) Protein Eng 12:85-94

  • /134© Burkhard Rost

    pairwise multiple sequence-profile ?

    126

    evolution of alignment methods

  • © Burkhard Rost (TUM Munich)

    Profile-profile alignments

    127

  • /134© Burkhard Rost

    128

    Profile-profile comparison

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF