41
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.10 Multiple alignments, PATTERNS, PSI-BLAST

Multiple alignments, PATTERNS, PSI-BLAST

  • Upload
    darci

  • View
    50

  • Download
    1

Embed Size (px)

DESCRIPTION

Multiple alignments, PATTERNS, PSI-BLAST. Overview. Multiple alignments How-to, Goal, problems, use Patterns PROSITE database, syntax, use PSI-BLAST BLAST, matrices, use [ Profiles/HMMs ] …. What is a multiple sequence alignment?. What can it do for me? How can I produce one of these? - PowerPoint PPT Presentation

Citation preview

Page 1: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Multiple alignments, PATTERNS, PSI-BLAST

Page 2: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Overview

Multiple alignments How-to, Goal, problems, use

Patterns PROSITE database, syntax, use

PSI-BLAST BLAST, matrices, use

[ Profiles/HMMs ] …

Page 3: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

What is a multiple sequence alignment?

What can it do for me? How can I produce one of these? How can I use it?

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

Page 4: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

How can I use a multiple alignment?

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPunknown -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------unknown AKDDRIRYDNEMKSWEEQMAE * : .* . :

Extrapolation

SwissProt

Unkown Sequence

Homology?

Page 5: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

How can I use a multiple alignment?

SwissProt

Unkown Sequence

Match?

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

Extrapolation

Prosite Patterns

Page 6: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

How can I use a multiple alignment?

Extrapolation

Prosite Patterns

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-IQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

L?K>R

Prosite Profiles -More Sensitive-More Specific

AFDEFGHQIVLW

Page 7: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

How can I use a multiple alignment?

Phylogeny

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

chite

wheat

trybr

mouse

-Evolution-Paralogy/Orthology

Page 8: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

How can I use a multiple alignment?

Phylogeny

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

Struc. Prediction

PhD For secondary Structure Prediction: 75% Accurate.

Threading: is improving but is not yet as good.

Page 9: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

How can I use a multiple alignment?

Phylogeny

Struc. Prediction

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

Caution!

Automatic MultipleSequence Alignment methodsare not always perfect…

Page 10: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

The problem

why is it difficult to compute a multiple sequence alignment?

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

Computation

What is the good alignment?

Biology

What is a good alignment?

Page 11: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

The problem

why is it difficult to compute a multiple sequence alignment?

CIRCULAR PROBLEM....

GoodSequences

GoodAlignment

Page 12: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

What do I need to know to make a good multiple alignment?

How do sequences evolve? How does the computer align the sequences? How can I choose my sequences? What is the best program? How can I use my alignment?

Page 13: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

An alignment is a story

ADKPKRPLSAYMLWLN

ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN

ADKPRRPLS-YMLWLNADKPKRPKPRLSAYMLWLN

Mutations+

Selection

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN

InsertionDeletion

Mutation

Page 14: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Homology

Same sequences -> same origin? -> same function? -> same 3D fold?

Length

%Sequence Identity

30%

100

Same 3D Fold

Twilight Zone

Page 15: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Convergent evolution

AFGP with (ThrAlaAla)nSimilar To Trypsynogen

AFGP with (ThrAlaAla)nNOT

Similar to Trypsinogen

N

S

Chen et al, 97, PNAS, 94, 3811-16

Page 16: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Residues and mutations

All residues are equal, but some more than others…

PG

SC

LI

T

V A

WYF QH

K

R

ED N

Aliphatic

Aromatic

Hydrophobic

Polar

SmallM

Accurate matrices are data driven rather than knowledge driven

GC

Page 17: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Substitution matrices

Different Flavors:

• Pam: 250, 350• Blosum: 45, 62• …

Page 18: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

What is the best substition matrix?

Mutation rates depend on families

Choosing the right matrix may be tricky Gonnet250 > BLOSUM62 > PAM250 Depends on the family, the program used and its

tuning

Family S N Histone3 6.4 0Insulin 4.0 0.1Interleukin I 4.6 1.4Globin 5.1 0.6Apolipoprot. AI 4.5 1.6Interferon G 8.6 2.8

Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (0.08 Billion years)

Page 19: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Insertions and deletions?

Indel Cost

L

Cost

L

Cost

L

Affine Gap PenaltyCost=GOP+GEP*L

Page 20: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

How to align many sequences?

Exact algorithms are computing time consuming Needlemann & Wunsch Smith & Waterman

-> heuristic required!

2 Globins =>1 sec

Page 21: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

3 Globins =>2 mn

How to align many sequences?

Exact algorithms are computing time consuming Needlemann & Wunsch Smith & Waterman

-> heuristic required!

Page 22: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

4 Globins =>5 hours

How to align many sequences?

Exact algorithms are computing time consuming Needlemann & Wunsch Smith & Waterman

-> heuristic required!

Page 23: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

5 Globins =>3 weeks

How to align many sequences?

Exact algorithms are computing time consuming Needlemann & Wunsch Smith & Waterman

-> heuristic required!

Page 24: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

6 Globins =>9 years

How to align many sequences?

Exact algorithms are computing time consuming Needlemann & Wunsch Smith & Waterman

-> heuristic required!

Page 25: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

How to align many sequences?

Exact algorithms are computing time consuming Needlemann & Wunsch Smith & Waterman

-> heuristic required!

7 Globins =>1000 years

Page 26: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

8 Globins =>150 000 years

How to align many sequences?

Exact algorithms are computing time consuming Needlemann & Wunsch Smith & Waterman

-> heuristic required!

Page 27: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Existing methods

1-Carillo and Lipman:

-MSA, DCA.

-Few Small Closely Related Sequence.

2-Segment Based:

-DIALIGN, MACAW.

-May Align Too Few Residues

-Do Well When They Can Run.

3-Iterative:-HMMs, HMMER, SAM.

-Slow, Sometimes Inacurate

-Good Profile Generators

4-Progressive:

-ClustalW, Pileup, Multalign…

-Fast and Sensitive

Page 28: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Progressive alignmentFeng and Dolittle, 1980; Taylor 1981

Dynamic Programming Using A Substitution Matrix

Page 29: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Progressive alignmentFeng and Dolittle, 1980; Taylor 1981

-Depends on the ORDER of the sequences (Tree).

-Depends on the CHOICE of the sequences.

-Depends on the PARAMETERS:

•Substitution Matrix.

•Penalties (Gop, Gep).

•Sequence Weight.

•Tree making Algorithm.

Page 30: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Selecting sequences from a BLAST output

Page 31: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

A common mistake

Sequences too closely related

Identical sequences brings no information Multiple sequence alignments thrive on

diversity

PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEEPRVA_HUMAN SMTDLLNAEDIKKAVGAFSATDSFDHKKFFQMVGLKKKSADDVKKVFHMLDKDKSGFIEEPRVA_GERSP SMTDLLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKTPDDVKKVFHILDKDKSGFIEEPRVA_MOUSE SMTDVLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKNPDEVKKVFHILDKDKSGFIEEPRVA_RAT SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEEPRVA_RABIT AMTELLNAEDIKKAIGAFAAAESFDHKKFFQMVGLKKKSTEDVKKVFHILDKDKSGFIEE :**::*.*******:***:* :****************..::******:***********

PRVA_MACFU DELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAESPRVA_HUMAN DELGFILKGFSPDARDLSAKETKMLMAAGDKDGDGKIGVDEFSTLVAESPRVA_GERSP DELGFILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVSESPRVA_MOUSE DELGSILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAESPRVA_RAT DELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAESPRVA_RABIT EELGFILKGFSPDARDLSVKETKTLMAAGDKDGDGKIGADEFSTLVSES :*** ******.******.**** *:************.:******:**

Page 32: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Respect information!

PRVA_MACFU ------------------------------------------SMTDLLN----AEDIKKAPRVA_HUMAN ------------------------------------------SMTDLLN----AEDIKKAPRVA_GERSP ------------------------------------------SMTDLLS----AEDIKKAPRVA_MOUSE ------------------------------------------SMTDVLS----AEDIKKAPRVA_RAT ------------------------------------------SMTDLLS----AEDIKKAPRVA_RABIT ------------------------------------------AMTELLN----AEDIKKATPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM : :*. .*::::

PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFIPRVA_HUMAN VGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSGFIEEDELGFIPRVA_GERSP IGAFAAADS--FDHKKFFQMVG------LKKKTPDDVKKVFHILDKDKSGFIEEDELGFIPRVA_MOUSE IGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSGFIEEDELGSIPRVA_RAT IGAFTAADS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGSIPRVA_RABIT IGAFAAAES--FDHKKFFQMVG------LKKKSTEDVKKVFHILDKDKSGFIEEEELGFITPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM :. . * .*..:*: *: * *. :::..:*:::**: .*:*: :** :

PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES-PRVA_HUMAN LKGFSPDARDLSAKETKMLMAAGDKDGDGKIGVDEFSTLVAES-PRVA_GERSP LKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVSES-PRVA_MOUSE LKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAES-PRVA_RAT LKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES-PRVA_RABIT LKGFSPDARDLSVKETKTLMAAGDKDGDGKIGADEFSTLVSES-TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE *: . .. :: .: : *: ***:.**:*. :** ::

-This alignment is not informative about the relation between TPCC MOUSE and the rest of the sequences.

-A better spread of the sequences is needed

Page 33: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Selecting diverse sequences

PRVB_CYPCA -AFAGVLNDADIAAALEACKAADSFNHKAFFAKVGLTSKSADDVKKAFAIIDQDKSGFIEPRVB_BOACO -AFAGILSDADIAAGLQSCQAADSFSCKTFFAKSGLHSKSKDQLTKVFGVIDRDKSGYIEPRV1_SALSA MACAHLCKEADIKTALEACKAADTFSFKTFFHTIGFASKSADDVKKAFKVIDQDASGFIEPRVB_LATCH -AVAKLLAAADVTAALEGCKADDSFNHKVFFQKTGLAKKSNEELEAIFKILDQDKSGFIEPRVB_RANES -SITDIVSEKDIDAALESVKAAGSFNYKIFFQKVGLAGKSAADAKKVFEILDRDKSGFIEPRVA_MACFU -SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEPRVA_ESOLU --AKDLLKADDIKKALDAVKAEGSFNHKKFFALVGLKAMSANDVKKVFKAIDADASGFIE : *: .: . .* .:*. * ** *: * : * :* * **:**

PRVB_CYPCA EDELKLFLQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA-PRVB_BOACO EDELKKFLQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKGPRV1_SALSA VEELKLFLQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ-PRVB_LATCH DEELELFLQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA-PRVB_RANES QDELGLFLQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA-PRVA_MACFU EDELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAESPRVA_ESOLU EEELKFVLKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA :** .*:.* .* *: ** :: .* **** **::** **

-A REASONABLE model now exists.

-Going further:remote homologues.

Page 34: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Aligning remote homologuesPRVA_MACFU ------------------------------------------SMTDLLNA----EDIKKAPRVA_ESOLU -------------------------------------------AKDLLKA----DDIKKAPRVB_CYPCA ------------------------------------------AFAGVLND----ADIAAAPRVB_BOACO ------------------------------------------AFAGILSD----ADIAAGPRV1_SALSA -----------------------------------------MACAHLCKE----ADIKTAPRVB_LATCH ------------------------------------------AVAKLLAA----ADVTAAPRVB_RANES ------------------------------------------SITDIVSE----KDIDAATPCS_RABIT -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAITPCS_PIG -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAITPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM : ::

PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFIPRVA_ESOLU LDAVKAEGS--FNHKKFFALVG------LKAMSANDVKKVFKAIDADASGFIEEEELKFVPRVB_CYPCA LEACKAADS--FNHKAFFAKVG------LTSKSADDVKKAFAIIDQDKSGFIEEDELKLFPRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKFPRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLFPRVB_LATCH LEGCKADDS--FNHKVFFQKTG------LAKKSNEELEAIFKILDQDKSGFIEDEELELFPRVB_RANES LESVKAAGS--FNYKIFFQKVG------LAGKSAADAKKVFEILDRDKSGFIEQDELGLFTPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEITPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEITPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM : . .: .. . *: * : * :* : .*:*: :** .

PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES-PRVA_ESOLU LKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA-PRVB_CYPCA LQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA--PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG-PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ--PRVB_LATCH LQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA--PRVB_RANES LQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA--TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQTPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQTPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE :: .. :: : :: .* :.** *. :** ::

Page 35: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Going further…

PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFIPRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKFPRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLFTPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEITPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEITPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMMTPC_PATYE SDEMDEEATGRLNCDAWIQLFER---KLKEDLDERELKEAFRVLDKEKKGVIKVDVLRWI . : .. . :: . : * :* : .* *. : * .

PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES--PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG--PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ---TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ-TPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQ-TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE-TPC_PATYE LS---SLGDELTEEEIENMIAETDTDGSGTVDYEEFKCLMMSSDA : . :: : :: * :..* :. :** ::

Page 36: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

What makes a good alignment…

The more divergeant the sequences, the better The fewer indels, the better Nice ungapped blocks separated with indels Different classes of residues within a block:

Completely conserved Size and hydropathy conserved Size or hydropathy conserved

The ultimate evaluation is a matter of personal judgment and knowledge

Page 37: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Avoiding pitfalls

Page 38: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Keep a biological perspective

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

chite AD--K----PKR-PLYMLWLNS-ARESIKRENPDFK-VT-EVAKKGGELWRGL- wheat -DPNK----PKRAP-FFVFMGE-FREEFKQKNPKNKSVA-AVGKAAGERWKSLStrybr -K--KDSNAPKR-AMT-MFFSSDFR-S-KH-S-DLS-IV-EMSKAAGAAWKELG mouse ----K----PKR-PRYNIYVSESFQEA-K--D-D-S-AQGKL-KLVNEAWKNLS * *** .:: ::... : * . . . : * . *: *

chite KSEWEAKAATAKQNY-I--RALQE-YERNG-G-wheat KAPYVAKANKLKGEY-N--KAIAA-YNK-GESAtrybr RKVYEEMAEKDKERY----K--RE-M-------mouse KQAYIQLAKDDRIRYDNEMKSWEEQMAE----- : : * : .* :

DIFFERENTPARAMETERS

Page 39: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Do not overtune!!!

DO NOT PLAY WITH

PARAMETERS! IF YOU KNOW

THE ALIGNMENT YOU WANT:

MAKE IT YOURSELF!

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

chite ---ADKPKRPL-SAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAP-SAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS-----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPR-SAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. * .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

Page 40: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Choosing the right methodPROBLEM

PROGRAM

ClustalW

ClustalW

MSA

DIALIGN II

DIALIGN II

METHOD

Source: BaliBase

Thompson et al, NAR, 1999

Page 41: Multiple alignments,  PATTERNS, PSI-BLAST

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2002.10

Conclusion

The best alignment method: Your brain The right data

The best evaluation method: Your eyes Experimental information

(SwissProt) What can I conclude?

Homology -> information extrapolation

How can I go further? Patterns Profiles HMMs …