21
Multiple Sequence Alignments It is God’s privilege to conceal things, but the kings’ pride is to research them. (Proverbs 25:2; ascribed to King Solomon of Israel, BC 1000) 1-4, Jan, 2006 Protein Folding Winter School Keehyoung Joo School of Computational Sciences, KIAS, Seoul, Korea

Multiple Sequence Alignments

  • Upload
    alder

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Multiple Sequence Alignments. It is God’s privilege to conceal things, but the kings’ pride is to research them. (Proverbs 25:2; ascribed to King Solomon of Israel, BC 1000). 1-4, Jan, 2006 Protein Folding Winter School Keehyoung Joo School of Computational Sciences, KIAS , Seoul, Korea. - PowerPoint PPT Presentation

Citation preview

Page 1: Multiple Sequence Alignments

Multiple Sequence Alignments

It is God’s privilege to conceal things, but the kings’ pride is to research them.

(Proverbs 25:2; ascribed to King Solomon of Israel, BC 1000)

1-4, Jan, 2006 Protein Folding Winter School

Keehyoung Joo

School of Computational Sciences, KIAS, Seoul, Korea

Page 2: Multiple Sequence Alignments

The major goal of computational sequence analysis is to predict the structure and function of genes and proteins from their sequence.

Page 3: Multiple Sequence Alignments

Contents How to make your model from sequence ? What is a Multiple Sequence Alignment

(MSA)? How can I use a MSA (Motivation) ? What is the matter of MSA ?

The choice of the sequences The choice of an objective function The optimization of that function

How to make MSA ?

Page 4: Multiple Sequence Alignments

How to make your model from sequence ? Tertiary structure

prediction methods Homology modeling Fold Recognition Ab. Initio method

T TC

R

C

PS I V A

SN

F

Fold DB

Protein Data Bank

Unknown Sequence

Find template folds and alignment

Modeling from templates and alignment

Page 5: Multiple Sequence Alignments

What is a Multiple Sequence Alignment

MSA can be seen as a generalization of Pairwise Sequence Alignment.

Page 6: Multiple Sequence Alignments

How can I use a MSA (Motivation) Clustering, classification,

or categorization of genes/proteins.

Identification of conserved region.

Detecting point mutations.

Deducing evolutionary relationship and phylogenetic tree.

Assist in predicting secondary and tertiary structure.

Page 7: Multiple Sequence Alignments

What is the matter of MSA ? It stands at the cross road of three distinct

technical difficulties.

Unknown Sequence

Choice of the sequences

Choice of an objective function

Database Search

What is a good alignment? (Biology)

Optimization of that function

What is the good alignment? (Computation)

Page 8: Multiple Sequence Alignments

The Choice of the sequences :Sequences sharing a common ancestor

(homologous sequences) PSI-BLAST, FASTA, Various Search Tools

The Choice of an objective functionBiological problem that lies in the definition of

correctness Sum of pair, Entropy score, Consistency based,

… The Optimization of that function

Exact Algorithms (Dynamic Programming) Progressive alignment (ClustalW) Iterative approaches (SA, GA, …)

Page 9: Multiple Sequence Alignments

Example : Sum of pair score

Seq A: ARGTCAGATACGLAG---PGMCTETWV

Seq B: ARATCGGAT---IAGTIYPGMCTHTWV

Scoring substitutions are represented in matrices. The popular ones are PAM or BLOSUM.

L

iiiLL basubBAS

1

),(),(

Sequence alignments

Page 10: Multiple Sequence Alignments

n

iikj

L

k

n

i

n

ijkin AGAAsubAScore

1,

1

1

1 1, )(),()(

Seq A1: ARGTCAGATACGLAG---PGMCTETWV----

Seq A2: ARATCGGAT---IAGTIYPGMCTHTWVIAGQ

Seq A3: ARATCE--TACG--GTI-PGMCTHTWVIA--

bnaAG i )(

Example : Sum of pair score (Cont.)

Multiple Sequence alignments

Exact method : multi-dimensional dynamic programming

-Time complexity O(Ln2n), Space complexity O(Ln)

Page 11: Multiple Sequence Alignments

How to make a MSA (Methods)

Page 12: Multiple Sequence Alignments

Recent research in literature MAFFT (2002) based on fast fourier transform MUSCLE (2004) progressive alignment, pairwise profile

alignment, position specific gap penalty, PROBCONS (2005) progressive alignment, probability table

using HMM, probabilistic consistency-based MSA

Page 13: Multiple Sequence Alignments

Example : Progressive alignment

MSA by adding sequences

Pairwise Alignment

1 + 2

3 + 4

1 + 3

1 + 4

2 + 4

2 + 3

Guide Tree

1

2

3

4

2

3

4

1

Page 14: Multiple Sequence Alignments

Progressive alignment (cont.)

1 2 3 4 5

1

2

3

4

5

Sequence

Distance Matrix:

displays distances of all sequence pairs.

1

45

3

2

Guide Tree

UPGMA (unweighted pair group method of arithmetic averages)

or Neighbour-Joining method

D = 1 - S

Page 15: Multiple Sequence Alignments

UPGMA Clustering (Guide Tree)

d ij 1 2 3 4 51 0 2 6 9 72 0 5 7 73 0 5 44 0 35 0

d ij20

.5

d ij.5

30

55..

d ij u wu 0 6w 0

.8405.

6 0

.5

12

3

5

4

12

3

5

4

12

3

5

4

12

3

5

4

u 3 4 5u 0 5 8 73 0 5 44 0 35 0

u 3 vu 0 5 73 0 4v 0

Page 16: Multiple Sequence Alignments

Progressive alignment (cont.)

Columns - once aligned - are never changed. . . and new gaps are inserted. Depend strongly on pairwise alignments and the intitial starting sequences No guarantee that the global optimal solution will be found. In case of sequences identity less than 25-30%, this approach become much

less reliable.

1

45

3

2

Guide Tree

21

Alignment of alignments

Page 17: Multiple Sequence Alignments

Progressive Alignment: Discussion Strengths:

Speed Progression biologically sensible (aligns using a tree)

Weaknesses: No objective function. No way of quantifying whether or not the

alignment is good Local minimum problem

Page 18: Multiple Sequence Alignments

Consistency based score functionCoffee Score function (Cedric Nortredame) : Given a set of sequences, the optimal MSA is defined as the one that agrees the most with all the possible optimal pair-wise alignments

1

1 1

1

1 1

)(

)(

N

i

N

ijijij

N

i

N

ijijij

total

ALenW

AScoreW

SCOREScore(Aij) = Number of aligned pairs of residues that are shared between Aij and the library.

- do not depend on a specific substitution matrix

- position dependant alignment.

- the most consistent are often closer to the truth

Page 19: Multiple Sequence Alignments

Summary MSAs are essential tools in computational biology

and bioinformatics. They are required for structure /function analysis and structure prediction.

No perfect method exists for assembling a MSA and all the available methods do approximations.

The most commonly used methods for MSA use a progressive alignment algorithm (ClustalW)

Recent progress have focused on the desigh of iterative (Prrp, SAGA) and consistency based methods (T-Coffee, probcons)

Page 20: Multiple Sequence Alignments

MSA applications Profile-profile alignment

Profile: A table that lists the frequencies of each amino acid in each position of MSA. Profile can be used in database searches Find new sequences that match the profile

Improve search sensitivity Improve search accuracy

Page 21: Multiple Sequence Alignments

Example: Profiles Profile: A table that lists the frequencies of each amino

acid in each position of protein sequence. Frequencies are calculated from a MSA containing a domain

of interest Allows us to identify consensus sequence Derived scoring scheme allows us to align a new sequence

to the profile Profile can be used in database searches Find new sequences that match the profile

Profiles also used to compute multiple alignments heuristically Progressive alignment