30
Multiple Sequence Alignment Multiple Sequence Alignment Sum-of-Pairs and ClustalW Ulf Leser

Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Multiple Sequence AlignmentMultiple Sequence AlignmentSum-of-Pairs and ClustalW

Ulf Leser

Page 2: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

This Lecture

• Multiple Sequence AlignmentMultiple Sequence Alignment

• The problem• The problem• Theoretical approach: Sum-of-Pairs scores• Practical approach: ClustalW• Practical approach: ClustalW

Ulf Leser: Bioinformatics, Summer Semester 2011 2

Page 3: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Multiple Sequence Alignmentp q g

• We now align multiple (k>2) sequencesWe now align multiple (k>2) sequences– Note: Also BLAST aligns only two sequences

• Why?y– Imagine k sequences of the promoter region of genes, all regulated

by the same transcription factor f. Which subsequence within the k sequences is recognized by f?sequences is recognized by f?

– Imagine k sequences of proteins that bind to DNA. Which subsequence of the k sequences code for the part of the proteins that performs the binding?

• GeneralW t t k th t( ) i k– We want to know the common part(s) in k sequences

– “common” does not mean identical– This part can be anywhere within the sequences

Ulf Leser: Bioinformatics, Summer Semester 2011 3

This part can be anywhere within the sequences

Page 4: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Definition

• DefinitionDefinition– A multiple sequence alignment (MSA) von k Strings si, 1≤i≤k, is a

table of k rows and l columns (l≥max(|sk|)), such thath f h b b f bl k• Row i contains the sequence of si, with an arbitrary number of blanks

being inserted at arbitrary positions• Every symbol of every si stands in exactly one column• No column contains only blanks

AACGTGATTGACTCGAGTGCTTTACAGTGCCGTGCTAGTCG

AACGTGATTGA____C_____TCGAGT__GCTTTACAGT_GCCGTGCTAGT____C_G__

TTCAGTGGACGTGGTAGGTGCAGACC

TTCA_GTG_GACGTG__GTAG__GTGCA_GAC___C____

Ulf Leser: Bioinformatics, Summer Semester 2011 4

Page 5: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Good MSA

• We are searching for good (optimal) MSAsWe are searching for good (optimal) MSAs• Defining „optimal“ here is not as simple as in the k=2 case• IntuitionIntuition

– All sequences had a common ancestor and evolved by evolution– We want to assume as few evolutionary events as possible– Thus, we want few columns (~ few INSDELs)– Thus, we want homogeneous columns (~ few replacements)

AACGTGATTGA____C_____TCGAGT__GCTTTACAGT_GCCGTGCTAGT C G

AAC__GTG_AT_T_GAC___TCGAGTGC_TTTACA_GTGCCG TGC TA GTCGGCCGTGCTAGT____C_G__

TTCA_GTG_GACGTG__GTAG__GTGCA_GAC___C____

GCCG__TGC_TA__GTCG_TTC_AGTGGACGTG__GTAG____GTGCA__TGACC__

Ulf Leser: Bioinformatics, Summer Semester 2011 5

Page 6: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

This Lecture

• Multiple Sequence AlignmentMultiple Sequence Alignment

• The problem• The problem• Theoretical approach: Sum-of-Pairs scores• Practical approach: ClustalW• Practical approach: ClustalW

Ulf Leser: Bioinformatics, Summer Semester 2011 6

Page 7: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

What Should we Count?

• For two sequencesFor two sequences– We scored each column using a scoring matrix– Find the alignment such that the total score is maximal

• But – how do we score a column with 5*T, 3*A, 1*_?– We would need an exponentially big scoring matrix

• Alternative: Sum-of-Pairs Score– We score an entire MSA

W th li t f h i f i th l– We score the alignment of each pair of sequences in the usual way– We aggregate over all pairs to score the MSA– We need a clever algorithm to find the MSA with the best scoreWe need a clever algorithm to find the MSA with the best score

Ulf Leser: Bioinformatics, Summer Semester 2011 7

Page 8: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Formallyy

• DefinitionDefinition– Let M be a MSA for the set S of k sequences S={s1,...,sk}– The alignment of si with sj induced by M is generated as follows

• Remove from M all rows except i and j• Remove all columns that contain only blanks

– The sum-of-pairs score (sop) of M is the sum of all pair-wiseThe sum of pairs score (sop) of M is the sum of all pair wise induced alignment scores

– The optimal MSA for S wrt. to sop is the MSA with the highest sop-ll ibl MSA f Sscore over all possible MSA for S

Ulf Leser: Bioinformatics, Summer Semester 2011 8

Page 9: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Examplep

d/i = 11

AAGAA_AAT_AATGCTG_G_G

5 }54

14r = 1m = 0

AAGAA_A 4 16}_ATAATGC_TGG_G

754 16}

• Given a MSA over k sequences of length l – how complex is it to compute its sop-score?

• How do we find the best MSA?

Ulf Leser: Bioinformatics, Summer Semester 2011 9

Page 10: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Analogygy

• Think of the k=2 caseThink of the k 2 case• Every alignment is a path

through the matrixg• The three possible

directions (down, right, down-right) conform to the three possible constellations in a columnconstellations in a column (XX, X_, _X)

• With growing paths we• With growing paths, we align growing prefixes of both sequences

Ulf Leser: Bioinformatics, Summer Semester 2011 10

q

Page 11: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Analogygy

• Assume k=3Assume k 3• Think of a 3-dimensional

cube with the three sequences giving the values in each dimension

• Now, we have paths aligning growing prefixes of three sequencesthree sequences

• Every column has seven possible constellationspossible constellations (XXX, XX_, X_X, _XX, X__, _X_, __X)

Ulf Leser: Bioinformatics, Summer Semester 2011 11

_ _, __ )

Page 12: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Dynamic Programming in two Dimensionsy g g

• We compute the best possible alignment d(i,j) for everyWe compute the best possible alignment d(i,j) for every pair of prefixes (lengths i,j) using the following formula

⎪⎬

⎫⎪⎨

⎧+−+−

= 1),1(1)1,(

min),( jidjid

jidInsertion in one seq

Insertion in the other seq⎪⎭

⎬⎪⎩

⎨+−− ),()1,1(),(),(

jitjidjj q

Match or mismatch

Ulf Leser: Bioinformatics, Summer Semester 2011 12

Page 13: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Dynamic Programming in three Dimensionsy g g

• We compute the best possible alignment d(i,j,k) for everyWe compute the best possible alignment d(i,j,k) for every triple of prefixes (lengths i,j,k ) using the following formula

d(i-1,j-1,k-1) + cij + cik + cjkd(i-1,j-1,k) + cij + 2d(i 1 j k 1) + c + 2

Three (mis)matchesOne (mis)match, two ins …

d(i,j,k) = mind(i-1,j,k-1) + cik + 2d(i,j-1,k-1) + cjk + 2d(i-1,j,k) + 2d(i j 1 k) + 2d(i,j-1,k) + 2d(i,j,k-1) + 2

Let cij = 0, if S1(i) = S2(j), else 1Let cik = 0, if S1(i) = S3(k), else 1Let c = 0 if S (j) = S (k) else 1

Ulf Leser: Bioinformatics, Summer Semester 2011 13

Let cjk = 0, if S2(j) = S3(k), else 1

Page 14: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

All Possible Stepsp

• d(i-1 j-1 k-1)1 G2 A• d(i-1,j-1,k-1)

• d(i,j-1,k-1)

2 A3 C 1 -

2 A3 C

• d(i,j,k-1)

• d(i,j-1,k)

1 -2 -3 C

1 -d(i,j 1,k)

• d(i-1,j,k)

1 -2 A3 -

1 G2 -

• d(i-1,j-1,k)

• d(i-1,j,k-1)

23 -

1 G2 A3 -1 G

2 -3 C

3 -

Ulf Leser: Bioinformatics, Summer Semester 2011 14

Page 15: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Concrete Examplesp

d(i,j-1,k) d(i-1,j,k-1)( ,j , )

1 G2 -

1 -2 A

( ,j, )

3 C3 -

• Best sop-score for d(i,j-1,k) is known

• Best sop-score for d(i-1, j,k-1) is known

• We want to compute d(i,j,k)• This requires to align one

symbol with two blanks

• We want to compute d(i,j,k)• This requires aligning a blank

with s1[i-1] and with s3[k-1]symbol with two blanks (blank/blank does not count)

• d(i,j,k) = d(i,j-1,k) + 2

with s1[i 1] and with s3[k 1] and to align s1[i-1] and s3[k-1]

• d(i,j,k) = d(i-1,j,k-1) + 2 + cik

Ulf Leser: Bioinformatics, Summer Semester 2011 15

Page 16: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Initialization

• We need to start somewhereWe need to start somewhere

• Of course, we have d(0,0,0)=0Of course, we have d(0,0,0) 0• Aligning in one dimension: d(i,0,0)=2*i

– Dito for d(0,j,0), d(0,0,k)( ,j, ), ( , , )

• Aligning in two dimensions: d(i,j,0)= …– Let da,b(i,j) be the alignment score for Sa[1..i] with Sb[1..j],

– d(i, j, 0) = d1,2(i, j) + (i+j) – d(i, 0, k) = d1,3(i, k) + (i+k)– d(0 j k) = d2 3(j k) + (j+k)d(0, j, k) = d2,3(j, k) + (j+k)

Ulf Leser: Bioinformatics, Summer Semester 2011 16

Page 17: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Algorithmg

initialize matrix d;for i := 1 to |S1|

for j := 1 to |S2|for k := 1 to |S3|

if (S1(i) = S2(j)) then c := 0; else c := 1;if (S1(i) = S2(j)) then cij := 0; else cij := 1;if (S1(i) = S3(k)) then cik := 0; else cik := 1;if (S2(j) = S3(k)) then cjk := 0; else cjk := 1;d1 := d[i – 1,j – 1,k - 1] + cij + cik + cjk;d2 := d[i – 1,j – 1,k] + cij + 2;d3 := d[i – 1,j,k - 1] + cik + 2;d4 := d[i,j – 1,k - 1] + cjk + 2;d : d[i 1 j k) + 2;d5 := d[i – 1,j,k) + 2;d6 := d[i,j – 1,k) + 2;d7 := d[i,j,k - 1) + 2;d[i,j,k] := min(d1, d2, d3, d4, d5, d6, d7);[ ,j, ] ( 1, 2, 3, 4, 5, 6, 7)

end for;end for;

end for;

Ulf Leser: Bioinformatics, Summer Semester 2011 17

Page 18: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Bad News – Complexity p y

• For 3 sequences of length n, we have to performFor 3 sequences of length n, we have to perform– There are n3 cells in the cube– For each cell (top-left-front corner), we need to look at 7 corners– Together: O(7*n3) computations

• For k sequences of length n– There are nk cell corners in the cube– For each corner, we need to look at 2k-1 other corners– Together: O(2k * nk) computations– Together: O(2k * nk) computations

• Actually the problem is NP-complete• Actually, the problem is NP-complete

Ulf Leser: Bioinformatics, Summer Semester 2011 18

Page 19: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

This Lecture

• Multiple Sequence AlignmentMultiple Sequence Alignment

• The problem• The problem• Theoretical approach: Sum-of-Pairs scores• Practical approach: ClustalW• Practical approach: ClustalW

Ulf Leser: Bioinformatics, Summer Semester 2011 19

Page 20: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Scoring a MSAg

• Let‘s take one step backLet s take one step back• What happened during evolution?

CTTGCA

GTTTCAGTTGCA

CTTGCA

GTTGACA

GTTTTCA GTTGTTA

GTATTTCAGTATTTCT

• Real number of events: 8

GTATTTCAGTATTTGA

CT TGC A• Real number of events: 8• sop-score: 2+3+6+6+2+…

– Everything is counted multiple times

_ _GT_TGACAGT_TGTTAGTATTTCT

Ulf Leser: Bioinformatics, Summer Semester 2011 20

Everything is counted multiple times GTATTTGA

Page 21: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Different Scoring Functiong

• If we knew the phylogenetic tree of the k sequencesIf we knew the phylogenetic tree of the k sequences– Align every parent with all its children– Aggregate all alignment scores– This gives the “real” number of evolutionary operations

• But: Finding the true phylogenetic tree requires a MSA– Not be covered here

• Use a heuristic: ClustalWHiggins D G and Sha p P M (1988) "CLUSTAL a package fo– Higgins, D. G. and Sharp, P. M. (1988). "CLUSTAL: a package for performing multiple sequence alignment on a microcomputer." Gene73(1): 237-44.Th J D Hi i D G d Gib T J (1994) "CLUSTAL W– Thompson, J. D., Higgins, D. G. and Gibson, T. J. (1994). "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice " Nucleic Acids Res 22(22): 4673 80

Ulf Leser: Bioinformatics, Summer Semester 2011 21

matrix choice." Nucleic Acids Res 22(22): 4673-80.

Page 22: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

ClustalW

• Main ideaMain idea– Compute a “good enough” phylogeny – the guide tree– Use the guide tree to iteratively align small MSA to larger MSA

• Starting from single sequences

– To escape the curse of dimensionality, compute MSA iteratively• “Progressive” MSA• Progressive MSA• Does not necessarily find the best solution• Needs a fast method to align two MSAs

• Works quite well in practice• Many other, newer (better) proposals

– DAlign, T-Coffee, HMMT, PRRT, MULTALIGN, ...

Ulf Leser: Bioinformatics, Summer Semester 2011 22

Page 23: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Step 1: Compute the Guide Treep p

• Compute all pair-wise alignments and store in similarityCompute all pair wise alignments and store in similarity matrix M– M[i,j] = sim(si, sj)

• Compute the guide tree using hierarchical clustering– Choose the smallest M[i,j]– Let si and sj form a new (next) branch of the tree – Compute the distance from the ancestor of si and sj to all other

sequences as the average of the distances to si and sjsequences as the average of the distances to si and sj

• Set M’ = M• Delete rows and columns i and j

Add l d (ij)• Add a new column and row (ij)• For all k≠ij: M‘[ij,k] = (M[i,k] + M[j,k]) / 2

– Iterate until M’ has only one column / row

Ulf Leser: Bioinformatics, Summer Semester 2011 23

y /

Page 24: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Sketch

A ABCDEFGA

A ACEFGaAB

CDE

AB.C..D...E

(B,D)→a

BCDE

AC.E..F...

(E,F)→b

ABCDEF

G

E....F.....G......

FG

G....a.....

EFG

ACGab ACGac A

AC.G..a...

(A,b)→c

BCDEF

CGacCG.a..c

(C,G)→d

ABCDE

b.... FG

c...

AB ae

EFG

ABacd

ac.d..

(d,c)→e BCDEF

(a,e)→fae

ae.

BCDEF

Ulf Leser: Bioinformatics, Summer Semester 2011 24

FG

FG

Page 25: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Examplep

A B C D EA

A 17 59 59 77

B 37 61 53

C 13 41

ABCD

D 21

A B E CD

E

AA B E CD

A 17 77 59

B 53 49

BCDEE 31 E

AB

AE CD AB

E 31 65

CD 54

BCDE

BCDE

Ulf Leser: Bioinformatics, Summer Semester 2011 25

E

Page 26: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Examplep

C PADKTNVKAAWGKVGAHAGEYGAD AADKTNVKAAWSKVGGHAGEYGAD AADKTNVKAAWSKVGGHAGEYGA

A PEEKSAVTALWGKVNVDEYGGB GEEKAAVLALWDKVNEEEYGGA

B

C

B GEEKAAVLALWDKVNEEEYGG

C PADKTNVKAAWG_KVGAHAGEYGAD AADKTNVKAAWS KVGGHAGEYGA

D

E

D AADKTNVKAAWS_KVGGHAGEYGAE AA__TNVKTAWSSKVGGHAPA__A

A PEEKSAV TALWG KVN VDEYGG_ _ __B GEEKAAV_LALWD_KVN__EEEYGG C PADKTNVKAA_WG_KVGAHAGEYGAD AADKTNVKAA WS KVGGHAGEYGA_ _E AA__TNVKTA_WSSKVGGHAPA__A

Once a gap, always a gap

Ulf Leser: Bioinformatics, Summer Semester 2011 26

Once a gap, always a gap

Page 27: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Step 2: Progressive MSAp g

• Pair-wise alignment in the order given by the guide treePair wise alignment in the order given by the guide tree • Aligning a MSA M1 with a MSA M2

– Use the usual (global) alignment algorithm (g ) g g– To score a column, compute the average score over all pairs of

symbols in this columns

E l• ExampleA …P…B G Score of this columnB …G…C …P…

Score of this column(2*s(P,A)+s(P,Y)+2*s(G,A)+s(G,Y)+

D …A…E …A…F …Y…

2*s(P,A)+s(P,Y) ) / 9

Ulf Leser: Bioinformatics, Summer Semester 2011 27

Page 28: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Issues

• There is a lot to say about whether hierarchical clusteringThere is a lot to say about whether hierarchical clustering actually computes the “correct” tree– See phylogenetic algorithms, „Algorithms in Bioinformatics“

• Clustal-W actually uses a different, more accurate algorithm called “neighbor-joining”

• Clustal-W is fast (O(k2*log(k))– For k sequences; plus cost for computing alignments, especially M

d b hi d i li• Idea behind progressive alignment– Find strong signals (highly conserved blocks) first

Outliers are added last– Outliers are added last– Increases the chances that conserved blocks survive– Several improvements to this scheme are known

Ulf Leser: Bioinformatics, Summer Semester 2011 28

p

Page 29: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Problems with progressive MSAp g

Ulf Leser: Bioinformatics, Summer Semester 2011 29

Source: Cedric Notredame, 2001

Page 30: Multiple Sequence AlignmentMultiple Sequence Alignment Sum ...€¦ · TTCA_GTG_GACGTG__GTA G__GTGCA_GAC___C____ Ulf Leser: Bioinformatics, Summer Semester 2011 4. Good MSA ... –

Further Readingg

• Merkl & Waack, chapter 13Merkl & Waack, chapter 13• Böckenhauer & Bongartz, chapter 5.3

Ulf Leser: Bioinformatics, Summer Semester 2011 30