Dynamic Programming for Pairwise Alignment

Dynamic Programmingfor

Pairwise Alignment

Dr Alexei Drummond

Department of Computer Science

[email protected]

BIOSCI 359, Semester 2, 2006

2

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Dynamic Programming

• method for solving combinatorial optimisation problems

• guaranteed to give optimal solution

• generalisation of “divide-and-conquer”

• relies on “Principle of Optimality”

i.e. sub-optimal solution of subproblem cannot be part of optimal solution of original problem instance.

3

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Auckland

Te Kuiti

Wellington

Principle of Optimality

4

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Auckland

Te Kuiti

Wellington


5

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Key to efficiency• computation is carried out bottom-up

• store solutions to subproblems in a table

• all possible subproblems solved once each, beginning with smallest subproblems

• work up to original problem instance

• only optimal solutions to subproblems are used to compute solution to problem at next level

• DO NOT carry out computation in recursive, top-down manner

• same subproblems would be solved many times

6

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Pairwise alignment

Sequences

x = a c g g t sy = a w g c c t t

Alignment

x = a – c g g – t sy = a w – g c c t t

7

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Scoring

• Numeric score associated with each column

• Total score = sum of column scores

• Column types:

(1) Identical (+ve) (2) Conservative (+ve)

(3) Non-conservative (-ve) (4) Gap (-ve)

x = a – c g g – t sy = a w – g c c t t

8

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

• Linear score: (g) = -gd

gap penality

• Affine score: (g) = -d - (g-1)e

gap-open penality gap-extension penalty

Gap penalties

----------g

y

x

9

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

BLOSUM50 matrix“Blocks Amino Acid Substitution Matrix”

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

10

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Needleman & Wunsch algorithm• Dynamic programming algorithm for global alignment

• Needleman & Wunsch (‘70), modified Gotoh (‘82)

Assumptions:

Linear gap score d

Symmetric scoring matrix S

s(a,b) = s(b,a) score from lining up a and b

s(a,-) = s(-,a) = -d score from lining up a with -

11

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Given sequences:

Define:

F(i,j) = score of best alignment

between

and

€

Y = (y1,y2,...,yn )

X = (x1,x2,...,xm )

€

(x1,x2,...,x i)

€

(y1,y2,...,y j )

12

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

€

F(i, j)

13

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

Looks like ……

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i,y j )

14

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

Looks like ……

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i,y j )

or ……………

€

x1,x2,x3,...,x i

€

y1,y2,y3,...,y j−1

€

−

€

y j

€

F(i, j −1) − d

15

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

Looks like ……

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i,y j )

or ……………

€

x1,x2,x3,...,x i

€

y1,y2,y3,...,y j−1

€

−

€

y j

€

F(i, j −1) − d

or ……………

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j

€

x i

€

−

€

F(i −1, j) − d

16

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

Looks like ……

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i,y j )

or ……………

€

x1,x2,x3,...,x i

€

y1,y2,y3,...,y j−1

€

−

€

y j

€

F(i, j −1) − d

or ……………

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j

€

x i

€

−

€

F(i −1, j) − d

so ……………

€

F(i −1, j −1) + s(x i,y j )

F(i, j) = max F(i, j −1) − d

F(i −1, j) − d

17

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Basis:

€

x1, x2, x3, ..., x i

€

− − − − ... −

€

y1, y2, y3, ..., y j

€

− − − − ... −

€

F(i,0) = F(i −1,0) + s(x i,−)

€

F(0, j) = F(0, j −1) + s(−,y j )

€

F(0,0) = 0

18

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

19

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

20

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

21

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

22

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

23

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

24

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

25

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

26

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

27

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

28

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

29

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

30

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

31

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

Optimalalignmentscore

32

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Constructing alignment

0

F matrix

0

1

2

m

0 1 2 n

X

Y


33

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X

Y


H E A G A W G H E E

P

A

W

H

E

A

E

34

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

? ? ? ? ? ? ? ? ? ? E

X

Y

Y

H E A G A W G H E E

? ? ? ? ? ? ? ? ? ? EAlignment

35

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

X

Y ? ? ? ? ? ? ? ? ? - E

? ? ? ? ? ? ? ? ? A EAlignment

36

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

X

Y ? ? ? ? ? ? ? ? E - E

? ? ? ? ? ? ? ? E A EAlignment

37

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

X

Y ? ? ? ? ? ? ? H E - E

? ? ? ? ? ? ? H E A EAlignment

38

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

X

Y ? ? ? ? ? ? G H E - E

? ? ? ? ? ? - H E A EAlignment

39

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? ? ? ? ? W G H E - E

? ? ? ? ? W - H E A E

40

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? ? ? ? A W G H E - E

? ? ? ? A W - H E A E

41

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? ? ? G A W G H E - E

? ? ? - A W - H E A E

42

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? ? A G A W G H E - E

? ? P - A W - H E A E

43

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? E A G A W G H E - E

? - P - A W - H E A E

44

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y H E A G A W G H E - E

- - P - A W - H E A E

45

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Time and space

€

⇒ Θ(mn)

F matrix

0

1

2

m

0 1 2 n

€

(m +1) × (n +1) table entries space

Each entry computed in constant time

€

⇒ Θ(mn) time

46

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Smith & Waterman algorithm

Computes local alignment.

i.e. look for best alignment of subsequences of X and Y, ignoring scoresof regions on either side

Y

X

Best subsequence alignment

47

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Given sequences

Define F(i,j) = score of best suffix alignment

between

and

N.B. Includes empty alignment with score 0

€

Y = (y1,y2,...,yn )

X = (x1,x2,...,xm )

€

(xs,xs+1,...,x i) where s ≤ i

€

(yr,yr+1,...,y j ) where r ≤ j

48

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Dynamic Programming recurrences

Optimal alignment

€

xr, xr+1, xr+2, ...,x i

€

ys, ys+1, ys+2, ..., y j

Looks like ……

€

xr,xr+2,xr+2,...,x i−1

€

ys,ys+1,ys+2,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i,y j )

or ……………

€

xr,xr+1,xr+2,...,x i

€

ys,ys+1,ys+2,...,y j−1

€

−

€

y j

€

F(i, j −1) − d

or ……………

€

xr,xr+1,xr+2,...,x i−1

€

ys,ys+1,ys+2,...,y j

€

x i

€

−

€

F(i −1, j) − d

or ……………

€

xr, xr+1, xr+2, ...,x i

€

ys, ys+1, ys+2, ..., y j

€

0

49

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


so ……

€

0

F(i −1, j −1) + s(x i,y j )

F(i, j) = max F(i, j −1) − d

F(i −1, j) − d

€

F(i,0) = F(0, j) = 0Basis:

50

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

F H E A G A W G H E E

0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 5 0 0 0 0 0

W 0 0 0 0 2 0 20 12 4 0 0

H 0 10 2 0 0 0 12 18 22 14 6

E 0 2 16 8 0 0 4 10 18 28 20

A 0 0 8 21 13 5 0 4 10 20 27

E 0 0 6 13 18 12 4 0 4 16 26

51

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example


0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 5 0 0 0 0 0

W 0 0 0 0 2 0 20 12 4 0 0

H 0 10 2 0 0 0 12 18 22 14 6

E 0 2 16 8 0 0 4 10 18 28 20

A 0 0 8 21 13 5 0 4 10 20 27

E 0 0 6 13 18 12 4 0 4 16 26

AlignmentX

Y A W G H E

A W - H E

52

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Repeated (local) matches

Long sequences - interested in all local alignments with significant score,> threshold T.

e.g. copies of repeated domain or motif in a protein.

X = sequence containing motif

Y = target sequence

Method is asymmetric

Y

Matching parts of X

53

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


Given sequences

Define F(i,j) (i ≥ 1) = best sum of match scores in

and €

Y = (y1,y2,...,yn )

X = (x1,x2,...,xm )

€

(x1,x2,...,x i)

€

(y1,y2,...,y j )

€

y j

€

x i

€

y j

assuming

and match ends in

is in a matched region

or

54

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Ends of matches

€

F(0,0) = 0

€

F(0, j) = best sum of completed match scores to

€

(y1,y2,...,y j )

assuming that

€

y j is not in a matched region

€

F(0, j −1)

F(0, j) = max F(i, j −1) −T, i =1,...,n

Row 0 therefore marks unmatched regions and ends of matches in Y.

55

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

General recurrence

€

F(0, j)

F(i −1, j −1) + s(x i,y j )

F(i, j) = max F(i, j −1) − d

F(i −1, j) − d

Start of new match

Extension of previous match

56

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

ExampleF H E A G A W G H E E

0 0 0 0 1 1 1 1 1 3 9

P 0 0 0 0 1 1 1 1 1 3 9

A 0 0 0 5 1 6 1 1 1 3 9

W 0 0 0 0 2 1 21 13 5 3 9

H 0 10 2 0 1 1 13 19 23 15 9

E 0 2 16 8 1 1 5 11 19 29 21

A 0 0 8 21 13 6 1 5 11 21 28

E 0 0 6 13 18 12 4 1 5 17 27

9

Extra cell for final total score

57

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Example

AlignmentX

Y H E A G A W G H E E

H E A . A W - H E .

Extra cell for final total score


0 0 0 0 1 1 1 1 1 3 9

P 0 0 0 0 1 1 1 1 1 3 9

A 0 0 0 5 1 6 1 1 1 3 9

W 0 0 0 0 2 1 21 13 5 3 9

H 0 10 2 0 1 1 13 19 23 15 9

E 0 2 16 8 1 1 5 11 19 29 21

A 0 0 8 21 13 6 1 5 11 21 28

E 0 0 6 13 18 12 4 1 5 17 27

9

58

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Overlap matchesY Y

X X

YY

X X

Don’t penalise overhanging ends i.e. set F(i,0) = F(0,j) = 0

€

F(i −1, j −1) + s(x i,y j )

F(i, j) = max F(i, j −1) − d

F(i −1, j) − d

Otherwise

59

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


0 0 0 0 0 0 0 0 0 0 0

P 0 -2̀ -1 -1 -2 -1 -4 -2 -2 -1 -1

A 0 -2 -2 4 -1 3 -4 -4 -4 -3 -2

W 0 -3 -5 -4 1 -4 18 10 2 6 -6

H 0 10 2 6 -6 -1 10 16 20 12 4

E 0 2 16 8 0 7 2 8 16 26 18

A 0 -2 8 21 13 5 3 2 8 18 25

E 0 0 4 13 18 12 4 4 2 14 24

60

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent


0 0 0 0 0 0 0 0 0 0 0

P 0 -2̀ -1 -1 -2 -1 -4 -2 -2 -1 -1

A 0 -2 -2 4 -1 3 -4 -4 -4 -3 -2

W 0 -3 -5 -4 1 -4 18 10 2 6 -6

H 0 10 2 6 -6 -1 10 16 20 12 4

E 0 2 16 8 0 7 2 8 16 26 18

A 0 -2 8 21 13 5 3 2 8 18 25

E 0 0 4 13 18 12 4 4 2 14 24

AlignmentX

Y G A W G H E E

P A W - H E A

61

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Affine gap penalities

• Affine score: (g) = -d - (g-1)e

gap-open penalty gap-extension penalty

• Different penalties associated with extending alignment with gap symbol

Y = C C T W PX = C S T W -

Y = C C T W PX = C S T - -

different from

62

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

General recurrence

€

F(i −1, j −1) + s(x i,y j )

F(i, j) = max F(k, j) + γ(i − k), k = 0,1,...,i −1

(i, j > 0) F(i,k) + γ ( j − k), k = 0,1,..., j −1

Extend by matching

€

x i and y j

Extend by matching suffix of X to gap of length k

Extend by matching suffix of Y to gap of length k

€

Θ(n3)Problem: Procedure runs in worst-case time

63

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

version

€

Θ(n2)

Extra variables

€

M(i, j) = best score of alignment of (x1,x2,...,x i) and

(y1,y2,...,y j ) given that x i is aligned with y j Ix (i, j) = best score of alignment of (x1,x2,...,x i) and

(y1,y2,...,y j ) given that x i is aligned with a gap

Iy (i, j) = best score of alignment of (x1,x2,...,x i) and

(y1,y2,...,y j ) given that y j is aligned with a gap

64

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Recurrences

€

M(i −1, j) − d

Ix (i, j) = max Ix (i −1, j) − e

(i, j > 0)

M(i, j −1) − d

Iy (i, j) = max Iy (i, j −1) − e

(i, j > 0)

M(i −1, j −1) + S(x i,y j )

M(i, j) = max Ix (i −1, j −1) + S(x i,y j )

Iy (i −1, j −1) + S(x i,y j )

(i, j > 0)

aligned to start of gap

€

x i

€

Θ(n2)Procedure runs in worst-case time

aligned to continuation of gap

€

x i

aligned to start of gap

€

y j

aligned to continuation of gap

€

y j

65

Dyn

amic

Pro

gra

mm

ing

fo

r P

airw

ise

Alig

nm

ent

Linear space alignment

Hirschberg’s insight

F

m

n00

€

m2⎣ ⎦

Documents

Dynamic Programming for Pairwise Alignment