35
Intro to Intro to Alignment Alignment Algorithms: Algorithms: Global and Local Global and Local Algorithmic Functions of Computational Biology Professor Istrail

Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Embed Size (px)

Citation preview

Page 1: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Intro to Alignment Intro to Alignment Algorithms: Algorithms:

Global and LocalGlobal and Local

Algorithmic Functions of Computational Biology

Professor Istrail

Page 2: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Sequence ComparisonBiomolecular sequences DNA sequences (string over 4 letter alphabet {A, C,

G, T}) RNA sequences (string over 4 letter alphabet

{ACGU}) Protein sequences (string over 20 letter alphabet

{Amino Acids})

Sequence similarity helps in the discovery of genes, and the prediction of structure and function of proteins.

Algorithmic Functions of Computational Biology

Professor Istrail

Page 3: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

The Basic Similarity Analysis Algorithm

Global Similarity

• Scoring Schemes

• Edit Graphs

• Alignment = Path in the Edit Graph

• The Principle of Optimality

• The Dynamic Programming Algorithm

• The Traceback

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 4: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Jupiter’s code: Alignment

MassaMaster

Mas-saMaster

Mass-aMaster

Massa-Master

Algorithmic Functions of Computational Biology

Professor Istrail

Page 5: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Sequence AlignmentInput: two sequences over the same alphabetOutput: an alignment of the two sequences

Example: GCGCATTTGAGCGA TGCGTTAGGGTGACCAA possible alignment: - GCGCATTTGAGCGA - -

TGCG - - TTAGGGTGACC

matchmismatch

indel

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 6: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

m

n

yyyY

xxxX

...21

...21

Consider two sequences

Over the alphabet

},,, TGCA

ji yx , belong to

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 7: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Scoring Schemes

Unit-score A C G T

A

C

G

T

1

1

1

1-

-

0

0

00

0

0

0

0 00

0

0

00

0

0

00

0

00

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 8: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Alignment

ACG| | |AGG

Score = (A,A) (C,G) (G,G)

+ +

= 1 + 0 + 1 = 2

Unit-cost

A|A

A is aligned with A

C|G

C is aligned with G

G is aligned with G

G|G

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 9: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Gaps

ACATGGAATACAGGAAAT

ACAT GG - AAT ACA - GG AAAT

OPTIMALALIGNMENTS

SCORE 7 8

AAAGGGGGGAAA

SCORE 0 3

- - - AAAGGG GGGAAA - - -

“-” is the gap symbol

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 10: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

(x,y) = the score for aligning x with y

(-,y) = the score for aligning - with y

(x,-) = the score for aligning x with -

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 11: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

A-CG - GATCGTG

Alignment

Score

(A,A) + (G,G) +(C,C) +(-,T) + (-,T ) + (G,G)

THE SUM OF THE SCORES OF THE PAIRWISE ALIGNED SYMBOLS

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 12: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Scoring Scheme

Dayhoff score

...

PTIPLSRLFDNAMLRAHRLHQSAIENQRLFNIAVSRVQHLHL

Partial alignment for Monkey and Trout somatotropin proteins

- A R N D C Q E G H I L K M F P S T W Y V

-8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8-8 3 -3 0 0 -3 -1 0 1 -3 -1 -3 -2 -2 -4 1 1 1 -7 -4 0A

RND

64

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 13: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Scoring Functions

Scoring function = a sum of a terms each for a pair of aligned residues, and for each gap

The meaning = log of the relative likelihood that the sequences are related, compared to being unrelated

Identities and conservative substitutions are Positive terms

Non-conservative substitutions are Negative terms

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 14: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

The Edit Graph

Suppose that we want to align AGT with AT

We are going to construct a graph where alignments between the two sequences correspond to paths between the begin and and end nodes of the graph.

This is the Edit Graph

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 15: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

0 1 2 3

0

1

2

AGT has length 3AT has length 2

The Edit graph has (3+1)*(2+1) nodes

The sequence AGT

The sequence AT

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 16: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

0 1 2 30

1

2

A G T

A

T

AGT indexes the columns, and AT indexes the rows of this “table”

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 17: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

0 1 2 30

1

2

A G T

A

T

The Graph is directed. The nodes (i,j) will hold values.

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 18: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

T0 1 2 30

1

2

A G

A

T

Algorithmic Functions of Computational Biology

Professor Istrail

Page 19: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

T

A

T

0 1 2 30

1

2

A GA-

-A

AA

-A

-A-

A

-A

G-

A-

A-

G-

G-

T-

T-

T-

-T

-T

-T

-T

AT

GT

TT

GA

TA

Directed edges get as labels pairs of aligned letters.

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 20: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Alignment = Path in the Edit GraphT

A

T

0 1 2 30

1

2

A GA-

-A

AA

-A

-A-

A

-A

G-

A-

A-

G-

G-

T-

T-

T-

-T

-T

-T

-T

AT

GT

TT

GA

TA

AGTA-T

Every path from Begin to End corresponds to an alignment

Every alignment corresponds to a path between Begin and End

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 21: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

The Principle of Optimality

The optimal answer to a problem is expressed in terms of optimal answer for its sub-problems

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 22: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Dynamic Programming

Part 1: Compute first the optimal alignment score

Part 2: Construct optimal alignment

We are looking for the optimal alignment = maximal score path in the Edit Graph from the Begin vertex to the End vertex

Given: Two sequences X and YFind: An optimal alignment of X with Y

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 23: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

The DP Matrix S(i,j)

0 1 2 30

1

2

A G T

A

TS(2,1)

S(1,0)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 24: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

The DP MatrixMatrix S =[S(i,j)]

S(i,j) = The score of the maximal cost path from the Begin Vertex and the vertex (i,j)

(i,j)

(i,j-1)

(i-1,j)

(i-1,j-1)The optimal path to (i,j)must pass through one of

the vertices

(i-1,j-1)

(i-1,j)

(i,j-1)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 25: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Opt path

(i,j)

(i,j-1)

(i-1,j)

(i-1,j-1)

Optimal path to (i-1,j) + (- , yj)

-xi

yj-

S(i-1,j) +

(- , yj)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 26: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Optimal path

(i,j)

(i-1,j)

(i,j-1)

(i-1,j-1)

Optimal path to (i-1,j-1) + (xi,yj)

S(i-1,j-1) + (xi , yj)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 27: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Optimal path

(i,j)

(i,j-1)

(I-1,j)

(i-1,j-1)

Optimal path to (i,j-1) + (xi,-)

S(i,j-1) + (xi, -)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 28: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

The Basic ALGORITHM

S(i,j) =

S(i-1, j-1) + (xi, yj)

S(i-1, j) + (xi, -)

S(i, j-1) + (-, yj)

MAX

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 29: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

T

A

T

0 1 2 30

1

2

A GA-

-A

AA

-A

-A-

A

-A

G-

A-

A-

G-

G-

T-

T-

T-

-T

-T

-T

-T

AT

GT

TT

GA

TA

0

0

0

0

1

1

0

1

1

0

1

2AGTA - TOptimal Alignment

Optimal Alignment and TracbackAlgorithmic Functions of Computational Biology –

Professor Istrail

Page 30: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

S(i,j) =

S(i-1, j-1) + (xi, yj),

S(i-1, j) + (xi, -),

S(i, j-1) + (-, yj)

MAX

0, We add this

The Basic ALGORITHM: Local Similarity

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 31: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

General Scoring Schemes

1. Independence of mutations at different sites

Additive scoring scheme

2. Gaps of any length are considered one mutation

All of the efficient alignment algorithms -- employing on the dynamicprogramming method --are based fundamentally on the of the fact that the scoring function is additive.

Assumptions

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 32: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Substitutions Matrices

m

n

yyyY

xxxX

...21

...21

ji yx ,belong to

Consider ungapped alignment of equal length sequences

Compute the probability that the two sequences are related

Compute the probability that the two sequences are not related

Compute the ratio of the two probabilities

Algorithmic Functions of Computational Biology

Professor Istrail

Page 33: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Random Model R

Every letter z occurs independently with probability

yjx qqRyxP i)/,(

qz

Algorithmic Functions of Computational Biology - Course 3

Professor Istrail

Page 34: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

Match Model M

Aligned pairs of residues occur with joint probability ab

pab

jiyxpMyxP )/,(

Algorithmic Functions of Computational Biology - Course 3

Professor Istrail

Page 35: Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor

)/,(

)/,(

RyxP

MyxP)/ jxixiyj qqp

),( ii yxs

)/log(),( baab pppbas

= log

=

i

where

Log-odds ratio

s(a,b) = the substitution matrixAlgorithmic Functions of Computational Biology - Course 3

Professor Istrail