New Local Sequence Alignment - Home di homes.di.unimi.it · 2017. 10. 9. · Global vs local...

Preview:

Citation preview

Local Sequence Alignment

Gabriella Trucco

Email: gabriella.trucco@unimi.it

Global Alignment

Global Alignment problem: seeks similarities between two entire strings useful when the similarity between the strings extends over their entire

length

The score of an alignment between two substrings might be larger than the score of an alignment between the entireties of input strings

Local Alignment Problem

Example: homeobox genes

How to find the conserved area and ignore the areas that show little similarity?

Motivation: Many genes are composed of domains, which are subsequences that perform a particular function.

1981: Temple Smith and Michael Waterman proposed a modification of the global sequence alignment dynamic programming algorithm that solves the Local Alignment problem

Global vs local alignment

Global and local alignments of

two hypothetical genes that each

have a conserved domain.

Global vs local alignment

The local alignment has a much worse

score according to the global scoring

scheme, but it correctly locates the

conserved domain.

Local Alignment problem

Local Alignment

Inefficient approach: find the longest path between every pair of vertices, and then select the longest of these computed paths

Good approach: find the longest paths from the source (0,0) to every other vertex by adding edges of weight 0 in the edit graph

The Smith-Waterman local

alignment algorithm

introduces edges of weight 0

(dashed lines) from the

source vertex (0, 0) to every

other vertex in the edit graph

Local Alignment

The largest value of si,jover the whole edit graph

represents the score of the best local alignment of v and w

Recall: global alignment matrix

Local alignment

Local alignment

Initialize first row and first column to be 0

The score of the best local alignment is the largest value in the entire array

To find the actual local alignment: start at an entry with the maximum score

Trace-back as usual

stop when we reach an entry with a score of 0

Example 1

Example 2

Example 2

Other examples

Exercise

Given the two sequencess: AACCTATAGCT

t: GCGATATA

and the following score values:Gap penalty: -1

Match: +1

Mismatch: -1

compute a local sequence alignment of the input sequences.

Recommended