21
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Embed Size (px)

Citation preview

Page 1: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Sequence comparison: Local alignment

Genome 559: Introduction to Statistical and Computational Genomics

Prof. William Stafford Noble

Page 2: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

One-minute responses• It would be helpful to somehow get the solutions for the sample

problems in our lecture printouts.– You can see the solutions by visiting the class web page and opening

the slides there.• I am still a little confused about the difference between strings and

lists.– A string is like a tuple of characters. Unlike a string a list is (1) mutable

and (2) may contain other objects besides characters.• The Biotechniques paper went into a lot of detail -- how much of this

should we understand?– I intend the paper to provide background for those who are interested.

You should be sure you understand just what I go over in lecture.• I am slightly worried because I never seem to do things in the most

straightforward way.– This just takes practice. Often, there is no single best way.

Page 3: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

One-minute responses• There was perhaps a bit too much

programming in this class.• There was more class time for

Python, which was nice.• I really liked the sample problem

times.• Problem set is very reasonable.• The examples and practice are

most useful teaching methods for me at least. I am getting comfortable with the code through practice.

• I like the sample problems. In the last few classes I felt rushed to finish them, but this time I was able to do all 3. It's very satisfying when they work.

• I had somewhat more difficulty with today's exercises. I think it was due to the inherent complexity of adding new types to the repertoire.

• Class moved at a good speed today.

• I enjoyed the pace today.• Today's pace was good.• The pace was good -- it was

helpful for me to have more time for problems.

• Good pace.• Programming problems were a

good speed today.• The biostats portion was a little

fast but manageable.

Page 4: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

One-minute responses

• The cheat sheet really helped.

• I really liked the list of operations and methods on the back of the lecture notes.

• Lists of commands in slides were helpful.

• Reviewing the DP matrix was very helpful.

• I'm glad we reviewed the Needleman-Wunsch algorithm.

• The traceback review helped me realize I'd forgotten how to do it.

Page 5: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Local alignment

• A single-domain protein may be homologous to a region within a multi-domain protein.

• Usually, an alignment that spans the complete length of both sequences is not required.

Page 6: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

BLAST allows local alignments

Global alignment

Local alignment

Page 7: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Global alignment DP

• Align sequence x and y.

• F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

djiF

djiF

yxsjiF

jiF

F

ji

1,

,1

,1,1

max,

00,0

Page 8: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Local alignment DP

• Align sequence x and y.

• F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

0

1,,1

,1,1

max,

00,0

djiFdjiF

yxsjiF

jiF

F

ji

Page 9: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Local DP in equation form

1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

0

Page 10: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

A simple example

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

A

G

C 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 11: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

A simple example

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

A 0

G 0

C 0 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 12: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

A simple example

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

A 0

G 0

C 0 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

0 -5

0-5

2

Page 13: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

A simple example

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

A 0 2

G 0 ?

C 0 ? 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 14: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

A simple example

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

A 0 2 ? ?

G 0 0 ? ?

C 0 0 ? ? 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 15: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

A simple example

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

A 0 2 2 0

G 0 0 0 4

C 0 0 0 0 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 16: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Local alignment

• Two differences with respect to global alignment:– No score is negative.– Traceback begins at the highest score in the

matrix and continues until you reach 0.

• Global alignment algorithm: Needleman-Wunsch.

• Local alignment algorithm: Smith-Waterman.

Page 17: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

A simple example

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

A 0 2 2 0

G 0 0 0 4

C 0 0 0 0 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

AGAG

Page 18: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Local alignment

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

G 0

A 0

A 0

G 0

G 0

C 0

1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.

0

Page 19: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Local alignment

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

G 0 0 0 2

A 0 2 2 0

A 0 2 4 0

G 0 0 0 6

G 0 0 0 2

C 0 0 0 0

1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.

0

Page 20: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Local alignment

A C G T

A 2 -7 -5 -7

C -7 2 -7 -5

G -5 -7 2 -7

T -7 -5 -7 2

A A G

0 0 0 0

G 0 0 0 2

A 0 2 2 0

A 0 2 4 0

G 0 0 0 6

G 0 0 0 2

C 0 0 0 0

1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.

0

AAGAAG

Page 21: Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

Summary

• Local alignment finds the best match between subsequences.

• Smith-Waterman local alignment algorithm:– No score is negative.– Trace back from the largest score in the

matrix.