Upload
russell-west
View
226
Download
4
Tags:
Embed Size (px)
Citation preview
Sequence comparison: Local alignment
Genome 559: Introduction to Statistical and Computational Genomics
Prof. William Stafford Noble
One-minute responses• It would be helpful to somehow get the solutions for the sample
problems in our lecture printouts.– You can see the solutions by visiting the class web page and opening
the slides there.• I am still a little confused about the difference between strings and
lists.– A string is like a tuple of characters. Unlike a string a list is (1) mutable
and (2) may contain other objects besides characters.• The Biotechniques paper went into a lot of detail -- how much of this
should we understand?– I intend the paper to provide background for those who are interested.
You should be sure you understand just what I go over in lecture.• I am slightly worried because I never seem to do things in the most
straightforward way.– This just takes practice. Often, there is no single best way.
One-minute responses• There was perhaps a bit too much
programming in this class.• There was more class time for
Python, which was nice.• I really liked the sample problem
times.• Problem set is very reasonable.• The examples and practice are
most useful teaching methods for me at least. I am getting comfortable with the code through practice.
• I like the sample problems. In the last few classes I felt rushed to finish them, but this time I was able to do all 3. It's very satisfying when they work.
• I had somewhat more difficulty with today's exercises. I think it was due to the inherent complexity of adding new types to the repertoire.
• Class moved at a good speed today.
• I enjoyed the pace today.• Today's pace was good.• The pace was good -- it was
helpful for me to have more time for problems.
• Good pace.• Programming problems were a
good speed today.• The biostats portion was a little
fast but manageable.
One-minute responses
• The cheat sheet really helped.
• I really liked the list of operations and methods on the back of the lecture notes.
• Lists of commands in slides were helpful.
• Reviewing the DP matrix was very helpful.
• I'm glad we reviewed the Needleman-Wunsch algorithm.
• The traceback review helped me realize I'd forgotten how to do it.
Local alignment
• A single-domain protein may be homologous to a region within a multi-domain protein.
• Usually, an alignment that spans the complete length of both sequences is not required.
BLAST allows local alignments
Global alignment
Local alignment
Global alignment DP
• Align sequence x and y.
• F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.
djiF
djiF
yxsjiF
jiF
F
ji
1,
,1
,1,1
max,
00,0
Local alignment DP
• Align sequence x and y.
• F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.
0
1,,1
,1,1
max,
00,0
djiFdjiF
yxsjiF
jiF
F
ji
Local DP in equation form
1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
0
A simple example
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
A
G
C 1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.
0
A simple example
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0
G 0
C 0 1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.
0
A simple example
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0
G 0
C 0 1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.
0
0 -5
0-5
2
A simple example
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0 2
G 0 ?
C 0 ? 1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.
0
A simple example
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0 2 ? ?
G 0 0 ? ?
C 0 0 ? ? 1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.
0
A simple example
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0 2 2 0
G 0 0 0 4
C 0 0 0 0 1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.
0
Local alignment
• Two differences with respect to global alignment:– No score is negative.– Traceback begins at the highest score in the
matrix and continues until you reach 0.
• Global alignment algorithm: Needleman-Wunsch.
• Local alignment algorithm: Smith-Waterman.
A simple example
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0 2 2 0
G 0 0 0 4
C 0 0 0 0 1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.
0
AGAG
Local alignment
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
G 0
A 0
A 0
G 0
G 0
C 0
1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.
0
Local alignment
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
G 0 0 0 2
A 0 2 2 0
A 0 2 4 0
G 0 0 0 6
G 0 0 0 2
C 0 0 0 0
1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.
0
Local alignment
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
G 0 0 0 2
A 0 2 2 0
A 0 2 4 0
G 0 0 0 6
G 0 0 0 2
C 0 0 0 0
1,1 jiF
jiF , jiF ,1
1, jiF
d
d ji yxs ,
Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.
0
AAGAAG
Summary
• Local alignment finds the best match between subsequences.
• Smith-Waterman local alignment algorithm:– No score is negative.– Trace back from the largest score in the
matrix.