Genome Revolution: COMPSCI 004G 8.1
BLASThttp://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html What is BLAST? What is it good for?
Basic Local Alignment Search Tools Given query (DNA or Protein) find
“matches” What is a match? How do judge a good
one?
Two kinds of alignment or matches Global alignment (sequence to sequence) Local alignment (subseq to subseq)
Genome Revolution: COMPSCI 004G 8.2
Global Alignment Words explain (see O’Reilly BLAST)
Align ‘coelacanth’ and ‘pelican’ Score +1 for match, -1 for mismatch, -1
gap
coelacanth coelacanthp-elican-- -pelican--
What are scores of these matches? What’s the best score? Needleman-Wunsch algorithm
Genome Revolution: COMPSCI 004G 8.3
Global Alignment
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
-1 -1 -2
-2
-3 -2 -2 -1
-4
-5
-6
-7 0
C O E L A C A N T H
PELICAN
Genome Revolution: COMPSCI 004G 8.4
Local Alignment Subsequence alignment rather than global
Advantages? Tradeoffs? Score +1 for match, -1 for mismatch, -1
gap
(co)ELECAN(th) (p)ELICAN
Smith-Waterman: initialize to zero, only score positive, trace-back from highest score
Genome Revolution: COMPSCI 004G 8.5
Local Alignment
0 0 0 0 0 0 0 0 0 000 10 2 10 1000 4
C O E L A C A N T H
PELICAN
Genome Revolution: COMPSCI 004G 8.6
Analysis How long does this algorithm take to
execute? How do we measure the complexity/size? Time v. Memory
We need a different measure of “gap match” and mismatch? Just using +1 or -1 doesn’t provide domain
specific analysis In practice use scoring matrix, see ncbi site
Genome Revolution: COMPSCI 004G 8.7
BLOSUM 62 scoring matrix http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.figgrp.194