Upload
internet
View
112
Download
0
Embed Size (px)
Citation preview
CAP3
Aula de algoritmo: CAP3 Aula de algoritmo: CAP3
Felipe Rodrigues da Silva
Embrapa Recursos Genéticos e Biotecnologia
CAP3
Seqüenciamento
CAP3
Polimerização de DNA
ATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTT
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||||||||||||||||||
5’
5’3’
3’
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||
5’3’
ATGCTTCATGCTTC5’ 3’
AA
AAAA
AA
AA
AA
AAAA
AA
AA
TT TTTT TT
TT TTTT
TT
TT
TTTT
GG
GG
GGGG
GG
GG
GG
GG
GG
CC
CC
CC
CC
CC
CC
CC
CC
CCCC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||
5’3’
ATGCTTCATGCTTC5’ 3’
AA
AA
AA
AA
AAAA
AAAA
AA
AA
TTTTTT TT
TT TTTT
TT
TT
TTTT
GG
GG
GGGGGG
GG
GG
GGGG
CC
CC
CC
CCCC
CC
CC
CC
CC
CC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||
5’3’
ATGCTTCATGCTTCTGTG5’ 3’
AA
AAAA
AA
AA AA
AAAA
AA
AA
TT TTTT
TT
TT TTTT
TT
TT
TTTT
GG
GG
GGGG
GG
GG
GG
GG
GG
CC
CCCC
CC
CC
CC
CC
CC
CCCC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTTGGCAGATCT5’ 3’
AA
AAAA
AA
AAAA
AA
AA
AA AA
TT TTTT
TT
TTTTTT
TT
TT
TTTT GGGG
GGGG GG
GG
GG
GG GG
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTTGGCAGATCT5’ 3’
AA
AAAA
AA
AAAA
AAAA
AA
AA
TTTT
TTTT
TTTTTT
TT
TT
TT TTGG
GG
GGGG
GG GG
GG
GG
GG
CC
CC
CC
CCCC
CCCC
CC
CCCC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTGAACATGGCAGATCTGAACA5’ 3’
AA
AA
AAAA
AA
AA
AAAA
AA
AA
TTTTTT
TTTT
TTTTTT
TTTT
TT
GG
GGGG
GG
GGGG
GG
GG GG
CCCCCC
CC CCCC
CC
CCCC
CC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTTGGCAGATCTGAACAGTGTT5’ 3’
AA
AA
AA AAAA
AA
AAAA
AA
AATT
TTTT
TT
TTTTTTTT
TT TT
TTGG
GGGG
GG
GGGG
GG
GG GG
CC
CCCC
CCCC
CC
CC
CC
CCCC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGTGGCAGATCTGAACAGTGTTACTG5’ 3’
AA
AA
AAAA
AA
AA
AAAA
AA
AA TT
TTTT
TTTTTT
TTTT
TTTT
TTGG
GGGG
GG
GGGG
GG
GGGG
CC
CCCC
CCCC
CC
CC
CCCC
CC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATTGGCAGATCTGAACAGTGTTACTGAT5’ 3’
AA
AA
AAAA
AA
AA
AAAAAA
AATT
TTTT
TT
TT
TTTTTT
TT
TTTTGG
GG
GG
GG
GGGG
GG
GGGG
CC
CC CCCCCC
CCCC
CCCCCC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATTGGCAGATCTGAACAGTGTTACTGAT5’ 3’
AA AA
AA
AA
AAAA
AAAAAA
AA
TTTTTT
TTTT
TTTT TT
TTTT TT
GG
GG
GG
GG
GG
GG
GG
GG
GG CC CCCCCCCC
CC CC
CCCCCC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||||||||||||||||||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTTGGCAGATCTGAACAGTGTTACTGATATTGCTT5’ 3’
AA AA
AA
AA
AA
AA AAAAAA
AATT
TTTT
TTTT
TT TTTT
TT
TT TTGG
GG
GG
GG
GGGG
GG
GGGGCC CCCC
CC CC
CCCC
CC CCCC
CC
CAP3
Polimerização de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTTGGCAGATCTGAACAGTGTTACTGATATTGCTT5’ 3’
AAAA
AA
AA
AA
AAAAAAAA
AA
TT
TTTT
TTTT
TTTTTT
TT
TTTTGG
GG
GG
GG
GGGG
GG
GGGG CCCC CC
CCCC
CCCC
CCCCCC
CC
CAP3
1972Walter Gilbert e Frederick Sanger
seqüenciamento de DNA
CAP3
Dideoxinucleotídeo
dideoxinucleotídeo
O
-
O-P-P-P-O-CH2
-O
-
O O O
BASE-O
-
-O
-
H
3' 2'
H
O
-
O-P-P-P-O-CH2
-O
-
O O O
BASE-O
-
-O
-
H
3' 2'
H
deoxinucleotídeo
O
-
O-P-P-P-O-CH2
-O
-
O O O
BASE-O
-
-O
-
OH
3' 2'
H
O
-
O-P-P-P-O-CH2
-O
-
O O O
BASE-O
-
-O
-3' 2'
H
CAP3
Seqüenciamento de DNA
CAP3
Polimerização de DNAo dideoxi
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||
5’3’
ATGCTTCATGCTTC5’ 3’
AA
AAAA
AA
AA
AA
AAAA
AA
AA
TT TTTT TT
TT TTTT
TT
TT
TTTT
GG
GG
GGGG
GG
GG
GG
GG
GG
CC
CC
CC
CC
CC
CC
CC
CC
CCCC
CC
CAP3
Polimerização de DNAo dideoxi
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||
5’3’
ATGCTTCATGCTTC5’ 3’
AA
AA
AA
AA
AAAA
AAAA
AA
AA
TTTTTT TT
TT TTTT
TT
TT
TTTT
GG
GG
GGGGGG
GG
GG
GGGG
CC
CC
CC
CCCC
CC
CC
CC
CC
CC
CC
CAP3
Polimerização de DNAo dideoxi
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||
5’3’
ATGCTTCATGCTTCTGTG5’ 3’
AA
AAAA
AA
AA AA
AAAA
AA
AA
TT TTTT
TT
TT TTTT
TT
TT
TTTT
GG
GG
GGGG
GG
GG
GG
GG
GG
CC
CCCC
CC
CC
CC
CC
CC
CCCC
CC
CAP3
Polimerização de DNAo dideoxi
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTTGGCAGATCT5’ 3’
AA
AAAA
AA
AAAA
AA
AA
AA AA
TT TTTT
TT
TTTTTT
TT
TT
TTTT GGGG
GGGG GG
GG
GG
GG GG
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CAP3
Polimerização de DNAo dideoxi
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATCTTGGCAGATCT5’ 3’
AA
AAAA
AA
AAAA
AAAA
AA
AA
TTTT
TTTT
TTTTTT
TT
TT
TT TTGG
GG
GGGG
GG GG
GG
GG
GG
CC
CC
CC
CCCC
CCCC
CC
CCCC
CC
CAP3
Polimerização de DNAo dideoxi
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATGGCAGATT5’ 3’
AA
AA
AAAA
AA
AA
AAAA
AA
AA
TTTTTT
TTTT
TTTTTT
TTTT
TT
GG
GGGG
GG
GGGG
GG
GG GG
CCCCCC
CC CCCC
CC
CCCC
CC
CC
CAP3
Polimerização de DNAo dideoxi
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATGGCAGATT5’ 3’
AA
AA
AA AAAA
AA
AAAA
AA
AATT
TTTT
TT
TTTTTTTT
TT TT
TTGG
GGGG
GG
GGGG
GG
GG GG
CC
CCCC
CCCC
CC
CC
CC
CCCC
CC
CAP3
Polimerização de DNAo dideoxi
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA|||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATGGCAGATT5’ 3’
AA
AA
AAAA
AA
AA
AAAA
AA
AA TT
TTTT
TTTTTT
TTTT
TTTT
TTGG
GGGG
GG
GGGG
GG
GGGG
CC
CCCC
CCCC
CC
CC
CCCC
CC
CC
CAP3
Seqüenciamento de DNA
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||||||||||||||||||
5’3’
ATGCTTCATGCTTCTGGCAGATGGCAGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATGGCAGATCTGAACAGTGTTACTGATT5’ 3’
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTGGCAGATCTGAACAGTGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTGGCAGATCTGAACAGTGTTACTGATATTGCTTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATGGCAGATCTGAACAGTGTTACTGATATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGGCAGATCTGAACAGTGTTACTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGGCAGATCTGAACAGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTGGCAGATCTGAACAGTGTTACTGATATTGCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGGCAGATCTGAACAGTGTTACTGATATTT
ATGCTTCATGCTTCTT
ATGCTTCATGCTTCTGGCAGATCTGGCAGATCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTTGGCAGATCTGAACAGTGTTACTGATATTGCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTGGCAGATCTGAACAGTGTTT
CAP3
Seqüenciamento de DNA
ATGCTTCATGCTTCTGGCAGATGGCAGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATGGCAGATCTGAACAGTGTTACTGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTGGCAGATCTGAACAGTGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTGGCAGATCTGAACAGTGTTACTGATATTGCTTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATGGCAGATCTGAACAGTGTTACTGATATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGGCAGATCTGAACAGTGTTACTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGGCAGATCTGAACAGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTGGCAGATCTGAACAGTGTTACTGATATTGCTTATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGGCAGATCTGAACAGTGTTACTGATATTT
ATGCTTCATGCTTCTT
ATGCTTCATGCTTCTGGCAGATCTGGCAGATCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTTGGCAGATCTGAACAGTGTTACTGATATTGCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTGGCAGATCTGAACAGTGTTT
TACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAATACGAAGACCGTCTAGACTTGTCACAATGACTATAACGAA||||||||||||||||||||||||||||||||||||||||
5’3’
CAP3
Seqüenciamento de DNA
ATGCTTCATGCTTCTGGCAGATGGCAGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATGGCAGATCTGAACAGTGTTACTGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTGGCAGATCTGAACAGTGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTGGCAGATCTGAACAGTGTTACTGATATTGCTTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATGGCAGATCTGAACAGTGTTACTGATATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGGCAGATCTGAACAGTGTTACTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGGCAGATCTGAACAGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTGGCAGATCTGAACAGTGTTACTGATATTGCTTATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGGCAGATCTGAACAGTGTTACTGATATTT
ATGCTTCATGCTTCTT
ATGCTTCATGCTTCTGGCAGATCTGGCAGATCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTTGGCAGATCTGAACAGTGTTACTGATATTGCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTGGCAGATCTGAACAGTGTTT
CAP3
Seqüenciamento de DNA
G A T C
• moldemolde• polimerasepolimerase• ddNNTPsTPs
•ddddGGTPsTPs •ddddAATPsTPs •ddddTTTPsTPs •ddddCCTPsTPs
CAP3
Seqüenciamento de DNA
ATGCTTCATGCTTCTGGCAGATGGCAGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATGGCAGATCTGAACAGTGTTACTGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTGGCAGATCTGAACAGTGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTGGCAGATCTGAACAGTGTTACTGATATTGCTTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATGGCAGATCTGAACAGTGTTACTGATATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGGCAGATCTGAACAGTGTTACTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGGCAGATCTGAACAGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTGGCAGATCTGAACAGTGTTACTGATATTGCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGGCAGATCTGAACAGTGTTACTGATATTT
ATGCTTCATGCTTCTT
ATGCTTCATGCTTCTGGCAGATCTGGCAGATCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTTGGCAGATCTGAACAGTGTTACTGATATTGCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTGGCAGATCTGAACAGTGTTT
ATGCTTCATGCTTCTTGGATGCTTCATGCTTCTGTGGG
ATGCTTCATGCTTCTGGCATGGCAGG
ATGCTTCATGCTTCTGGCAGATCTTGGCAGATCTGG
ATGCTTCATGCTTCTGGCAGATCTGAACATGGCAGATCTGAACAGG
ATGCTTCATGCTTCTGGCAGATCTGAACAGTTGGCAGATCTGAACAGTGG
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTTGGCAGATCTGAACAGTGTTACTGG
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTTGGCAGATCTGAACAGTGTTACTGATATTGG
ATGCTTCATGCTTCTGGTGGCC
ATGCTTCATGCTTCTGGCAGATTGGCAGATCC
ATGCTTCATGCTTCTGGCAGATCTGAATGGCAGATCTGAACC
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTATGGCAGATCTGAACAGTGTTACC
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGTGGCAGATCTGAACAGTGTTACTGATATTGCC
ATGCTTCATGCTTCTGGCTGGCAA
ATGCTTCATGCTTCTGGCAGTGGCAGAA
ATGCTTCATGCTTCTGGCAGATCTGTGGCAGATCTGAAATGCTTCATGCTTCTGGCAGATCTGATGGCAGATCTGAAA
ATGCTTCATGCTTCTGGCAGATCTGAACTGGCAGATCTGAACAA
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTTGGCAGATCTGAACAGTGTTAA
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGTGGCAGATCTGAACAGTGTTACTGAA
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATTGGCAGATCTGAACAGTGTTACTGATAA
G A T C
CAP3
Seqüenciamento de DNA
ATGCTTCATGCTTCTGGCAGATGGCAGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATGGCAGATCTGAACAGTGTTACTGATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTGGCAGATCTGAACAGTGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTGGCAGATCTGAACAGTGTTACTGATATTGCTTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATGGCAGATCTGAACAGTGTTACTGATATT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGGCAGATCTGAACAGTGTTACTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGGCAGATCTGAACAGTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTGGCAGATCTGAACAGTGTTACTGATATTGCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGGCAGATCTGAACAGTGTTACTGATATTT
ATGCTTCATGCTTCTT
ATGCTTCATGCTTCTGGCAGATCTGGCAGATCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGCTTTGGCAGATCTGAACAGTGTTACTGATATTGCTT
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTGGCAGATCTGAACAGTGTTT
ATGCTTCATGCTTCTTGGATGCTTCATGCTTCTGTGGG
ATGCTTCATGCTTCTGGCATGGCAGG
ATGCTTCATGCTTCTGGCAGATCTTGGCAGATCTGG
ATGCTTCATGCTTCTGGCAGATCTGAACATGGCAGATCTGAACAGG
ATGCTTCATGCTTCTGGCAGATCTGAACAGTTGGCAGATCTGAACAGTGG
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTTGGCAGATCTGAACAGTGTTACTGG
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTTGGCAGATCTGAACAGTGTTACTGATATTGG
ATGCTTCATGCTTCTGGTGGCC
ATGCTTCATGCTTCTGGCAGATTGGCAGATCC
ATGCTTCATGCTTCTGGCAGATCTGAATGGCAGATCTGAACC
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTATGGCAGATCTGAACAGTGTTACC
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATATTGTGGCAGATCTGAACAGTGTTACTGATATTGCC
ATGCTTCATGCTTCTGGCTGGCAA
ATGCTTCATGCTTCTGGCAGTGGCAGAA
ATGCTTCATGCTTCTGGCAGATCTGTGGCAGATCTGAAATGCTTCATGCTTCTGGCAGATCTGATGGCAGATCTGAAA
ATGCTTCATGCTTCTGGCAGATCTGAACTGGCAGATCTGAACAA
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTTGGCAGATCTGAACAGTGTTAA
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGTGGCAGATCTGAACAGTGTTACTGAA
ATGCTTCATGCTTCTGGCAGATCTGAACAGTGTTACTGATTGGCAGATCTGAACAGTGTTACTGATAA
CAP3
CAP3
CAP3
CAP3
CAP3
Gel inteiro
CAP3
Gel e cromatograma
CAP3
CAP3
Seqüenciamento de DNA
CAP3
Montagem
CAP3
Shotgun
• Amostrar fragmentos da seqüência-alvo da maneira
mais aleatória possível.
• Determinar a maior porção possível das seqüências
das extremidades destes fragmentos
Sanger F, Coulson AR, Hong GF, Hill DF, Petersen GB. (1982) Nucleotide
sequence of bacteriophage lambda DNA. J Mol Biol 162(4): 729-73.
CAP3
Montagem shotgun
DNA original
CAP3
Maior problema computacional
Determinar a disposição das seqüências
dos fragmentos que seja mais consistente
com as sobreposições encontradas
Este é um problema NP-completo!
CAP3
Para complicar...
1. Há uma certa porcentagem de erros nas leituras
2. A leitura pode ser proveniente de qualquer uma
das duas fitas
3. Há desvio de representatividade
4. Existem “falsas” sobreposições
CAP3
CAP3
Um programa para montagem de seqüências de DNA
Xiaoqiu Huang1 and Anup Madan2
1 Department of Computer Science, Michigan Technological University,
Houghton, Michigan;
2 Department of Molecular Biotechnology, University of Washington, School
of Medicine, Seattle, Washington
Genome Research 9: 868-877
CAP3
Três fases do algoritmo de montagem Remoção das
extremidades de baixa qualidade
Cálculo de sobreposição de
reads
Remoção das “falsas” sobreposições
Construção dos contigs
Alinhamento múltiplo e geração do
consenso
CAP3
Três fases do algoritmo de montagem Remoção das
extremidades de baixa qualidade
Cálculo de sobreposição de
reads
Remoção das “falsas”
sobreposições
Construção dos contigs
Alinhamento múltiplo e geração
do consenso
CAP3
Identificação de sobreposição
• Concatena todos os reads
• Encontram-se segmentos de alta pontuação
– Caracter separador
• Busca binária na lista ordenada de posição dos
reads na seqüência combinada
• Não são analisadas porções anteriores ao read
atual
CAP3
Cálculo das posições de corte de read
corte 5’
corte 3’
Read h
Read f
Read g
CAP3
Smith-Waterman ponderado
• Match = m * min(q1,q2)
• Mismatch = n * min(q1,q2)
• Gap = -q * min(q1,q2)
CAP3
Cálculo da sobreposição de reads
CAP3
Cálculo da sobreposição de reads
• Feito por alinhamento global
• Banda de busca 2x maior que no alinhamento
local
• Avaliação
– comprimento
– identidade
– pontuação de similaridade
– HQDs (max [0, min(q1, q2)-b], d=soma
– taxa de discrepância (r1+r2+e)
CAP3
Uso de limitadores (constraints)
1. Layout preeliminar
2. Checagem de qualidade
3. Corrige os ruins
• com mais que u problemas
4. Liga contigs
• com mais que v limitações satisfeitas
CAP3
Uso de limitadores (constraints)
CAP3
Alinhamento e consenso
• Alinha, em ordem crescente de posição,
read com o consenso já montado
• Soma ponderada de qualidade
– base consenso = maior soma
– qualidade consenso = soma base w – soma
base x – soma base y ...
CAP3
Qualidade Média
CAP3
Cálculo de pontuação
CAP3
Resultados do CAP3
Data set
GenBank accession
no.No. of reads
Average length of
reads
Length of provided sequence
Running time (min)
No. of large
contigsLength of CAP3
sequenceNo. of
differences
203 AC004669 1812 598 89,779 37 1 90,292 0
216 AC004638 2353 614 124,645 154 1 132,057 17
322F16 AF111103 4297 1011 159,179 127 1 157,982 11
526N18 AF123462 3221 965 180,182 73 2 180,128 10
CAP3
Construção de Scaffolds
Data set
Length of answer
sequenceNo. of reads
per kb
Ability to make
scaffold with CAP3
Ability to make
scaffold with PHRAP
188A7 112,773 10.6 yes yes
201G24 184,666 10.8 yes yes
213L3 135,545 10.3 yes yes
257P13 184,998 10.1 yes yes
488C13 187,237 11.1 yes yes
501I4 231,464 11.7 yes no
CAP3
Program Data set Longest Contig # large contigs length of gaps internal errors
CAP3 3XA 6189 57 52,885 443
PHRAP 3XA 6396 54 38,146 529
CAP3 3XB 12,368 44 71,761 71
PHRAP 3XB 13,116 47 60,436 228
CAP3 3XC 10,709 49 54,229 227
PHRAP 3XC 11,406 45 34,727 332
CAP3 3XD 11,408 43 67,586 115
PHRAP 3XD 11,350 49 60,312 240
CAP3 5XA 10,582 42 27,965 249
PHRAP 5XA 18,268 31 14,396 252
CAP3 5XB 26,034 17 10,405 100
PHRAP 5XB 33,693 18 7,322 115
CAP3 5XC 20,939 29 20,520 172
PHRAP 5XC 20,912 27 16,617 261
CAP3 5XD 14,219 35 23,635 46
PHRAP 5XD 14,696 33 17,113 129
CAP3 8XA 71,025 12 4,681 83
PHRAP 8XA 71,395 8 1,061 80
CAP3 8XB 53,127 8 883 59
PHRAP 8XB 53,078 7 542 36
CAP3 8XC 52,134 8 752 4
PHRAP 8XC 76,922 6 774 6
CAP3 8XD 72,690 7 1,241 35
PHRAP 8XD 102,523 6 648 60
CAP3 10XA 91,380 4 0 28
PHRAP 10XA 91,329 3 0 11
CAP3 10XB 167,655 1 0 5
PHRAP 10XB 138,551 2 0 7
CAP3 10XC 106,631 5 321 44
PHRAP 10XC 77,747 4 330 12
CAP3 10XD 79,900 4 468 2
PHRAP 10XD 79,978 3 346 2
CAP3
Excelente revisão de “montadores”
• http://students.cec.wustl.edu/~cs547/Literature/