Upload
noelle
View
21
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Similar Techniques For Molecular Sequencing and Network Security Doug Madory 27 APR 05. Big Picture Protein Structure Sequencing using Profile HMM. Big Picture. PQS for Network Security (Us) Design HMM for network event Find event within linear stream of observed network events. - PowerPoint PPT Presentation
Citation preview
Similar Techniques For Similar Techniques For Molecular Sequencing and Molecular Sequencing and
Network SecurityNetwork Security
Doug MadoryDoug Madory27 APR 0527 APR 05
Big PictureBig Picture Protein StructureProtein Structure Sequencing using Profile HMMSequencing using Profile HMM
Big PictureBig Picture PQS for Network Security (Us)PQS for Network Security (Us)
Design HMM for network eventDesign HMM for network event Find event within linear stream of observed Find event within linear stream of observed
network eventsnetwork events
Sequencing using Profile HMM (Bioinformatics)Sequencing using Profile HMM (Bioinformatics) Train HMM using known information about Train HMM using known information about
subsequencesubsequence Find subsequence within linear protein / genome Find subsequence within linear protein / genome
sequencesequence
Q: Did an event happen?
Q: If it exists, where is sequence?
Profile HMM - Simple Profile HMM - Simple CaseCase
Train HMMTrain HMM Viterbi ScoringViterbi Scoring Backtrace ViterbiBacktrace Viterbi
Query:Query: A-, AA, TAA-, AA, TA DB:DB: ATAATA
HMM TrainingHMM Training
Build HMM with 2 M states because there are 2 columns in query M1 M2
Begin A C G T
End
D2
I2I0
D1
I1
A C G T
HMM TrainingHMM Training
Step 1 – add pseudocount to each transition and emission M1 M2
Begin A 1 C 1 G 1 T 1
End
D2
I2I0
D1
I1
A 1 C 1 G 1 T 1
1
1
1
1
1
1
1
11
1
1
11
1
11
11
1
1
HMM TrainingHMM Training
Step 2 – train with A-
M1 M2
Begin A 2 C 1 G 1 T 1
End
D2
I2I0
D1
I1
A 1 C 1 G 1 T 1
1
1
1
2
1
1
1
11
1
1
21
1
21
11
1
1
HMM TrainingHMM Training
Step 3 – train with AA
M1 M2
Begin A 3 C 1 G 1 T 1
End
D2
I2I0
D1
I1
A 2 C 1 G 1 T 1
1
1
1
3
1
1
1
11
1
2
21
2
21
11
1
1
HMM TrainingHMM Training
Step 4 – train with TA
M1 M2
Begin A 3 C 1 G 1 T 2
End
D2
I2I0
D1
I1
A 3 C 1 G 1 T 1
1
1
1
4
1
1
1
11
1
3
21
3
21
11
1
1
HMM TrainingHMM Training
Fully trained HMM
M1 M2
Begin A 3 C 1 G 1 T 2
End
D2
I2I0
D1
I1
A 3 C 1 G 1 T 1
1
1
1
4
1
1
1
11
1
3
21
3
21
11
1
1
Viterbi ScoringViterbi Scoring
XX AA TT AAB/IB/I00 VVBB=0=0MM11
II11
DD11
MM22
II22
DD22
EE
Insert
MatchDelet
e
Moves
VI0(1) = log aB-I0
VM1(0) = 0
VI1(0) = 0
VD1(0) = log aB-D1
Illegal Moves Observations
States
Viterbi ScoringViterbi Scoring
XX AA TT AAB/IB/I00 VVBB=0=0 VVII
00(1) = -(1) = -0.780.78
MM11 VM1(0)= 0
II11 VI1(0)= 0
DD11 VD1(0)= -
0.78MM22
II22
DD22
EE
Insert
MatchDelet
e
Moves
VI0(2) = VI
0(1)+log aI0-I0
VI0(3) = VI
0(2)+log aI0-I0
Observations
States
Viterbi ScoringViterbi Scoring
XX AA TT AAB/IB/I00 VVBB=0=0 VVII
00(1) = -(1) = -0.780.78
VI0(2)= -
1.25VI
0(3)=-1.72
MM11 VM1(0)= 0
II11 VI1(0)= 0
DD11 VD1(0)= -
0.78MM22
II22
DD22
EE
Insert
MatchDelet
e
Moves
VM1(1) = log e(A)/q + VB + log aB-M1
VM1(1) = log (3/7)/(1/4) + 0 - 0.17
VM1(1) = 0.23 – 0.17 = 0.06
Observations
States
Viterbi ScoringViterbi Scoring
XX AA TT AAB/IB/I00 VVBB=0=0 VVII
00(1) = -(1) = -0.780.78
VI0(2)= -
1.25VI
0(3)=-1.72
MM11 VM1(0)= 0 VM
1(1)= 0.06
II11 VI1(0)= 0
DD11 VD1(0)= -
0.78MM22
II22
DD22
EE
Insert
MatchDelet
e
Moves
VD1(1) = VI
1(0) + log aI0D1
VD1(1) = -0.78 – 0.47 = -1.25
VM1(0) + log aM1I1
VI1(1) = 0 + max { VI
1(0) + log aI1I1 }VD
1(0) + log aD1I1
VI1(1) = 0 + max {-0.78+-0.47 }
VI1(1) = -0.47
Observations
States
Viterbi ScoringViterbi Scoring
XX AA TT AAB/IB/I00 VVBB=0=0 VVII
00(1) = -0.78(1) = -0.78 VI0(2)= -1.25 VI
0(3)=-1.72
MM11 VM1(0)= 0 VM
1(1)= 0.06
II11 VI1(0)= 0 VI
1(1) = -0.47
DD11 VD1(0)= -0.78 VD
1(1)= -1.25
MM22
II22
DD22
EE
Insert
MatchDelet
e
Moves
Observations
States
Viterbi ScoringViterbi Scoring
XX AA TT AAB/IB/I00 VVBB=0=0 VVII
00(1) = -0.78(1) = -0.78 VI0(2)= -1.25 VI
0(3)=-1.72
MM11 VM1(0)= 0 VM
1(1)= 0.06 VM1(2) = -1.19 VM
1(3)= -1.49
II11 VI1(0)= 0 VI
1(1) = -0.47 VI1(2) = -0.72 VI
1(3) = -1.19
DD11 VD1(0)= -0.78 VD
1(1)= -1.25 VD1(2) = -1.72 VD
1(3)= -1.25
MM22 VM2(0)= 0 VM
2(1)= -0.47 VM2(2)= -0.41 VM
2(3)= -1.19
II22 VI2(0)= 0 VI
2(1) = -1.85 VI2(2) = -1.07 VI
2(3) = -1.01
DD22 VD2(0)= -1.25 VD
2(1)= -1.25 VD2(2)= -0.58 VD
2(3)= -1.36
EE VVEE= -1.31= -1.31
Insert
MatchDelet
e
Moves
Observations
States
Profile HMM - Simple Profile HMM - Simple CaseCase
Demo in PythonDemo in Python
Big Picture RevisitedBig Picture Revisited PQS for Network Security (Us)PQS for Network Security (Us)
Design HMM for network eventDesign HMM for network event Find event within linear stream of observed Find event within linear stream of observed
network eventsnetwork events
Sequencing using Profile HMM (Bioinformatics)Sequencing using Profile HMM (Bioinformatics) Train HMM using known information about Train HMM using known information about
subsequencesubsequence Find subsequence within linear protein / genome Find subsequence within linear protein / genome
sequencesequence
Q: Did an event happen?
Q: If it exists, where is sequence?