Upload
pandora-page
View
22
Download
1
Embed Size (px)
DESCRIPTION
Characterization of Transmembrane Helices. Madhavi Ganapathiraju. Summary. Completion of classification procedures for TM prediction using the LSA features Web-tool for the TM prediction has been designed; it is being developed by Christopher Jursa - PowerPoint PPT Presentation
Citation preview
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar 11
Characterization of Characterization of Transmembrane Transmembrane HelicesHelices
Madhavi GanapathirajuMadhavi Ganapathiraju
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
22
SummarySummary
Completion of classification procedures for TM prediction Completion of classification procedures for TM prediction using the LSA featuresusing the LSA features
Web-tool for the TM prediction has been designed; it is being Web-tool for the TM prediction has been designed; it is being developed by Christopher Jursadeveloped by Christopher Jursa
TMPDB, a set of 119 transmembrane proteins has also been TMPDB, a set of 119 transmembrane proteins has also been processed and included in evaluationsprocessed and included in evaluations
KchannelDB, the database of Kchannel proteins subdiviided KchannelDB, the database of Kchannel proteins subdiviided into families of 1, 2, 4 and 6 TMs each has been collected and into families of 1, 2, 4 and 6 TMs each has been collected and processed. First 2 have been evaluated.processed. First 2 have been evaluated.
Decision tree and support vector machine classifiers have Decision tree and support vector machine classifiers have been evaluated been evaluated
Paper summarizing the work has been writtenPaper summarizing the work has been written QQokok metric has been found to be incorrect in previous metric has been found to be incorrect in previous
evaluations – It has been corrected. evaluations – It has been corrected.
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
33Recap: TM prediction Recap: TM prediction methodmethod
Place a moving window at position i
Count Ci1, Ci2…Ci10
i = i + 1
(B) Window analysis from left to right
(A) Map amino acid sequence to 5 different property
sequences
Example:
MDPML…
Example:
-n----p---....RO..OOaDDad
Features(L-l+1) x 4
(D) Neural Network(4 input nodes, 1 output node)
(E) Hidden Markov Model
Prediction & confidenceLx1 & Lx1
PredictionLx1
(C) Singular Value
Decomposition(PCA)
Matrix of Counts(L-l+1) x 10
Place a moving window at position i
Count Ci1, Ci2…Ci10
i = i + 1
(B) Window analysis from left to right
(A) Map amino acid sequence to 5 different property
sequences
(A) Map amino acid sequence to 5 different property
sequences
Example:
MDPML…
Example:
-n----p---....RO..OOaDDad
Features(L-l+1) x 4
(D) Neural Network(4 input nodes, 1 output node)
(E) Hidden Markov Model
Prediction & confidenceLx1 & Lx1
PredictionLx1
(C) Singular Value
Decomposition(PCA)
Matrix of Counts(L-l+1) x 10
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
44
Neural Net ClassifierNeural Net Classifier
Inpu
t Lay
er
4 Dimensions of theVector obtained by LSA
form the input
Dimension 1
Dimension 2
Dimension 3
Dimension 4
Hid
den
Laye
r
Out
put L
ayer
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
55
Decision Tree & SVM Decision Tree & SVM ClassifiersClassifiers Used MATLABarsenal, the wrapper Used MATLABarsenal, the wrapper
tools developed by Rong (LTI) to see tools developed by Rong (LTI) to see the performance of classifiers on the the performance of classifiers on the feature setfeature set– Decision TreesDecision Trees– SVM (2SVM (2ndnd degree polynomial kernel) degree polynomial kernel)
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
66
Evaluation Data SetsEvaluation Data Sets
BenchmarkBenchmark– 36 proteins of high resolution TM 36 proteins of high resolution TM
informationinformation TMPDBTMPDB
– 119 proteins of known 3D structure119 proteins of known 3D structure KChannelDBKChannelDB
– Multiple sequence alignments of KChannel Multiple sequence alignments of KChannel proteins of 1 and 2 TM segments proteins of 1 and 2 TM segments
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
77
Results: 36 high resResults: 36 high res
Segment Residue level
Symbol Method
Qok F Qob
s
Qpred
Q2 F2T
F2N
Set 36 high resolution proteins
1 TMHMM* 71 90 90 90 80 74 77
2 TMpro (LC)* 61 94 94 94 76 ? ?
3 TMpro (HMM)* 66 95 97 92 77 76 76
4 TMpro (NN)* 75 95 95 94 73 70 75
Evaluations have been performed by submitting data on benchmark server
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
88
Results: TMPDBResults: TMPDB
Segment Residue level
FQobs
Qpred
Q2 F2T
F2N
TMHMM 90 89 90 89 80 90
NN 90 90 89 86 75 90
HMM 85 90 80 84 74 77
SVM 93 95 90 84 77 88
Decision Trees 92 97 86 83 75 87
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
99
Other thingsOther things
Processed KChannel DB proteins for Processed KChannel DB proteins for evaluationevaluation– Initial evaluations are done, but not ready Initial evaluations are done, but not ready
for discussion for discussion ……
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
1010
TMPro web serviceTMPro web service
TMPro website is being developed by Christopher, Dr. TMPro website is being developed by Christopher, Dr. Karimi’s studentKarimi’s student– Should be up in 2 weeks timeShould be up in 2 weeks time
Developed standalone versions of feature processing Developed standalone versions of feature processing required for the web-service for DT and SVMrequired for the web-service for DT and SVM
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
1111
Charge rich proteinsCharge rich proteins
I seem to have not mailed myself the latest figures I seem to have not mailed myself the latest figures here, I will show them separately here, I will show them separately
Nov 07, 2005Nov 07, 2005 JKS-seminar JKS-seminar
1212
Ongoing workOngoing work QQokok is not high for TMPDB data set is not high for TMPDB data set To overcome this, error analysis is being performedTo overcome this, error analysis is being performed
– Measure how far away from “truth” the prediction is (what Measure how far away from “truth” the prediction is (what threshold would have classified the segment correctly as TM or threshold would have classified the segment correctly as TM or non TM)non TM)
– Characteristics of the segments misclassifiedCharacteristics of the segments misclassified Are they traditional globular hydrophobic segments only, can aromatic Are they traditional globular hydrophobic segments only, can aromatic
and other properties be used to recover from error? and other properties be used to recover from error?
Combination with TMHMM prediction for improved Combination with TMHMM prediction for improved performance performance – Rule based combination on aromatic property has previously Rule based combination on aromatic property has previously
been shown to improve TMHMM predictions (March/June 2005?) been shown to improve TMHMM predictions (March/June 2005?) on high resolution proteinson high resolution proteins
– Do this on TMPDB set as wellDo this on TMPDB set as well Other architectures of NN to be studied? Error TM segments to Other architectures of NN to be studied? Error TM segments to
be studied further with DT rules that failbe studied further with DT rules that fail