Upload
lesley-rose
View
214
Download
0
Embed Size (px)
Citation preview
1
Identification of Helix-Turn-Identification of Helix-Turn-Helix (HTH) DNA-Binding Helix (HTH) DNA-Binding
MotifsMotifs
Changhui YanChanghui Yan
Department of Computer Department of Computer ScienceScience
Utah State UniversityUtah State University
2
HTH MotifsHTH Motifs
Protein sequences sharing low similarities can Protein sequences sharing low similarities can fold into a similar HTH structure.fold into a similar HTH structure.
Identifying HTH motifs from sequence is Identifying HTH motifs from sequence is extremely challengingextremely challenging
7 families containing HTH motifs from the Pfam 7 families containing HTH motifs from the Pfam database. Positive data set: 2,198 proteins.database. Positive data set: 2,198 proteins.
Negative data set: 1,518 proteins.Negative data set: 1,518 proteins.
3
Combination of Amino Acid Combination of Amino Acid Sequence and Predicted Sequence and Predicted
Secondary StructureSecondary Structure
LQQITHIALQQITHIANNQL-GLE----KDVVRVWFQL-GLE----KDVVRVWF LQQITHIALQQITHIANNQL-GLE----KDVVRVWFQL-GLE----KDVVRVWFHHHEEHEHHHEEHEEEEHMHE----HHEEMMEHEHMHE----HHEEMMEH
HMM_AAHMM_AA HMM_AA_SSHMM_AA_SS
4
Reduced AlphabetsReduced Alphabets
Schemes for reducing amino acid alphabet based on the Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix derived by grouping and averaging the similarity matrix elements as described in the text. elements as described in the text. (Murphy (Murphy et al.et al. 2000) 2000)
5
ResultsResults
True Positive True Positive 11 False Positive False Positive 22
HMM_AAHMM_AA 33 00
HMM_AA_SSHMM_AA_SS(20 letters) (20 letters) 33
227227 00
HMM_AA_SSHMM_AA_SS(Murphy_15) (Murphy_15) 33
474474 00
HMM_AA_SSHMM_AA_SS(Murphy_10) (Murphy_10) 33
470470 33
HMM_AA_SSHMM_AA_SS(Murphy_8) (Murphy_8) 33
431431 55
1.1.True positive: HTH motifs that are correctly identified as such.True positive: HTH motifs that are correctly identified as such.2.2.False positive: Non-HTH motifs that are identified as HTH motifs.False positive: Non-HTH motifs that are identified as HTH motifs.3.3.The alphabet used to encode amino acid sequences.The alphabet used to encode amino acid sequences.
Table 1. Cross-Families EvaluationsTable 1. Cross-Families Evaluations
6
ResultsResults
Total HTH Total HTH motifsmotifs
FFAS03 and FFAS03 and HMM_AA_SSHMM_AA_SS
FFAS03 FFAS03 onlyonly
HMM_AA_SS onlyHMM_AA_SS only
563563 135135 2424 7171
ProteinProtein LocationLocation Annotation from UniprotAnnotation from Uniprot
sp|Q9PQE5|sp|Q9PQE5|SCPB_UREPASCPB_UREPA
176-214176-214 Participates to Participates to chromosomal partition chromosomal partition during cell divisionduring cell division
sp|Q9PQV6|sp|Q9PQV6|RPOB_UREPARPOB_UREPA
540-587540-587 DNA-directed RNA DNA-directed RNA polymerasepolymerase
sp|Q9PR27|sp|Q9PR27|SYY_UREPASYY_UREPA
340-380340-380 Tyrosyl-tRNA synthetaseTyrosyl-tRNA synthetase
sp|Q9PQC2|sp|Q9PQC2|SYA_UREPASYA_UREPA
217-265217-265 Alanyl-tRNA synthetaseAlanyl-tRNA synthetase
sp|Q9PQ74|sp|Q9PQ74|DPO3A_UREPADPO3A_UREPA
365-400365-400 DNA polymerase III subunit DNA polymerase III subunit alphaalpha
sp|Q9PQX7|sp|Q9PQX7|Y166_UREPAY166_UREPA
507-553507-553 Hypothetical proteinHypothetical protein
Table 3. Putative HTH motifs in Table 3. Putative HTH motifs in Ureaplasma parvumUreaplasma parvum
Table 2. Comparisons with a method based on profile-profile comparisonsTable 2. Comparisons with a method based on profile-profile comparisons