6
1 Identification of Helix- Identification of Helix- Turn-Helix (HTH) DNA-Binding Turn-Helix (HTH) DNA-Binding Motifs Motifs Changhui Yan Changhui Yan Department of Computer Department of Computer Science Science Utah State University Utah State University

1 Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University

Embed Size (px)

Citation preview

Page 1: 1 Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University

1

Identification of Helix-Turn-Identification of Helix-Turn-Helix (HTH) DNA-Binding Helix (HTH) DNA-Binding

MotifsMotifs

Changhui YanChanghui Yan

Department of Computer Department of Computer ScienceScience

Utah State UniversityUtah State University

Page 2: 1 Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University

2

HTH MotifsHTH Motifs

Protein sequences sharing low similarities can Protein sequences sharing low similarities can fold into a similar HTH structure.fold into a similar HTH structure.

Identifying HTH motifs from sequence is Identifying HTH motifs from sequence is extremely challengingextremely challenging

7 families containing HTH motifs from the Pfam 7 families containing HTH motifs from the Pfam database. Positive data set: 2,198 proteins.database. Positive data set: 2,198 proteins.

Negative data set: 1,518 proteins.Negative data set: 1,518 proteins.

Page 3: 1 Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University

3

Combination of Amino Acid Combination of Amino Acid Sequence and Predicted Sequence and Predicted

Secondary StructureSecondary Structure

LQQITHIALQQITHIANNQL-GLE----KDVVRVWFQL-GLE----KDVVRVWF LQQITHIALQQITHIANNQL-GLE----KDVVRVWFQL-GLE----KDVVRVWFHHHEEHEHHHEEHEEEEHMHE----HHEEMMEHEHMHE----HHEEMMEH

HMM_AAHMM_AA HMM_AA_SSHMM_AA_SS

Page 4: 1 Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University

4

Reduced AlphabetsReduced Alphabets

Schemes for reducing amino acid alphabet based on the Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix derived by grouping and averaging the similarity matrix elements as described in the text. elements as described in the text. (Murphy (Murphy et al.et al. 2000) 2000)

Page 5: 1 Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University

5

ResultsResults

True Positive True Positive 11 False Positive False Positive 22

HMM_AAHMM_AA 33 00

HMM_AA_SSHMM_AA_SS(20 letters) (20 letters) 33

227227 00

HMM_AA_SSHMM_AA_SS(Murphy_15) (Murphy_15) 33

474474 00

HMM_AA_SSHMM_AA_SS(Murphy_10) (Murphy_10) 33

470470 33

HMM_AA_SSHMM_AA_SS(Murphy_8) (Murphy_8) 33

431431 55

1.1.True positive: HTH motifs that are correctly identified as such.True positive: HTH motifs that are correctly identified as such.2.2.False positive: Non-HTH motifs that are identified as HTH motifs.False positive: Non-HTH motifs that are identified as HTH motifs.3.3.The alphabet used to encode amino acid sequences.The alphabet used to encode amino acid sequences.

Table 1. Cross-Families EvaluationsTable 1. Cross-Families Evaluations

Page 6: 1 Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University

6

ResultsResults

Total HTH Total HTH motifsmotifs

FFAS03 and FFAS03 and HMM_AA_SSHMM_AA_SS

FFAS03 FFAS03 onlyonly

HMM_AA_SS onlyHMM_AA_SS only

563563 135135 2424 7171

ProteinProtein LocationLocation Annotation from UniprotAnnotation from Uniprot

sp|Q9PQE5|sp|Q9PQE5|SCPB_UREPASCPB_UREPA

176-214176-214 Participates to Participates to chromosomal partition chromosomal partition during cell divisionduring cell division

sp|Q9PQV6|sp|Q9PQV6|RPOB_UREPARPOB_UREPA

540-587540-587 DNA-directed RNA DNA-directed RNA polymerasepolymerase

sp|Q9PR27|sp|Q9PR27|SYY_UREPASYY_UREPA

340-380340-380 Tyrosyl-tRNA synthetaseTyrosyl-tRNA synthetase

sp|Q9PQC2|sp|Q9PQC2|SYA_UREPASYA_UREPA

217-265217-265 Alanyl-tRNA synthetaseAlanyl-tRNA synthetase

sp|Q9PQ74|sp|Q9PQ74|DPO3A_UREPADPO3A_UREPA

365-400365-400 DNA polymerase III subunit DNA polymerase III subunit alphaalpha

sp|Q9PQX7|sp|Q9PQX7|Y166_UREPAY166_UREPA

507-553507-553 Hypothetical proteinHypothetical protein

Table 3. Putative HTH motifs in Table 3. Putative HTH motifs in Ureaplasma parvumUreaplasma parvum

Table 2. Comparisons with a method based on profile-profile comparisonsTable 2. Comparisons with a method based on profile-profile comparisons