49
Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor of Computational Biophysics Departments of Physics and Computer Science Wake Forest University

Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Embed Size (px)

Citation preview

Page 1: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Structure-based Analysis of Protein Function

PTPs and Serine Hydrolases

Jacquelyn S. FetrowWake Forest University

Jacquelyn S. FetrowReynolds Professor of Computational BiophysicsDepartments of Physics and Computer Science

Wake Forest University

Page 2: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Need for Improved Proteome Analyses

• Powerful genomics and proteomics methods identify large numbers of protein sequences

• Need to identify biochemical function and functional state accurately

• Need to increase quality of annotations: decrease false positive and false negative identifications

Page 3: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

• Except in model organisms, over 50% of all proteins identified by large-scale sequencing projects are annotated as “function unknown”

• Annotations are inadequate and do not adequately describe functional complexity of proteins

• Annotation transfer methods can assign incorrect function in a significant number of cases

Knowing the Sequence is Not Enough to Determine the Function

Knowing the Sequence is Not Enough to Determine the Function

S. cerevisiae Fsh3p

S.pombe DYR_SCHPO

S. cerevisiae DHFR

Page 4: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Structural proteomics approach to function annotation

• Most common method:– structural

superposition– function annotation

transfer based on structural similarity

COX-1 (1cqe)COX-2 (1cx2)

Page 5: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

But, Knowing the Structure is Not Enough to Predict the Function

See also:

Martin, et al. 1998, Structure 6:875-884

Hegyi and Gerstein, 1999, J. Mol. Biol. 288:147-164.

Similar Structure, Similar Structure, Similar FunctionSimilar Function

Similar Structure, Similar Structure, Similar FunctionSimilar Function

48%48%48%48% 27%27%27%27%

Similar Structure, Similar Structure, Different FunctionDifferent FunctionSimilar Structure, Similar Structure, Different FunctionDifferent Function

23%23%23%23% Different Structure, Different Structure, Different FunctionDifferent Function

Different Structure, Different Structure, Different FunctionDifferent Function

1.5%1.5%1.5%1.5%

Different Structure, Different Structure, Similar FunctionSimilar Function

Different Structure, Different Structure, Similar FunctionSimilar Function

Koppensteiner, W., Lackner, P. Wiederstein, M., & Sippl,

M. J. Mol. Biol 2000 296:1139.

Analysis of high resolution structures

released in 1998 compared to pre-

1998 PDB structures

Page 6: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

But, then, what do we really mean by function?

• Two isoforms of human cyclooxygenase, COX-1 & COX-2

• COX-1 is expressed in healthy tissues; COX-2 is induced in inflammatory response

• COX-1 and COX-2 have ~60% sequence identity, very similar overall structures, and identical catalytic residues

COX-1 (1cqe)COX-2 (1cx2)

Page 7: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

• Aspirin/NSAIDs inhibit both isoforms; COX-1 inhibition can lead to gastrointestinal side effects

• Newer COX-2 selective inhibitors (VioxxTM, CelebrexTM) have anti-inflammatory and pain killing benefits of NSAIDs with reduced side effects

Goal: accurate identification of active sites and their similarities and differences

COX-1 (1cqe)COX-2 (1cx2)

1cqe: P RLVLTVRSNLI AQ TF –EFNQLYHWH –R FGM Y- GESMIEMGAPFSLK

1cx2: P –YVLTSRSYLI AQ TF SEFNTLYHWH YR FSL YL GETMVELGAPFSLK

But, then, what do we really mean by function?

Page 8: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Fuzzy Functional Forms and Active Site Profiling

• Advantage: computational method based on structure– Use of structural (not just

sequence) information– Identification of key functional

features (not annotation transfer via global sequence alignment)

– Fast; can be globally applied to protein sequences

• Disadvantage: computational method– Scoring function cutoffs– False positive and negative

rates– Size of FFF library

Fetrow & Skolnick. J. Mol. Biol. (1998) 282: 949-968.Cammer, Hoffman, Speir, Canady, Nelson, Knutson, Gallina, Baxter, Fetrow. J. Mol. Biol. (2003) 334:387-401.

Page 9: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Geometric definition of an FFF

• Defined by three metrics– Key residues (and their

identity) involved in active site chemistry

– Geometric constraints (distances between alpha carbons)

– Allowed variability for geometric constraints

• Training– Against all PDB structures– Relax constraints to identify

all true positive structures, but no false positives

– Cross validation

A

C

B

Fetrow & Skolnick. J. Mol. Biol. (1998) 282: 949-968.Fetrow, Godzik & Skolnick. (1998) J. Mol. Biol. 282:703-711.

Page 10: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Advantages of the FFF approach

• Use of structural information enables:

– Function annotation farther into “twilight zone”

– Identification of similar functional sites in proteins of different structure

• Functional complexity

– Identification of multiple chemistries within a single functional site

– Identification of multiple functions within a protein domain

Serine-threonine phosphataseFFF 1=metal binding siteFFF 2=metal binding site

FFF 3=phosphatase catalytic residues

FFF for redox regulatory siteFFF for redox regulatory site

Fetrow, Siew, Skolnick. FASEB J (1999) 13:1866-74

Page 11: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

P30366 C C P P48726 V Y V P48456 A C FP48487 C C P P32838 I Y F Q27889 A C FP48486 C C P P32345 V Y K P16299 A S FP48483 C C P P23595 L Y I P20651 A S FP48488 C C P P48580 L Y I P48453 A S FP48481 C C P P23696 L Y L Q08209 A C FP48480 C C P P11493 L Y L P20652 A C FP48484 C C P P11611 L Y L P48452 A C FP48489 C C P P11082 L Y L P48455 T C FP22198 C C P P48463 L Y L P48454 T C FP48485 T C P P13353 L Y L Q12705 S A FP48482 S C P P05323 L Y L O42773 S N FP23880 C C P P48577 L F I P48457 S A FQ05547 C C P P48579 L Y I Q05681 S C FP12982 C C P Q07099 L Y I P23287 S V FP48461 C C P Q07098 L Y I P14747 S N FP36874 C C P P23778 L Y VP36873 C C P Q06009 L Y VP37139 C C P Q07100 L Y VP08128 C C P P48578 L Y VP08129 C C P P23635 L Y VP48462 C C P P23636 L Y VP37140 C C P P23594 L Y IP13681 C C PP32598 C C PP20654 C C PP23777 C C PP48490 C C PP23733 T C PP23734 T C PP20604 V F A

Comparison of putative redox active site residues

PP1

PP2A

PP2B

Page 12: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Cluster analysis of PP1, PP2A, and PP2B subfamilies

PP1

PP2APP2B

Page 13: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

P30366 C C P P48726 V Y V P48456 A C FP48487 C C P P32838 I Y F Q27889 A C FP48486 C C P P32345 V Y K P16299 A S FP48483 C C P P20604 V F A P20651 A S FP48488 C C P P48580 L Y I P48453 A S FP48481 C C P P23696 L Y L Q08209 A C FP48480 C C P P11493 L Y L P20652 A C FP48484 C C P P11611 L Y L P48452 A C FP48489 C C P P11082 L Y L P48455 T C FP22198 C C P P48463 L Y L P48454 T C FP48485 T C P P13353 L Y L Q12705 S A FP48482 S C P P05323 L Y L O42773 S N FP23880 C C P P48577 L F I P48457 S A FQ05547 C C P P48579 L Y I Q05681 S C FP12982 C C P Q07099 L Y I P23287 S V FP48461 C C P Q07098 L Y I P14747 S N FP36874 C C P P23778 L Y VP36873 C C P Q06009 L Y VP37139 C C P Q07100 L Y VP08128 C C P P48578 L Y VP08129 C C P P23635 L Y VP48462 C C P P23636 L Y VP37140 C C P P23594 L Y IP13681 C C P P23595 L Y IP32598 C C PP20654 C C PP23777 C C PP48490 C C PP23733 T C PP23734 T C P

Comparison of putative redox active site residues

PP1PP2A

PP2B

Page 14: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Limitations of the FFF Approach

• FFFs only uses identities of three residues– Leads to false positive identifications

• FFF hit is only yes/no– Does not have a score or confidence

associated with it

• FFFs only identify key residues– Does not identity specificity—substrate or

small molecule specificity

Page 15: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Active site signature: first step in active site profiling

• Use FFF to identify key functional residues

• Extract fragments in structural proximity to FFF residues

• Arrange fragments to form a linear sequence—active site signature

Cammer, Hoffman, Speir, Canady, Nelson, Knutson, Gallina, Baxter, Fetrow. J. Mol. Biol. (2003) 334:387-401.

Page 16: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

1mucA_A RHRVFKLKIGA-ASIFALKIAKNGGPVTA--GLYGGTMLEGSIGTLASAHAF--LTWGTELFGPLLL 2mucA_B RHRVFKLKIGA-ASIFALKIAKNGGPVTA--GLYGGTMLEGSIGTLASAHAF--LTWGTELIGPLLL 1bkhC_C RHRVFKLKIGA-ASIFALKIAKNGGPVTA—-GLYGGTMLEGSIGTLASAHAF--LTWGTELFGPLLL 1bkhB_D RHRVFKLKIGA-ASIFALKIAKNGGPVTA—-GLYGGTMLEGSIGTLASAHAF--LTWGTELFGPLLL 3mucA_E RHRVFKLKIG--ASIFALKIAKNGGPVTA—-GLYGGTMLEGSIGTLASAHAF--LTWGTELFGPLLL 1chrA_F RHNRFKVKLGF-VDVFSLKLCNMG—VTIA--ASYGGTMLDGSIGTLASAHAF-SLPFGCELIGPFVL 2chr__G RHNRFKVKLGF-VDVFSLKLCNMGGVTIA--ASYGGTMLDSTIGTSVALQLYS-LPFGCELIGPFVL

Profile segments for 7

enzymes identified by

one FFF

Profile segments for 7

enzymes identified by

one FFF

Align signatures to create active site profile

Examples of residues identical across family

Examples of residues different between family members—possible specificity determinants?

Page 17: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Active Site Profile Score

• Empirically derived function takes into account sequence similarity

• Enables approaches based on active site information– Clustering of functional

families (profile score)– Novel sequence family and

subfamily assignment (pairwise score)

1.0 0.2 0.1

Identity

Strong Weak

1cozA_1 GTFDLLHWGHIKLLEAYRTISTTKIKEE

1cozB_1 GTFDLLHWGHIKLLEAYRTISTTKIKEE

BS002557__1cozA GTFDPPHNGHLLMANDYREVSSTMIRER

**** * **: : : ** :*:* *:*.

N

SSSSScore

n m k l

gWSI 1 1 1 1

Page 18: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Validation of Active Site Profile Score

• 193 real functional families– 193 FFFs applied to known

structures from PDB to identify functional families

– For each protein in each family, extract active site signature

– Align all signatures in a given family to create profile

– Calculate profile score

• 193 decoy functional families– Geometric criteria “relaxed” slightly

to identify first “false positive”– (Automatically identified as part of

training procedure)– Extract signatures, align to create

profile, calculate score

A

C

B

A

C

B

Page 19: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Validation of the active site profile score

Active site profile for serine carboxypeptidasesProfile score=0.42

 1ivyA LNGGP--GESYAGIYIVGNGLSLFNIYNLY--N-NGDVDMACNF-GAGHMVPTD1ysc_ LNGGP--GESYAGHY-IGNGLTMAGE-NVYDIRKAGDKDFICNWLNGGHMVPFD1ivyB LNGGP--GESYAGIYIVGNGLSLFNIYNLYA-N-NGDVDMACNF-GAGHMVPTD1cpy_ LNGGP-AGASYAGHYIIGNGLTMAG--NVYDIR-AGDKDFICNWLNGGHMVPFD ***** * **** * :*** *:* . ** *: ** ...**** * 

1ac5_ LNGGPC-GESYAGQY-IGNGWI-----NMYNFN-NGDKDLICNN-NASHMVPFD

Page 20: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Validation of Active Site Profile Score

 1ivyA LNGGP--GESYAGIYIVGNGLSLFNIYNLY--N-NGDVDMACNF-GAGHMVPTD1ysc_ LNGGP--GESYAGHY-IGNGLTMAGE-NVYDIRKAGDKDFICNWLNGGHMVPFD1ivyB LNGGP--GESYAGIYIVGNGLSLFNIYNLYA-N-NGDVDMACNF-GAGHMVPTD1cpy_ LNGGP-AGASYAGHYIIGNGLTMAG--NVYDIR-AGDKDFICNWLNGGHMVPFD ***** * **** * :*** *:* . ** *: ** ...**** * 1ac5_ LNGGPC-GESYAGQY--IGNGWI-----NMYNFN-NGDKDLICNN---NASHMVPFD1ivyA LNGGP--GESYAGIYI-VGNGLSLFNIYNLY--N-NGDVDMACNF---GAGHMVPTD1ysc_ LNGGP--GESYAGHY--IGNGLTMAGE-NVYDIRKAGDKDFICNWL--NGGHMVPFD1ivyB LNGGP--GESYAGIYI-VGNGLSLFNIYNLYA-N-NGDVDMACNF---GAGHMVPTD1cpy_ LNGGP-AGASYAGHYI-IGNGLTMAG--NVYDIR-AGDKDFICNWL--NGGHMVPFD1c4xA LHGAG--GNSMGGAVTLMGSVG-----SFVY----HGRQDRIVPLTLDRCGHWAQLE *:*. * * .* :*. :* * * .* . : 

1ac5_ LNGGPC-GESYAGQY-IGNGWI-----NMYNFN-NGDKDLICNN-NASHMVPFDSerine carboxypeptidase

profileScore=0.42

Serine carboxypeptidase

decoy profileScore=0.14

Page 21: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Validation of Active Site Profile Score

• Profile score compared to decoy profile score shows clear separation for most families

• Separation less distinct when decoy is functionally related to FFF family

• Profile score ≥0.25 considered significant

A

-0.4-0.2

0

0.20.40.60.8

11.2

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171

FFF Functional Family

Pro

file

Sco

re

B

0

0.2

0.4

0.6

0.8

1

1.2

173 176 179 182 185 188 191 194

FFF Functional Family

Pro

file

Sc

ore

True profiles

Decoy profiles

Page 22: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Prospective validation of the method

• Human protein tyrosine phosphatases (PTPs)– PTPs are important signal

transduction proteins– Analysis demonstrates

accuracy and throughput• Yeast serine hydrolases

– Serine hydrolases are crucial for many cellular processes

– Analysis demonstrates experimental validation of sensitivity and accuracy of function annotations

– Performance compared to other tools

Page 23: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Method for genome analysis

• Download protein sequences encoded by human or yeast genome• Run Prospector (Skolnick, et al) fold recognition program• For any protein sequence that aligns with structure used to create FFF:

– Take top 20 alignments (top five hits for four scoring functions)– Determine if FFF residues conserved

• If yes:– Predict FFF function– Identify active site signature– Align and calculate pairwise profile score

Page 24: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

PTP Functional Family

• Catalytic site is found in multiple protein structures

• Active site structure is conserved

2hnp, a classical PTP 1vhr, a dual specificity PTP 1phr, a low molecular weight PTP

Page 25: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Annotation of human genome sequences for PTP function

• Identified over 150 human PTPs– Comparison to experimentally-verified

PTPs shows that over 95% of known PTPs identified: false negative rate < 5%

• Over 40 unique PTPs identified– Sequences that are not recognized

as PTPs by any other method (including BLAST, Blocks, Prints and Pfam)

0%

20%

40%

60%

80%

100%

Unique to FFF

FFF + other tools

How good are these function assignments?How good are these function assignments?

Page 26: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Functional Characterization of PTP Proteins

• Clone, express, and purify

• Test PTPs for biochemical function

• Progress (before termination of project)

– 49 soluble PTP domains purified

– 37 PTPs active in vitro

– Four active PTPs that were not previously recognized by other methods (including no recognizable similarity to any PTP in the public databases)

500x10-6

400

300

200

100

V (

A405nm

/sec)

2015105PNPP (mM)

Hydrolysis of pNPP by PTP #1

0

20

40

60

80

100

120

140

160

180

TOTAL Structure-basednovels

Clean novels

Proteins in Target Set

Nu

mb

er o

f P

rote

ins

Target Set

Soluble Protein

Active in vitro

30%

15%65%35%

75%

66%

Page 27: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

• False positive rate cannot be absolutely determined; PTP project shows:– Total PTP proteins: 49 soluble proteins, with 37 active

in pNPP hydrolysis assay (~25% not validated in assay)

– PTP proteins unrecognized by other methods: 6 soluble proteins, with 4 active in pNPP hydrolysis assay (~33% not validated in assay)

– Maximum false positive rate: ~25-33%

• Why a maximum?– Only one substrate and assay condition tested– Small sample set

Functional Characterization of PTP Proteins

Page 28: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

• Identified over 150 human PTPs

• Identify active site signature from each PTP sequence

• Align to create active site profile for PTP family

• Cluster to identify subfamilies of PTPs

0%

20%

40%

60%

80%

100%

Unique to FFF

FFF + other tools

Active Site Profiling of Human PTPs: Identification of Sub-families

Active Site Profiling of Human PTPs: Identification of Sub-families

Page 29: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

--Novel PTP#5--Blast (global sequence similarity) indicates that PTP#5 is dual specificity PTP

--Clustering of active site profile indicates “PTP#5” falls into class 1

ClassicalPTPs

Dual specificity PTPs and PTEN

Low molecular weight PTPs

All PTPs

Subfamily 1

Subfamily 2

Subfamily 3

Subfamily 4

Subfamily 7

Subfamily 8

Subfamily 5

Subfamily 6

Active Site Profiling of Human PTPs: Identification of Sub-families

Active Site Profiling of Human PTPs: Identification of Sub-families

Page 30: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Summary of human PTP annotation project

• 150 PTPs identified in human genome– Over 95% of previously annotated PTPs identified

(false negative rate <5%)– Of those tested in our lab, 75% exhibited PTP

function

• 40 proteins not identified by other methods (BLAST, Blocks, Pfam)– Of those tested, 66% exhibited PTP function

• Maximum false positive rate: 25-33%• Active site profiling subclassifies proteins

differently than global sequence alignment

Page 31: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

FFFs for Serine Hydrolases

• 35 serine hydrolase FFFs describing 25 EC-defined functions– Nucleophilic serine in active site– Protease, lipase, esterase, amidase or transacylase function (FAD-

independent-S-hydroxynitrile lyase, too)– Several “family” FFFs, including hydrolase “family” FFF

• 35 FFFs cover approximately 63% of known structural space and 23% of potential functional space

0% 20% 40% 60% 80% 100%

(S) Hydroxynitrile Lyases

Serine Transacylases

Serine Amidases

Serine Esterases

Serine Lipases

Serine Proteases

Total Structural Space

Fu

nct

ion

Page 32: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Identification of Yeast Serine Hydrolases by FFFs and Profiling

• 6946 yeast protein sequences (NCBI and SGD) • Threading with PROSPECTOR against PDB

structures• Analysis of top 20 threads (top five scores, four

scoring functions) with serine hydrolase FFFs• If thread is “hit” by FFF, sequence is identified as

a serine hydrolase (yes or no)• Active site profile scoring provides rank ordering

of identified serine hydrolases; ≥0.25 is considered significant

Skolnick & Kihara. (2001) Proteins 42:319-331.DiGennaro, Siew, Hoffman, Zhang, Skolnick, Neilson, Fetrow. (2001) J. Struct. Biol. 134:232-245.Fetrow, Godzik & Skolnick. (1998) J. Mol. Biol. 282:703-711.

Page 33: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Annotation of yeast genome for serine hydrolase functions

• 147 proteins identified by combination of PROSPECTOR and serine hydrolase FFFs

• 52 of 147 proteins identified by more than one serine hydrolase FFF

• 55 of 147 proteins identified with significant active site profile score (≥0.25)

• 7 proteins were previously identified* as serine hydrolases (“knowns”)– Profile score≥0.25: Dap2, Kex1, Prb1, Prc1, Ste13, and Yjl068c – Profile score=0.23: Ppe1

*Previously identified in SGD (http://genome-www.stanford.edu/Saccharomyces/)

How good are these function assignments?How good are these function assignments?

Page 34: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Activity-based Probe Technology

• Advantage: probe chemistry– Identifies functional

proteins in complex mixtures

– Fractionates proteome on basis of chemical reactivity (not protein abundance)

• Disadvantage: probe chemistry– Specific for serine

hydrolases?

BiologicalBiologicalSamplesSamples

BiologicalBiologicalSamplesSamples

ActivityActivityProbesProbesActivityActivityProbesProbes

High High ThroughputThroughputScreeningScreening

High High ThroughputThroughputScreeningScreening

Patricelli, Giang, Stamp, Burbaum. (2001) Proteomics 1:1067-1071.Kidd, Liu & Cravatt. (2001) Biochemistry 40:4005-4015. Cravatt & Sorenson. (2000) Curr. Opin. Chem. Biol. 4:663-668.

Page 35: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Identification of Serine Hydrolases by ABPs

• Yeast grown under four culture conditions

• Cultures lysed, centrifuged, fractions labeled with ABP

• Affinity chromatography; separation of labeled proteins by 1D PAGE

• In-gel tryptic digest and LC-MS identification of peptides

• High quality identifications: More than one peptide identified for a given protein

Page 36: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Results of ABP labeling experiments

• 80 proteins uniquely labeled by ABP• 23 of 80 proteins identified with high quality

mass spec data– 8 of 23 proteins were previously identified* as

serine hydrolases (“knowns”): Dap2, Kex1, Ppe1, Prb1, Prc1, Ste13, Yjc068c and Amd2

– “unknowns”: Ygl039w, Ygl157w, Yml059c, Fas2, Ydr428c, Ynl123w, Yor084w, Eht1, Yju3, Ybr139w, Ybr204c, Yhr049c, Ylr118c, Ymr222c, and Yor280c

*Previously identified in Saccharomyces Genome Database (SGD) (http://genome-www.stanford.edu/Saccharomyces/)

Page 37: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Comparison of computational and experimental results

• Chemical proteomics: 23 high quality identifications

• Computational/structural proteomics: 55 proteins identified with significant active site profile score (≥0.25)

• 15 proteins identified by both methods (high quality identifications by both methods)

Page 38: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

How well did the FFFs identify ABP-labeled proteins?

• If all 23 proteins identified by ABP labeling are correct, then:– FFF identification: 15/23=65%– FFF coverage of structure space (“the best we

could expect to do”): 65%– FFF coverage of biological function space

(“the worst we could expect to do”): 23%

• But, are all the ABP identifications actually serine hydrolases?

Page 39: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

What did the FFFs miss?

• 8 proteins identified by high quality ABP data, but not serine hydrolase FFFs– Amd2 (“8th known”) identified by ABP, but not FFF

because no amidase FFF had been constructed– 3 proteins identified by dehydrogenase FFFs, not

serine hydrolase FFFs (discussed subsequently)

– 3 proteins with significant threading scores, no FFF hit• Yor084w (1a8uA): chloroperoxidase T (known serine

hydrolase)• Fas2 (1kas): 3-oxo-ACP-reductase/synthase• Ynl123w (1pysB): tRNA synthetase

– 1 protein (Ydr428c) yields no computational results

Page 40: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Advantages of Combining Methods: Clarification of ABP identifications

• 3 proteins identified by high quality ABP data, but not serine hydrolase FFFs– Ygl039w, Ygl157w, and Yml059c– All three labeled by another family of FFFs (UDP-galactose-4-

epimerase, estradiol-17-beta dehydrogenase, and 3-alpha, 20-beta-hydroxysteroid dehydrogenase)

– Proteins in this family all have active site serine and tyrosine: possible site of ABP labeling

• If these protein functions are correctly identified by the FFFs AND if other five ABP identifications are correct, then:– FFF identification: 18/23=78% (better than expected)

Page 41: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

What about the “unknowns”?

• 15 proteins identified by both methods– 7 of 8 “knowns” identified by both methods

(Dap2, Kex1, Ppe1, Prb1, Prc1, Ste13, and Yjl068c)

– 8 novel annotations of proteins as serine hydrolases (Eht1, Yju3, Ybr139w, Ybr204c, Yhr049w, Ylr118c, Ymr222c, and Yor280c)

• All 8 annotated as “function unknown” or “hypothetical protein” in SGD

• High confidence in novel annotations (two independently applied methods)

Page 42: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

• 15 proteins identified by both methods– 7 of 8 “knowns” identified by both methods

(Dap2, Kex1, Ppe1, Prb1, Prc1, Ste13, and Yjl068c)

– 8 novel annotations of proteins as serine hydrolases (Eht1, Yju3, Ybr139w, Ybr204c, Yhr049w, Ylr118c, Ymr222c, and Yor280c)

• All 8 annotated as “function unknown” or “hypothetical protein” in SGD

• High confidence in novel annotations (two independently applied methods)

What about the “unknowns”?

Page 43: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

New Family of Eukaryotic Serine Hydrolases (FSH)

• 3 yeast proteins (Yhr049w, Ymr222c, and Yor280c) identified by both ABP and FFFs

• 3 sequences related by sequence similarity

• All annotated as “function unknown” at SGD

• None annotated with confidence by other computational methods (Prints, Pfam or Blocks)

Page 44: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

New Family of Eukaryotic Serine Hydrolases (FSH)

• These 3 proteins related to proteins from other eukaryotic proteomes (human, mouse, worm, fruit fly, mosquito, plant)

• No NCBI biochemical annotations for any of these proteins (except one—see next slide)

Page 45: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Cautionary Tale for Annotation Transfer

• One FSH protein, DYR_SCHPO, from S. pombe was annotated as a dihydrofolate reductase (DHFR)

• Sequence analysis indicates a multidomain protein: contains both DHFR and serine hydrolase function– Possible biological connection between serine hydrolase and

DHFR functions?

• Annotation transfer methods would have assigned incorrect function to FSH family of proteins

S. cerevisiae Fsh3p

S.pombe DYR_SCHPO

S.cerevisiae Fsh2p

S. cerevisiae Fsh1p

S. cerevisiae DHFR

Page 46: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Comparison to other computational methods: How much information does structure add?

• ABPs identified 23 proteins with high confidence• FFFs identified 15 (65%) as serine hydrolases• Pfam identified 10 (43%) as serine hydrolases

0

5

10

15

20

25

All Experimental Hits Experimental Hits with SGD"molecular function

unknown"

Total

FFF

Pfam

Page 47: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

• 15 serine hydrolase sequences identified by both methods– 7 of 8 known serine hydrolases identified by both

methods (all eight identified by ABP labeling)– 8 new serine hydrolases identified (formerly

annotated as “function unknown”)– New family of eukaryotic serine hydrolases (FSH)

• FFF annotation clarifies molecular function of the three proteins identified by ABP labeling

• More accurately identify limits of FFF and active site profiling accuracy– If 23 ABP identifications are correct, FFF correctly

identifies function of 78%

Summary of yeast serine hydrolase annotation project

Baxter, et al. (2004) Mol. Cell Prot.

Page 48: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Structure-based annotation of protein function

• Prospective experimental validation of predictions demonstrates accuracies (and limitations) of current methods

• Mis-annotation of function continues to be a problem—found in all databases

• Results suggest that a significant number of proteins will exhibit well-studied functions, but are not identified by current computational methods

• Profiling of sequences around functional site provides additional information on function and specificity

Page 49: Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Acknowledgements

– Susan Baxter (NCGR)– Melanie Nelson (SAIC)– Stephen Cammer (SDSC)– Brian Hoffman (Scitegic)– Jen Montimurro (Wadsworth Ctr)– Stacy Knutson (Wake Forest)– Jeff Speir (Scripps)– Jeannine DiGennaro (GeneVault)– Steve Betz (Neurocrine)– Marijo Galina– Susan Okuley– Chris Scott

ActivX– Jonathan Burbaum– Jonathan Rosenblum– Dan Giang

(now Cengent Therapeutics)