Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor

Structure-based Analysis of Protein Function

PTPs and Serine Hydrolases

Jacquelyn S. FetrowWake Forest University

Jacquelyn S. FetrowReynolds Professor of Computational BiophysicsDepartments of Physics and Computer Science

Wake Forest University

Need for Improved Proteome Analyses

• Powerful genomics and proteomics methods identify large numbers of protein sequences

• Need to identify biochemical function and functional state accurately

• Need to increase quality of annotations: decrease false positive and false negative identifications

• Except in model organisms, over 50% of all proteins identified by large-scale sequencing projects are annotated as “function unknown”

• Annotations are inadequate and do not adequately describe functional complexity of proteins

• Annotation transfer methods can assign incorrect function in a significant number of cases

Knowing the Sequence is Not Enough to Determine the Function

Knowing the Sequence is Not Enough to Determine the Function

S. cerevisiae Fsh3p

S.pombe DYR_SCHPO

S. cerevisiae DHFR

Structural proteomics approach to function annotation

• Most common method:– structural

superposition– function annotation

transfer based on structural similarity

COX-1 (1cqe)COX-2 (1cx2)

But, Knowing the Structure is Not Enough to Predict the Function

See also:

Martin, et al. 1998, Structure 6:875-884

Hegyi and Gerstein, 1999, J. Mol. Biol. 288:147-164.

Similar Structure, Similar Structure, Similar FunctionSimilar Function

Similar Structure, Similar Structure, Similar FunctionSimilar Function

48%48%48%48% 27%27%27%27%

Similar Structure, Similar Structure, Different FunctionDifferent FunctionSimilar Structure, Similar Structure, Different FunctionDifferent Function

23%23%23%23% Different Structure, Different Structure, Different FunctionDifferent Function

Different Structure, Different Structure, Different FunctionDifferent Function

1.5%1.5%1.5%1.5%

Different Structure, Different Structure, Similar FunctionSimilar Function

Different Structure, Different Structure, Similar FunctionSimilar Function

Koppensteiner, W., Lackner, P. Wiederstein, M., & Sippl,

M. J. Mol. Biol 2000 296:1139.

Analysis of high resolution structures

released in 1998 compared to pre-

1998 PDB structures

But, then, what do we really mean by function?

• Two isoforms of human cyclooxygenase, COX-1 & COX-2

• COX-1 is expressed in healthy tissues; COX-2 is induced in inflammatory response

• COX-1 and COX-2 have ~60% sequence identity, very similar overall structures, and identical catalytic residues


• Aspirin/NSAIDs inhibit both isoforms; COX-1 inhibition can lead to gastrointestinal side effects

• Newer COX-2 selective inhibitors (VioxxTM, CelebrexTM) have anti-inflammatory and pain killing benefits of NSAIDs with reduced side effects

Goal: accurate identification of active sites and their similarities and differences


1cqe: P RLVLTVRSNLI AQ TF –EFNQLYHWH –R FGM Y- GESMIEMGAPFSLK

1cx2: P –YVLTSRSYLI AQ TF SEFNTLYHWH YR FSL YL GETMVELGAPFSLK

But, then, what do we really mean by function?

Fuzzy Functional Forms and Active Site Profiling

• Advantage: computational method based on structure– Use of structural (not just

sequence) information– Identification of key functional

features (not annotation transfer via global sequence alignment)

– Fast; can be globally applied to protein sequences

• Disadvantage: computational method– Scoring function cutoffs– False positive and negative

rates– Size of FFF library

Fetrow & Skolnick. J. Mol. Biol. (1998) 282: 949-968.Cammer, Hoffman, Speir, Canady, Nelson, Knutson, Gallina, Baxter, Fetrow. J. Mol. Biol. (2003) 334:387-401.

Geometric definition of an FFF

• Defined by three metrics– Key residues (and their

identity) involved in active site chemistry

– Geometric constraints (distances between alpha carbons)

– Allowed variability for geometric constraints

• Training– Against all PDB structures– Relax constraints to identify

all true positive structures, but no false positives

– Cross validation

A

C

B

Fetrow & Skolnick. J. Mol. Biol. (1998) 282: 949-968.Fetrow, Godzik & Skolnick. (1998) J. Mol. Biol. 282:703-711.

Advantages of the FFF approach

• Use of structural information enables:

– Function annotation farther into “twilight zone”

– Identification of similar functional sites in proteins of different structure

• Functional complexity

– Identification of multiple chemistries within a single functional site

– Identification of multiple functions within a protein domain

Serine-threonine phosphataseFFF 1=metal binding siteFFF 2=metal binding site

FFF 3=phosphatase catalytic residues

FFF for redox regulatory siteFFF for redox regulatory site

Fetrow, Siew, Skolnick. FASEB J (1999) 13:1866-74

P30366 C C P P48726 V Y V P48456 A C FP48487 C C P P32838 I Y F Q27889 A C FP48486 C C P P32345 V Y K P16299 A S FP48483 C C P P23595 L Y I P20651 A S FP48488 C C P P48580 L Y I P48453 A S FP48481 C C P P23696 L Y L Q08209 A C FP48480 C C P P11493 L Y L P20652 A C FP48484 C C P P11611 L Y L P48452 A C FP48489 C C P P11082 L Y L P48455 T C FP22198 C C P P48463 L Y L P48454 T C FP48485 T C P P13353 L Y L Q12705 S A FP48482 S C P P05323 L Y L O42773 S N FP23880 C C P P48577 L F I P48457 S A FQ05547 C C P P48579 L Y I Q05681 S C FP12982 C C P Q07099 L Y I P23287 S V FP48461 C C P Q07098 L Y I P14747 S N FP36874 C C P P23778 L Y VP36873 C C P Q06009 L Y VP37139 C C P Q07100 L Y VP08128 C C P P48578 L Y VP08129 C C P P23635 L Y VP48462 C C P P23636 L Y VP37140 C C P P23594 L Y IP13681 C C PP32598 C C PP20654 C C PP23777 C C PP48490 C C PP23733 T C PP23734 T C PP20604 V F A

Comparison of putative redox active site residues

PP1

PP2A

PP2B

Cluster analysis of PP1, PP2A, and PP2B subfamilies

PP1

PP2APP2B

P30366 C C P P48726 V Y V P48456 A C FP48487 C C P P32838 I Y F Q27889 A C FP48486 C C P P32345 V Y K P16299 A S FP48483 C C P P20604 V F A P20651 A S FP48488 C C P P48580 L Y I P48453 A S FP48481 C C P P23696 L Y L Q08209 A C FP48480 C C P P11493 L Y L P20652 A C FP48484 C C P P11611 L Y L P48452 A C FP48489 C C P P11082 L Y L P48455 T C FP22198 C C P P48463 L Y L P48454 T C FP48485 T C P P13353 L Y L Q12705 S A FP48482 S C P P05323 L Y L O42773 S N FP23880 C C P P48577 L F I P48457 S A FQ05547 C C P P48579 L Y I Q05681 S C FP12982 C C P Q07099 L Y I P23287 S V FP48461 C C P Q07098 L Y I P14747 S N FP36874 C C P P23778 L Y VP36873 C C P Q06009 L Y VP37139 C C P Q07100 L Y VP08128 C C P P48578 L Y VP08129 C C P P23635 L Y VP48462 C C P P23636 L Y VP37140 C C P P23594 L Y IP13681 C C P P23595 L Y IP32598 C C PP20654 C C PP23777 C C PP48490 C C PP23733 T C PP23734 T C P

Comparison of putative redox active site residues

PP1PP2A

PP2B

Limitations of the FFF Approach

• FFFs only uses identities of three residues– Leads to false positive identifications

• FFF hit is only yes/no– Does not have a score or confidence

associated with it

• FFFs only identify key residues– Does not identity specificity—substrate or

small molecule specificity

Active site signature: first step in active site profiling

• Use FFF to identify key functional residues

• Extract fragments in structural proximity to FFF residues

• Arrange fragments to form a linear sequence—active site signature

Cammer, Hoffman, Speir, Canady, Nelson, Knutson, Gallina, Baxter, Fetrow. J. Mol. Biol. (2003) 334:387-401.

1mucA_A RHRVFKLKIGA-ASIFALKIAKNGGPVTA--GLYGGTMLEGSIGTLASAHAF--LTWGTELFGPLLL 2mucA_B RHRVFKLKIGA-ASIFALKIAKNGGPVTA--GLYGGTMLEGSIGTLASAHAF--LTWGTELIGPLLL 1bkhC_C RHRVFKLKIGA-ASIFALKIAKNGGPVTA—-GLYGGTMLEGSIGTLASAHAF--LTWGTELFGPLLL 1bkhB_D RHRVFKLKIGA-ASIFALKIAKNGGPVTA—-GLYGGTMLEGSIGTLASAHAF--LTWGTELFGPLLL 3mucA_E RHRVFKLKIG--ASIFALKIAKNGGPVTA—-GLYGGTMLEGSIGTLASAHAF--LTWGTELFGPLLL 1chrA_F RHNRFKVKLGF-VDVFSLKLCNMG—VTIA--ASYGGTMLDGSIGTLASAHAF-SLPFGCELIGPFVL 2chr__G RHNRFKVKLGF-VDVFSLKLCNMGGVTIA--ASYGGTMLDSTIGTSVALQLYS-LPFGCELIGPFVL

Profile segments for 7

enzymes identified by

one FFF

Profile segments for 7

enzymes identified by

one FFF

Align signatures to create active site profile

Examples of residues identical across family

Examples of residues different between family members—possible specificity determinants?

Active Site Profile Score

• Empirically derived function takes into account sequence similarity

• Enables approaches based on active site information– Clustering of functional

families (profile score)– Novel sequence family and

subfamily assignment (pairwise score)

1.0 0.2 0.1

Identity

Strong Weak

1cozA_1 GTFDLLHWGHIKLLEAYRTISTTKIKEE

1cozB_1 GTFDLLHWGHIKLLEAYRTISTTKIKEE

BS002557__1cozA GTFDPPHNGHLLMANDYREVSSTMIRER

**** * **: : : ** :*:* *:*.

N

SSSSScore

n m k l

gWSI 1 1 1 1

Validation of Active Site Profile Score

• 193 real functional families– 193 FFFs applied to known

structures from PDB to identify functional families

– For each protein in each family, extract active site signature

– Align all signatures in a given family to create profile

– Calculate profile score

• 193 decoy functional families– Geometric criteria “relaxed” slightly

to identify first “false positive”– (Automatically identified as part of

training procedure)– Extract signatures, align to create

profile, calculate score

A

C

B

A

C

B

Validation of the active site profile score

Active site profile for serine carboxypeptidasesProfile score=0.42

1ivyA LNGGP--GESYAGIYIVGNGLSLFNIYNLY--N-NGDVDMACNF-GAGHMVPTD1ysc_ LNGGP--GESYAGHY-IGNGLTMAGE-NVYDIRKAGDKDFICNWLNGGHMVPFD1ivyB LNGGP--GESYAGIYIVGNGLSLFNIYNLYA-N-NGDVDMACNF-GAGHMVPTD1cpy_ LNGGP-AGASYAGHYIIGNGLTMAG--NVYDIR-AGDKDFICNWLNGGHMVPFD ***** * **** * :*** *:* . ** *: ** ...**** *

1ac5_ LNGGPC-GESYAGQY-IGNGWI-----NMYNFN-NGDKDLICNN-NASHMVPFD


1ivyA LNGGP--GESYAGIYIVGNGLSLFNIYNLY--N-NGDVDMACNF-GAGHMVPTD1ysc_ LNGGP--GESYAGHY-IGNGLTMAGE-NVYDIRKAGDKDFICNWLNGGHMVPFD1ivyB LNGGP--GESYAGIYIVGNGLSLFNIYNLYA-N-NGDVDMACNF-GAGHMVPTD1cpy_ LNGGP-AGASYAGHYIIGNGLTMAG--NVYDIR-AGDKDFICNWLNGGHMVPFD ***** * **** * :*** *:* . ** *: ** ...**** * 1ac5_ LNGGPC-GESYAGQY--IGNGWI-----NMYNFN-NGDKDLICNN---NASHMVPFD1ivyA LNGGP--GESYAGIYI-VGNGLSLFNIYNLY--N-NGDVDMACNF---GAGHMVPTD1ysc_ LNGGP--GESYAGHY--IGNGLTMAGE-NVYDIRKAGDKDFICNWL--NGGHMVPFD1ivyB LNGGP--GESYAGIYI-VGNGLSLFNIYNLYA-N-NGDVDMACNF---GAGHMVPTD1cpy_ LNGGP-AGASYAGHYI-IGNGLTMAG--NVYDIR-AGDKDFICNWL--NGGHMVPFD1c4xA LHGAG--GNSMGGAVTLMGSVG-----SFVY----HGRQDRIVPLTLDRCGHWAQLE *:*. * * .* :*. :* * * .* . :

1ac5_ LNGGPC-GESYAGQY-IGNGWI-----NMYNFN-NGDKDLICNN-NASHMVPFDSerine carboxypeptidase

profileScore=0.42

Serine carboxypeptidase

decoy profileScore=0.14


• Profile score compared to decoy profile score shows clear separation for most families

• Separation less distinct when decoy is functionally related to FFF family

• Profile score ≥0.25 considered significant

A

-0.4-0.2

0

0.20.40.60.8

11.2

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171

FFF Functional Family

Pro

file

Sco

re

B

0

0.2

0.4

0.6

0.8

1

1.2

173 176 179 182 185 188 191 194

FFF Functional Family

Pro

file

Sc

ore

True profiles

Decoy profiles

Prospective validation of the method

• Human protein tyrosine phosphatases (PTPs)– PTPs are important signal

transduction proteins– Analysis demonstrates

accuracy and throughput• Yeast serine hydrolases

– Serine hydrolases are crucial for many cellular processes

– Analysis demonstrates experimental validation of sensitivity and accuracy of function annotations

– Performance compared to other tools

Method for genome analysis

• Download protein sequences encoded by human or yeast genome• Run Prospector (Skolnick, et al) fold recognition program• For any protein sequence that aligns with structure used to create FFF:

– Take top 20 alignments (top five hits for four scoring functions)– Determine if FFF residues conserved

• If yes:– Predict FFF function– Identify active site signature– Align and calculate pairwise profile score

PTP Functional Family

• Catalytic site is found in multiple protein structures

• Active site structure is conserved

2hnp, a classical PTP 1vhr, a dual specificity PTP 1phr, a low molecular weight PTP

Annotation of human genome sequences for PTP function

• Identified over 150 human PTPs– Comparison to experimentally-verified

PTPs shows that over 95% of known PTPs identified: false negative rate < 5%

• Over 40 unique PTPs identified– Sequences that are not recognized

as PTPs by any other method (including BLAST, Blocks, Prints and Pfam)

0%

20%

40%

60%

80%

100%

Unique to FFF

FFF + other tools

How good are these function assignments?How good are these function assignments?

Functional Characterization of PTP Proteins

• Clone, express, and purify

• Test PTPs for biochemical function

• Progress (before termination of project)

– 49 soluble PTP domains purified

– 37 PTPs active in vitro

– Four active PTPs that were not previously recognized by other methods (including no recognizable similarity to any PTP in the public databases)

500x10-6

400

300

200

100

V (

A405nm

/sec)

2015105PNPP (mM)

Hydrolysis of pNPP by PTP #1

0

20

40

60

80

100

120

140

160

180

TOTAL Structure-basednovels

Clean novels

Proteins in Target Set

Nu

mb

er o

f P

rote

ins

Target Set

Soluble Protein

Active in vitro

30%

15%65%35%

75%

66%

• False positive rate cannot be absolutely determined; PTP project shows:– Total PTP proteins: 49 soluble proteins, with 37 active

in pNPP hydrolysis assay (~25% not validated in assay)

– PTP proteins unrecognized by other methods: 6 soluble proteins, with 4 active in pNPP hydrolysis assay (~33% not validated in assay)

– Maximum false positive rate: ~25-33%

• Why a maximum?– Only one substrate and assay condition tested– Small sample set

Functional Characterization of PTP Proteins

• Identified over 150 human PTPs

• Identify active site signature from each PTP sequence

• Align to create active site profile for PTP family

• Cluster to identify subfamilies of PTPs

0%

20%

40%

60%

80%

100%

Unique to FFF

FFF + other tools

Active Site Profiling of Human PTPs: Identification of Sub-families


--Novel PTP#5--Blast (global sequence similarity) indicates that PTP#5 is dual specificity PTP

--Clustering of active site profile indicates “PTP#5” falls into class 1

ClassicalPTPs

Dual specificity PTPs and PTEN

Low molecular weight PTPs

All PTPs

Subfamily 1

Subfamily 2

Subfamily 3

Subfamily 4

Subfamily 7

Subfamily 8

Subfamily 5

Subfamily 6



Summary of human PTP annotation project

• 150 PTPs identified in human genome– Over 95% of previously annotated PTPs identified

(false negative rate <5%)– Of those tested in our lab, 75% exhibited PTP

function

• 40 proteins not identified by other methods (BLAST, Blocks, Pfam)– Of those tested, 66% exhibited PTP function

• Maximum false positive rate: 25-33%• Active site profiling subclassifies proteins

differently than global sequence alignment

FFFs for Serine Hydrolases

• 35 serine hydrolase FFFs describing 25 EC-defined functions– Nucleophilic serine in active site– Protease, lipase, esterase, amidase or transacylase function (FAD-

independent-S-hydroxynitrile lyase, too)– Several “family” FFFs, including hydrolase “family” FFF

• 35 FFFs cover approximately 63% of known structural space and 23% of potential functional space

0% 20% 40% 60% 80% 100%

(S) Hydroxynitrile Lyases

Serine Transacylases

Serine Amidases

Serine Esterases

Serine Lipases

Serine Proteases

Total Structural Space

Fu

nct

ion

Identification of Yeast Serine Hydrolases by FFFs and Profiling

• 6946 yeast protein sequences (NCBI and SGD) • Threading with PROSPECTOR against PDB

structures• Analysis of top 20 threads (top five scores, four

scoring functions) with serine hydrolase FFFs• If thread is “hit” by FFF, sequence is identified as

a serine hydrolase (yes or no)• Active site profile scoring provides rank ordering

of identified serine hydrolases; ≥0.25 is considered significant

Skolnick & Kihara. (2001) Proteins 42:319-331.DiGennaro, Siew, Hoffman, Zhang, Skolnick, Neilson, Fetrow. (2001) J. Struct. Biol. 134:232-245.Fetrow, Godzik & Skolnick. (1998) J. Mol. Biol. 282:703-711.

Annotation of yeast genome for serine hydrolase functions

• 147 proteins identified by combination of PROSPECTOR and serine hydrolase FFFs

• 52 of 147 proteins identified by more than one serine hydrolase FFF

• 55 of 147 proteins identified with significant active site profile score (≥0.25)

• 7 proteins were previously identified* as serine hydrolases (“knowns”)– Profile score≥0.25: Dap2, Kex1, Prb1, Prc1, Ste13, and Yjl068c – Profile score=0.23: Ppe1

*Previously identified in SGD (http://genome-www.stanford.edu/Saccharomyces/)

How good are these function assignments?How good are these function assignments?

Activity-based Probe Technology

• Advantage: probe chemistry– Identifies functional

proteins in complex mixtures

– Fractionates proteome on basis of chemical reactivity (not protein abundance)

• Disadvantage: probe chemistry– Specific for serine

hydrolases?

BiologicalBiologicalSamplesSamples

BiologicalBiologicalSamplesSamples

ActivityActivityProbesProbesActivityActivityProbesProbes

High High ThroughputThroughputScreeningScreening

High High ThroughputThroughputScreeningScreening

Patricelli, Giang, Stamp, Burbaum. (2001) Proteomics 1:1067-1071.Kidd, Liu & Cravatt. (2001) Biochemistry 40:4005-4015. Cravatt & Sorenson. (2000) Curr. Opin. Chem. Biol. 4:663-668.

Identification of Serine Hydrolases by ABPs

• Yeast grown under four culture conditions

• Cultures lysed, centrifuged, fractions labeled with ABP

• Affinity chromatography; separation of labeled proteins by 1D PAGE

• In-gel tryptic digest and LC-MS identification of peptides

• High quality identifications: More than one peptide identified for a given protein

Results of ABP labeling experiments

• 80 proteins uniquely labeled by ABP• 23 of 80 proteins identified with high quality

mass spec data– 8 of 23 proteins were previously identified* as

serine hydrolases (“knowns”): Dap2, Kex1, Ppe1, Prb1, Prc1, Ste13, Yjc068c and Amd2

– “unknowns”: Ygl039w, Ygl157w, Yml059c, Fas2, Ydr428c, Ynl123w, Yor084w, Eht1, Yju3, Ybr139w, Ybr204c, Yhr049c, Ylr118c, Ymr222c, and Yor280c

*Previously identified in Saccharomyces Genome Database (SGD) (http://genome-www.stanford.edu/Saccharomyces/)

Comparison of computational and experimental results

• Chemical proteomics: 23 high quality identifications

• Computational/structural proteomics: 55 proteins identified with significant active site profile score (≥0.25)

• 15 proteins identified by both methods (high quality identifications by both methods)

How well did the FFFs identify ABP-labeled proteins?

• If all 23 proteins identified by ABP labeling are correct, then:– FFF identification: 15/23=65%– FFF coverage of structure space (“the best we

could expect to do”): 65%– FFF coverage of biological function space

(“the worst we could expect to do”): 23%

• But, are all the ABP identifications actually serine hydrolases?

What did the FFFs miss?

• 8 proteins identified by high quality ABP data, but not serine hydrolase FFFs– Amd2 (“8th known”) identified by ABP, but not FFF

because no amidase FFF had been constructed– 3 proteins identified by dehydrogenase FFFs, not

serine hydrolase FFFs (discussed subsequently)

– 3 proteins with significant threading scores, no FFF hit• Yor084w (1a8uA): chloroperoxidase T (known serine

hydrolase)• Fas2 (1kas): 3-oxo-ACP-reductase/synthase• Ynl123w (1pysB): tRNA synthetase

– 1 protein (Ydr428c) yields no computational results

Advantages of Combining Methods: Clarification of ABP identifications

• 3 proteins identified by high quality ABP data, but not serine hydrolase FFFs– Ygl039w, Ygl157w, and Yml059c– All three labeled by another family of FFFs (UDP-galactose-4-

epimerase, estradiol-17-beta dehydrogenase, and 3-alpha, 20-beta-hydroxysteroid dehydrogenase)

– Proteins in this family all have active site serine and tyrosine: possible site of ABP labeling

• If these protein functions are correctly identified by the FFFs AND if other five ABP identifications are correct, then:– FFF identification: 18/23=78% (better than expected)

What about the “unknowns”?

• 15 proteins identified by both methods– 7 of 8 “knowns” identified by both methods

(Dap2, Kex1, Ppe1, Prb1, Prc1, Ste13, and Yjl068c)

– 8 novel annotations of proteins as serine hydrolases (Eht1, Yju3, Ybr139w, Ybr204c, Yhr049w, Ylr118c, Ymr222c, and Yor280c)

• All 8 annotated as “function unknown” or “hypothetical protein” in SGD

• High confidence in novel annotations (two independently applied methods)

• 15 proteins identified by both methods– 7 of 8 “knowns” identified by both methods

(Dap2, Kex1, Ppe1, Prb1, Prc1, Ste13, and Yjl068c)

– 8 novel annotations of proteins as serine hydrolases (Eht1, Yju3, Ybr139w, Ybr204c, Yhr049w, Ylr118c, Ymr222c, and Yor280c)

• All 8 annotated as “function unknown” or “hypothetical protein” in SGD

• High confidence in novel annotations (two independently applied methods)

What about the “unknowns”?

New Family of Eukaryotic Serine Hydrolases (FSH)

• 3 yeast proteins (Yhr049w, Ymr222c, and Yor280c) identified by both ABP and FFFs

• 3 sequences related by sequence similarity

• All annotated as “function unknown” at SGD

• None annotated with confidence by other computational methods (Prints, Pfam or Blocks)

New Family of Eukaryotic Serine Hydrolases (FSH)

• These 3 proteins related to proteins from other eukaryotic proteomes (human, mouse, worm, fruit fly, mosquito, plant)

• No NCBI biochemical annotations for any of these proteins (except one—see next slide)

Cautionary Tale for Annotation Transfer

• One FSH protein, DYR_SCHPO, from S. pombe was annotated as a dihydrofolate reductase (DHFR)

• Sequence analysis indicates a multidomain protein: contains both DHFR and serine hydrolase function– Possible biological connection between serine hydrolase and

DHFR functions?

• Annotation transfer methods would have assigned incorrect function to FSH family of proteins

S. cerevisiae Fsh3p

S.pombe DYR_SCHPO

S.cerevisiae Fsh2p

S. cerevisiae Fsh1p

S. cerevisiae DHFR

Comparison to other computational methods: How much information does structure add?

• ABPs identified 23 proteins with high confidence• FFFs identified 15 (65%) as serine hydrolases• Pfam identified 10 (43%) as serine hydrolases

0

5

10

15

20

25

All Experimental Hits Experimental Hits with SGD"molecular function

unknown"

Total

FFF

Pfam

• 15 serine hydrolase sequences identified by both methods– 7 of 8 known serine hydrolases identified by both

methods (all eight identified by ABP labeling)– 8 new serine hydrolases identified (formerly

annotated as “function unknown”)– New family of eukaryotic serine hydrolases (FSH)

• FFF annotation clarifies molecular function of the three proteins identified by ABP labeling

• More accurately identify limits of FFF and active site profiling accuracy– If 23 ABP identifications are correct, FFF correctly

identifies function of 78%

Summary of yeast serine hydrolase annotation project

Baxter, et al. (2004) Mol. Cell Prot.

Structure-based annotation of protein function

• Prospective experimental validation of predictions demonstrates accuracies (and limitations) of current methods

• Mis-annotation of function continues to be a problem—found in all databases

• Results suggest that a significant number of proteins will exhibit well-studied functions, but are not identified by current computational methods

• Profiling of sequences around functional site provides additional information on function and specificity

Acknowledgements

– Susan Baxter (NCGR)– Melanie Nelson (SAIC)– Stephen Cammer (SDSC)– Brian Hoffman (Scitegic)– Jen Montimurro (Wadsworth Ctr)– Stacy Knutson (Wake Forest)– Jeff Speir (Scripps)– Jeannine DiGennaro (GeneVault)– Steve Betz (Neurocrine)– Marijo Galina– Susan Okuley– Chris Scott

ActivX– Jonathan Burbaum– Jonathan Rosenblum– Dan Giang

(now Cengent Therapeutics)

Documents

Structure-based Analysis of Protein Function PTPs and Serine Hydrolases Jacquelyn S. Fetrow Wake Forest University Jacquelyn S. Fetrow Reynolds Professor