48
System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark [email protected] www.cbs.dtu.dk

System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark [email protected]

Embed Size (px)

Citation preview

Page 1: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

System approaches to the prediction of protein function

Søren BrunakCenter for Biological Sequence AnalysisTechnical University of [email protected]

Page 2: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

40-60% proteins of unknown function in the human genome

Page 3: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

134

109

54

19

17

138 7 17

Molecular_function unknown (134)

Catalytic activity (109)

Binding (54)

Enzyme regulator activity (19)

Transcription regulator activity (17)

Structural molecule activity (13)

Transporter activity (8)

Motor activity (7)

Signal transducer activity (7)

Chaperone activity (1)

Diverse functional categories of cell cycle regulated yeast proteins

Level 1 GO categories for 349 cell cycle regulated yeast genes. Only 95 of these belong to the ”Cell Cycle” category (biological process).

Page 4: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

77

51

35

54 4 3 1

Binding (77)

Structural molecule activity (51)

Catalytic activity (35)

Chaperone activity (5)

Enzyme regulator activity (4)

Transporter activity (4)

Transcription regulator activity (3)

Translation regulator activity (1)

Diverse functional categories for human nucleolus proteins

Level 1 GO categories for 148 human genes located in the nucleolus. Only 5 of these belong to the ”Nucleolus” category (cellular component).

Page 5: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Pairwise alignment>carp Cyprinus carpio growth hormone 210 aa vs.

>chicken Gallus gallus growth hormone 216 aa

scoring matrix: BLOSUM50, gap penalties: -12/-2

40.6% identity; Global alignment score: 487

10 20 30 40 50 60 70

carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD

:: . : ...:.: . : :. . :: :::.:.:::: :::. ..:: . .::..: .: .:: :.

chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE

10 20 30 40 50 60 70 80

80 90 100 110 120 130 140 150

carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN

: ::.:::..:..: ..:::.:. ::.:: : : ::. .:.:. :. ... ::: ::. ::..:.. : .: .

chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G

90 100 110 120 130 140 150 160

170 180 190 200 210

carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL

.: : .. : . . .:. : ... ::.:::::.:::::::.: .::: .::::.

chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI

170 180 190 200 210

Page 6: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

An enzyme (1AOZ) and a non-enzyme (1PLC) from the Cupredoxin superfamily

Page 7: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

1AOZ (129 aa) vs. 1PLC (99 aa)scoring matrix: BLOSUM50, gap penalties: -12/-215.5% identity; Global alignment score: -23

10 20 30 40 50 601AOZ SQIRHYKWEVEYMFWAPNCNENIVMGINGQFPGPTIRANAGDSVVVELTNKLHTEGVVIH .. .. : ... . . ..: . :...: . .: ...:. 1PLC ---------IDVLLGA---DDGSLAFVPSEFS-----ISPGEKIVFK-NNAGFPHNIVFD 10 20 30 40

70 80 90 100 110 1201AOZ WHGILQRGTPWADGTASISQCAINPGETFFYNFTVDNPGTFFYHGHLGMQRSAGLYGSLI .: :. . . : . :::: .. . .:. : : ::. :.. 1PLC EDSI-PSGVDASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQG----AGMVGKVT 50 60 70 80 90

1AOZ VDPPQGKKE :. 1PLC VN-------

Page 8: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Transfer of functional information – in what space ?

Recognize function in:

Sequence space – sequence alignment

Structure space – structural comparison

Gene expression spaces – array data

Interaction spaces – network/pathway

extraction

Paper space – text mining

Protein feature space

Page 9: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Predict orphan protein function in feature space

Orphan sequences have to use the standard cellular machinery for sorting, post-translational modification, etc.Similar pattern of modification may imply similar functionPredict sequence attributes independently, e.g. local and global properties such as

- post-translational modifications - localization signals - degradation signals - structure - composition, length, isoelectric point, ….

Then integrate and correlate using neural networks

Page 10: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Acceptor site Pos. Target AKKG S EQES S-10 PKA (1CMK)GFGD S IEAQ S-87 Ovalbumin (1OVA)EVVG S AEAG S-350 Ovalbumin (1OVA)GDLG S CEFH S-80 Cystatin (1CEW)

Serine phosphorylation sites

Page 11: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk
Page 12: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk
Page 13: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Length distributions

and functional role categories

Page 14: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Propeptide cleavage sites

Post-translational processing by limited proteolysis of inactive secretory precursors produces active proteins and peptides

Furin specific (a) and otherproprotein convertasecleavage sites (b)

Page 15: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

PCs activate a large variety of proteins

Peptide hormones, neuropeptides, growth and differentiation factors, adhesion factors, receptors, blood coagulation factors, plasma proteins, extracellular matrix proteins, proteases, exogenous proteins such as coat glycoproteins from infectious viruses (e.g. HIV-1 and Influenza) and bacterial toxins (e.g. diphtheria and anthrax toxin).

PCs play an essential role in many vital biological processes like embryonic development and neural function, and in viral and bacterial pathogenesis. PCs are implicated in pathologies such as cancer and neurodegenerative diseases.

Page 16: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Mucin-type O-glycosylation

N-acetylgalactosamine (GalNAc) -1 linked to the hydroxyl group of a serine or threonine

Responsible for the high carbohydrate content of mucin proteins (>50% of the dry weight)

Mucins, principal component of mucus, protects epithelial surfaces from dehydration, mechanical injury, proteases and pathogens

Mucin-type glycosylation contributes to this by changing the structure to a stiff extended one and charging the protein to make it bind more water

Page 17: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Mucin-type O-glycosylation site conservation

Page 18: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Positional preference of N-Glyc sites across cellular role categories

Page 19: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Functional classes predicted

Functional role (Monica Riley categories)• The original scheme had 14 categories• Reduced to 12 categories by skipping the category

”other” and combining replication and transcription

Enzyme prediction• Enzyme vs non-enzyme• Major enzyme class in the EC system

Gene Ontology • A subset of classes can be predicted

Systems biology related categories• For example ’cell cycle regulated’, secreted, nucleolar

Page 20: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Predicting Gene Ontology categories

The GO system is designed for proteins to belong to multiple classes rather than oneDifferent kinds of function can be annotated:• Molecular function• Biological process• Cellular component

GO assigns the ”function” at several levels of detail rather than only one

Page 21: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

The concept of ProtFun

Predict as many biologically relevant features as we can from the sequence

Train artificial neural networks for each category

Assign a probability for each category from the NN outputs

Page 22: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk
Page 23: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk
Page 24: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk
Page 25: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

An enzyme (1AOZ) and a non-enzyme (1PLC) from the Cupredoxin superfamily

Page 26: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

1AOZ and 1PLC predictions# Functional category 1AOZ 1PLC Amino_acid_biosynthesis 0.126 0.070 Biosynthesis_of_cofactors 0.100 0.075 Cell_envelope 0.429 0.032 Cellular_processes 0.057 0.059 Central_intermediary_metabolism 0.063 0.041 Energy_metabolism 0.126 0.268 Fatty_acid_metabolism 0.027 0.072 Purines_and_pyrimidines 0.439 0.088 Regulatory_functions 0.102 0.019 Replication_and_transcription 0.052 0.089 Translation 0.079 0.150 Transport_and_binding 0.032 0.052

# Enzyme/nonenzyme Enzyme 0.773 0.310 Nonenzyme 0.227 0.690

# Enzyme class Oxidoreductase (EC 1.-.-.-) 0.077 0.077 Transferase (EC 2.-.-.-) 0.260 0.099 Hydrolase (EC 3.-.-.-) 0.114 0.071 Lyase (EC 4.-.-.-) 0.025 0.020 Isomerase (EC 5.-.-.-) 0.010 0.068 Ligase (EC 6.-.-.-) 0.017 0.017

Page 27: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Similar structure different functions

Many examples exist of structurally similar proteins which have different functions

Two PDB structures from the Cupredoxin superfamily • 1AOZ is an ascorbate oxidase (enzyme)• 1PLC is performing electron transport (non-enzyme)

Despite their structural similarity, our method predicts both correctly

Page 28: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Performance on Gene Ontology categories (worst case)

Page 29: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Example: Eukaryotic Cell CycleEukaryotic Cell Cycle

Systems Biology – Whole system description

• Focus on whole systems, rather

than individual units

• Requires identification of all units

in the system

• High diversity in biological

systems

• Inference of system

features/functions from

experimental data

• Ultimate goal is in-silico modeling

of the temporal aspects of the

cell cycle in different organisms

Page 30: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Microarray identification of periodic genes

Synchronous

Yeast cells DNA chips Gene expression Temporal expression

Look for those with a periodic expression

Periodic

? ? ? ? Non-Periodic

Page 31: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

70% 91% 47% 104 known genes

1) Visual inspection of expression profiles (Cho et al., 1998) 2) Fourier analysis and correlation with profiles of known genes (Spellman et al., 1998)3) Statistical modeling (single pulse model) (Zhao et al., 2001)

Problems• Cho uses non-objective criteria• Spellman identifies too many genes• Zhao identifies less than half of previous identified cell cycle regulated genes

Identification of periodicly expressed genes

Page 32: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Sequence based ’’machine learning approach’’

LearnLearn {consistensy

filterPeriodic genesPeriodic genes

Non-periodic genesNon-periodic genes

? ? Grey zone areaGrey zone area

(~5600 gener)

Positive setPositive set

(97 sequences)(97 sequences)

Negative setNegative set

(556 sequences)(556 sequences)

6200 genes

Our novel strategy

Page 33: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Prediction of cell cycle regulated genes from protein sequence

Page 34: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Features of cell cycle regulated genes used by neural net ensemble

Page 35: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Non-linear function prediction! Responds to single AA change

Page 36: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

ORF ANN F-score Intensity Protein functionYIL169C 0,98 2,8 176 Protein of unknown functionYNL322C 0,98 1,7 870 Cell wall protein needed for cell wall beta-1,6-glucan assemblyYJL078C 0,98 5,5 86 Protein that may have a role in mating efficiencyYDL038C 0,98 5,3 165 Protein of unknown functionYOL155C 0,97 3,0 391 Protein with similarity to glucan 1,4-alpha-glucosidaseYJR151C 0,97 1,3 251 Member of the seripauperin (PAU) familyYLR286C 0,97 9,3 520 EndochitinaseYOL030W 0,97 4,1 817 Protein with similarity to Gas1pYOR220W 0,97 2,5 340 Protein of unknown functionYNR044W 0,97 6,5 172 Anchor subunit of a-agglutininYGR023W 0,97 1,8 129 Signal transduction of cell wall stress during morphorgenesisYDL016C 0,97 0,8 338 Protein of unknown functionYDL152W 0,97 1,0 156 Protein of unknown functionYPR136C 0,97 1,1 76 Protein of unknown functionYGR115C 0,97 1,0 71 Protein of unknown function, questionable ORFYMR317W 0,97 2,1 260 Protein of unknown functionYCR089W 0,97 3,4 104 Protein involved in mating inductionYLR194C 0,96 5,4 1870 Protein of unknown functionYIL011W 0,96 2,6 565 Member of the seripauperin (PAU) familyYGR161C 0,96 2,4 190 Protein of unknown functionYBR067C 0,96 5,9 825 Cold- and heat-shock induced mannoprotein of the cell wallYNL228W 0,96 1,9 250 Protein of unknown function; questionable ORFYNL327W 0,96 8,7 1320 Cell-cycle regulation protein involved in cell separationYLR332W 0,96 1,5 642 Putative sensor for cell wall integrity signaling during growthYNR067C 0,96 6,3 222 Protein with similarity to endo-1,3-beta-glucanase

Page 37: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

unknownkinase & phosphatase

transcription

RNA binding

Serine rich

hydrolase

other

unknown

wall

nuclear

membrane

cytoplasmic

cytoskeleton other

Subcellular localizationFunctional grouping

Among the ”top 250 predicted” genes not used for training are• 75 previous identified as cell cycle regulated genes• 175 new potentially cell cycle regulated genes

Top 250 genes predicted from the entire genome

Page 38: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Experimental validation results

More than 100 new periodic genes identified/validated

For many of them, a role in the cell cycle is supported by other sources of evidence

About 30% of them have no known functional role

Gene p-valueNeural

Network score

GO Biological Process & Gene Description

Gene A 0.0009 0.76 Regulates the cell size requirement for passage through Start and commitment to cell division

Gene B 0.0026 0.70 cyclin involved in G1/S transition of mitotic cell cycle

Gene C 0.0081 0.59 Involved in cell cycle dependent gene expression

Gene D 0.0111 0.76 cell wall organization and biogenesis*

Gene E 0.0142 0.90 Required for spindle pole body duplication and a mitotic checkpoint function.

Gene F 0.0169 0.85 DNA repair*

Gene G 0.0192 0.74 G1/S transition of mitotic cell cycle*

Gene H 0.0222 0.76 DNA repair*

Gene I 0.0247 0.75 cellular morphogenesis*

Gene J 0.0255 0.81 regulation of exit from mitosis

Gene K 0.0353 0.46 Protein with similarity to putative glycosidase of the cell wall

Gene L 0.0482 0.74 G2/M transition of mitotic cell cycle*

Gene M 0.0520 0.81 chromatin assembly/disassembly*

Gene N 0.0630 0.92 actin cytoskeleton organization and biogenesis*

Page 39: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

High confidence set

Page 40: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

The eukaryotic cell cycle

The cell division process is divided into four phases:

• G1 growth/synthesis

• S replication of DNA

• G2 growth/synthesis

• M mitosis/cell division

Page 41: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Temporal variation in feature space

Page 42: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

S phase ?

40% into the cell cycle the plots shows:

• High isoelectric point

• Many nuclear proteins

• Short proteins

• Low potential for N-glycosylation

• Low potential for Ser/Thr-phosphorylation

• Few PEST regions

• Low aliphatic index

S phase feature snapshot

Page 43: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Name

Fsc

ore

Avg

. In

t.

pI

Leng

th

Protein function or role

IRS4 0,98 122 9,8 615 Protein involved in silencing of ribosomal DNA

SHE1 2,09 60 10,4 338 Protein that causes lethality when overexpressed

HHT1 8,89 2920 11,4 136 Histone H3, identical to Hht2p

YGR079W 1,06 194 5,4 370 Protein of unknown function

HTB1 9,68 1171 10,1 131 Histone H2B

MKC7 2,00 533 4,6 596 Aspartyl protease found in the periplasmic space

YNL228W 1,92 250 4,9 258 Protein of unknown function; questionable ORF

HTB2 9,70 1071 10,1 131 Histone H2B, nearly identical to Htb1p

HHF2 9,18 1955 11,4 103 Histone H4, identical to Hhf1p

TOF2 4,15 270 8,0 771 Protein that interacts with DNA topoisomerase I

ENT4 1,47 73 9,4 247 Protein of unknown function

HTA1 9,82 1340 10,7 132 Histone H2A, nearly identical to Hta2p

HHT2 7,86 2084 11,4 136 Histone H3, core component of the nucleosome

YPL150W 0,66 95 9,4 901 Serine/threonine protein kinase with unknown role

YKR045C 1,01 242 11,0 191 Protein of unknown function

YNR014W 1,80 312 8,7 212 Protein of unknown function

HHO1 9,17 625 10,2 258 Histone H1

S phase peaking genes

Page 44: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Identify areas where prediction approaches can clean up noisyexperimental data

• High-throughput proteomics data• DNA array data

Strength of prediction approaches can indeed be complementary to the experimental data due toexperimental constraints

Generate hypotheses on the dynamics of protein feature space, e.g. the periodicity of the phospho-proteome.

Page 45: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk
Page 46: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk
Page 47: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Acknowledgements

People at CBS

• Lars Juhl Jensen• Ramneek Gupta• + 20 others

• Karin Julenius (O-glyc conservation)

• Thomas Skøt Jensen (cell cycle)• Ulrik de Lichtenberg (cell cycle) • Rasmus Wernersson (Febit experiments)

• Jannick Bendtsen (SecretomeP)• Lars Kiemer (NucleolusP)• Anders Fausbøll (NucleolusP)

• Thomas Schiritz-Ponten (new ProFun method)

Febit AG• Peer Smith

CNB/CSIC, Madrid • Alfonso Valencia• Javier Tamames• Damien Devos

Gunnar von Heijne, Stockholm (SecretomeP)

Page 48: System approaches to the prediction of protein function Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

Referenceswww.cbs.dtu.dk/services/Protfunwww.cbs.dtu.dk/cellcycle

L.J. Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H.H. Stærfeldt, K. Rapacki, C. Workman, C.A.F. Andersen, S. Knudsen, A. Krogh, A. Valencia, and S. Brunak, "Prediction of human protein function from post-translational modifications and localization features", J. Mol. Biol., 319, 1257-1265, 2002.

L.J. Jensen, M. Skovgaard, and S. Brunak, "Prediction of novel archaeal enzymes from sequence derived features", Protein Sci., 11, 2894-2898, 2002.

L.J. Jensen, R. Gupta, H.-H. Stærfeldt, and S. Brunak, "Prediction of human protein function according to Gene Ontology categories", Bioinformatics, 19, 635-642, 2003.

L.J. Jensen, D.W. Ussery, and S. Brunak, "Functionality of system components: Conservation of protein function in protein feature space", Genome Res., Oct 14, 2003.

U. de Lichtenberg, T.S. Jensen, L.J. Jensen, and S. Brunak, Protein feature based identification of cell cycle regulated proteins in yeast, J. Mol. Biol., 13, 663-674, 2003.