Transcript
Page 1: Mining protein lists from proteomics studies: applications for drug discovery

1. Introduction

2. General overview of protein

lists reported by proteomics

studies

3. Local properties of reported

protein lists

4. Global properties of reported

protein lists

5. Expert opinion: mining protein

lists for drug discovery

Review

Mining protein lists fromproteomics studies: applicationsfor drug discoveryAlexey V AntonovInstitute for Bioinformatics and Systems Biology, Helmholtz Zentrum Munchen -- German Research

Center for Environmental Health (GmbH), Neuherberg, Germany

Importance of the field: In recent years, proteomics has become a common

technique applied to a wide spectrum of scientific problems, including the

identification of diagnostic biomarkers, monitoring the effects of drug

treatments or identification of chemical properties of a protein or a drug.

Although being significantly different in scientific essence, the ultimate

result of the majority of proteomics studies is a protein list. Thousands of

independent proteomics studies have reported protein lists in various

functional contexts.

Areas covered in this review: We review here the spectrum of scientific prob-

lems where proteomics technology was applied recently to deliver protein

lists. The available bioinformatics methods commonly used to understand

the properties of the protein lists are compared.

What the reader will gain: The types and common functional properties of

the reported protein lists are discussed. The range of scientific problems

where this knowledge could be potentially helpful with a focus on drug

discovery issues is explored.

Take home message: Reported protein lists represent a valuable resource

which can be used for a variety of goals, ranging from biomarkers discovery

to identification of novel therapeutic implications of known drugs.

Keywords: databases, gene list, PLIPS, protein list, proteomics, text mining

Expert Opin. Drug Discov. (2010) 5(4):323-331

1. Introduction

In recent years, several proteomic platforms have been developed and applied suc-cessfully to help understand the cell proteome [1]. The term proteomics itself hasbeen constantly expanding acquiring novel meaning. Although the approach ini-tially was commonly used to characterize all proteins within a given cell, at themoment, many researchers are also taking advantage of proteomic technology toreveal changes in the concentration of proteins between different cell physiologicalconditions [2-4]. Another direction commonly referred to as proteomics is ‘sub-proteomic’ [5]. In this case, a subset of proteins, sharing specific characteristics, isisolated from a complex mixture of proteins. The sub-proteomic approach relieson enrichment techniques for isolation of proteins with similar characteristics orbiophysical and chemical properties (e.g., isoelectric point, molecular mass,cellular compartment).

The spectrum of problems covered by proteomics studies continues to expandrapidly. Proteomic analyses have recently been conducted on tissues, biofluids andsubcellular components in both animal models and humans [1]. Various clinicalapplications of proteomics, including the identification of prognostic and earlierdiagnostic markers and monitoring the effects of drug treatments, are of particularinterest [5]. In addition, proteomics is frequently combined with other genomics

10.1517/17460441003716796 © 2010 Informa UK Ltd ISSN 1746-0441 323All rights reserved: reproduction in whole or in part not permitted

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.

Page 2: Mining protein lists from proteomics studies: applications for drug discovery

and/or metabolomic technologies to profile molecular cellmechanisms at a systems biology level [6].There are several databases for the purpose of capturing and

disseminating proteomics data [7,8]. For example, the PRIDEdatabase [9,10] has been developed to provide a standards-compliant repository for mass spectrometry-based proteomicsdata comprising identifications of proteins, peptides and post-translational modifications. Additionally, public repositoriescollect a lot of valuable supplemental information aboutexperimental set up and technical parameters.Independent of the primary target of the proteomics study

or proteomics technology used, in most cases, experimentalpart delivers a list of proteins found to be expressed (or differ-entially expressed) in the context of studied biological phe-nomena. For example, we identified >400 papers publishedin the last 5 years in the ‘Proteomics’ journal that report oneor several lists of proteins (in total, ‘Proteomics’ publishedabout 2000 papers for this period). Being publicly available,this information at the same time was disseminated in hun-dreds of papers. Recently, we developed a Web mining tool,which collected this information. By searching through fulltext papers, it automatically selects tables with a list of proteinidentifiers. This information was compiled into the PLIPSdatabase [11]. Currently, the database covers about 1500different protein lists which have been reported by ~ 1200independent proteomics studies.We review here the general and individual properties of

reported protein lists. We also show the ways the PLIPS data-base can be utilized to deliver novel hypotheses in variousclinical contexts. The spectra of potential applications include

typical biomarker discovery projects and the search for noveltherapeutic implications of known or developing drugs.

2. General overview of protein lists reportedby proteomics studies

Although the spectrum of reported protein lists covers a vari-ety of different biological, clinical and chemical issues, mostof the proteomics studies can be grouped into a relativelysmall number of classes. According to the biological essenceof the studied phenomena, we can generally split reportedprotein lists into the following five classes:

• proteins specifically expressed in a tissue• proteins specifically expressed in a cell compartment• proteins differentially expressed between different celltypes

• proteins differentially expressed between treated/untreated cells

• proteins with a common chemical property.

Of course, there are a number of specific cases when it is hardto assign the study to a particular class as well as many cases inwhich the study can often be attributed to several classes.Next, we give a brief overview of each class.

2.1 Proteins expressed in a specific cell typeA number of projects have been launched recently to gener-ate a catalogue of proteins specifically expressed in differentnormal human tissues [12]. Therefore, a considerable share ofprotein lists reported recently represents this type of proteo-mics studies. For example, the HUPO Proteome Projects areinitiatives coordinating proteomics studies to characterizehuman proteomes of different tissues [13,14]. For example, anon-redundant set of 1804 proteins was identified in humanbrain samples [13]. At the moment, most of human tissueshave been profiled, including such special ones as ‘enamelpellicle’ [15].

Some of these projects were focused mainly on technologi-cal issues of protein extraction and/or purification. A remark-able example is the study in [16], where the results of proteinextraction from different ocular regions using different deter-gents were compared. It was demonstrated that the extractionstrategy may affect the final outcome in protein profiling bymass spectrometry (MS) or by other methods.

Although being commonly considered to be of less impor-tance for drug discovery projects, this class of retrieved proteinlists can be also of great value. As we demonstrate further, theyshould be also accounted for to reduce the risk of potentialadverse drug effect.

2.2 Proteins specifically expressed in a cell

compartmentThe next issue which was abundantly addressed byproteomics studies is understanding the distribution of

Article highlights.

• Proteomics has become a common technique applied toa wide spectrum of scientific problems. Although beingsignificantly different in scientific essence, the ultimateresult of the majority of proteomics studies is a proteinlist.

• Thousands of independent proteomics studies havereported protein lists in various functional contexts.

• The PLIPS database is a collection of proteomics paperswhich reported a protein list.

• According to the biological essence of the studiedphenomena, the reported protein lists can be split intofive classes.

• Analyses of global properties of reported protein listsindicate that most reported protein lists are highlydependent. On an average, each list shares a significantsubset of proteins with >20 other protein lists.

• Significant similarities between protein lists can beindicative of similarity in molecular mechanismsbetween corresponding phenomena.

• Information from PLIPS can be of great value for drugdiscovery projects in various contexts: i) to select aproper protein target and ii) to identify new therapeuticimplications of novel and known drugs.

This box summarizes key points contained in the article.

Mining protein lists from proteomics studies: applications for drug discovery

324 Expert Opin. Drug Discov. (2010) 5(4)

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.

Page 3: Mining protein lists from proteomics studies: applications for drug discovery

protein expression across cell compartments. In comparison tothe previous class, additional steps in sample preparationmay be required to separate proteins from different cellcompartments.

Proteins embedded into the plasma membrane largely definethe cell functionality. There were a number of proteomics stud-ies which delivered plasma membrane-containing fractions ofproteins for different cell types [17,18]. Also, specific sub-membrane fractions, such as lipid rafts, were reported [19,20].Lipid rafts are glycolipid- and cholesterol-enriched membranemicrodomains implicated in membrane signaling and traffick-ing. The mitochondria is also an intensively explored cell com-partment [21] as well as several other compartments [22,23].The knowledge of compartment specific cell proteome can beof use for a variety of scientific issues.

2.3 Proteins differentially expressed between

different cell typesDifferent factors can significantly affect the landscape of thecell proteome. The next logical step forward would be a con-struction of protein catalogues not only for normal tissues orcell compartments but also for tissues at abnormal conditions.The comparative analyses of the proteomes of normal cellsversus abnormal cells often lead to the inference of differen-tially expressed proteins. Numerous examples of proteomicsstudies covering different clinical issues and delivering, as anoutput, the lists of differentially expressed proteins are avail-able at the moment. Most of them were aimed to discoverbiomarkers for early disease detection [24-27], for stratificationdisease into distinct subtypes [28-30] and/or for monitoringdisease progression [31-33].

Different types of cancer are regular targets of proteomicsstudies which deliver lists of up- or downregulated pro-teins [24,25,32]. The discovery of suitable biomarkers for earlydetection promises significant improvements in clinical out-comes for cancer patients. However, despite the recent prog-ress in proteomics technologies, one should be cautious ofinterpretations. This is partly due to the inheritedcomplexity of cancer where almost every case of disease is par-tially unique on the molecular level. The second reason relatesto the inherent biases of the whole technological chain, frompreparation of biospecimens to protein detection by MS.

2.4 Proteins differentially expressed between

treated/untreated cellsSystematic investigation of the mechanisms of drug action rep-resents a large share of proteomics studies whose primary out-put is a protein list. Proteomics can be considered as a verysuitable technique to quantify cell response to exposure to adrug [34,35] or to the other clinically related environmentalconditions [36,37]. The discovered protein lists can shed newinsights for a better understanding of the mechanisms of drugaction, such as induction of apoptosis or activation of otherdisease-related signaling or regulatory/metabolic pathways.

A large number of small molecules with a wide spectrum ofproposed mechanism of action have been explored [34-36,38]. Inaddition, cell response to silencing or overexpression of somepotential targets for anticancer therapy (transcription factors,phosphorylation kinases) that regulates cell-cycle progressionor apoptosis was quantified [39]. The inferred protein lists inthese studies can be used for a variety of purposes in drugdiscovery projects.

2.5 Proteins with a common chemical propertyA number of proteomics projects were devoted to deliverproteins with common chemical properties [40-46]. A signifi-cant share of such studies is devoted to identification of pro-teins subjected to post-translational modifications. Forexample, proteomics was used intensively to study proteinphosphorylation in the cell [40,41]. Reversible phosphoryla-tion of proteins is a key mechanism for control of signaltransduction. Phosphorylation of proteins is known toregulate enzymatic activity, subcellular localization,protein--protein interaction (PPI) and degradation of pro-teins. In [40], 118 tyrosine phosphorylated proteins wereidentified by coupling stable isotope labeling with aminoacids in cell culture to mass spectrometry.

The objective of another study [42] was to identify targetsfor S-nitrosylation in human sperm. Spermatozoa were incu-bated with nitric oxide donors and S-nitrosylated proteinswere identified using the biotin switch assay and a proteomicapproach using tandem mass spectrometry (MS/MS). In total,240 S-nitrosylated proteins were detected in sperm incubatedwith S-nitroso-glutathione.

Another very promising direction is the application of prote-omics to identify the whole genome-binding spectra for a givenmolecular probe [47]. It is frequently referred to as chemical pro-teomics or activity-based proteomics. Molecular probes areused to target a selective group of functionally-related proteins.An affinity chromatography protocol is used to selectproteins with binding potential. In the next step, the MS/MStechnique is applied to identify the recovered proteins.

Chemical proteomics was used in [46] to identify thenucleotide-binding proteome of active and resting platelets.Affinity chromatography protocol using immobilized adeno-sine triphosphate, cyclic adenosine monophosphate andcyclic guanosine monophosphate was used. Several plateletproteins that show statistically significant difference betweenthe active and resting nucleotide-binding proteome werereported.

This type of proteomics study was used to identify thewhole genome-binding proteome for several available drugsor drugs in the development phase [43-45]. For example, thestudy in [43] reported protein targets of bosutonib, a promis-cuous kinase inhibitor. Bosutinib (SKI-606) is an ATP-competitive third generation kinase inhibitor for the treat-ment of chronic myeloid leukemia (CML) and is currentlyunder clinical trials for the treatment of CML (Phase III)and breast cancer (Phase II).

Antonov

Expert Opin. Drug Discov. (2010) 5(4) 325

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.

Page 4: Mining protein lists from proteomics studies: applications for drug discovery

3. Local properties of reported protein lists

A common challenge faced by experimental researchers is tounderstand the role of the discovered proteins in a biologicalcontext. At the moment, a vast amount of knowledge is accu-mulated in relation to individual protein function, biologicalrole and chemical properties [48]. Several public databases areavailable [49,50] which store information about protein interac-tions in the context of regulatory, signaling or metabolicpathways. A lot of experimental data is available for binaryPPIs and the composition of protein complexes [51]. All thisinformation was used previously to establish local functionalproperties of inferred protein lists.

3.1 Enrichment analysisA widely accepted strategy is to infer biological processes thatare statistically overrepresented in the inferred list of proteins.The Gene Ontology (GO) database or the Kyoto Encyclope-dia of Genes and Genomes (KEGG) were commonly used asreference knowledge. The number of proteins (from the pro-tein list) found in each GO category (or KEGG pathway) iscompared with the number of genes expected to be found inthe given category by chance. If the observed and expectednumbers are substantially different, the category is reportedas enriched [52-55].As pointed out in [56], based on statistics from ~ 300 protein

lists reported recently in the Proteomics journal, enrichmentanalysis usually is inefficient when the size of analyzed proteinlist is small (<30). In the majority of such cases (>70%), nota single GO term was found to be significantly enriched.When the size of reported protein list is >50, it was shownthat in most cases (>80%) enrichment analysis was able to pro-vide some insights into the functional context, for example, atleast one GO term related to biological processes was signifi-cantly overrepresented. However, the coverage of the largestmodel was <15% of the size of the reported protein list. Thismeans that for the protein list of a size about 50, in the bestcase only 8 proteins would be classified by the same biologicalprocess; the role of the other 40 proteins would be unclear. Pro-tein lists derived from proteomics studies, which usuallyaddress specific biological questions (Sections 2.3, 2.4, 2.5)are frequently of small size (10 -- 50 proteins) in comparisonto genome-wide proteomics studies (Section 2.1). Therefore,the usage of enrichment analysis for understanding of the localproperties of a given list of differentially expressed proteins orproteins with a common chemical property is expected to bevery inefficient.

3.2 PPI network and protein listsAlthough the knowledge of overrepresented GO termsor KEGG pathways is helpful, it resolves only partially themolecular mechanism relevant to the explored list ofproteins. PPI data represent abundant information that isoften used for the interpretation of proteomics studies.From many perspectives, this information is more suitable

as it is expected that the identified proteins must beinvolved in cooperative activities.

Several experimental proteomics studies [57-59] use PPI net-works to interpret inferred protein lists. By mapping proteinsto PPI network and using visualization capacities, the authorsusually can demonstrate that the identified proteins lie closelyon the PPI network. However, visual analysis of graphicalrepresentation of interacting proteins gives only an intuitivefeeling that discovered proteins are related. Taking intoaccount the density of the PPI networks, one must not under-estimate the value of the statistical treatment. Even for a ran-domly generated protein list, it is possible to connect manyproteins from the list into a sub-network via one or two inter-mediate partners (proteins that are not on the experimentallydiscovered list).

As shown in [56], a network-based interpretation of a pro-tein list in the context of PPI networks can be very efficient.It is almost independent of the size of protein list and providesgood coverage (about 75% of the input protein list) for bothsmall and large size reported lists. In addition, a global view ofprotein relations is provided which is not limited to the size ofan individual pathway or GO category. The graphical repre-sentation of the inferred network models is amenable tofurther analyses.

According to an estimate based on about 1500 protein listsstored in PLIPS [11], >80% of recently reported protein listshave statistically significant share of proteins that form non-interrupted sub-network of interacting (according toavailable PPI data) protein pairs.

3.3 The metabolism component in disease specific

protein listsBased on our personal experience, in comparison to genelists reported by genomics studies (http://mips.helmholtz-muenchen.de/proj/ccancer/), protein lists reported by proteo-mics studies [11] contain much more frequently large numbersof metabolism-related proteins. Analysis of 16 disease-specificprotein lists (proteins differentially expressed between normal/disease cells or differentially expressed proteins in responseto treatment) can be found in [60]. In many cases, deregulatedmetabolism-related proteins were from several canonicalKEGG pathways. At the same time, the proteins can be orga-nized into a non-interrupted (a maximum of one or twonodes are missing) disease-specific network that runs throughseveral canonical pathways. The results support a hypothesisthat disease-specific metabolism proteins in most cases arenot functionally independent. Deregulated proteins fromdifferent pathways are linked to each other via consecutiveone- or two-step metabolic reactions.

4. Global properties of reported protein lists

To understand common global properties of reported lists weused PLIPS. The PLIPS database is a collection of proteomicspapers which reported a protein list. PLIPS covers papers

Mining protein lists from proteomics studies: applications for drug discovery

326 Expert Opin. Drug Discov. (2010) 5(4)

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.

Page 5: Mining protein lists from proteomics studies: applications for drug discovery

published during the last 7 years in major proteomic journals:Proteomics, Journal of Proteome Research, Molecular & Cel-lular Proteomics and Proteomics -- Clinical Applications.From each paper, the tables are extracted and protein/geneidentifiers are searched. Only those tables which contained>10 unique protein/gene identifiers of the same type(‘UniProt/Swiss-Prot’, ‘Gene Symbol’, ‘Ensembl’, ‘RefSeqProtein ID’, ‘RefSeq Transcript ID’) are selected. The col-lected protein/gene lists are systematically organized to allowvarious types of analyses.

4.1 Cross-reference analysis of reported protein listsAnalyses of global properties of reported protein lists indicatethat most reported protein lists are highly dependent. Onaverage, each list shares a significant subset of proteins with>20 other protein lists. Each protein list from the PLIPSwas examined versus all other identified protein lists to findcommonly shared proteins and to link together those thathave statistically significant similarity. This information isavailable at PLIPS and can be browsed online.

According to the widely accepted guilt-by-associationprinciple, significant similarities between protein lists canbe indicative of similarity in molecular mechanisms betweencorresponding phenomena. Indeed, browsing PLIPS (http://mips.helmholtz-muenchen.de/proj/plips/human/human.1.html)one can see that list reported in cancerous context are gene-rally associated with other cancer related proteomics studies.These relations are expected and can be used to identifysimilar molecular cancer phenotypes.

However, in some cases to some extent unexpected associa-tions can be found. For example, in [61] hepatic proteins withcopper-binding ability were isolated using a proteomics strat-egy. In total, 48 cytosolic proteins and 19 microsomal pro-teins displaying copper-binding ability were reported. About100 protein lists from PLIPS share a significant number ofcommon proteins with the 48 reported cytosolic proteins.Most of them were reported in a cancerous context. Therole of copper in the development of cancer is not completelyclear. It is known that the cancer patients usually haveincreased levels of serum copper. However, it is acceptedthat the elevation of serum copper is part of the body’s biolog-ical response to the cancer, rather than its cause [62]. Takinginto account that most of the reported proteins were not pre-viously known to have copper-binding ability, such a strongassociation of reported list to cancer studies suggests a morecomplex role of copper in cancer development.

4.2 Individual proteinThe distribution of the number of times each protein wasreported in different studies follows power law distribution.This actually complies with the fact that many biological net-works are scale-free [63], that is, degree distribution approxi-mates a power law. The set of top proteins in terms of thenumber of times they were reported in different studies

consist of 53 proteins. Each protein was reported at least by70 different proteomics studies. We analyzed the functionalcontext of these 53 proteins. Several GO terms, suchas ‘anti-apoptosis’ and ‘regulation of apoptosis’ are over-represented (p value <0.05, estimated by Monte Carlo simu-lation procedure). This probably indicates that a largeproportion of collected protein lists are related to proteinsfound to be differentially expressed between various cellcancerous phenotypes.

5. Expert opinion: mining protein lists fordrug discovery

Information from protein lists reported by proteomics studiescan be of great value for drug discovery projects in variouscontexts. Here, we review several possible applications.

5.1 Selecting a proper protein targetSelection of the proper protein targets is one of the key pointsin the overall success of the modern drug discovery projects.This is usually a starting point and selection of the wrong tar-get can lead to the failure from the beginning. In this case,knowledge of expression profile for selected proteins can beof high value. Attention should be paid not only to proteinlists with differential expression between normal/disease cellconditions but also to the overall expression profile of the pro-tein across different cell types and compartment specificexpression. If the protein is on the many lists of differentiallyexpressed proteins, then this is certainly a good indication thata potential drug which targets this protein would have a widespectrum of applications. On the other hand, if the protein isexpressed in many tissues then this definitely increases theprobability of adverse drug side effects, or the drug wouldbe simply toxic.

Certainly this is a very simplified schema. One should takeinto account many factors including the functional role of theprotein. To get a better understanding of the protein impor-tance, one should analyze it in the context of other proteinsfrom the corresponding list. In this case, a network-based analysis of local properties of a given protein listmight be of great support for conclusions.

5.2 Mining novel therapeutic implications of known

drugsOne of the promising directions in pharma is the identificationof novel uses for old drugs based on novel, previously unknownmechanisms of action. For example, several established drugs fortreating major mental illnesses have been found to possess newtherapeutic uses. Inhibitors for monoamine oxidase (MAOinhibitors) can be used to alleviate the symptoms associatedwith Parkinson’s disease in addition to their traditional thera-peutic actions for treating mental disorders. The neuroprotec-tive effects of MAO inhibitors through the disruption of themovement of glyceraldehyde-3-phosphate dehydrogenase

Antonov

Expert Opin. Drug Discov. (2010) 5(4) 327

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.

Page 6: Mining protein lists from proteomics studies: applications for drug discovery

(GAPDH) into the cell nucleus is one innovative area investi-gated because GAPDH has recently been confirmed to be acell death mediator by translocation into the nucleus andtriggering cell death pathways.For many drugs available on the market, the full list of

affected targets is still under question. The precise molecularmechanism of many diseases still remains unclear. Proteomicshas been used intensively to solve both problems. For a varietyof cell disorders, comparative proteomics profiling has deliveredlists of proteins at differential states (Section 2.3). For a numberof known or developing drugs, a list of proteins which eitherrespond to drug treatment (Section 2.4) or are direct targets(Section 2.5) have been identified. This type of informationcan be mined to identify new potential therapeutic implicationsof either known drugs or drugs under development.For example, camptothecins are known to target the intra-

nuclear enzyme topoisomerase and is used intensively for can-cer chemotherapy. In [38], the potential anti-human ovariancancer effects of NSC606985, a novel and rarely studiedcamptothecin analogue, were investigated. Acute myeloid leu-kemic cells were treated and about 90 deregulated proteins inNSC606985-induced apoptotic NB4 cells were reported.Although the reported proteins are not necessarily direct tar-gets for NSC606985, they are affected by treatment via inter-nal cell molecular mechanisms. If a significant percent ofthem were reported to be deregulated in some other diseases,then it might be a hint for possible novel therapeuticimplications of NSC606985.Searching through PLIPS database results in about

300 previously reported protein lists which share a significant

number of common proteins with 90 proteins deregulated bytreatment with NSC606985. About 20 proteins were previ-ously reported as being expressed in breast cancer cells (a cat-alogue of 162 proteins was reported). Such strong similaritysuggests breast cancer as a potential therapeutic implicationof NSC606985. Renal cell carcinoma (RCC) can be anotherpossible therapeutic implication of NSC606985. A list of33 potential RCC biomarkers identified by several proteomicsstudies was compiled and reported in [64]. Six proteins are alsoderegulated by NSC606985.

Several cancer unrelated applications can be hypothesized.Two protein lists reported independently in [65] and [66] inthe context of cystic fibrosis lung disease have statisticallysignificant similarity to proteins affected by NSC606985.In both cases, proteins that are expressed in cystic fibrosislung epithelial cells have 20 common proteins with theNSC606985 list.

The same strategy can be applied vice versa; given a list ofproteins known to be differentially expressed between normaland disease tissues, one can try to identify significantly similarprotein lists reported either to be deregulated on drug treat-ment or identified by chemical proteomics as direct drug tar-gets. If such a list would be found, then this can be a goodindication of potential usability of a given drug to agiven disease.

Declaration of interest

The author states no conflict of interest and has received nopayment in preparation of this manuscript.

Mining protein lists from proteomics studies: applications for drug discovery

328 Expert Opin. Drug Discov. (2010) 5(4)

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.

Page 7: Mining protein lists from proteomics studies: applications for drug discovery

Bibliography

1. Cox J, Mann M. Is proteomics the new

genomics? Cell 2007;130:395-8

2. Rifai N, Gillette MA, Carr SA.

Protein biomarker discovery and

validation: the long and uncertain path

to clinical utility. Nat Biotechnol

2006;24:971-83

3. Findeisen P, Neumaier M. Mass

spectrometry-based clinical proteomics

profiling: current status and future

directions. Expert Rev Proteomics

2009;6:457-9

4. Findeisen P, Neumaier M. Mass

spectrometry based proteomics

profiling as diagnostic tool in

oncology: current status and future

perspective. Clin Chem Lab Med

2009;47:666-84

5. Colantonio DA, Chan DW.

The clinical application of

proteomics. Clin Chim Acta

2005;357:151-8

6. Perroud B, Lee J, Valkova N, et al.

Pathway analysis of kidney cancer using

proteomics and metabolic profiling.

Mol Cancer 2006;5:64

7. Craig R, Cortens JP, Beavis RC. Open

source system for analyzing, validating,

and storing protein identification data 3.

J Proteome Res 2004;3:1234-42

8. Desiere F, Deutsch EW, King NL, et al.

The PeptideAtlas project 2.

Nucleic Acids Res 2006;34:D655-8

9. Jones P, Cote RG, Martens L, et al.

PRIDE: a public repository of

protein and peptide identifications

for the proteomics

community 8. Nucleic Acids Res

2006;34:D659-63

10. Jones P, Cote RG, Cho SY, et al.

PRIDE: new developments and new

datasets 5. Nucleic Acids Res

2008;36:D878-83

11. Antonov AV, Dietmann S, Wong P,

et al. PLIPS, an automatically collected

database of protein lists reported by

proteomics studies. J Proteome Res

2009;8:1193-7

12. Gronborg M, Bunkenborg J,

Kristiansen TZ, et al. Comprehensive

proteomic analysis of human

pancreatic juice. J Proteome Res

2004;3:1042-55

13. Mueller M, Martens L, Reidegeld KA,

et al. Functional annotation of proteins

identified in human brain during

the HUPO Brain Proteome Project pilot

study. Proteomics 2006;6:5059-75

14. Zheng J, Gao X, Beretta L, et al. The

Human Liver Proteome Project (HLPP)

workshop during the 4th HUPO

World Congress. Proteomics

2006;6:1716-18

15. Siqueira WL, Zhang W, Helmerhorst EJ,

et al. Identification of protein

components in in vivo human acquired

enamel pellicle using LC-ESI-MS/MS.

J Proteome Res 2007;6:2152-60

16. Patel N, Solanki E, Picciani R, et al.

Strategies to recover proteins from ocular

tissues for proteomics. Proteomics

2008;8:1055-70

17. Rietschel B, Bornemann S, Arrey TN,

et al. Membrane protein analysis

using an improved peptic

in-solution digestion protocol.

Proteomics 2009;9:5553-7

18. Jeong JA, Ko KM, Park HS, et al.

Membrane proteomic analysis of

human mesenchymal stromal cells

during adipogenesis. Proteomics

2007;7:4181-91

19. Li N, Shaw AR, Zhang N, et al. Lipid

raft proteomics: analysis of

in-solution digest of sodium dodecyl

sulfate-solubilized lipid raft proteins by

liquid chromatography-matrix-assisted

laser desorption/ionization tandem mass

spectrometry. Proteomics

2004;4:3156-66

20. Zhang N, Shaw AR, Li N, et al. Liquid

chromatography electrospray ionization

and matrix-assisted laser desorption

ionization tandem mass spectrometry for

the analysis of lipid raft proteome of

monocytes. Anal Chim Acta

2008;627:82-90

21. Wang J, Gutierrez P, Edwards N, et al.

Integration of 18O labeling and solution

isoelectric focusing in a shotgun analysis

of mitochondrial proteins.

J Proteome Res 2007;6:4601-7

22. Song H, Sokolov M. Analysis of protein

expression and compartmentalization in

retinal neurons using serial tangential

sectioning of the retina. J Proteome Res

2009;8:346-51

23. An Y, Fu Z, Gutierrez P, et al. Solution

isoelectric focusing for peptide analysis:

comparative investigation of an insoluble

nuclear protein fraction. J Proteome Res

2005;4:2126-32

24. Okamura N, Masuda T, Gotoh A, et al.

Quantitative proteomic analysis to

discover potential diagnostic markers

and therapeutic targets in human renal

cell carcinoma. Proteomics

2008;8:3194-203

25. Bianchi L, Canton C, Bini L, et al.

Protein profile changes in the human

breast cancer cell line MCF-7 in response

to SEL1L gene induction. Proteomics

2005;5:2433-42

26. Morita A, Miyagi E, Yasumitsu H, et al.

Proteomic search for potential

diagnostic markers and therapeutic

targets for ovarian clear cell

adenocarcinoma. Proteomics

2006;6:5880-90

27. Meuwis MA, Fillet M, Chapelle JP, et al.

New biomarkers of Crohn’s disease:

serum biomarkers and development of

diagnostic tools. Expert Rev Mol Diagn

2008;8:327-37

28. Chen YR, Juan HF, Huang HC, et al.

Quantitative proteomic and genomic

profiling reveals metastasis-related

protein expression patterns in gastric

cancer cells. J Proteome Res

2006;5:2727-42

29. Li LS, Kim H, Rhee H, et al. Proteomic

analysis distinguishes basaloid carcinoma

as a distinct subtype of nonsmall cell

lung carcinoma. Proteomics

2004;4:3394-400

30. Kikuta K, Gotoh M, Kanda T, et al.

Pfetin as a prognostic biomarker in

gastrointestinal stromal tumor: novel

monoclonal antibody and external

validation study in multiple clinical

facilities. Jpn J Clin Oncol

2010;40:60-72

31. Gromov P, Gromova I, Bunkenborg J,

et al. Up-regulated proteins in the

fluid bathing the tumour cell

microenvironment as potential

serological markers for early

detection of cancer of the breast.

Mol Oncol 2009;4(1):65-89

32. Moreira JM, Ohlsson G, Gromov P,

et al. Bladder cancer associated protein: a

potential prognostic biomarker in human

bladder cancer. Mol Cell Proteomics

2009;9(1):161-77

Antonov

Expert Opin. Drug Discov. (2010) 5(4) 329

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.

Page 8: Mining protein lists from proteomics studies: applications for drug discovery

33. Ohlsson G, Moreira JM, Gromov P,

et al. Loss of expression of the

adipocyte-type fatty acid-binding

protein (A-FABP) is associated with

progression of human urothelial

carcinomas. Mol Cell Proteomics

2005;4:570-81

34. Meuwis MA, Fillet M, Lutteri L, et al.

Proteomics for prediction and

characterization of response to infliximab

in Crohn’s disease: a pilot study.

Clin Biochem 2008;41:960-7

35. Fillet M, Cren-Olive C, Renert AF, et al.

Differential expression of proteins in

response to ceramide-mediated stress

signal in colon cancer cells by 2-D gel

electrophoresis and MALDI-TOF-MS.

J Proteome Res 2005;4:870-80

36. Short DM, Heron ID,

Birse-Archbold JL, et al. Apoptosis

induced by staurosporine alters

chaperone and endoplasmic reticulum

proteins: identification by quantitative

proteomics. Proteomics 2007;7:3085-96

37. Li Z, Kreutzer M, Mikkat S, et al.

Proteomic analysis of the E2F1 response

in p53-negative cancer cells: new

aspects in the regulation of cell

survival and death. Proteomics

2006;6:5735-45

38. Yu Y, Wang LS, Shen SM, et al.

Subcellular proteome analysis of

camptothecin analogue

NSC606985-treated acute myeloid

leukemic cells. J Proteome Res

2007;6:3808-18

39. Zhao J, Zhu K, Lubman DM, et al.

Proteomic analysis of estrogen response

of premalignant human breast cells using

a 2-D liquid separation/mass mapping

technique. Proteomics 2006;6:3847-61

40. Amanchy R, Kalume DE, Iwahori A,

et al. Phosphoproteome analysis of

HeLa cells using stable isotope

labeling with amino acids in cell

culture (SILAC). J Proteome Res

2005;4:1661-71

41. Amanchy R, Kalume DE, Pandey A.

Stable isotope labeling with amino acids

in cell culture (SILAC) for studying

dynamics of protein abundance and

posttranslational modifications.

Sci STKE 2005;2005:l2

42. Lefievre L, Chen Y, Conner SJ, et al.

Human spermatozoa contain multiple

targets for protein S-nitrosylation: an

alternative mechanism of the modulation

of sperm function by nitric oxide?

Proteomics 2007;7:3066-84

43. Fernbach NV, Planyavsky M, Muller A,

et al. Acid elution and

one-dimensional shotgun analysis on an

Orbitrap mass spectrometer: an

application to drug affinity

chromatography. J Proteome Res

2009;8:4753-65

44. Rix U, Hantschel O, Durnberger G,

et al. Chemical proteomic profiles of the

BCR-ABL inhibitors imatinib, nilotinib,

and dasatinib reveal novel kinase and

nonkinase targets. Blood

2007;110:4055-63

45. Rix U, Remsing Rix LL, Terker AS,

et al. A comprehensive target selectivity

survey of the BCR-ABL kinase inhibitor

INNO-406 by kinase profiling and

chemical proteomics in chronic myeloid

leukemia cells. Leukemia 2009;1:44-50

46. Wong JW, McRedmond JP, Cagney G.

Activity profiling of platelets by

chemical proteomics. Proteomics

2009;9:40-50

47. Burckstummer T, Bennett KL,

Preradovic A, et al. An efficient

tandem affinity purification

procedure for interaction proteomics

in mammalian cells. Nat Methods

2006;3:1013-19

48. Ashburner M, Ball CA, Blake JA, et al.

Gene ontology: tool for the

unification of biology. The Gene

Ontology Consortium. Nat Genet

2000;25:25-9

49. Kanehisa M, Goto S, Hattori M, et al.

From genomics to chemical genomics:

new developments in KEGG.

Nucleic Acids Res 2006;34:D354-7

50. Vastrik I, D’Eustachio P, Schmidt E,

et al. Reactome: a knowledge base of

biologic pathways and processes.

Genome Biol 2007;8:R39

51. Aranda B, Achuthan P, am-Faruque Y,

et al. The intact molecular interaction

database in 2010. Nucleic Acids Res

2010;38:D525-31

52. Antonov AV, Schmidt T, Wang Y, et al.

ProfCom: a web tool for profiling

the complex functionality of gene

groups identified from

high-throughput data. Nucleic Acids

Res 2008;36:W347-51

53. Khatri P, Draghici S. Ontological

analysis of gene expression data: current

tools, limitations, and open problems.

Bioinformatics 2005;21:3587-95

54. Khatri P, Sellamuthu S, Malhotra P,

et al. Recent additions and improvements

to the Onto-Tools. Nucleic Acids Res

2005;33:W762-5

55. Antonov AV, Dietmann S, Wong P,

et al. GeneSet2miRNA: finding the

signature of cooperative miRNA activities

in the gene lists. Nucleic Acids Res

2009;37:W323-8

56. Antonov AV, Dietmann S,

Rodchenkov I, et al. PPI spider: a tool

for the interpretation of proteomics

data in the context of

protein-protein interaction networks.

Proteomics 2009;9:2740-9

57. Martin B, Sanz R, Aragues R, et al.

Functional clustering of metastasis

proteins describes plastic adaptation

resources of breast-cancer cells to new

microenvironments. J Proteome Res

2008;7:3242-53

58. Martin B, Aragues R, Sanz R, et al.

Biological pathways contributing to

organ-specific phenotype of brain

metastatic cells. J Proteome Res

2008;7:908-20

59. Tu LC, Yan X, Hood L, et al.

Proteomics analysis of the interactome

of N-myc downstream regulated

gene 1 and its interactions with the

androgen response program in prostate

cancer cells. Mol Cell Proteomics

2007;6:575-88

60. Antonov AV, Dietmann S, Mewes HW.

KEGG spider: interpretation of genomics

data in the context of the global gene

metabolic network. Genome Biol

2008;9:R179

61. Smith SD, She YM, Roberts EA, et al.

Using immobilized metal affinity

chromatography,

two-dimensional electrophoresis

and mass spectrometry to identify

hepatocellular proteins with

copper-binding ability. J Proteome Res

2004;3:834-40

62. Inutsuka S, Araki S, Kusaba I, et al.

Copper and zinc content of the blood of

patients with malignant tumors

(especially on Cu-Zn ratio).

Rinsho Byori 1973;21:632-6

63. Barabasi AL, Oltvai ZN. Network

biology: understanding the cell’s

functional organization. Nat Rev Genet

2004;5:101-13

Mining protein lists from proteomics studies: applications for drug discovery

330 Expert Opin. Drug Discov. (2010) 5(4)

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.

Page 9: Mining protein lists from proteomics studies: applications for drug discovery

64. Seliger B, Dressler SP, Lichtenfels R,

et al. Candidate biomarkers in

renal cell carcinoma. Proteomics

2007;7:4601-12

65. Roxo-Rosa M, da CG, Luider TM, et al.

Proteomic analysis of nasal cells

from cystic fibrosis patients and

non-cystic fibrosis control individuals:

search for novel biomarkers of cystic

fibrosis lung disease. Proteomics

2006;6:2314-25

66. Pollard HB, Ji XD, Jozwik C, et al.

High abundance protein profiling of

cystic fibrosis lung epithelial cells.

Proteomics 2005;5:2210-26

AffiliationAlexey V Antonov PhD

Senior Scientist,

Institute for Bioinformatics and Systems Biology,

Helmholtz Zentrum Munchen -- German

Research Center for Environmental Health

(GmbH),

Ingolstadter Landstra�e 1,

D-85764, Neuherberg,

Germany

Tel: +49 89 3187 2788;

Fax: +49 89 3187 3585;

E-mail: [email protected]

Antonov

Expert Opin. Drug Discov. (2010) 5(4) 331

Exp

ert O

pin.

Dru

g D

isco

v. D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y M

cMas

ter

Uni

vers

ity o

n 11

/21/

14Fo

r pe

rson

al u

se o

nly.


Recommended