1. Introduction
2. General overview of protein
lists reported by proteomics
studies
3. Local properties of reported
protein lists
4. Global properties of reported
protein lists
5. Expert opinion: mining protein
lists for drug discovery
Review
Mining protein lists fromproteomics studies: applicationsfor drug discoveryAlexey V AntonovInstitute for Bioinformatics and Systems Biology, Helmholtz Zentrum Munchen -- German Research
Center for Environmental Health (GmbH), Neuherberg, Germany
Importance of the field: In recent years, proteomics has become a common
technique applied to a wide spectrum of scientific problems, including the
identification of diagnostic biomarkers, monitoring the effects of drug
treatments or identification of chemical properties of a protein or a drug.
Although being significantly different in scientific essence, the ultimate
result of the majority of proteomics studies is a protein list. Thousands of
independent proteomics studies have reported protein lists in various
functional contexts.
Areas covered in this review: We review here the spectrum of scientific prob-
lems where proteomics technology was applied recently to deliver protein
lists. The available bioinformatics methods commonly used to understand
the properties of the protein lists are compared.
What the reader will gain: The types and common functional properties of
the reported protein lists are discussed. The range of scientific problems
where this knowledge could be potentially helpful with a focus on drug
discovery issues is explored.
Take home message: Reported protein lists represent a valuable resource
which can be used for a variety of goals, ranging from biomarkers discovery
to identification of novel therapeutic implications of known drugs.
Keywords: databases, gene list, PLIPS, protein list, proteomics, text mining
Expert Opin. Drug Discov. (2010) 5(4):323-331
1. Introduction
In recent years, several proteomic platforms have been developed and applied suc-cessfully to help understand the cell proteome [1]. The term proteomics itself hasbeen constantly expanding acquiring novel meaning. Although the approach ini-tially was commonly used to characterize all proteins within a given cell, at themoment, many researchers are also taking advantage of proteomic technology toreveal changes in the concentration of proteins between different cell physiologicalconditions [2-4]. Another direction commonly referred to as proteomics is ‘sub-proteomic’ [5]. In this case, a subset of proteins, sharing specific characteristics, isisolated from a complex mixture of proteins. The sub-proteomic approach relieson enrichment techniques for isolation of proteins with similar characteristics orbiophysical and chemical properties (e.g., isoelectric point, molecular mass,cellular compartment).
The spectrum of problems covered by proteomics studies continues to expandrapidly. Proteomic analyses have recently been conducted on tissues, biofluids andsubcellular components in both animal models and humans [1]. Various clinicalapplications of proteomics, including the identification of prognostic and earlierdiagnostic markers and monitoring the effects of drug treatments, are of particularinterest [5]. In addition, proteomics is frequently combined with other genomics
10.1517/17460441003716796 © 2010 Informa UK Ltd ISSN 1746-0441 323All rights reserved: reproduction in whole or in part not permitted
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.
and/or metabolomic technologies to profile molecular cellmechanisms at a systems biology level [6].There are several databases for the purpose of capturing and
disseminating proteomics data [7,8]. For example, the PRIDEdatabase [9,10] has been developed to provide a standards-compliant repository for mass spectrometry-based proteomicsdata comprising identifications of proteins, peptides and post-translational modifications. Additionally, public repositoriescollect a lot of valuable supplemental information aboutexperimental set up and technical parameters.Independent of the primary target of the proteomics study
or proteomics technology used, in most cases, experimentalpart delivers a list of proteins found to be expressed (or differ-entially expressed) in the context of studied biological phe-nomena. For example, we identified >400 papers publishedin the last 5 years in the ‘Proteomics’ journal that report oneor several lists of proteins (in total, ‘Proteomics’ publishedabout 2000 papers for this period). Being publicly available,this information at the same time was disseminated in hun-dreds of papers. Recently, we developed a Web mining tool,which collected this information. By searching through fulltext papers, it automatically selects tables with a list of proteinidentifiers. This information was compiled into the PLIPSdatabase [11]. Currently, the database covers about 1500different protein lists which have been reported by ~ 1200independent proteomics studies.We review here the general and individual properties of
reported protein lists. We also show the ways the PLIPS data-base can be utilized to deliver novel hypotheses in variousclinical contexts. The spectra of potential applications include
typical biomarker discovery projects and the search for noveltherapeutic implications of known or developing drugs.
2. General overview of protein lists reportedby proteomics studies
Although the spectrum of reported protein lists covers a vari-ety of different biological, clinical and chemical issues, mostof the proteomics studies can be grouped into a relativelysmall number of classes. According to the biological essenceof the studied phenomena, we can generally split reportedprotein lists into the following five classes:
• proteins specifically expressed in a tissue• proteins specifically expressed in a cell compartment• proteins differentially expressed between different celltypes
• proteins differentially expressed between treated/untreated cells
• proteins with a common chemical property.
Of course, there are a number of specific cases when it is hardto assign the study to a particular class as well as many cases inwhich the study can often be attributed to several classes.Next, we give a brief overview of each class.
2.1 Proteins expressed in a specific cell typeA number of projects have been launched recently to gener-ate a catalogue of proteins specifically expressed in differentnormal human tissues [12]. Therefore, a considerable share ofprotein lists reported recently represents this type of proteo-mics studies. For example, the HUPO Proteome Projects areinitiatives coordinating proteomics studies to characterizehuman proteomes of different tissues [13,14]. For example, anon-redundant set of 1804 proteins was identified in humanbrain samples [13]. At the moment, most of human tissueshave been profiled, including such special ones as ‘enamelpellicle’ [15].
Some of these projects were focused mainly on technologi-cal issues of protein extraction and/or purification. A remark-able example is the study in [16], where the results of proteinextraction from different ocular regions using different deter-gents were compared. It was demonstrated that the extractionstrategy may affect the final outcome in protein profiling bymass spectrometry (MS) or by other methods.
Although being commonly considered to be of less impor-tance for drug discovery projects, this class of retrieved proteinlists can be also of great value. As we demonstrate further, theyshould be also accounted for to reduce the risk of potentialadverse drug effect.
2.2 Proteins specifically expressed in a cell
compartmentThe next issue which was abundantly addressed byproteomics studies is understanding the distribution of
Article highlights.
• Proteomics has become a common technique applied toa wide spectrum of scientific problems. Although beingsignificantly different in scientific essence, the ultimateresult of the majority of proteomics studies is a proteinlist.
• Thousands of independent proteomics studies havereported protein lists in various functional contexts.
• The PLIPS database is a collection of proteomics paperswhich reported a protein list.
• According to the biological essence of the studiedphenomena, the reported protein lists can be split intofive classes.
• Analyses of global properties of reported protein listsindicate that most reported protein lists are highlydependent. On an average, each list shares a significantsubset of proteins with >20 other protein lists.
• Significant similarities between protein lists can beindicative of similarity in molecular mechanismsbetween corresponding phenomena.
• Information from PLIPS can be of great value for drugdiscovery projects in various contexts: i) to select aproper protein target and ii) to identify new therapeuticimplications of novel and known drugs.
This box summarizes key points contained in the article.
Mining protein lists from proteomics studies: applications for drug discovery
324 Expert Opin. Drug Discov. (2010) 5(4)
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.
protein expression across cell compartments. In comparison tothe previous class, additional steps in sample preparationmay be required to separate proteins from different cellcompartments.
Proteins embedded into the plasma membrane largely definethe cell functionality. There were a number of proteomics stud-ies which delivered plasma membrane-containing fractions ofproteins for different cell types [17,18]. Also, specific sub-membrane fractions, such as lipid rafts, were reported [19,20].Lipid rafts are glycolipid- and cholesterol-enriched membranemicrodomains implicated in membrane signaling and traffick-ing. The mitochondria is also an intensively explored cell com-partment [21] as well as several other compartments [22,23].The knowledge of compartment specific cell proteome can beof use for a variety of scientific issues.
2.3 Proteins differentially expressed between
different cell typesDifferent factors can significantly affect the landscape of thecell proteome. The next logical step forward would be a con-struction of protein catalogues not only for normal tissues orcell compartments but also for tissues at abnormal conditions.The comparative analyses of the proteomes of normal cellsversus abnormal cells often lead to the inference of differen-tially expressed proteins. Numerous examples of proteomicsstudies covering different clinical issues and delivering, as anoutput, the lists of differentially expressed proteins are avail-able at the moment. Most of them were aimed to discoverbiomarkers for early disease detection [24-27], for stratificationdisease into distinct subtypes [28-30] and/or for monitoringdisease progression [31-33].
Different types of cancer are regular targets of proteomicsstudies which deliver lists of up- or downregulated pro-teins [24,25,32]. The discovery of suitable biomarkers for earlydetection promises significant improvements in clinical out-comes for cancer patients. However, despite the recent prog-ress in proteomics technologies, one should be cautious ofinterpretations. This is partly due to the inheritedcomplexity of cancer where almost every case of disease is par-tially unique on the molecular level. The second reason relatesto the inherent biases of the whole technological chain, frompreparation of biospecimens to protein detection by MS.
2.4 Proteins differentially expressed between
treated/untreated cellsSystematic investigation of the mechanisms of drug action rep-resents a large share of proteomics studies whose primary out-put is a protein list. Proteomics can be considered as a verysuitable technique to quantify cell response to exposure to adrug [34,35] or to the other clinically related environmentalconditions [36,37]. The discovered protein lists can shed newinsights for a better understanding of the mechanisms of drugaction, such as induction of apoptosis or activation of otherdisease-related signaling or regulatory/metabolic pathways.
A large number of small molecules with a wide spectrum ofproposed mechanism of action have been explored [34-36,38]. Inaddition, cell response to silencing or overexpression of somepotential targets for anticancer therapy (transcription factors,phosphorylation kinases) that regulates cell-cycle progressionor apoptosis was quantified [39]. The inferred protein lists inthese studies can be used for a variety of purposes in drugdiscovery projects.
2.5 Proteins with a common chemical propertyA number of proteomics projects were devoted to deliverproteins with common chemical properties [40-46]. A signifi-cant share of such studies is devoted to identification of pro-teins subjected to post-translational modifications. Forexample, proteomics was used intensively to study proteinphosphorylation in the cell [40,41]. Reversible phosphoryla-tion of proteins is a key mechanism for control of signaltransduction. Phosphorylation of proteins is known toregulate enzymatic activity, subcellular localization,protein--protein interaction (PPI) and degradation of pro-teins. In [40], 118 tyrosine phosphorylated proteins wereidentified by coupling stable isotope labeling with aminoacids in cell culture to mass spectrometry.
The objective of another study [42] was to identify targetsfor S-nitrosylation in human sperm. Spermatozoa were incu-bated with nitric oxide donors and S-nitrosylated proteinswere identified using the biotin switch assay and a proteomicapproach using tandem mass spectrometry (MS/MS). In total,240 S-nitrosylated proteins were detected in sperm incubatedwith S-nitroso-glutathione.
Another very promising direction is the application of prote-omics to identify the whole genome-binding spectra for a givenmolecular probe [47]. It is frequently referred to as chemical pro-teomics or activity-based proteomics. Molecular probes areused to target a selective group of functionally-related proteins.An affinity chromatography protocol is used to selectproteins with binding potential. In the next step, the MS/MStechnique is applied to identify the recovered proteins.
Chemical proteomics was used in [46] to identify thenucleotide-binding proteome of active and resting platelets.Affinity chromatography protocol using immobilized adeno-sine triphosphate, cyclic adenosine monophosphate andcyclic guanosine monophosphate was used. Several plateletproteins that show statistically significant difference betweenthe active and resting nucleotide-binding proteome werereported.
This type of proteomics study was used to identify thewhole genome-binding proteome for several available drugsor drugs in the development phase [43-45]. For example, thestudy in [43] reported protein targets of bosutonib, a promis-cuous kinase inhibitor. Bosutinib (SKI-606) is an ATP-competitive third generation kinase inhibitor for the treat-ment of chronic myeloid leukemia (CML) and is currentlyunder clinical trials for the treatment of CML (Phase III)and breast cancer (Phase II).
Antonov
Expert Opin. Drug Discov. (2010) 5(4) 325
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.
3. Local properties of reported protein lists
A common challenge faced by experimental researchers is tounderstand the role of the discovered proteins in a biologicalcontext. At the moment, a vast amount of knowledge is accu-mulated in relation to individual protein function, biologicalrole and chemical properties [48]. Several public databases areavailable [49,50] which store information about protein interac-tions in the context of regulatory, signaling or metabolicpathways. A lot of experimental data is available for binaryPPIs and the composition of protein complexes [51]. All thisinformation was used previously to establish local functionalproperties of inferred protein lists.
3.1 Enrichment analysisA widely accepted strategy is to infer biological processes thatare statistically overrepresented in the inferred list of proteins.The Gene Ontology (GO) database or the Kyoto Encyclope-dia of Genes and Genomes (KEGG) were commonly used asreference knowledge. The number of proteins (from the pro-tein list) found in each GO category (or KEGG pathway) iscompared with the number of genes expected to be found inthe given category by chance. If the observed and expectednumbers are substantially different, the category is reportedas enriched [52-55].As pointed out in [56], based on statistics from ~ 300 protein
lists reported recently in the Proteomics journal, enrichmentanalysis usually is inefficient when the size of analyzed proteinlist is small (<30). In the majority of such cases (>70%), nota single GO term was found to be significantly enriched.When the size of reported protein list is >50, it was shownthat in most cases (>80%) enrichment analysis was able to pro-vide some insights into the functional context, for example, atleast one GO term related to biological processes was signifi-cantly overrepresented. However, the coverage of the largestmodel was <15% of the size of the reported protein list. Thismeans that for the protein list of a size about 50, in the bestcase only 8 proteins would be classified by the same biologicalprocess; the role of the other 40 proteins would be unclear. Pro-tein lists derived from proteomics studies, which usuallyaddress specific biological questions (Sections 2.3, 2.4, 2.5)are frequently of small size (10 -- 50 proteins) in comparisonto genome-wide proteomics studies (Section 2.1). Therefore,the usage of enrichment analysis for understanding of the localproperties of a given list of differentially expressed proteins orproteins with a common chemical property is expected to bevery inefficient.
3.2 PPI network and protein listsAlthough the knowledge of overrepresented GO termsor KEGG pathways is helpful, it resolves only partially themolecular mechanism relevant to the explored list ofproteins. PPI data represent abundant information that isoften used for the interpretation of proteomics studies.From many perspectives, this information is more suitable
as it is expected that the identified proteins must beinvolved in cooperative activities.
Several experimental proteomics studies [57-59] use PPI net-works to interpret inferred protein lists. By mapping proteinsto PPI network and using visualization capacities, the authorsusually can demonstrate that the identified proteins lie closelyon the PPI network. However, visual analysis of graphicalrepresentation of interacting proteins gives only an intuitivefeeling that discovered proteins are related. Taking intoaccount the density of the PPI networks, one must not under-estimate the value of the statistical treatment. Even for a ran-domly generated protein list, it is possible to connect manyproteins from the list into a sub-network via one or two inter-mediate partners (proteins that are not on the experimentallydiscovered list).
As shown in [56], a network-based interpretation of a pro-tein list in the context of PPI networks can be very efficient.It is almost independent of the size of protein list and providesgood coverage (about 75% of the input protein list) for bothsmall and large size reported lists. In addition, a global view ofprotein relations is provided which is not limited to the size ofan individual pathway or GO category. The graphical repre-sentation of the inferred network models is amenable tofurther analyses.
According to an estimate based on about 1500 protein listsstored in PLIPS [11], >80% of recently reported protein listshave statistically significant share of proteins that form non-interrupted sub-network of interacting (according toavailable PPI data) protein pairs.
3.3 The metabolism component in disease specific
protein listsBased on our personal experience, in comparison to genelists reported by genomics studies (http://mips.helmholtz-muenchen.de/proj/ccancer/), protein lists reported by proteo-mics studies [11] contain much more frequently large numbersof metabolism-related proteins. Analysis of 16 disease-specificprotein lists (proteins differentially expressed between normal/disease cells or differentially expressed proteins in responseto treatment) can be found in [60]. In many cases, deregulatedmetabolism-related proteins were from several canonicalKEGG pathways. At the same time, the proteins can be orga-nized into a non-interrupted (a maximum of one or twonodes are missing) disease-specific network that runs throughseveral canonical pathways. The results support a hypothesisthat disease-specific metabolism proteins in most cases arenot functionally independent. Deregulated proteins fromdifferent pathways are linked to each other via consecutiveone- or two-step metabolic reactions.
4. Global properties of reported protein lists
To understand common global properties of reported lists weused PLIPS. The PLIPS database is a collection of proteomicspapers which reported a protein list. PLIPS covers papers
Mining protein lists from proteomics studies: applications for drug discovery
326 Expert Opin. Drug Discov. (2010) 5(4)
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.
published during the last 7 years in major proteomic journals:Proteomics, Journal of Proteome Research, Molecular & Cel-lular Proteomics and Proteomics -- Clinical Applications.From each paper, the tables are extracted and protein/geneidentifiers are searched. Only those tables which contained>10 unique protein/gene identifiers of the same type(‘UniProt/Swiss-Prot’, ‘Gene Symbol’, ‘Ensembl’, ‘RefSeqProtein ID’, ‘RefSeq Transcript ID’) are selected. The col-lected protein/gene lists are systematically organized to allowvarious types of analyses.
4.1 Cross-reference analysis of reported protein listsAnalyses of global properties of reported protein lists indicatethat most reported protein lists are highly dependent. Onaverage, each list shares a significant subset of proteins with>20 other protein lists. Each protein list from the PLIPSwas examined versus all other identified protein lists to findcommonly shared proteins and to link together those thathave statistically significant similarity. This information isavailable at PLIPS and can be browsed online.
According to the widely accepted guilt-by-associationprinciple, significant similarities between protein lists canbe indicative of similarity in molecular mechanisms betweencorresponding phenomena. Indeed, browsing PLIPS (http://mips.helmholtz-muenchen.de/proj/plips/human/human.1.html)one can see that list reported in cancerous context are gene-rally associated with other cancer related proteomics studies.These relations are expected and can be used to identifysimilar molecular cancer phenotypes.
However, in some cases to some extent unexpected associa-tions can be found. For example, in [61] hepatic proteins withcopper-binding ability were isolated using a proteomics strat-egy. In total, 48 cytosolic proteins and 19 microsomal pro-teins displaying copper-binding ability were reported. About100 protein lists from PLIPS share a significant number ofcommon proteins with the 48 reported cytosolic proteins.Most of them were reported in a cancerous context. Therole of copper in the development of cancer is not completelyclear. It is known that the cancer patients usually haveincreased levels of serum copper. However, it is acceptedthat the elevation of serum copper is part of the body’s biolog-ical response to the cancer, rather than its cause [62]. Takinginto account that most of the reported proteins were not pre-viously known to have copper-binding ability, such a strongassociation of reported list to cancer studies suggests a morecomplex role of copper in cancer development.
4.2 Individual proteinThe distribution of the number of times each protein wasreported in different studies follows power law distribution.This actually complies with the fact that many biological net-works are scale-free [63], that is, degree distribution approxi-mates a power law. The set of top proteins in terms of thenumber of times they were reported in different studies
consist of 53 proteins. Each protein was reported at least by70 different proteomics studies. We analyzed the functionalcontext of these 53 proteins. Several GO terms, suchas ‘anti-apoptosis’ and ‘regulation of apoptosis’ are over-represented (p value <0.05, estimated by Monte Carlo simu-lation procedure). This probably indicates that a largeproportion of collected protein lists are related to proteinsfound to be differentially expressed between various cellcancerous phenotypes.
5. Expert opinion: mining protein lists fordrug discovery
Information from protein lists reported by proteomics studiescan be of great value for drug discovery projects in variouscontexts. Here, we review several possible applications.
5.1 Selecting a proper protein targetSelection of the proper protein targets is one of the key pointsin the overall success of the modern drug discovery projects.This is usually a starting point and selection of the wrong tar-get can lead to the failure from the beginning. In this case,knowledge of expression profile for selected proteins can beof high value. Attention should be paid not only to proteinlists with differential expression between normal/disease cellconditions but also to the overall expression profile of the pro-tein across different cell types and compartment specificexpression. If the protein is on the many lists of differentiallyexpressed proteins, then this is certainly a good indication thata potential drug which targets this protein would have a widespectrum of applications. On the other hand, if the protein isexpressed in many tissues then this definitely increases theprobability of adverse drug side effects, or the drug wouldbe simply toxic.
Certainly this is a very simplified schema. One should takeinto account many factors including the functional role of theprotein. To get a better understanding of the protein impor-tance, one should analyze it in the context of other proteinsfrom the corresponding list. In this case, a network-based analysis of local properties of a given protein listmight be of great support for conclusions.
5.2 Mining novel therapeutic implications of known
drugsOne of the promising directions in pharma is the identificationof novel uses for old drugs based on novel, previously unknownmechanisms of action. For example, several established drugs fortreating major mental illnesses have been found to possess newtherapeutic uses. Inhibitors for monoamine oxidase (MAOinhibitors) can be used to alleviate the symptoms associatedwith Parkinson’s disease in addition to their traditional thera-peutic actions for treating mental disorders. The neuroprotec-tive effects of MAO inhibitors through the disruption of themovement of glyceraldehyde-3-phosphate dehydrogenase
Antonov
Expert Opin. Drug Discov. (2010) 5(4) 327
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.
(GAPDH) into the cell nucleus is one innovative area investi-gated because GAPDH has recently been confirmed to be acell death mediator by translocation into the nucleus andtriggering cell death pathways.For many drugs available on the market, the full list of
affected targets is still under question. The precise molecularmechanism of many diseases still remains unclear. Proteomicshas been used intensively to solve both problems. For a varietyof cell disorders, comparative proteomics profiling has deliveredlists of proteins at differential states (Section 2.3). For a numberof known or developing drugs, a list of proteins which eitherrespond to drug treatment (Section 2.4) or are direct targets(Section 2.5) have been identified. This type of informationcan be mined to identify new potential therapeutic implicationsof either known drugs or drugs under development.For example, camptothecins are known to target the intra-
nuclear enzyme topoisomerase and is used intensively for can-cer chemotherapy. In [38], the potential anti-human ovariancancer effects of NSC606985, a novel and rarely studiedcamptothecin analogue, were investigated. Acute myeloid leu-kemic cells were treated and about 90 deregulated proteins inNSC606985-induced apoptotic NB4 cells were reported.Although the reported proteins are not necessarily direct tar-gets for NSC606985, they are affected by treatment via inter-nal cell molecular mechanisms. If a significant percent ofthem were reported to be deregulated in some other diseases,then it might be a hint for possible novel therapeuticimplications of NSC606985.Searching through PLIPS database results in about
300 previously reported protein lists which share a significant
number of common proteins with 90 proteins deregulated bytreatment with NSC606985. About 20 proteins were previ-ously reported as being expressed in breast cancer cells (a cat-alogue of 162 proteins was reported). Such strong similaritysuggests breast cancer as a potential therapeutic implicationof NSC606985. Renal cell carcinoma (RCC) can be anotherpossible therapeutic implication of NSC606985. A list of33 potential RCC biomarkers identified by several proteomicsstudies was compiled and reported in [64]. Six proteins are alsoderegulated by NSC606985.
Several cancer unrelated applications can be hypothesized.Two protein lists reported independently in [65] and [66] inthe context of cystic fibrosis lung disease have statisticallysignificant similarity to proteins affected by NSC606985.In both cases, proteins that are expressed in cystic fibrosislung epithelial cells have 20 common proteins with theNSC606985 list.
The same strategy can be applied vice versa; given a list ofproteins known to be differentially expressed between normaland disease tissues, one can try to identify significantly similarprotein lists reported either to be deregulated on drug treat-ment or identified by chemical proteomics as direct drug tar-gets. If such a list would be found, then this can be a goodindication of potential usability of a given drug to agiven disease.
Declaration of interest
The author states no conflict of interest and has received nopayment in preparation of this manuscript.
Mining protein lists from proteomics studies: applications for drug discovery
328 Expert Opin. Drug Discov. (2010) 5(4)
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.
Bibliography
1. Cox J, Mann M. Is proteomics the new
genomics? Cell 2007;130:395-8
2. Rifai N, Gillette MA, Carr SA.
Protein biomarker discovery and
validation: the long and uncertain path
to clinical utility. Nat Biotechnol
2006;24:971-83
3. Findeisen P, Neumaier M. Mass
spectrometry-based clinical proteomics
profiling: current status and future
directions. Expert Rev Proteomics
2009;6:457-9
4. Findeisen P, Neumaier M. Mass
spectrometry based proteomics
profiling as diagnostic tool in
oncology: current status and future
perspective. Clin Chem Lab Med
2009;47:666-84
5. Colantonio DA, Chan DW.
The clinical application of
proteomics. Clin Chim Acta
2005;357:151-8
6. Perroud B, Lee J, Valkova N, et al.
Pathway analysis of kidney cancer using
proteomics and metabolic profiling.
Mol Cancer 2006;5:64
7. Craig R, Cortens JP, Beavis RC. Open
source system for analyzing, validating,
and storing protein identification data 3.
J Proteome Res 2004;3:1234-42
8. Desiere F, Deutsch EW, King NL, et al.
The PeptideAtlas project 2.
Nucleic Acids Res 2006;34:D655-8
9. Jones P, Cote RG, Martens L, et al.
PRIDE: a public repository of
protein and peptide identifications
for the proteomics
community 8. Nucleic Acids Res
2006;34:D659-63
10. Jones P, Cote RG, Cho SY, et al.
PRIDE: new developments and new
datasets 5. Nucleic Acids Res
2008;36:D878-83
11. Antonov AV, Dietmann S, Wong P,
et al. PLIPS, an automatically collected
database of protein lists reported by
proteomics studies. J Proteome Res
2009;8:1193-7
12. Gronborg M, Bunkenborg J,
Kristiansen TZ, et al. Comprehensive
proteomic analysis of human
pancreatic juice. J Proteome Res
2004;3:1042-55
13. Mueller M, Martens L, Reidegeld KA,
et al. Functional annotation of proteins
identified in human brain during
the HUPO Brain Proteome Project pilot
study. Proteomics 2006;6:5059-75
14. Zheng J, Gao X, Beretta L, et al. The
Human Liver Proteome Project (HLPP)
workshop during the 4th HUPO
World Congress. Proteomics
2006;6:1716-18
15. Siqueira WL, Zhang W, Helmerhorst EJ,
et al. Identification of protein
components in in vivo human acquired
enamel pellicle using LC-ESI-MS/MS.
J Proteome Res 2007;6:2152-60
16. Patel N, Solanki E, Picciani R, et al.
Strategies to recover proteins from ocular
tissues for proteomics. Proteomics
2008;8:1055-70
17. Rietschel B, Bornemann S, Arrey TN,
et al. Membrane protein analysis
using an improved peptic
in-solution digestion protocol.
Proteomics 2009;9:5553-7
18. Jeong JA, Ko KM, Park HS, et al.
Membrane proteomic analysis of
human mesenchymal stromal cells
during adipogenesis. Proteomics
2007;7:4181-91
19. Li N, Shaw AR, Zhang N, et al. Lipid
raft proteomics: analysis of
in-solution digest of sodium dodecyl
sulfate-solubilized lipid raft proteins by
liquid chromatography-matrix-assisted
laser desorption/ionization tandem mass
spectrometry. Proteomics
2004;4:3156-66
20. Zhang N, Shaw AR, Li N, et al. Liquid
chromatography electrospray ionization
and matrix-assisted laser desorption
ionization tandem mass spectrometry for
the analysis of lipid raft proteome of
monocytes. Anal Chim Acta
2008;627:82-90
21. Wang J, Gutierrez P, Edwards N, et al.
Integration of 18O labeling and solution
isoelectric focusing in a shotgun analysis
of mitochondrial proteins.
J Proteome Res 2007;6:4601-7
22. Song H, Sokolov M. Analysis of protein
expression and compartmentalization in
retinal neurons using serial tangential
sectioning of the retina. J Proteome Res
2009;8:346-51
23. An Y, Fu Z, Gutierrez P, et al. Solution
isoelectric focusing for peptide analysis:
comparative investigation of an insoluble
nuclear protein fraction. J Proteome Res
2005;4:2126-32
24. Okamura N, Masuda T, Gotoh A, et al.
Quantitative proteomic analysis to
discover potential diagnostic markers
and therapeutic targets in human renal
cell carcinoma. Proteomics
2008;8:3194-203
25. Bianchi L, Canton C, Bini L, et al.
Protein profile changes in the human
breast cancer cell line MCF-7 in response
to SEL1L gene induction. Proteomics
2005;5:2433-42
26. Morita A, Miyagi E, Yasumitsu H, et al.
Proteomic search for potential
diagnostic markers and therapeutic
targets for ovarian clear cell
adenocarcinoma. Proteomics
2006;6:5880-90
27. Meuwis MA, Fillet M, Chapelle JP, et al.
New biomarkers of Crohn’s disease:
serum biomarkers and development of
diagnostic tools. Expert Rev Mol Diagn
2008;8:327-37
28. Chen YR, Juan HF, Huang HC, et al.
Quantitative proteomic and genomic
profiling reveals metastasis-related
protein expression patterns in gastric
cancer cells. J Proteome Res
2006;5:2727-42
29. Li LS, Kim H, Rhee H, et al. Proteomic
analysis distinguishes basaloid carcinoma
as a distinct subtype of nonsmall cell
lung carcinoma. Proteomics
2004;4:3394-400
30. Kikuta K, Gotoh M, Kanda T, et al.
Pfetin as a prognostic biomarker in
gastrointestinal stromal tumor: novel
monoclonal antibody and external
validation study in multiple clinical
facilities. Jpn J Clin Oncol
2010;40:60-72
31. Gromov P, Gromova I, Bunkenborg J,
et al. Up-regulated proteins in the
fluid bathing the tumour cell
microenvironment as potential
serological markers for early
detection of cancer of the breast.
Mol Oncol 2009;4(1):65-89
32. Moreira JM, Ohlsson G, Gromov P,
et al. Bladder cancer associated protein: a
potential prognostic biomarker in human
bladder cancer. Mol Cell Proteomics
2009;9(1):161-77
Antonov
Expert Opin. Drug Discov. (2010) 5(4) 329
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.
33. Ohlsson G, Moreira JM, Gromov P,
et al. Loss of expression of the
adipocyte-type fatty acid-binding
protein (A-FABP) is associated with
progression of human urothelial
carcinomas. Mol Cell Proteomics
2005;4:570-81
34. Meuwis MA, Fillet M, Lutteri L, et al.
Proteomics for prediction and
characterization of response to infliximab
in Crohn’s disease: a pilot study.
Clin Biochem 2008;41:960-7
35. Fillet M, Cren-Olive C, Renert AF, et al.
Differential expression of proteins in
response to ceramide-mediated stress
signal in colon cancer cells by 2-D gel
electrophoresis and MALDI-TOF-MS.
J Proteome Res 2005;4:870-80
36. Short DM, Heron ID,
Birse-Archbold JL, et al. Apoptosis
induced by staurosporine alters
chaperone and endoplasmic reticulum
proteins: identification by quantitative
proteomics. Proteomics 2007;7:3085-96
37. Li Z, Kreutzer M, Mikkat S, et al.
Proteomic analysis of the E2F1 response
in p53-negative cancer cells: new
aspects in the regulation of cell
survival and death. Proteomics
2006;6:5735-45
38. Yu Y, Wang LS, Shen SM, et al.
Subcellular proteome analysis of
camptothecin analogue
NSC606985-treated acute myeloid
leukemic cells. J Proteome Res
2007;6:3808-18
39. Zhao J, Zhu K, Lubman DM, et al.
Proteomic analysis of estrogen response
of premalignant human breast cells using
a 2-D liquid separation/mass mapping
technique. Proteomics 2006;6:3847-61
40. Amanchy R, Kalume DE, Iwahori A,
et al. Phosphoproteome analysis of
HeLa cells using stable isotope
labeling with amino acids in cell
culture (SILAC). J Proteome Res
2005;4:1661-71
41. Amanchy R, Kalume DE, Pandey A.
Stable isotope labeling with amino acids
in cell culture (SILAC) for studying
dynamics of protein abundance and
posttranslational modifications.
Sci STKE 2005;2005:l2
42. Lefievre L, Chen Y, Conner SJ, et al.
Human spermatozoa contain multiple
targets for protein S-nitrosylation: an
alternative mechanism of the modulation
of sperm function by nitric oxide?
Proteomics 2007;7:3066-84
43. Fernbach NV, Planyavsky M, Muller A,
et al. Acid elution and
one-dimensional shotgun analysis on an
Orbitrap mass spectrometer: an
application to drug affinity
chromatography. J Proteome Res
2009;8:4753-65
44. Rix U, Hantschel O, Durnberger G,
et al. Chemical proteomic profiles of the
BCR-ABL inhibitors imatinib, nilotinib,
and dasatinib reveal novel kinase and
nonkinase targets. Blood
2007;110:4055-63
45. Rix U, Remsing Rix LL, Terker AS,
et al. A comprehensive target selectivity
survey of the BCR-ABL kinase inhibitor
INNO-406 by kinase profiling and
chemical proteomics in chronic myeloid
leukemia cells. Leukemia 2009;1:44-50
46. Wong JW, McRedmond JP, Cagney G.
Activity profiling of platelets by
chemical proteomics. Proteomics
2009;9:40-50
47. Burckstummer T, Bennett KL,
Preradovic A, et al. An efficient
tandem affinity purification
procedure for interaction proteomics
in mammalian cells. Nat Methods
2006;3:1013-19
48. Ashburner M, Ball CA, Blake JA, et al.
Gene ontology: tool for the
unification of biology. The Gene
Ontology Consortium. Nat Genet
2000;25:25-9
49. Kanehisa M, Goto S, Hattori M, et al.
From genomics to chemical genomics:
new developments in KEGG.
Nucleic Acids Res 2006;34:D354-7
50. Vastrik I, D’Eustachio P, Schmidt E,
et al. Reactome: a knowledge base of
biologic pathways and processes.
Genome Biol 2007;8:R39
51. Aranda B, Achuthan P, am-Faruque Y,
et al. The intact molecular interaction
database in 2010. Nucleic Acids Res
2010;38:D525-31
52. Antonov AV, Schmidt T, Wang Y, et al.
ProfCom: a web tool for profiling
the complex functionality of gene
groups identified from
high-throughput data. Nucleic Acids
Res 2008;36:W347-51
53. Khatri P, Draghici S. Ontological
analysis of gene expression data: current
tools, limitations, and open problems.
Bioinformatics 2005;21:3587-95
54. Khatri P, Sellamuthu S, Malhotra P,
et al. Recent additions and improvements
to the Onto-Tools. Nucleic Acids Res
2005;33:W762-5
55. Antonov AV, Dietmann S, Wong P,
et al. GeneSet2miRNA: finding the
signature of cooperative miRNA activities
in the gene lists. Nucleic Acids Res
2009;37:W323-8
56. Antonov AV, Dietmann S,
Rodchenkov I, et al. PPI spider: a tool
for the interpretation of proteomics
data in the context of
protein-protein interaction networks.
Proteomics 2009;9:2740-9
57. Martin B, Sanz R, Aragues R, et al.
Functional clustering of metastasis
proteins describes plastic adaptation
resources of breast-cancer cells to new
microenvironments. J Proteome Res
2008;7:3242-53
58. Martin B, Aragues R, Sanz R, et al.
Biological pathways contributing to
organ-specific phenotype of brain
metastatic cells. J Proteome Res
2008;7:908-20
59. Tu LC, Yan X, Hood L, et al.
Proteomics analysis of the interactome
of N-myc downstream regulated
gene 1 and its interactions with the
androgen response program in prostate
cancer cells. Mol Cell Proteomics
2007;6:575-88
60. Antonov AV, Dietmann S, Mewes HW.
KEGG spider: interpretation of genomics
data in the context of the global gene
metabolic network. Genome Biol
2008;9:R179
61. Smith SD, She YM, Roberts EA, et al.
Using immobilized metal affinity
chromatography,
two-dimensional electrophoresis
and mass spectrometry to identify
hepatocellular proteins with
copper-binding ability. J Proteome Res
2004;3:834-40
62. Inutsuka S, Araki S, Kusaba I, et al.
Copper and zinc content of the blood of
patients with malignant tumors
(especially on Cu-Zn ratio).
Rinsho Byori 1973;21:632-6
63. Barabasi AL, Oltvai ZN. Network
biology: understanding the cell’s
functional organization. Nat Rev Genet
2004;5:101-13
Mining protein lists from proteomics studies: applications for drug discovery
330 Expert Opin. Drug Discov. (2010) 5(4)
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.
64. Seliger B, Dressler SP, Lichtenfels R,
et al. Candidate biomarkers in
renal cell carcinoma. Proteomics
2007;7:4601-12
65. Roxo-Rosa M, da CG, Luider TM, et al.
Proteomic analysis of nasal cells
from cystic fibrosis patients and
non-cystic fibrosis control individuals:
search for novel biomarkers of cystic
fibrosis lung disease. Proteomics
2006;6:2314-25
66. Pollard HB, Ji XD, Jozwik C, et al.
High abundance protein profiling of
cystic fibrosis lung epithelial cells.
Proteomics 2005;5:2210-26
AffiliationAlexey V Antonov PhD
Senior Scientist,
Institute for Bioinformatics and Systems Biology,
Helmholtz Zentrum Munchen -- German
Research Center for Environmental Health
(GmbH),
Ingolstadter Landstra�e 1,
D-85764, Neuherberg,
Germany
Tel: +49 89 3187 2788;
Fax: +49 89 3187 3585;
E-mail: [email protected]
Antonov
Expert Opin. Drug Discov. (2010) 5(4) 331
Exp
ert O
pin.
Dru
g D
isco
v. D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
cMas
ter
Uni
vers
ity o
n 11
/21/
14Fo
r pe
rson
al u
se o
nly.