2. General overview of protein
lists reported by proteomics
3. Local properties of reported
4. Global properties of reported
5. Expert opinion: mining protein
lists for drug discovery
Mining protein lists fromproteomics studies: applicationsfor drug discoveryAlexey V AntonovInstitute for Bioinformatics and Systems Biology, Helmholtz Zentrum Munchen -- German Research
Center for Environmental Health (GmbH), Neuherberg, Germany
Importance of the field: In recent years, proteomics has become a common
technique applied to a wide spectrum of scientific problems, including the
identification of diagnostic biomarkers, monitoring the effects of drug
treatments or identification of chemical properties of a protein or a drug.
Although being significantly different in scientific essence, the ultimate
result of the majority of proteomics studies is a protein list. Thousands of
independent proteomics studies have reported protein lists in various
Areas covered in this review: We review here the spectrum of scientific prob-
lems where proteomics technology was applied recently to deliver protein
lists. The available bioinformatics methods commonly used to understand
the properties of the protein lists are compared.
What the reader will gain: The types and common functional properties of
the reported protein lists are discussed. The range of scientific problems
where this knowledge could be potentially helpful with a focus on drug
discovery issues is explored.
Take home message: Reported protein lists represent a valuable resource
which can be used for a variety of goals, ranging from biomarkers discovery
to identification of novel therapeutic implications of known drugs.
Keywords: databases, gene list, PLIPS, protein list, proteomics, text mining
Expert Opin. Drug Discov. (2010) 5(4):323-331
In recent years, several proteomic platforms have been developed and applied suc-cessfully to help understand the cell proteome . The term proteomics itself hasbeen constantly expanding acquiring novel meaning. Although the approach ini-tially was commonly used to characterize all proteins within a given cell, at themoment, many researchers are also taking advantage of proteomic technology toreveal changes in the concentration of proteins between different cell physiologicalconditions [2-4]. Another direction commonly referred to as proteomics is sub-proteomic . In this case, a subset of proteins, sharing specific characteristics, isisolated from a complex mixture of proteins. The sub-proteomic approach relieson enrichment techniques for isolation of proteins with similar characteristics orbiophysical and chemical properties (e.g., isoelectric point, molecular mass,cellular compartment).
The spectrum of problems covered by proteomics studies continues to expandrapidly. Proteomic analyses have recently been conducted on tissues, biofluids andsubcellular components in both animal models and humans . Various clinicalapplications of proteomics, including the identification of prognostic and earlierdiagnostic markers and monitoring the effects of drug treatments, are of particularinterest . In addition, proteomics is frequently combined with other genomics
10.1517/17460441003716796 2010 Informa UK Ltd ISSN 1746-0441 323All rights reserved: reproduction in whole or in part not permitted
and/or metabolomic technologies to profile molecular cellmechanisms at a systems biology level .There are several databases for the purpose of capturing and
disseminating proteomics data [7,8]. For example, the PRIDEdatabase [9,10] has been developed to provide a standards-compliant repository for mass spectrometry-based proteomicsdata comprising identifications of proteins, peptides and post-translational modifications. Additionally, public repositoriescollect a lot of valuable supplemental information aboutexperimental set up and technical parameters.Independent of the primary target of the proteomics study
or proteomics technology used, in most cases, experimentalpart delivers a list of proteins found to be expressed (or differ-entially expressed) in the context of studied biological phe-nomena. For example, we identified >400 papers publishedin the last 5 years in the Proteomics journal that report oneor several lists of proteins (in total, Proteomics publishedabout 2000 papers for this period). Being publicly available,this information at the same time was disseminated in hun-dreds of papers. Recently, we developed a Web mining tool,which collected this information. By searching through fulltext papers, it automatically selects tables with a list of proteinidentifiers. This information was compiled into the PLIPSdatabase . Currently, the database covers about 1500different protein lists which have been reported by ~ 1200independent proteomics studies.We review here the general and individual properties of
reported protein lists. We also show the ways the PLIPS data-base can be utilized to deliver novel hypotheses in variousclinical contexts. The spectra of potential applications include
typical biomarker discovery projects and the search for noveltherapeutic implications of known or developing drugs.
2. General overview of protein lists reportedby proteomics studies
Although the spectrum of reported protein lists covers a vari-ety of different biological, clinical and chemical issues, mostof the proteomics studies can be grouped into a relativelysmall number of classes. According to the biological essenceof the studied phenomena, we can generally split reportedprotein lists into the following five classes:
proteins specifically expressed in a tissue proteins specifically expressed in a cell compartment proteins differentially expressed between different celltypes
proteins differentially expressed between treated/untreated cells
proteins with a common chemical property.
Of course, there are a number of specific cases when it is hardto assign the study to a particular class as well as many cases inwhich the study can often be attributed to several classes.Next, we give a brief overview of each class.
2.1 Proteins expressed in a specific cell typeA number of projects have been launched recently to gener-ate a catalogue of proteins specifically expressed in differentnormal human tissues . Therefore, a considerable share ofprotein lists reported recently represents this type of proteo-mics studies. For example, the HUPO Proteome Projects areinitiatives coordinating proteomics studies to characterizehuman proteomes of different tissues [13,14]. For example, anon-redundant set of 1804 proteins was identified in humanbrain samples . At the moment, most of human tissueshave been profiled, including such special ones as enamelpellicle .
Some of these projects were focused mainly on technologi-cal issues of protein extraction and/or purification. A remark-able example is the study in , where the results of proteinextraction from different ocular regions using different deter-gents were compared. It was demonstrated that the extractionstrategy may affect the final outcome in protein profiling bymass spectrometry (MS) or by other methods.
Although being commonly considered to be of less impor-tance for drug discovery projects, this class of retrieved proteinlists can be also of great value. As we demonstrate further, theyshould be also accounted for to reduce the risk of potentialadverse drug effect.
2.2 Proteins specifically expressed in a cell
compartmentThe next issue which was abundantly addressed byproteomics studies is understanding the distribution of
Proteomics has become a common technique applied toa wide spectrum of scientific problems. Although beingsignificantly different in scientific essence, the ultimateresult of the majority of proteomics studies is a proteinlist.
Thousands of independent proteomics studies havereported protein lists in various functional contexts.
The PLIPS database is a collection of proteomics paperswhich reported a protein list.
According to the biological essence of the studiedphenomena, the reported protein lists can be split intofive classes.
Analyses of global properties of reported protein listsindicate that most reported protein lists are highlydependent. On an average, each list shares a significantsubset of proteins with >20 other protein lists.
Significant similarities between protein lists can beindicative of similarity in molecular mechanismsbetween corresponding phenomena.
Information from PLIPS can be of great value for drugdiscovery projects in various contexts: i) to select aproper protein target and ii) to identify new therapeuticimplications of novel and known drugs.
This box summarizes key points contained in the article.
Mining protein lists from proteomics studies: applications for drug discovery
324 Expert Opin. Drug Discov. (2010) 5(4)
protein expression across cell compartments. In comparison tothe previous class, additional steps in sample preparationmay be required to separate proteins from different cellcompartments.
Proteins embedded into the plasma membrane largely definethe cell functionality. There were a number of proteomics stud-ies which delivered plasma membrane-containing fractions ofproteins for different cell types [17,18]. Also, specific sub-membrane fractions, such as lipid rafts, were reported [19,20].Lipid rafts are glycolipid- and cholesterol-enriched membranemicrodomains implicated in membrane signaling and traffick-ing. The mitochondria is also an intensively explored cell com-partment  as well as several other compartments [22,23].The knowledge of compartment specific cell proteome can beof use for a variety of scientific issues.
2.3 Proteins differentially expressed between
different cell typesDifferent factors can significantly affect the landscape of thecell proteome. The next logical step forward would be a con-struction of protein catalogues not only for normal tissues orcell compartments but also for tissues at abnormal conditions.The comparative analyses of the proteomes of normal cellsversus abnormal cells often lead to the inference of differen-tially expressed proteins. Numerous examples of proteomicsstudies covering different clinical issues and delivering, as anoutput, the lists of differentially expressed proteins are avail-able at the moment. Most of them were aimed to discoverbiomarkers for early disease detection [24-27], for stratificationdisease into distinct subtypes [28-30] and/or for monitoringdisease progression [31-33].
Different types of cancer are regular targets of proteomicsstudies which deliver lists of up- or downregulated pro-teins [24,25,32]. The discovery of suitable biomarkers for earlydetection promises significant improvements in clinical out-comes for cancer patients. However, despite the recent prog-ress in proteomics technologies, one should be cautious ofinterpretations. This is partly due to the inheritedcomplexity of cancer where almost every case of disease is par-tially unique on the molecular level. The second reason relatesto the inherent biases of the whole technological chain, frompreparation of biospecimens to protein detection by MS.
2.4 Proteins differentially expressed between
treated/untreated cellsSystematic investigation of the mechanisms of drug action rep-resents a large share of proteomics studies whose primary out-put is a protein list. Proteomics can be considered as a verysuitable technique to quantify cell response to exposure to adrug [34,35] or to the other clinically related environmentalconditions [36,37]. The discovered protein lists can shed newinsights for a better understanding of the mechanisms of drugaction, such as induction of apoptosis or activation of otherdisease-related signaling or regulatory/metabolic pathways.
A large number of small molecules with a wide spectrum ofproposed mechanism of action have been explored [34-36,38]. Inaddition, cell response to silencing or overexpression of somepotential targets for anticancer therapy (transcription factors,phosphorylation kinases) that regulates cell-cycle progressionor apoptosis was quantified . The inferred protein lists inthese studies can be used for a variety of purposes in drugdiscovery projects.
2.5 Proteins with a common chemical propertyA number of proteomics projects were devoted to deliverproteins with common chemical properties [40-46]. A signifi-cant share of such studies is devoted to identification of pro-teins subjected to post-translational modifications. Forexample, proteomics was used intensively to study proteinphosphorylation in the cell [40,41]. Reversible phosphoryla-tion of proteins is a key mechanism for control of signaltransduction. Phosphorylation of proteins is known toregulate enzymatic activity, subcellular localization,protein--protein interaction (PPI) and degradation of pro-teins. In , 118 tyrosine phosphorylated proteins wereidentified by coupling stable isotope labeling with aminoacids in cell culture to mass spectrometry.
The objective of another study  was to identify targetsfor S-nitrosylation in human sperm. Spermatozoa were incu-bated with nitric oxide donors and S-nitrosylated proteinswere identified using the biotin switch assay and a proteomicapproach using tandem mass spectrometry (MS/MS). In total,240 S-nitrosylated proteins were detected in sperm incubatedwith S-nitroso-glutathione.
Another very promising direction is the application of prote-omics to identify the whole genome-binding spectra for a givenmolecular probe . It is frequently referred to as chemical pro-teomics or activity-based proteomics. Molecular probes areused to target a selective group of functionally-related proteins.An affinity chromatography protocol is used to selectproteins with binding potential. In the next step, the MS/MStechnique is applied to identify the recovered proteins.
Chemical proteomics was used in  to identify thenucleotide-binding proteome of active and resting platelets.Affinity chromatography protocol using immobilized adeno-sine triphosphate, cyclic adenosine monophosphate andcyclic guanosine monophosphate was used. Several plateletproteins that show statistically significant difference betweenthe active and resting nucleotide-binding proteome werereported.
This type of proteomics study was used to identify thewhole genome-binding proteome for several available drugsor drugs in the development phase [43-45]. For example, thestudy in  reported protein targets of bosutonib, a promis-cuous kinase inhibitor. Bosutinib (SKI-606) is an ATP-competitive t...