2. General overview of protein
lists reported by proteomics
3. Local properties of reported
4. Global properties of reported
5. Expert opinion: mining protein
lists for drug discovery
Mining protein lists fromproteomics studies: applicationsfor drug discoveryAlexey V AntonovInstitute for Bioinformatics and Systems Biology, Helmholtz Zentrum Munchen -- German Research
Center for Environmental Health (GmbH), Neuherberg, Germany
Importance of the field: In recent years, proteomics has become a common
technique applied to a wide spectrum of scientific problems, including the
identification of diagnostic biomarkers, monitoring the effects of drug
treatments or identification of chemical properties of a protein or a drug.
Although being significantly different in scientific essence, the ultimate
result of the majority of proteomics studies is a protein list. Thousands of
independent proteomics studies have reported protein lists in various
Areas covered in this review: We review here the spectrum of scientific prob-
lems where proteomics technology was applied recently to deliver protein
lists. The available bioinformatics methods commonly used to understand
the properties of the protein lists are compared.
What the reader will gain: The types and common functional properties of
the reported protein lists are discussed. The range of scientific problems
where this knowledge could be potentially helpful with a focus on drug
discovery issues is explored.
Take home message: Reported protein lists represent a valuable resource
which can be used for a variety of goals, ranging from biomarkers discovery
to identification of novel therapeutic implications of known drugs.
Keywords: databases, gene list, PLIPS, protein list, proteomics, text mining
Expert Opin. Drug Discov. (2010) 5(4):323-331
In recent years, several proteomic platforms have been developed and applied suc-cessfully to help understand the cell proteome . The term proteomics itself hasbeen constantly expanding acquiring novel meaning. Although the approach ini-tially was commonly used to characterize all proteins within a given cell, at themoment, many researchers are also taking advantage of proteomic technology toreveal changes in the concentration of proteins between different cell physiologicalconditions [2-4]. Another direction commonly referred to as proteomics is sub-proteomic . In this case, a subset of proteins, sharing specific characteristics, isisolated from a complex mixture of proteins. The sub-proteomic approach relieson enrichment techniques for isolation of proteins with similar characteristics orbiophysical and chemical properties (e.g., isoelectric point, molecular mass,cellular compartment).
The spectrum of problems covered by proteomics studies continues to expandrapidly. Proteomic analyses have recently been conducted on tissues, biofluids andsubcellular components in both animal models and humans . Various clinicalapplications of proteomics, including the identification of prognostic and earlierdiagnostic markers and monitoring the effects of drug treatments, are of particularinterest . In addition, proteomics is frequently combined with other genomics
10.1517/17460441003716796 2010 Informa UK Ltd ISSN 1746-0441 323All rights reserved: reproduction in whole or in part not permitted
and/or metabolomic technologies to profile molecular cellmechanisms at a systems biology level .There are several databases for the purpose of capturing and
disseminating proteomics data [7,8]. For example, the PRIDEdatabase [9,10] has been developed to provide a standards-compliant repository for mass spectrometry-based proteomicsdata comprising identifications of proteins, peptides and post-translational modifications. Additionally, public repositoriescollect a lot of valuable supplemental information aboutexperimental set up and technical parameters.Independent of the primary target of the proteomics study
or proteomics technology used, in most cases, experimentalpart delivers a list of proteins found to be expressed (or differ-entially expressed) in the context of studied biological phe-nomena. For example, we identified >400 papers publishedin the last 5 years in the Proteomics journal that report oneor several lists of proteins (in total, Proteomics publishedabout 2000 papers for this period). Being publicly available,this information at the same time was disseminated in hun-dreds of papers. Recently, we developed a Web mining tool,which collected this information. By searching through fulltext papers, it automatically selects tables with a list of proteinidentifiers. This information was compiled into the PLIPSdatabase . Currently, the database covers about 1500different protein lists which have been reported by ~ 1200independent proteomics studies.We review here the general and individual properties of
reported protein lists. We also show the ways the PLIPS data-base can be utilized to deliver novel hypotheses in variousclinical contexts. The spectra of potential applications include
typical biomarker discovery projects and the search for noveltherapeutic implications of known or developing drugs.
2. General overview of protein lists reportedby proteomics studies
Although the spectrum of reported protein lists covers a vari-ety of different biological, clinical and chemical issues, mostof the proteomics studies can be grouped into a relativelysmall number of classes. According to the biological essenceof the studied phenomena, we can generally split reportedprotein lists into the following five classes:
proteins specifically expressed in a tissue proteins specifically expressed in a cell compartment proteins differentially expressed between different celltypes
proteins differentially expressed between treated/untreated cells
proteins with a common chemical property.
Of course, there are a number of specific cases when it is hardto assign the study to a particular class as well as many cases inwhich the study can often be attributed to several classes.Next, we give a brief overview of each class.
2.1 Proteins expressed in a specific cell typeA number of projects have been launched recently to gener-ate a catalogue of proteins specifically expressed in differentnormal human tissues . Therefore, a considerable share ofprotein lists reported recently represents this type of proteo-mics studies. For example, the HUPO Proteome Projects areinitiatives coordinating proteomics studies to characterizehuman proteomes of different tissues [13,14]. For example, anon-redundant set of 1804 proteins was identified in humanbrain samples . At the moment, most of human tissueshave been profiled, including such special ones as enamelpellicle .
Some of these projects were focused mainly on technologi-cal issues of protein extraction and/or purification. A remark-able example is the study in , where the results of proteinextraction from different ocular regions using different deter-gents were compared. It was demonstrated that the extractionstrategy may affect the final outcome in protein profiling bymass spectrometry (MS) or by other methods.
Although being commonly considered to be of less impor-tance for drug discovery projects, this class of retrieved proteinlists can be also of great value. As we demonstrate further, theyshould be also accounted for to reduce the risk of potentialadverse drug effect.
2.2 Proteins specifically expressed in a cell
compartmentThe next issue which was abundantly addressed byproteomics studies is understanding the distribution of
Proteomics has become a common technique applied toa wide spectrum of scientific problems. Although beingsignificantly different in scientific essence, the ultimateresult of the majority of proteomics studies is a proteinlist.
Thousands of independent proteomics studies havereported protein lists in various functional contexts.
The PLIPS database is a collection of proteomics paperswhich reported a protein list.
According to the biological essence of the studiedphenomena, the reported protein lists can be split intofive classes.
Analyses of global properties of reported protein listsindicate that most reported protein lists are highlydependent. On an average, each list shares a significantsubset of proteins with >20 other protein lists.
Significant similarities between protein lists can beindicative of similarity in molecular mechanismsbetween corresponding phenomena.
Information from PLIPS can be of great value for drugdiscovery projects in various contexts: i) to select aproper protein target and ii) to identify new therapeuticimplications of novel and known drugs.
This box summarizes key points contained in the article.
Mining protein lists from proteomics studies: applications for drug discovery
324 Expert Opin. Drug Discov. (2010) 5(4)