Upload
utpalmtbi
View
217
Download
0
Embed Size (px)
Citation preview
8/10/2019 Using the Reactome Database
1/16
UNIT 8.7Using the Reactome Database
The completion of multiple genomes in recent years has led to an explosion of information
about known and predicted gene products. This information explosion has been acceler-
ated by the invention of high-throughput experimental techniques, such as microarrays
(see Chapter 7), yeast two-hybrid screens, and ChIP on Chip techniques, which allow
experimentalists to ask questions about tens of thousands of genes simultaneously. As a
result, biological researchers now face an embarrassment of riches: there is simply too
much information to easily digest and interpret.
One way to reduce the complexity of this information is to adopt a high-level view
of biological pathways. A microarray experiment that changes the expression pattern
of thousands of genes may only affect the expression patterns of a small handful of
biochemical pathways. Hence there is a high degree of interest in the bioinformatics
community in creating pathway databases. The Reactome project, covered in this unit,
is one such database. It is a curated collection of well documented molecular reactions
that span the gamut from simple intermediate metabolism (e.g., sugar catabolism) to
complex cellular events such as the mitotic cell cycle. These reactions are gathered by
experts in the field, peer reviewed, and edited by professional staff members prior to being
published in the database. A semiautomated procedure supplements this information by
identifying likely orthologous molecular reactions in mouse, rat, zebrafish, and othermodel organisms.
The protocols in this unit illustrate how to use Reactome to learn the steps of a biological
pathway and see how one pathway interacts with another. Basic Protocol 1 describes how
to navigate and browse through the Reactome database. Basic Protocol 2 and Alternate
Protocol 1 explain how to identify thepathways in which a molecule of interest is involved
using either the common name or accession number, respectively. Basic Protocol 3 details
how to use the Pathfinder tool to search the database for possible connections within and
between pathways. Alternate Protocol 2 describes when and how to use the Advanced
Search feature.
NOTE: This information is based on Reactome in July 2004. Some of the Web pages may
have changed somewhat since the unit written.
BASIC
PRO TO CO L 1
BROWSING A REACTOME PATHWAY
This protocol will introduce the basic navigational techniques needed to browse the
Reactome database.
Necessary Resources
Hardware
Computer capable of supporting a Web browser, and an Internet connection
Software
Any modern Web browser will work. The formatting of the Reactome pages maylook best using Internet Explorer 4.0 or higher, or Netscape 7.0 or higher.
Contributed by Lincoln D. SteinCurrent Protocols in Bioinformatics(2004) 8.7.1-8.7.16
Copyright C 2004 by John Wiley & Sons, Inc.
AnalyzingMolecularInteractions
8.7.1
Supplement 7
8/10/2019 Using the Reactome Database
2/16
8/10/2019 Using the Reactome Database
3/16
8/10/2019 Using the Reactome Database
4/16
Using theReactomeDatabase
8.7.4
Supplement 7 Current Protocols in Bioinformatics
Repair and Double-Strand Break Repair. The+ marks mean that there are subheadings
underneath the headings. Clicking on a+ will expand the topic to show its subparts.
The main screen, to the right of the navigation panel, containing the description of the
pathway. This is the meat of the information contained within Reactome. The main screen
begins with the authors, peer reviewers, and editors for this pathway, along with the date
that the pathway wasfirst released. This is followed by a textsummationthat describes
the pathway. Below the summation are more details about the pathway, including the
taxon in which the reaction occurs, the Gene Ontology classification(s) of the pathway,
and the cellular compartment in which the pathway is known to occur. Further down
are two importantfields. Thefield that readsEquivalent event(s) in other organism(s)allows one to jump to the corresponding processes in the other model organism systems.
TheParticipating molecules field lists all proteins, nucleic acids, complexes and small
molecules, and complexes of these entities that are involved in any of the myriad aspects
of DNA repair.
3. Drill down into the Global Genomic Nucleotide Excision Repair subpathway as fol-
lows. The last entry in the navigation panel is Nucleotide Excision Repair. Click on
it to open this level of the hierarchy, revealing the subentries Global Genomic NER
(GG-NER)andTranscription-coupled NER (TC-NER).Click on Global Genomic
NER (GG-NER), to reveal the page shown in Figure 8.7.3.
Notice that the navigation panel has now expanded by a level to reveal the relation-
ship between global genomic nucleotide excision repair and the more general pathways
that it belongs to on the one hand, and to the more specific pathways (DNA Damage
Recognition. . ., Formation of incision complex. . .,etc.) on the other hand. Further, the
Figure 8.7.3 The main screen after drilling down to the Global Genomic NER (GG-NER) sub-
pathway. The navigation panel on the left has opened up to indicate the subpathways of GG-NER,
and the highlighting in the reaction map now indicates the reactions involved in this subpathway
only.
8/10/2019 Using the Reactome Database
5/16
AnalyzingMolecularInteractions
8.7.5
Current Protocols in Bioinformatics Supplement 7
Figure 8.7.4 An individual reaction. Notice that a single reaction arrow is highlighted in the
reaction map, and that the information in the main screen now shows the constituent input and
output molecular compounds that participate in this reaction.
highlighting in the reaction map is now restricted to the reactions that are involvedin global
genomic nucleotide excision repair. The main screen describes the process in text form, and
is accompanied by a cartoon overview. A much smaller set of participating molecules (par-
tially scrolled out of view in thefigure) lists the proteins, complexes, and other molecules
that participate in this process.
4. In order to drill down to the reaction level, continue to click on subpathways. Eventually
the reaction level will be reached, where processes are described as the interactions
of individual molecules. To see this, return to the navigation panel and clickfirst on
DNA Damage Recognition in GG-NERand then on XPC:HR23B complex binds
to damaged DNA site with lesion [Homo sapiens]to go to the page shown in Figure
8.7.4.
A reaction-level page is similar to the upper-level pages, with a few important differences.
First of all, the reaction map on the reaction-level page highlights a single reaction arrow
only, indicating that one is at the lowest level of a pathway. Second, several additional
fields appear below the text description of the reaction. These new fields include Input,
which lists the molecules that enter the reaction, and Output, which lists the molecules thatresult from the reaction. In the case of the current reaction, the inputs are the damaged
DNA substrate and the XPC:HR23B nucleotide excision complex, while the output is the
complex of XPC:HR23B with the damaged DNA. In other words, this reaction describes
the binding of XPC:HR23B to damaged DNA prior to the subsequent enzymatic reactions
that cleave the DNA and excise the damaged base pair.
Two other new fields are also shown. Preceding event(s) describes the reaction that
immediately precedes this one temporally, in this caseXPC binds to HR23B forming a
heterodimeric complex. . . ..Following event(s)describes the reaction that immediately
follows this one:Recruitment of repair factors to form preincision complex. . .. One can
8/10/2019 Using the Reactome Database
6/16
Using theReactomeDatabase
8.7.6
Supplement 7 Current Protocols in Bioinformatics
Figure 8.7.5 After clicking theFollowing event(s)link in the previousfigure, the next step in the
GG-NER pathway is displayed.
click on the preceding and following events to follow the reactions backward and forward
in time.
5. Move to the next reaction by clicking on theFollowing event(s)link,Recruitment
of repair factors to form preincision complex. This will lead to the page shown in
Figure 8.7.5, which describes the recruitment of six new proteins and complexes tocreate a single complex bound at the site of the damaged DNA. This page shows a
preceding event ofXPC:HR23B complex binds to damaged DNA site with lesion
[Homo sapiens], which is the page shown in Figure 8.7.4, and a following event
ofFormation of open bubble structure in DNA by helicases [Homo sapiens]. By
clicking on the Following event(s)link, it would be possible to continue to follow
the process forward in time.
The relationship between thelevelsof the navigation bar on the one hand and thePre-
ceding event(s) andFollowing event(s) links, on the other hand, may not be immediately
clear. These represent two distinct ways of viewing pathways. The nested levels of the nav-
igation bar reflect levels of abstraction in the conceptual organization of pathways. As
one moves deeper into the hierarchy, the contents of the main screen become more and
more specific and move closer to the biochemical reaction level. ThePreceding event(s)
andFollowing event(s)links, on the other hand, usually only appear when one is at the
reaction level, and move backward and forward in time, remaining always at individual
reactions. It might seem to be redundant to have this dual mode of navigation, but it is there
for a good reason. Because biological knowledge is incomplete, there are many instances
where it is known thatsomething happens next, but the specific molecules that are in-
volved in this next step are not yet characterized. In this case, theFollowing event(s)
link will be missing, and one must step up in the hierarchy to a more general description
of the pathway in order to connect to the next known, well characterized reaction in the
process.
8/10/2019 Using the Reactome Database
7/16
AnalyzingMolecularInteractions
8.7.7
Current Protocols in Bioinformatics Supplement 7
Figure 8.7.5 also illustrates an important aspect of Reactome, the References section at
the bottom of the screen. Every reaction described in the database is supported by some
type of provenance. The three main types of provenance are direct literature citations, an
indirect assertion made by arguing from protein-based similarity in a model organism, and
an assertion made by the author of the module. In the case of direct literature citations,
the citation describes experiments performed using a system derived from the taxon under
consideration. For example, the first reference in the current reaction describes in vivo
experiments performed on human tissue culture cells that provided direct evidence via
molecular cross-linking of an association between the XPC:HR23B/DNA complex and the
repair factors recruited during this step.
Often, knowledge of human biology is derived from work on model organisms. If under-
standing of a reaction is derived from work on a model organism system, the references will
describe those experiments. Internally, direct evidence and indirect evidence from model
organisms are kept distinct, but the user interface does not currently reflect that fact.
Finally, the high-level, more general pathways will usually be based on an assertion by the
author of the module and supported by one or more review articles. Click on the authors
name at the top of the main screen to see the list of review articles that describe the
pathway.
6. Reactome provides information about the subunits of a complex, as well as the larger
ensembles of proteins that a complex participates in. In this example, from the Re-
cruitment of repair factors to form preincision complexpage, click on the TFIIHlink in the Input section. This will load a page that contains information about the
TFIIH (transcription factor IIH) complex (Fig. 8.7.6).
Figure 8.7.6 This page describes the TFIIH protein. In addition to describing its subunit structure,
the page notes all the macromolecular complexes and pathways in which TFIIH participates. The
reaction map highlighting indicates that, in addition to DNA repair processes, TFIIH is involved in
mRNA transcription (arrow).
8/10/2019 Using the Reactome Database
8/16
Using theReactomeDatabase
8.7.8
Supplement 7 Current Protocols in Bioinformatics
Because this page describes a molecule and not a reaction or pathway, there is no navigation
panel on the left. However, the reaction map at the top of the page is still present, and it lights
up to highlight the reactions in which TFIIH participates. Mousing over the highlighted
pathways reveals that, in addition to the DNA excision repair pathway that has been browsed
in the steps above, TFIIH also participates in PolII-mediated RNA transcription. This
connection between RNA transcription and DNA repair might surprise biologists who are
not well acquainted with DNA excision repair, and illustrates how Reactome bridges the
disciplines.
The section near the bottom of Figure 8.7.6 labeledParticipates in processes lists all
the pathways and reactions in which TFIIH participates. Although not shown in Figure
8.7.6, at the top of this section there is an extensive list of all the reactions in which the
current molecule participates. This is organized in a hierarchical manner that mirrors the
pathway hierarchy of the navigation panel. At the bottom, these events are organized into
three groups: all events that produce TFIIH, all that consume it, and all that are catalyzed
by it.
7. To learn more about a protein subunit, click on the subunit of interest. In this case, one
of the subunits of TFIIH is Cdk7 (shown in Fig. 8.7.6; it complexes with Cyclin H and
MAT1 to form the CAK subcomplex, which in turn is one of the major components
of TFIIH). Click on the Cdk7 link to load a page that describes it (Fig. 8.7.7). In
addition to highlighting the DNA repair and RNA transcription constellations, the
reaction map now shows highlighting in the Mitotic Cell Cycle constellation as well
(upper left quadrant of the image), reflecting Cdk7s role as a cell-cycle checkpointmolecule.
This page is called the reference entity page because it contains links to UniProt,
Ensembl, and other reference databases that describe the molecule.
Figure 8.7.7 The reference entity page describes the relationship between a molecule as it is
represented in Reactome and one or more entries in a third-party database such as SwissProt.
8/10/2019 Using the Reactome Database
9/16
AnalyzingMolecularInteractions
8.7.9
Current Protocols in Bioinformatics Supplement 7
BASIC
PRO TO CO L 2
FINDING THE PATHWAYS INVOLVING A GENE OR PROTEIN
This protocol will describe how to identify pathways and reactions that involve a gene or
protein of interest. For the purposes of illustration, the cyclin-dependent kinase 7 gene
will be used, which has the following identifiers:
Protein product: Common name: Cdk7
UniProt (SwissProt): CDK7 HUMAN
Gene: LocusLink: 1022
GenBank: NM 001799
Ensembl: ENSG00000134058.
See Alternate Protocol 1 to search by a database accession number rather than by a
common name.
Necessary Resources
Hardware
Computer capable of supporting a Web browser, and an Internet connection
Software
Any modern Web browser will work. The formatting of the Reactome pages maylook best using Internet Explorer 4.0 or higher, or Netscape 7.0 or higher.
1. Point the browser to the Reactome home page at http://www.reactome.org.
2. On the home page (Fig. 8.7.1), in the search bar near the top of the page (see annotation
to step 1 of Basic Protocol 1), click the text box (second box from the right-hand side
of the search bar), type Cdk7, then press the Enter key (or click the Go! button). This
brings up the search results page shown in Figure 8.7.8.
For now, ignore the textfields and buttons that occupy most of the real estate at the top of
the page, and focus on the section at the bottom under the headingFound 8 instances in
the following categories.This section tells the user that Reactome knows of 1 Literature
reference, 1 summation, 4 ReferenceEntities, and 2 PhysicalEntities that have something
to do with Cdk7. Summations are the text paragraphs that appear at the top of pages that
describe pathways and reactions. ReferenceEntities are lists of protein and gene entries
that appear in online genome databases. What are needed, although not apparent from the
name, are the PhysicalEntities, which is the term that Reactome uses for anything that has
Figure 8.7.8 Results from the quick search on the Reactome home page are displayed at the
bottom of the full-featured Advanced Search page.
8/10/2019 Using the Reactome Database
10/16
Using theReactomeDatabase
8.7.10
Supplement 7 Current Protocols in Bioinformatics
Figure 8.7.9 Following the search results to the Cdk7 page displays the structure of Cdk7 and a
hierarchical list of the pathways in which it is known to participate.
mass, such as a macromolecule. The search interface will be modified in the near future to
make it easier to interpret.
3. Navigate to the Cdk7 entry by clicking on the2 link that appears after the Physi-
calEntity label. This will lead to a list of two entries in Reactome, MAT1 (also known
asCdk7 assembly factor) and Cdk7 itself. Click on the Cdk7link. This will lead
to the page shown in Figure 8.7.9.
This page, which is similar to the TFIIH page shown in Figure 8.7.7, describes everything
that Reactome knows about Cdk7, including its names in other online databases, the protein
complexes that it belongs to, and the pathways and reactions that it participates in. Any of
these links can be clicked to begin browsing the pathways involving Cdk7 as described in
Basic Protocol 1.
ALTER NATE
PRO TOCO L 1
FINDING THE PATHWAYS INVOLVING A GENE OR PROTEIN USINGSwissProt, Ensembl, OR LocusLink NAME
Instead of searching for a gene or protein using its common name, as described in Basic
Protocol 2, one may wish to use the accession number by which it is known in SwissProt,
Ensembl, or LocusLink. The steps for doing so, using a SwissProt accession number,
are presented here. The same procedure works for Ensembl or LocusLink identi fiers.
However, Reactome does not currently recognize GenBank accession numbers, e.g.,
NM 001799, because of the redundancy of GenBank entries. If one wish to find a proteinbased on its GenBank accession number, one should first use NCBI LocusLink to find
the correct LocusLink number, and then use this number to access the appropriate entry
in Reactome.
Necessary Resources
Hardware
Computer capable of supporting a Web browser, and an Internet connection
8/10/2019 Using the Reactome Database
11/16
AnalyzingMolecularInteractions
8.7.11
Current Protocols in Bioinformatics Supplement 7
Software
Any modern Web browser will work. The formatting of the Reactome pages maylook best using Internet Explorer 4.0 or higher, or Netscape 7.0 or higher.
1. Point the browser to the Reactome home page at http://www.reactome.org.
2. On the home page (Fig. 8.7.1), in the search bar near the top of the page (see annotation
to step 1 of Basic Protocol 1), click the text box (second box from the right-hand side
of the search bar), type CDK7 HUMAN, then press the Enter key (or Click the Go!
button).
This brings up a reference entity page (see Basic Protocol 1, step 7) similar to the one
shown in Figure 8.7.7.
3. Navigate to the molecule or pathway of interest. The reference entity page is similar
in most respects to the PhysicalEntity page shown in Figure 8.7.9. From here it is
possible to navigate to the pathways and reactions in which Cdk7 takes part, view the
complexes that contain Cdk7, or link to the PhysicalEntity page shown in Figure 8.7.9.
ALTER NATE
PRO TO CO L 2
USING ADVANCED SEARCH
The simple searches shown in Basic Protocol 2 and Alternate Protocol 1 will suffice for
many situations. However, the default search casts a very wide net and may return morehits than one wants. If this is the case, one may wish to use the Advanced Search, which
gives much finer control over the search. To illustrate, this protocol describe how to search
forpyruvate dehydrogenase,whose default search returns multiple hits on compounds,
events, literature references, and other database entries.
Necessary Resources
Hardware
Computer capable of supporting a Web browser, and an Internet connection
Software
Any modern Web browser will work. The formatting of the Reactome pages maylook best using Internet Explorer 4.0 or higher, or Netscape 7.0 or higher.
1. Point the Web browser to the Reactome home page at http://www.reactome.org.
2. On the home page (Fig. 8.7.1), in the search bar near the top of the page (see annotation
to step 1 of Basic Protocol 1), click the text box (second box from the right-hand side
of the search bar), and typepyruvate dehydrogenase.
Pyruvate dehydrogenase is a protein complex, and one might like to limit the search to
database entries for complexes, as in step 3.
3. Go to the pull-down menu on the far left of the search bar and change the scope of the
search fromeverythingtocomplexes,then press the Go! button to the right of the
search bar.
This will return a list of 13 complexes that contain the wordspyruvate dehydrogenase,
including FADH2-linked pyruvate dehydrogenase complex, pyruvate dehydrogenase E2
holoenzyme, S-acetyldihydrolipoamide linked, and pyruvate dehydrogenase E2 trimer.
By default, the search willfind matches in Homo sapiens. If one wishes to see matches in
another species, one can change the search parameters as in step 4.
8/10/2019 Using the Reactome Database
12/16
8/10/2019 Using the Reactome Database
13/16
AnalyzingMolecularInteractions
8.7.13
Current Protocols in Bioinformatics Supplement 7
Figure 8.7.10 After entering the names of the start and end compounds, the Pathfinder willdisplay pull-down menus of candidate compounds known to Reactome. Pull down the menus in
order tofine tune Reactomes choice of compounds.
Figure 8.7.11 The PathFinder graphic display shows all the steps necessary to traverse from
the starting compound to the ending compound.
the menus and select the best match to what was intended. In the current example,
Reactome got the start compound right, but guessed incorrectly for the end compound,
returning an intermediate complex that happens to involve pol II transcription. Click on
the end compound menu and select pol II transcription complex.If the sought-afteritem is not found at first, try rephrasing it and typing it into the appropriate textfield,
then pressing Enter. This will update the list of candidates in the pull-down menu.
5. When the correct start and end compounds have been selected, press the Go! button
at the bottom of the panel. In a few seconds the page will refresh and display a list
of reactions that together connect the origin of replication to the pol II transcription
complex. The found path traverses the DNA repair pathways, which involve both
DNA replication and transcription factors. One can click on any of these steps to
begin browsing Reactome at that point.
8/10/2019 Using the Reactome Database
14/16
Using theReactomeDatabase
8.7.14
Supplement 7 Current Protocols in Bioinformatics
6. Press the button labeled View in Pathway underneath the Pathfinder list. Provided that
Java is installed and running on the system, a new image window will pop up that
shows this pathway in a graphical form (Fig. 8.7.11).
The user can interact with the pathway visualization in a limited manner in order to make
it more visually appealing. To do this, press the button labeledStop relaxingto stop the
automatic layout process andfix the reaction boxes in place. Next, grab the boxes with the
mouse and move them into the preferred positions.
The Pathfinder visualization does not currently support exporting the display as a static
image. However, it is possible to use the screenshot feature of ones local computer (Alt-
Print Scr on the PC) in order to capture the pathway. Also note that the View in Pathwaybutton will not appear unless Java is installed.
COMMENTARY
Background InformationThe Reactome project is a collaboration be-
tween Cold Spring HarborLaboratory andThe
European Bioinformatics Institute, and aims
to collect structured information on all the bi-
ological pathways in the human (Joshi-Tope
et al., 2003; see Internet Resources for online
version of this paper). The project is build-ing its database by inviting faculty-level lab-
oratory researchers to contribute a pathway or
sub-pathway to the database. To achieve this,
contributors are instructed on the use of a spe-
cializedpieceof authoringsoftwareand areas-
sisted in their work by a staff of curators based
at the two institutions. After authoring, each
pathway is checked for consistency both man-
ually and automatically and then sent to one
or more external peer reviewers. The pathway
is published to the Web when all internal and
external peer review is satisfactory. In many
ways, the project resembles a review journal,
except that its output is a database rather than
a series of papers.
In order to assist authors in organizing their
domain of knowledge into a set of defined
pathways,Reactomerelies on frequentmini-
jamborees of roughly a half-dozen authors.
Duringthesejamborees,whichare held in con-
junction with international meetings, authors
working on a set of related pathways get to-
gether in the same room and work out the
logical structure of their topic. This is also an
opportunity for Reactome curators to train theauthors in the use of the authoring software.
Reactome uses a simple scheme for describ-
ing biological pathways in which all molecu-
lar interactions are defined as reactions. A re-
action takes a series of inputs and transforms
them into a series of outputs, where inputs and
outputs are any type of molecular compound.
For example, the reaction in which proinsulin
is cleaved to form the and chains takes as
its input proinsulin, and produces the insulin
and polypeptides.
Representing biology as a set of molecular
reactions turns out to have broad expressive
power, but sometimes the results are disorient-
ing. For example, the reaction in which insulin
binds to the insulin receptor takes as its in-
puts extracellular insulin and the extracellularportion of the insulin receptor, and produces
as its output the complex of insulin and its
receptor, which, in Reactome, is represented
as a distinct molecular entity. The reaction by
which extracellular glucose is transported into
the cytosol transforms extracellularD-glucose
into intracellular D-glucose. Hence, a search
of Reactome forD-glucosewillfind bothD-
glucose (intracellular cytosolic) and D-glucose
(extracellular).
In addition to inputs and outputs, Reactome
reactions have a discrete set of additional at-
tributes. For those reactions that are mediated
by catalysts, the catalyst enzyme and its ac-
tivity are noted. Reactions are also annotated
using the cellular compartment in which they
occur. While Reactomedoes notpretendto be a
definitive source of information on the cellular
location of macromolecules, its data model is
set up to work smoothly with future databases
of subcellular localization; information on the
subcellular location of macromolecules will
help automated path-prediction software dis-
tinguish plausible pathways from impossible
ones. Finally, each reaction is supported by lit-erature citations, either those reporting experi-
ments performed directly in the humansystem,
or those performed on model systems when
there is high-quality protein similarity data to
suggest that thesame reaction is likelyto occur
in humans.
In order to assist with the comprehensibil-
ity of the resource, the reactions are annotated
with text narratives and illustrations, and are
8/10/2019 Using the Reactome Database
15/16
AnalyzingMolecularInteractions
8.7.15
Current Protocols in Bioinformatics Supplement 7
organized into a series of discrete goal-driven
pathways.
Reactome is related to several other path-
way databases, but has distinct methodolo-
gies and aims. The Human Protein Reference
Database (HPRD; Peri et al., 2003) is also
a hand-curated database of biological path-
ways. The HPRD focus, however, is to an-
notate individual proteins and their physical
and genetic interactions. HPRD contains in-
formation derived from large-scale screening
studies as well as individual papers that report
pairwise interactions. A result of this method-
ology is that many of the interactions found in
HPRD are speculative and subject to change.
Reactome takes a much more conservative ap-
proach; it represents far fewer molecular in-
teractions than HPRD does, but they are more
likely to be correct and less subject to revision.
HumanCyc (Krieger et al., 2004) is a
database of biological pathways that uses a
data model generally similar to Reactome,
although the user interface and underlyingdatabase technology are quite different in de-
tail. The focus of HumanCyc is intermediate
metabolism, however. It tends to have more
information on the creation and utilization of
small molecules than does Reactome, but less
information on such higher-level processes as
transcription, translation, and the cell cycle.
The Kyoto Encyclopedia of Genes and
Genomes, or KEGG (Kanehisa et al., 2004)
features an extensive set of biological path-
way charts. Like HumanCyc, KEGG focuses
on intermediatemetabolismrather thanhigher-
level pathways. Its data model differs funda-mentally from Reactomes by representing the
motivating force of all reactions in the form
of catalyst activities via Enzyme Commission
EC numbers. Because there is not a one-to-one
mapping between EC activity and polypeptide,
it can be problematic to relate a protein repre-
sented in SwissProt to a reaction represented
in KEGG.
Finally, the BioCarta project (http://www.
biocarta.com) represents human biology as
a series of colorful high-resolution diagrams.
Unlike Reactome or the other projects men-
tioned earlier, these diagrams are the end prod-uct of the project; there is no underlying
database. The focus of BioCarta is to be an
education and visualization tool, rather than to
support data mining and pattern discovery.
The Reactome database is far from com-
plete. At the time this module was written, Re-
actome covered just 8% of the human genome,
a number conservatively estimated by divid-
ing the number of human SwissProt entries
that take part in Reactome reactions by the
total number of human entries in the entire
SwissProt database. Because all of the other
pathway databases mentioned here are also in-
complete, the biologist faces the daunting task
of visiting each of these sites in an attempt
to fill in the holes in one databases coverage
with information from the others. The BioPAX
project (http://www.biopax.org) promises to
improve this situation by creating a standard-
ized file format for representing biological
pathways and reactions. Reactome and many
of the other pathway databases have commit-
ted to exporting their data in BioPAX format.
In the future, this will enable the databases
to exchange pathways and to co-curate data,
thereby accelerating the rate in which the gaps
are closed.
Reactomeis a fully open-source project.All
the software developed for use in Reactome
is available for download and redistribution,
and the data itself is available in a variety of
formats. The Download link on theReactome
Web site provides instructions for obtaining
data and software.
The Reactome dataset is available as
relational database tables in a format com-
patible with MySQL (http//www.mysql.com;
UNIT 9.2) and as files compatible with
the Protege-2000 knowledgebase editor
(http://protege.stanford.edu) and will soon be
available as tab-delimited textfiles.
Literature CitedJoshi-Tope, G., Vastrik, I., Gopinath, G.R.,
Matthews, L., Schmidt, E., Gillespie, M.,DEustachio, P., Jassal, B., Lewis, S., Wu, G.,Birney, E., and Stein, L. 2003. The GenomeKnowledgebase: A Resource for Biologists andBioinformaticists. Cold Spring Harbor Sym-posia on Quantitative Biology LXVIII:237-244.Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y.
Kanehisa, M., Goto, S., Kawashima, S., Okuno,Y., and Hattori, M. 2004. The KEGG resourcefor deciphering the genome.Nucleic Acids Res.32:D277-D280.
Krieger, C.J., Zhang, P., Mueller, L.A., Wang, A.,Paley, S., Arnaud, M., Pick, J., Rhee, S.Y., andKarp, P.D. 2004. MetaCyc: A multiorganismdatabase of metabolic pathways and enzymes.
Nucleic Acids Res.32:D438-D442.
Peri, S., Navarro, J.D., Amanchy, R., Kristiansen,T.Z., Jonnalagadda, C.K., Surendranath, V.,Niranjan, V., Muthusamy, B., Gandhi, T.K.,Gronborg, M., Ibarrola, N., Deshpande, N.,Shanker, K., Shivashankar, H.N., Rashmi, B.P.,Ramya, M.A., Zhao, Z., Chandrika, K.N.,Padma, N., Harsha, H.C., Yatish, A.J., Kavitha,M.P., Menezes, M., Choudhury, D.R., Suresh,S., Ghosh, N., Saravana, R., Chandran, S.,
8/10/2019 Using the Reactome Database
16/16