Upload
ciara-stiverson
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
1
Annotation for Gene Expression Analysis with Reactome.db Package
Utah State University – Spring 2012
STAT 6570: Statistical Bioinformatics
Cody Tramp
2
References
Ligtenberg W. 2011. Reactome.db: How to use the reactome.db package.
www.reactome.org
3
Reactome.db Overview
“Open souce, open access, manually curated, and peer-reviewed pathway database” – www.reactome.org
Reactome.db is an R interface that allows queries to the SQL database containing pathway information
Contains functions for converting between annotation IDs and names for GO, Entrez, and Reactome
4
Getting Help on Specific Reactome.db Functions
#Load the Reactome.db packagelibrary(reactome.db)
#Check for main manual pages?reactome.db #This won't get the actual manual
#List all reactome.db objectsls("package:reactome.db")
# [1] "reactome“ "reactome_dbconn“ "reactome_dbfile" # [4] "reactome_dbInfo“ "reactome_dbschema“ "reactomeEXTID2PATHID" # [7] "reactomeGO2REACTOMEID“ "reactomeMAPCOUNTS“ "reactomePATHID2EXTID" #[10] "reactomePATHID2NAME“ "reactomePATHNAME2ID“ "reactomeREACTOMEID2GO"
#Look up specific manual for an object?reactome_dbInfo #Still not very useful – poor documentation
5
How IDs and names are stored in Reactome.db The reactome.db links to a SQL database Functions are interfaces to the database SQL databases are relational databases
(think of Excel spreedsheets, but better) Data is stored as key:value pairs
Key Value15869 Homo sapiens: Metabolism of nucleotides68616 Homo sapiens: Assembly of the ORC complex at the origin of replication68827 Homo sapiens: CDC6 association with the ORC:origin complex68867 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex68874 Homo sapiens: Assembly of the pre-replicative complex
6
Reactome.db Function Uses(NOTE: all return a key:value list)
Converting Between Entrez and ReactomereactomeEXTID2PATHID = Entrez ID to Reactome.db IDreactomePATHID2EXTID = Reactome.db Name to Entrez ID
> xx <- toTable(reactomeEXTID2PATHID)> head(xx) reactome_id gene_id1 168253 108982 168254 108983 168253 81064 168254 81065 168253 56106 168254 5610
Use toTable() instead of as.list() that is shown in manuals
7
Reactome.db Function Uses(NOTE: all return a key:value list)
Converting from GO ID and Reactome IDreactomeREACTOMEID2GO = Reactome.db ID to GO IDsreactomeGO2REACTOMEID = GO ID to Reactome.db ID
> xx <- toTable(reactomeGO2REACTOMEID)> head(xx) reactome_id go_id1 168276 GO:00190542 168276 GO:00190483 168276 GO:00440684 168276 GO:00224155 168276 GO:00517016 168276 GO:0044003
8
Reactome.db Function Uses(NOTE: all return a key:value list)
Retrieving Pathway Names from Reactome IDSreactomePATHNAME2ID = Reactome.db Name to Reactome.db IDreactomePATHID2NAME = Reactome.db ID to Reactome.db Name
> xx <- toTable(reactomePATHID2NAME)> head(xx) reactome_id path_name1 15869 Homo sapiens: Metabolism of nucleotides2 68616 Homo sapiens: Assembly of the ORC complex at the origin of replication3 68689 Homo sapiens: CDC6 association with the ORC:origin complex4 68827 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex5 68867 Homo sapiens: Assembly of the pre-replicative complex6 68874 Homo sapiens: M/G1 Transition
9
Reactome.db Function Uses(NOTE: all return a key:value list)
reactomeMAPCOUNTS = shows number of rows in each function’s relational database (not very useful unless error checking)
> xx <- as.list(reactomeMAPCOUNTS)> xx$reactomeEXTID2PATHID[1] 28363
$reactomeGO2REACTOMEID[1] 3217
$reactomePATHID2EXTID[1] 8320
$reactomePATHID2NAME[1] 13778
$reactomePATHNAME2ID[1] 13876
$reactomeREACTOMEID2GO[1] 47575
10
Ex: Find apoptosis induction-related ID(compare to Notes 6.1 slide 10)# Get data.frame summarizing all reactome.db pathways including a certain string
xx <- toTable(reactomePATHNAME2ID)all.pathways <- xx$path_name # get name of each reactome.db pathwayt <- grep('apoptosis',all.Terms) # get index where Term includes #use agrep() for approximate term searching
reactome.Term <- unlist(all.pathways[t])reactome.IDs <- unlist(xx$reactome_id[t])
reactome.frame <- data.frame(reactome.ID=reactome.IDs, reactome.Term=reactome.Term)
rownames(reactome.frame) <- 1:length(reactome.ID)reactome.frame # 13 terms
11
Ex: Find apoptosis induction-related ID(compare to Notes 6.1 slide 10)
12
Ex. Pathway Term Search Function##Define Function to search for pathways with given key word##agrep.bool is indicator to use agrep (TRUE) or grep (FALSE)searchPathways2REACTOMEID <- function(term, agrep.bool) { xx <- toTable(reactomePATHNAME2ID) all.pathways <- xx$path_name # get name of each reactome.db pathway #get index where Term is found if (agrep.bool==FALSE) (t <- grep(term, all.pathways)) else (t <- agrep(term, all.pathways)) unlist(xx$reactome_id[t]) }
apop.IDs <- searchPathways2REACTOMEID("apoptosis", FALSE)length(apop.IDs) #13 pathways matched
apop.IDs <- searchPathways2REACTOMEID("apoptosis", TRUE)length(apop.IDs) #85 pathways matched
13
Getting GO Terms from single Reactome ID##Get List of GO Terms from Reactome IDxx <- toTable(reactomeGO2REACTOMEID)t <- xx$reactome_id == "15869"GOTerms <- xx$go_id[t]
> GOTerms [1] "GO:0055086" "GO:0006139" "GO:0044281" [4] "GO:0034641" "GO:0044238" "GO:0008152" [7] "GO:0006807" "GO:0044237" "GO:0008150"[10] "GO:0009987"
> xx <- toTable(reactomeGO2REACTOMEID)> head(xx) reactome_id go_id1 168276 GO:00190542 168276 GO:00190483 168276 GO:00440684 168276 GO:00224155 168276 GO:00517016 168276 GO:0044003
14
Getting GO Terms from list of Reactome IDs##Define Function to get all GO Terms for all Reactome IDs in a listgetGOTerms <- function(list_reactome) { listGO = list(); xx <- toTable(reactomeGO2REACTOMEID); for(i in 1:length(list_reactome)) {t <- xx$reactome_id==list_reactome[i]; temp_list = xx$go_id[t] listGO = c(listGO, temp_list)} unlist(listGO) }
GOTerms.all <- getGOTerms(apop.IDs)#From slide 10length(GOTerms.all) #136 GO Terms from 13 apop.IDs
Should have yielded 169 terms (Notes 4.1 slide 10) – reactome.db might not be complete
15
Reactome.org Online Tools
16
Pathway Viewer on reactome.org
http://www.reactome.org/userguide/Usersguide.html#Introduction
17
Pathway Viewer on reactome.org Details Panel
18
Pathway Viewer on reactome.org
http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142
19
Reactome Pathway SymbolsUpregulation andparticipating proteins
Inhibition
http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142
20
Reactome Database Assignment Method Genes seem to be assigned to pathways in a
similar manner to GO database If gene is up-regulated, it is included Genes that are down-regulated in a condition are
NOT mapped to the condition/pathway
Haven’t received official response from reactome.org, but from general browsing this seems to be the case
21
Pathway Analysis Tool
http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage
22
Pathway Analysis Tool
http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage
23
Expression Set Data Analysis
24
Expression Set Data Analysis
25
Summary Reactome.db provides an interface to the
SQL database containing IDs Functions for converting between ID types No functionality for gene testing through R
Online tools include pathway maps and ID lookup tables
Some limited expression testing (with unknown statistical methods)
26
Questions?