1
Developing Semantic Pathway Alignment Algorithms for Systems Biology
Jonas Gamalielsson2006-09-06
2
higher order networks,revealing functional modules
Functional modules, which are hierarchically clustered
Systems Biology Stydy behaviour of complex biological systems Consider interaction of all cellular/molecular parts Often studied over time Goal: develop models for system understanding Powerful computational tools are required Organisation: "Life's complexity pyramid" (Oltvai & Barabasi, 2002)
genes, mRNA, proteins & metabolites
regulatory motifs & metabolic pathways
3
Thesis aim
To develop semantic pathway alignment algorithms for systems biology
Three related algorithms GOTEM (GO-based regulatory TEMplates) GOSAP (GO-based Semantic Alignment of biological
Pathways) EGOSAP (Evolutionary GO-based Semantic
Alignment of biological Pathways)
4
Gene Ontology (GO)Gene_Ontology
molecular_function biological_process cellular_component
catalytic activity transporter activity cellular process development cell extracellular
Function sub-graph Process sub-graph Component sub-graph
G1 G2
G3G4G1
G2
G3G4
G1
G2
G3
G4
Gx = gene product
5
Semantic similarity
ms(A,B)=D pms(A,B)=pD=0.10 SS(A,B)=-log2(0.10)=3.32
D
B C
A
p=0.10
p=0.03p=0.07
p=0.01Note: all nodes not shown in graph
is-a
is-a
is-a
EXAMPLE: Resnik (1995)
6
GOTEM: background 1(2) Highly desirable to derive gene regulatory
networks using gene expression data Reverse engineering (RE) algorithms derive a
model (set of rules) that fits the data Examples; boolean networks, neural networks,
Bayesian networks Limitations of RE algorithms
Many derived model networks can fit the same data Few derived networks are actually biologically feasible RE algorithms do not distinguish between biologically
plausible and implausible networks. Reduce search space?
7
GOTEM: background 2(2)
We propose GOTEM; GO-based regulatory TEMplates [1,2]
Contribution Means to distinguish between plausible and implausible
networks GOTEM generalises knowledge about gene products
using the molecular function part of Gene Ontology Binary semantic templates encoding general knowledge
of regulation are derived from documented pathways and used to assess the biological plausibility of regulatory hypotheses
[1] Gamalielsson, J., Olsson, B., Nilsson, P. (2005). A Gene Ontology based Method for Assessing the Biological Plausibility of Regulatory Hypotheses. Technical report, HS-IKI-TR-05-004, University of Skövde, Sweden[2] Gamalielsson, J., Nilsson, P., Olsson, B. (2006). A GO-based Method for Assessing the Biological Plausibility of Regulatory Hypotheses. In proceedings of the 2nd International Workshop on Bioinformatics Research and Applications (IWBRA 2006), Reading, Great Britain (May 2006)
8
GOTEM
Annotation databases
Templates
GO term probability calculation
Binary relations
Extract binary pathway relations
Template generation
Model pathway databases
Hypothesis assessment
Enriched GO graph
Method/algorithm
Data/information
GO
Regulatory hypotheses
Scored & ranked hypotheses
9
GOTEM: exampleRAD24 [act] MEC3SWI4 [expr] CLN1SWI4 [expr] CLN2SWI6 [expr] CLN1SWI6 [expr] CLN2CLN1 [phos] SIC1CLN2 [phos] SIC1CDC28 [phos] SIC1..
T1: GO:0003689 [act] GO:0003677T2: GO:0003689 [act] GO:0003676T3: GO:0003689 [act] GO:0005488T4: GO:0003689 [act] GO:0003674T5: GO:0003677 [act] GO:0003677T6: GO:0003677 [act] GO:0003676T7: GO:0003677 [act] GO:0005488..
GO-score(Tx)=-log2((p(GOIDLHS)+p(GOIDRHS))/2)
RAD24 [?] MEC3CLN1 [?] SWI4MBP1 [?] CLN2...
TM1: GO:0003689 [act] GO:0003677(GO-score=6.80)..
TM1: GO:0003674 [exp] GO:0003674(GO-score=0)..
Generation
Assessment
TM1: GO:0003700 [exp] GO:0016538(GO-score=5.88)..
10
GOTEM: results
Test Templates created from KEGG S. cerevisiae cell cycle Reverse engineered hypotheses from microarray
gene expression data Assess how well templates can separate true positive
interactions from false positive ones Results
Method can filter out a large proportion of implausible hypotheses
Hence, improves specificity of network reconstruction
11
GOSAP: background 1(2)
Large base of biological pathways Need for pathway analysis methods:
Inter-species comparisons Intra-species comparisons Assess hypothetical pathways
Limitations of related work Previous efforts on metabolic pathways Little work on approximate matching by semantic
similarity EC hierarchy used before, which only covers the
molecular function of enzymes
12
GOSAP: background 2(2)
We propose GOSAP; GO-based Semantic Alignment of
biological Pathways [3,4] Contribution
GO has not been used before for semantic pathway alignment
GOSAP generalises about any kind of gene product using GO, not only enzymes
Richer semantic description of gene products by combining function-, process- and component ontologies of GO in similarity calculations
[3] Gamalielsson, J., Olsson, B. (2005). GOSAP: Gene Ontology Based Semantic Alignment of Biological Pathways.Technical report, HS-IKI-TR-05-005, University of Skövde, Sweden[4] Gamalielsson, J., Olsson, B. (200x). GOSAP: GO-based Semantic Alignment of Biological Pathways. Manuscript in preparation.
13
GOSAPOrganism annotation databases
GO term probability calculation
Model paths
Extraction of super-paths
Model pathway database
Path alignment
Enriched GO graph
Procedure/algorithm
Data/information
GO graph
Query paths
Scored & ranked path alignments
Query pathway database
Parameter settings
14
GOSAP: examplePath extraction
Path alignment
e.g.1. SWI4 [e]>CLN12. SWI4 [e]>CLN2[p]>SIC13. MBP1[e]>CLB5[p]>CDC6..
Only super-paths, extracted by depth-first based algorithm.
1. SWI4[?]>CLN2
2. MBP1[?]>CLN1[?]>CDC6
1. SWI4 [e]>CLN1
2. SWI4 [e]>CLN2[p]>SIC1
3. MBP1[e]>CLB5[p]>CDC6
Query paths Model pathsalign
Example alignment
Q: FAR1 ?> SIC1 (GAP) CLN2 ?> SIC1M: FAR1 i> CLN1 p> SWI6 e> CLN2 p> SIC1F: GO:0004861 > GO:0019207 (GAP) GO:0016538 > GO:0019210P: GO:0007050|GO:0045786 > GO:0000079 (GAP) GO:0000320|GO:0000321 > GO:0000079C: GO:0005634 > GO:0005634 (GAP) GO:0005634 > GO:0005634
15
GOSAP: results Test
Model pathways: KEGG S. cerevisiae cell cycle, metabolic pathways Query pathways: Reverse engineered (RE) regulatory pathways,
KEGG MAPK, metabolic pathways Assess if GOSAP can find significant alignments of biological
interest Results
Method is able to detect significant alignments between RE paths and model paths
and between different metabolic pathways suggest missing gene products in query paths
Combined ontologies resulted in significant alignments when molecular function alone did not
16
EGOSAP: background 1(2) Large base of biological pathways and microarray
gene expression data Sometimes only hypothetical sets of gene
products are known Highly desirable derive interactions between gene
products Limitations of related work
Previous efforts merely map genes onto known pathways by identity
No work on approximate matching by semantic similarity Related methods do not attempt to assemble
hypothetical paths using a query set of gene products
17
EGOSAP: background 2(2) We propose
EGOSAP; Evolutionary GO-based Semantic Alignment of biological Pathways [5]
Contribution GO has not been used before for semantic pathway
alignment GOSAP generalises about any kind of gene product
using GO, not only enzymes Richer semantic description of gene products by
combining function-, process- and component ontologies of GO in similarity calculations
Hypothetical paths are assembled using an evolutionary algorithm and a query set of gene products
[5] Gamalielsson, J., Corne, D. W., Olsson, B. (200x). EGOSAP: Evolutionary Gene Ontology Based Semantic Alignment of Biological Pathways. Manuscript in preparation.
18
EGOSAPOrganism annotation databases
GO term probability calculation
Model paths
Path extraction
Model pathway database
Evolution of path alignments
Enriched GO graph
Procedure/algorithm
Data/information
GO graph
Query set of gene products
Path alignments
Parameter settings
19
EGOSAP: example Evolutionary algorithmt0initialise P(t)evaluate P(t)while(not term-cond) dobegin tt+1 select P(t) from P(t-1) alter P(t) evaluate P(t) apply elitism to P(t)end
P(t): a set of gene product permutations initialised from query alphabet
evaluate: Calculate fitness, i.e. semantic similarity score btw model path and each evolved path in P(t).
select: tournament selection
alter: partially mapped crossover, mutation
Example alignment (fitness=0.73, p=0.01):
Query (mouse): MEF2C > NR2F6 > NRBF2 > AFG3L2 > TRIM28Model (yeast): SWI6 > SWI4 > NDD1 > ACE2 > SFL1Function: GO:0003713 > GO:0003700 > GO:0016563 > GO:0008237 > GO:0016564Process: GO:0006366 > GO:0007049 > GO:0006357 > GO:0006508 > GO:0000122Component: GO:0005634 > GO:0005634 > GO:0005634 > GO:0016021 > GO:0005694
20
EGOSAP: results Test
Model pathway: S. cerevisiae regulatory chain motifs Query set: Differentially expressed genes for
transgenic and knock-out mice Assess if EGOSAP can evolve significant alignments
(of biological interest) Results
Method is able to detect significant alignments between evolved paths and model paths.
Like for GOSAP, combined ontologies resulted in significant alignments when molecular function alone did not
21
Conclusions
Three methods for semantic analysis of biological pathways are developed
Methods assess biological plausibility of derived pathways compare different pathways for semantic similarities evolve hypothetical pathways similar to model pathways
Methods are novel Methods are believed to be useful to biologists
22
Write-up schedule
September 2006 Thesis contributions, thesis skeleton, set of chapters,
draft of GOSAP paper October 2006
Submission GOSAP paper, redrafts of earlier material, set of new chapters, draft of EGOSAP paper
November 2006 Submission EGOSAP paper, nearly complete thesis draft
December 2006 - February 2007 Continual refinement of thesis
March 2007 Submission of thesis