Upload
michel-dumontier
View
1.381
Download
0
Tags:
Embed Size (px)
DESCRIPTION
valuating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a system for hypothesis formulation and evaluation. HyQue uses domain-specific rulesets to evaluate hypotheses based on well understood scientific principles. However, because scientists may apply differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rulesets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rulesets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation as Linked Data, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses concerning the yeast galactosegene system.
Citation preview
1
Evaluating scientific hypotheses using the SPARQL Inferencing Notation
Alison Callahan and Michel Dumontier
Department of Biology, Carleton University
ESWC2012::HyQue-SPIN
ESWC2012::HyQue-SPIN2
ESWC2012::HyQue-SPIN3
Uncovering all the evidence to support/refute a hypothesis is becoming increasingly difficult
and requires a lot of digging around
Continuous growth in research outputs
ESWC2012::HyQue-SPIN5
Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html
http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=151
Semantic Web technologies for biological knowledge management and discovery
• Capability to publish, link, retrieve and query de-centralized data
• A powerful integrative platform across data, ontology and services
• Formal knowledge representation allows for automated reasoning
• Massive growth in dataset availability, and soon, in application development
7
A rapidly growing web of linked data
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Bio2RDF covers the major biological databases
BioPortal gives up-to-date access to bio-ontologies
Mark Wilkinson, UBCMichel Dumontier, Carleton University
Christopher Baker, UNB
The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web Services using OWL classes as service inputs and outputs
http://sadiframework.org
~700 bioinformatic services as of May 29, 2012
SADI provides access to Semantic Web Services
HyQue
HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery• Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to
facts, expert knowledge and web services• Conforms to a simplified event-based model • Supports evaluation against positive and negative findings• Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing
– trace a hypothesis to its evaluation, including the data and rules used
ESWC2012::HyQue-SPIN11Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
HyQue
• Background knowledge as OWL ontologies
hypotheses (HO), processes/events (GO), measurement values (SIO), units (UO), evidence (ECO), molecules (ChEBI), biopolymers (SO), etc
• Facts as RDF data
model organism data - genes and their chromosomal location, proteins and their functions, localization and participation in interactions, complexes, pathways, biological processes, etc
• Evaluation rules defined using SPINDomain-specific rules - scores based on external knowledge
System rules - scores based on hypothesis structure
Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.
HyQue Architecture
A HyQue hypothesis is a collection of propositions
• proposition: “a statement expressing something true or false” • HyQue propositions specify events• complex propositions can be formulated using logical operators
(AND, OR, XOR…) or decomposed using component relations
HyQue hypothesis ≡ ‘proposition’ that ‘specifies’ only `event’)
HyQue hypothesis ≡ ‘proposition’ that `has component part’ only (`proposition’ that ‘specifies’ only `event’)
Event-based data model
HyQue events denote a phenomenon involving two objects: ‘agent’ and ‘target’ . In addition, we can specify the context of this event (e.g. located in nucleus, or under some genetic background)
Event ‘has agent’ agent ‘has target’ target ‘is located in’ location ‘is negated’ boolean
ESWC2012::HyQue-SPIN15
Currently supported events
1. protein-protein binding2. protein-nucleic acid binding3. molecular activation 4. molecular inhibition5. gene induction6. gene repression7. transport
16
Example Hypothesis
• HyQue’s demonstrative knowledge base is focused on galactose metabolism and regulation.
The paper describes a union of hypotheses:(Gal4p induces the expression of GAL1 AND
Gal4p induces the expression of GAL7 AND
Gal3p induces the expression of GAL2)
OR
(Gal4p induces the expression of GAL7 AND
Gal80p induces the expression of GAL7 AND
Gal80p does not inhibit the activity of Gal4p
WHEN GAL3 is over-expressed)
ESWC2012::HyQue-SPIN17
User Interface with auto-completionhttp://hyque.semanticscience.org
Users don’t need to know RDF to formulate hypotheses
Hypothesis RDF Representation
event
hypothesis
proposition
has component part
specifies
:h rdf:type hyque:Hypothesis ;
hyque:has-component-part :p1 .
:p1 rdf:type hyque:Proposition ;
hyque:specifies :e1
:e1 rdf:type hyque:Event .
Event RDF representation
ESWC2012::HyQue-SPIN19
event:gal4p positively regulates the expression of GAL1
:e1 rdf:type hyque:event ;
<!– positive regulation of gene expression -->
rdf:type <http://bio2rdf.org/go:0010628>;
hyque:agent <http://bio2rdf.org/sgd:Gal4p> ;
hyque:target <http://bio2rdf.org/sgd:GAL1> ;
hyque:is_negated "0";
….
ESWC2012::HyQue-SPIN20
event:gal4p positively regulates the
expression of GAL1
HyQue’s SPIN rules retrieve event data, and then score it and the overall hypothesis
HyQue current contains 63 SPIN rules to evaluate hypotheses: 18 system rules, 45 domain specific rules
ESWC2012::HyQue-SPIN21
Combination of system and domain rules to retrieve and score data, and add new triples
:e1 a go:0010628;hyque:agent sgd:Gal4p;hyque:target sgd:GAL1 .hyque:is_negated "0" ;
Event - induction SPIN induction rule
SPIN System Rule : Link Hypothesis to Evaluation
CONSTRUCT { ?this ‘has attribute’ ?hypothesisEval . ?hypothesisEval a ‘evaluation’. ?hypothesisEval ‘obtained from’ ?propositionEval . ?hypothesisEval ‘has value ?hypothesisEvalScore . } WHERE { ?this ‘has component part’ ?proposition . ?proposition ‘has attribute’ ?propositionEval . BIND(:calculateHypothesisScore(?this) AS ?hypothesisEvalScore) . BIND(IRI(fn:concat(afn:namespace(?this),
afn:localname(?this),"_", "evaluation")) AS ?hypothesisEval) . }
ESWC2012::HyQue-SPIN22
SPIN Domain Rule: Score experimental evidence of Gene Expression Induction Event
SELECT ?induceEventScoreWHERE { BIND (:calculateInduceAgentTypeScore(?arg1) AS ?agentTypeScore) . BIND (:calculateInduceAgentFunctionTypeScore(?arg1) AS
?agentFunctionTypeScore) . BIND (:calculateInduceTargetTypeScore(?arg1) AS ?targetTypeScore) . BIND (:calculateInduceLogicalOperatorScore(?arg1) AS ?logicalOperatorScore) . BIND (:calculateInduceEventLocationScore(?arg1) AS ?eventLocationScore) . BIND (:penalizeNegation(?arg1) AS ?negationScore) . BIND (5 AS ?maxScore) . BIND (((((((?agentTypeScore + ?agentFunctionTypeScore) + ?targetTypeScore) + ?logicalOperatorScore) + ?eventLocationScore) + ?negationScore) / ?maxScore) AS ?induceEventScore) .}
ESWC2012::HyQue-SPIN24
HyQue domain rules CALCULATE a quantitative measure of evidence for an event
‘induce’ rule (maximum score: 5):– Is event negated?
• If yes, subtract 2
– Is event of type ‘induce’?• If yes, add 1; if no, subtract 1
– Is agent of type ‘protein’ or ‘RNA’?• If yes, add 1; if of type ‘gene’, subtract 1
– Is target of type ‘gene’? • If yes, add 1; if no, subtract 1
– Does agent have known ‘transcription factor activity’? • If yes, add 1
– Is event located in the ‘nucleus’?• If yes, add 1; if no, subtract 1
GO:0010628
CHEBI:36080
SO:0000236
GO:0003700
GO:0005634
SPIN rule, outcome and score for a GAL gene induction event
ESWC2012::HyQue-SPIN26
4/5 = 0.80
Can customize rules to get more evidence, but at a cost if not found
• calculateInhibitEventScore does not take into account the physical location of the event participants
• Experimental evidence suggests that physical location in the context of an inhibition event is important
• Inhibition of Gal4p activity by Gal80p is known to take place in the nucleus, yet this inhibition is interrupted when Gal80p is bound by Gal3p, which is typically found in the cytoplasm
Adding a new rule to consider location weakens the event due to lack of data (0.87 -> score 0.78)
ESWC2012::HyQue-SPIN27
(Gal4p induces the expression of GAL1 e1 AND Gal3p induces the expression of GAL2 e2 AND
Gal4p induces the expression of GAL7)OR
(Gal4p induces the expression of GAL7 AND Gal80p induces the expression of GAL7 AND Gal80p does not inhibit the activity of Gal4p
WHEN GAL3 is over-expressed)
Customization of rules and rulesets can generate different evidence-based evaluations
Reproducible eScience LOD for Hypothesis, Rules, Data and Evaluation
Summary
• HyQue is a system that facilitates the formulation and evaluation of scientific hypotheses against formalized knowledge on the Semantic Web.
• This work focused on the development and incorporation of recursive SPIN rules to obtain and score events and multi-event hypotheses using OWL ontologies and RDF-based LOD.
ESWC2012::HyQue-SPIN30
Future Directions
• Collaborative, end user-centered environment to engineer, share, compare and evaluate hypotheses
• Investigate alternative scoring systems• Structure knowledge beyond the GAL
network– EU/US Collaborations on disease-centered
research hypotheses– Applications for clinical decision support
ESWC2012::HyQue-SPIN31
ESWC2012::HyQue-SPIN
Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier