4/1/2008 OWL-ED 2008, Gaithersburg, MD1
OWL: PAX of Mindor
the AX? Experiences of Using OWL in the
Development of BioPAX
Joanne Luciano1 & Robert Stevens2
1Harvard Medical School, 2Manchester University
OWL-ED DC, April 1-2, 2008
Gaithersburg, MD, USA
4/1/2008 OWL-ED 2008, Gaithersburg, MD2
BioPAXThe Vision
Integrate biological processes data of different types
The Reality- An abstraction of the different types of processes that enable them to co-exist- A controlled vocabulary for them
What went wrong?- No real interest of using OWL features or reasoners- No real examples of why using them would be of any value
4/1/2008 OWL-ED 2008, Gaithersburg, MD3
The domain: Biological pathways
MetabolicPathways
MolecularInteractionNetworks
SignalingPathways
Main categories:
4/1/2008 OWL-ED 2008, Gaithersburg, MD4
A few technical factors
• the complexity of the language and its syntax• the open world assumption was foreign to people • the logical framework was unfamiliar• the steep learning curve• the lack of tutorials and examples• the lack of tools of any quality• the general lack of experience (new language)• the BioPAX community did not have a coherent set
of requirements
4/1/2008 OWL-ED 2008, Gaithersburg, MD5
A few social factors
• there was disagreement (two camps, OWL and XML Schema)
• OWL was not seen as necessary by all members • OWL of the community and it required
considerably more work• there were existing known methods• mentality: do enough to do the job at hand done• human nature: to resist the new or unknown
4/1/2008 OWL-ED 2008, Gaithersburg, MD6
Why bother?
• Much basic scientific research produces pathway data– environmental research, energy research, genetic and clinical
research, and virtually all of life science research today– At some point, the question is asked “What pathways are
involved?"
• Therefore, it is important to provide a mechanism for access and reuse to these data– enable it to have broad impact for science
• The major problem for researchers who use pathway databases has been that the representations of pathway data within these resources are not consistent or interchangeable
4/1/2008 OWL-ED 2008, Gaithersburg, MD7
At the conceptual level
• In signaling pathways, it is the activation or inhibition of a process (apoptosis)
• in metabolic pathways, a series of chemical reactions transform a chemical molecule (glucose → pyruvate)
4/1/2008 OWL-ED 2008, Gaithersburg, MD8
At the syntax level• HumanCyc’s term: D-glucose-6-phosphate
• KEGG’s term: D-Glucose-6P
It is clear we are referring to the same molecule, i.e. the same real world class of instances,
The vocabulary label used to name these instances differs and while this difference is insignificant for a human reader, it is significant for computational processing.
4/1/2008 OWL-ED 2008, Gaithersburg, MD9
Reasons for choosing OWL-DL
• OWL’s Expressivity
• Future uses: Enable reasoning
4/1/2008 OWL-ED 2008, Gaithersburg, MD10
Reasons for choosing OWL-DL
• Future uses: Enable reasoning
(whatever that means)– Future (not us, not now)– Reasoning
• by Choosing OWL-DL it would it would “be enabled”– (didn’t really think much about this)
4/1/2008 OWL-ED 2008, Gaithersburg, MD11
Mistakes in using OWL(nothing new here)
• Bad Conceptualizations– confusion about what was being represented
• biological processes or database records of biological processes
– Utility class, a concept used in Java, not in biology, was a
• Poor Understanding of OWL– Assumed axioms were disjoint– Domain and range– Open world assumptions and implications– Semantics in comments rather than in the ontology– What was said in the ontology, was not what was meant
• OWL as an EXPORT File Format• For more details see Luciano and Stevens (2007)
4/1/2008 OWL-ED 2008, Gaithersburg, MD12
Social Factors
• BioPathways Consortium enabled Chris Sander by obtaining a commitment for funding by Dept of Energy
• Chris in turn funded 2 people to get the initiative organized (The DEF Group),
• An initial group of stakeholders decided to organize a “core group” for decisions and hold meetings “by-invitation-only”
4/1/2008 OWL-ED 2008, Gaithersburg, MD13
Social Factors
• Ignorance – of the task– of how to achieve it
• Internal biases…– Which tool
• first there were none• then there were (promises)• then there were none
– XML-Schema vs-OWL
4/1/2008 OWL-ED 2008, Gaithersburg, MD14
Social Factors
• Understanding of OWL increased– Knowledge (papers, tutorials) became
increasingly available– Tools for OWL become available
• Discovery of mistakes made• However, remember “future”
– Pressure to release
• Breakdown – in-fighting, undermining, ugliness (more mistakes!)
4/1/2008 OWL-ED 2008, Gaithersburg, MD15
What Went Right
• Helped the Semantic Web community spread the word about OWL by having a user to point to
• Community outreach helped BioPAX adoption• BioPAX brought the wider community together • Created the higher level abstraction that
included generalize concepts common to the different “pathway” conceptualizations– Upper level ontology for pathways
4/1/2008 OWL-ED 2008, Gaithersburg, MD16
Looking ahead
Current Starting Point
• multiple OWL syntaxes• multiple tools• support materials• methodologies for development
4/1/2008 OWL-ED 2008, Gaithersburg, MD17
Looking ahead• NEED tools to support developers:
– analyze the semantic complexity needed to support use cases
– facilitate development in a staged process with increasing complexity at each stage
– support basic requirements first, controlled vocabularies, taxonomies, (XML data exchange) then interoperability (SBML/BioPAX)
– then support richer semantics enabling integration, inference, and possibly integrated or in-line rules
4/1/2008 OWL-ED 2008, Gaithersburg, MD18
Conclusions and Lessons Learnt OUCH, that hurt, don’t do that again
• Assess– complexity of the use cases– needs of the community– capability of the language (and its limitations)– tools available
• Process– support subsequent levels of complexity on sound
foundation (O??-Foundry) • Evaluate
– correct (specification)– complete/comprehensive (concepts and detail)– utility/effectiveness (use cases)