87
1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Embed Size (px)

Citation preview

Page 1: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

1

Foundations I: Methodologies, Knowledge Representation

Deborah McGuinness and Joanne Luciano

CSCI/ITEC-6962-01

Week 2, September 13, 2010

Page 2: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Review of reading Assignment 1• Ontologies 101, Semantic Web, e-Science,

RDFS, OWL guide

• Any comments, questions?

2

Page 3: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Contents• Review of methodologies

• Elements of KR in semantic web context

• And in e-Science

• Choices of representation, models

• Examples of KR

• Encoding and understanding representations

• Assignment 1

3

Page 4: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

4

Semantic Web Methodology and Technology Development Process

• Establish and improve a well-defined methodology vision for Semantic Technology based application development

• Leverage controlled vocabularies, et c.

Use Case

Small Team, mixed skills

Analysis

Adopt Technology Approach

Leverage Technology

Infrastructure

Rapid Prototype

Open World: Evolve, Iterate,

Redesign, Redeploy

Use Tools

Science/Expert Review & Iteration

Develop model/

ontology

Evaluation

Page 5: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

KR and methodologies

• Procedural Knowledge: Knowledge is encoded in functions/procedures.

This can be viewed as hard coded and less flexible.

E.g.: function Person(X) return boolean is

if (X = ``Socrates'') or (X = ``Hillary'')

then return true else return false;

OR

function Mortal(X) return boolean is return person(X);

• Networks: A compromise between declarative and procedural schemes. Knowledge is represented in a labeled, directed graph whose nodes represent concepts and entities, while its arcs represent relationships between these entities and concepts.

5

Page 6: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

KR and methodologies

• Frames: Much like a semantic network except each node represents prototypical concepts and/or situations. Each node has several property slots whose values may be specified or inherited.

• Logic: A way of declaratively representing knowledge. For example:

– person(Socrates).

– person(Hillary).

– forall X [person(X) ---> mortal(X)]

– DL, FOL, HOL

6

Page 7: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

KR and methodologies

• Decision Trees: Concepts are organized in the form of a tree.

• Statistical Knowledge: The use of certainty factors, Bayesian Networks, Dempster-Shafer Theory, Fuzzy Logics, ..., etc.

• Rules: The use of Production Systems to encode condition-action rules (as in expert systems).

7

Page 8: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

KR and methodologies

• Parallel Distributed processing: The use of connectionist models.

• Subsumption Architectures: Behaviors are encoded (represented) using layers of simple (numeric) finite-state machine elements.

• Hybrid Schemes: Any representation formalism employing a combination of KR schemes.

8

Page 9: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Remember, in science!

• Some of the knowledge is lost when it is placed into any particular representation structure, or may not be reusable (e.g. Frames)

• So, you may ask something that cannot be answered or inferred

• Knowledge evolves, i.e. changes

• Knowledge and understanding is very often context dependent (and discipline, language, and skill-level dependent, and …) 9

Page 10: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

And, if you are used to logic• You are working mostly within the world of

logic, whereas we are trying to represent knowledge with logic and we are usually dealing with tangible objects, such as trees, clouds, rock, storms, etc.

• Because of this, we have to be very careful when translating real things into logical symbols - this can, surprisingly, be a difficult challenge.

• Consider your method of representation (yes, we do want to compute with it) 10

Page 11: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Thus• A person who wants to encode knowledge

needs to decouple the ambiguities of interpretation from the mathematical certainty of (any form of) logic.

• The nature of interpretation is critical in formal knowledge representation and is carefully formalized by KR scientists in order to guarantee that no ambiguity exists in the logical structure of the represented knowledge.

11

Page 12: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Representing Knowledge With Objects

• Take all individuals that we need to keep track of and place them into different buckets based on how similar they are to each other. Each bucket is given a descriptive based on what objects it contains.

• Since the individuals in a given bucket are at least somewhat similar, we can avoid needing to describe every inconsequential detail about each individual. Instead, properties that are common to all individuals in a bucket can just be assigned to the entire bucket at once. Properties are typically either primitive values (such as numbers or text strings) or may be references to other buckets.

12

Page 13: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Representing Knowledge With Objects

• Some buckets will be more similar to each other than others and we can arrange the buckets into a hierarchy based on the similarity.

• If all buckets in a branch in the tree of buckets share a property, the information can be further simplified by assigning the property only to the parent bucket. Other buckets (and individuals) are said to inherit that property.

• Buckets may have different names: e.g. Classes, Frames, or Nodes

• BUT, once we move to (e.g.) DL, not all object rules apply, e.g. cannot override properties

• Multiple inheritance is not always obvious to people13

Page 14: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Re-enter Semantic Web

At its core, the Semantic Web can be thought of as a methodology for linking up pieces of structured and unstructured information into commonly-shared description logics ontologies.

14

Page 15: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

15

Semantic Web Layers

http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/

Page 16: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

16

Elements of KR in Semantic Web• Declarative Knowledge• Statements as triples: {subject-predicate-object}

interferometer is-a optical instrumentFabry-Perot is-a interferometerOptical instrument has focal lengthOptical instrument is-a instrumentInstrument has instrument operating modeInstrument has measured parameterInstrument operating mode has measured parameterNeutralTemperature is-a temperatureTemperature is-a parameter

• A query: select all optical instruments which have operating mode vertical

• An inference: infer operating modes for a Fabry-Perot Interferometer which measures neutral temperature

Page 17: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

17

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance Value

Restrs.

GeneralLogical

constraints

Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

Page 18: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

18

OWL or RDF or OWL 2 RL?

• In representing knowledge you will need to balance expressivity with implementability

• OWL (Lite, DL, Full) 1 or 2?

• RDF and RDFS• Rules, e.g. SWRL or OWL 2 RL

• You will need to consider the sources of your knowledge

• You will need to consider what you want to do with the represented knowledge

Page 19: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

19

The knowledge base• Using, Re-using, Re-purposing, Extending,

Subsetting• Approach:

– Bottom-up (instance level or vocabularies)– Top-down (upper-level or foundational)– Mid-level (use case)

• Coding and testing (understanding)• Using tools (some this class, more over the next two

classes)• Iterating (later)• Maintaining and evolving (curation, preservation)

(later)

Page 20: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

20

‘Collecting’ the ‘data’• Part of the (meta)data information is present in tools ... but

thrown away at output e.g., a business chart can be generated by a tool: it ‘knows’ the structure, the classification, etc. of the chart,but, usually, this information is lost storing it in web data would be easy!

• Semantic Web-aware tools are around (even if you do not know it...), though more would be good: – Photoshop CS stores metadata in RDF in, say, jpg files (using XMP)– RSS 1.0 feeds are generated by (almost) all blogging systems (a

huge amount of RDF data!) • Scraping - different tools, services, etc, come around every

day: – get RDF data associated with images, for example: service to get

RDF from flickr images– service to get RDF from XMP– XSLT scripts to retrieve microformat data from XHTML files– RSS scraping in use in Virtual Observatory projects in Japan– scripts to convert spreadsheets to RDF

• SQL - A huge amount of data in Relational Databases– Although tools exist, it is not feasible to convert that data into

RDF – Instead: SQL ⇋ RDF ‘bridges’ are being developed: a query to RDF

data is transformed into SQL on-the-fly

Page 21: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

21

More Collecting• RDFa (formerly known as RDF/A) extends XHTML by: – extending the link and meta to include child elements

– add metadata to any elements (a bit like the class in microformats, but via dedicated properties)

• It is very similar to microformats, but with more rigor: – it is a general framework (instead of an メagreement モ on the meaning of, say, a class attribute value)

– terminologies can be mixed more easily

• GRDDL - Gleaning Resource Descriptions from Dialects of Languages

• ATOM - XML-based Web content and metadata syndication format (used with RSS)

Page 22: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

22

Foundational OntologiesDomain independent concepts and relations

physical object, process, event,…, participates,…

(Usually) Rigorously definedformal logic, philosophical principles, highly structured

ExamplesDOLCE – Descriptive Onotology for Linguistic and Cognitive

Engineering

SUMO – Suggested Upper Merged Ontology

CYC Upper Level Ontology

BFO – Basic Formal Ontology

GFO – General Formal Ontology (developed by Onto Med)

Page 23: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

23

Foundational Ontologies

PURPOSE: help integrate domain ontologies

Geophysics ontology

Marine ontology

Water ontology

Planetary ontology

Geology ontology

Struc ontology

Rock ontology

“…and then there was one…”

Foundational ontology

Courtesy: Boyan Brodaric

Page 24: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

24

Foundational Ontologies

PURPOSE: help organize domain ontologies

“…a place for everything, and everything in its place…”

Foundational ontology

shale rock formation lithification

Courtesy: Boyan Brodaric

Page 25: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

25

Problem scenario

Little work done on linking foundational ontologies with geoscience ontologies

Such linkage might benefit various scenarios requiring cross-disciplinary knowledge, e.g.:

water budgets: groundwater (geology) and surface water (hydro)

hazards risk: hazard potential (geology, geophysics) and items at threat (infrastructure, people, environment, economic)

health: toxic substances (geochemistry) and people, wildlife

many others…

Courtesy: Boyan Brodaric

Page 26: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

26DOLCE - Descriptive Ontology for Linguistic and Cognitive Engineering

Page 27: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

27

• Physical • Object

• SelfConnectedObject • ContinuousObject • CorpuscularObject • Collection

• Process • Abstract

• SetClass • Relation

• Proposition • Quantity

• Number • PhysicalQuantity

• Attribute

SUMO - Standard Upper Merged Ontology

Page 28: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

28

• http://www.ifomis.org/Research/IFOMISReports/IFOMIS%20Report%2005_2003.pdf

http://www.ifomis.org/Research/IFOMISReports/IFOMIS %20Report%2005_2003. pdf

BFO – Basic Formal Ontology

Snap comes from a snapshot at any given time

Page 29: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

29Span comes from spanning time;sometimes considered a 4D description

Page 30: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

30

Using SNAP/ SPAN

Page 31: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

31

Page 32: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

32

SWEET 2.0 Modular Design

Math, Time, Space

Basic Science

Geoscience Processes

Geophysical Phenomena

Applications

importation

• Supports easy extension by domain specialists

• Organized by subject (theoretical to applied)

• Reorganization of classes, but no significant changes to content

• Importation is unidirectional

Page 33: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

33

SWEET 2.0 Ontologies

Page 34: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

34

Using SWEET

• Plug-in (import) domain detailed modules

• Lots of classes, few relations (properties)

• Version 2.0 is re-usable and extensible

Page 35: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

35

Mix-n-Match

• The hybrid example:

– Collect a lot of different ontologies representing different terms, levels of concepts, etc. into a base form: RDF

Page 36: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

36

Mid-Level: Developing ontologies• Use cases and small team (7-8; 2-3 domain experts,

2 knowledge experts, 1 software engineer, 1 facilitator, 1 scribe)

• Identify classes and properties (leverage controlled vocab.)– Start with narrower terms, generalize when needed or

possible– Adopt a suitable conceptual decomposition (e.g. SWEET) – Import modules when concepts are orthogonal

• Review, vet, publish • Only code them (in RDF or OWL) when needed

(CMAP, …)• Ontologies: small and modular

Page 37: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

37

Use Case example• Plot the neutral temperature from the Millstone-Hill

Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.

• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.

• Objects: – Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Non-vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product

Page 38: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

38

Class and property example• Parameter

– Has coordinates (independent variables)

• Observatory– Operates instruments

• Instrument– Has operating mode

• Instrument operating mode– Has measured parameters

• Date-time interval• Data product

Page 39: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

39

Page 40: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

40

Page 41: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

41

Page 42: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

42

Higher level use case• Find data which represents the state of the

neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity

• Find data which represents the state of the neutral atmosphere above 100km, toward the arctic circle at any time of

high geomagnetic activity

Page 43: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

43

Extending the KR for a purpose

Input

Physical properties: State of neutral atmosphere

Spatial:

• Above 100km

• Toward arctic circle (above 45N)

Conditions:

• High geomagnetic activity

Action: Return Data

Specification needed for query to CEDARWEB

Instrument

Parameter(s)

Operating Mode

Observatory

Date/time

Return-type: data

GeoMagneticActivity has ProxyRepresentation

GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)

Kp is a GeophysicalIndex hasTemporalDomain: “daily”

hasHighThreshold: xsd_number = 8

Date/time when KP => 8

Page 44: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

44

Translating the Use-Case - ctd.

Input

Physical properties: State of neutral atmosphere

Spatial:

Above 100km

Toward arctic circle (above 45N)

Conditions:

High geomagnetic activity

Action: Return Data

Specification needed for query to CEDARWEB

Instrument

Parameter(s)

Operating Mode

Observatory

Date/time

Return-type: data

NeutralAtmosphere is a subRealm of TerrestrialAtmosphere

hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc.

hasSpatialDomain: [0,360],[0,180],[100,150]

hasTemporalDomain:

NeutralTemperature is a Temperature (which) is a Parameter

FabryPerotInterferometer is a Interferometer, (which) is a Optical Instrument (which) is a Instrument

hasFilterCentralWavelength: Wavelength

hasLowerBoundFormationHeight: Height

ArcticCircle is a GeographicRegion

hasLatitudeBoundary:

hasLatitudeUpperBoundary:

GeoMagneticActivity has ProxyRepresentation

GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)

Kp is a GeophysicalIndex hasTemporalDomain: “daily”

hasHighThreshold: xsd_number = 8

Date/time when KP => 8

Page 45: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

45

Knowledge representation - visual

• UML – Universal Modeling Language– Ontology Definition Metamodel/Meta Object

Facility (OMG) for UML– Provides standardized notation

• CMAP Ontology Editor (concept mapping tool from IHMC - http://cmap.ihmc.us/coe )– Drag/drop visual development of classes,

subclass (is-a) and property relationship– Read and writes OWL– Formal convention (OWL/RDF tags, etc.)

• White board, text file

Page 46: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

46

Page 47: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

47

Representing processes

Page 48: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

48

Is OWL/RDF the only option? No…

• SKOS - Simple Knowledge Organization Scheme for Taxonomies http://www.w3.org/2004/02/skos/

• Annotations (RDFa) – for un- or semi-structured information sources http://www.w3.org/TR/xhtml-rdfa-primer/ http://rdfa.info

• Atom (and RSS) – for representing syndication feeds – structured http://tools.ietf.org/html/rfc4287

• More expressive languages IKL, CL, … • Languages aimed at different paradigms – e.g., rule

languages

Page 49: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

49

Query• Querying knowledge representations in OWL and/or RDF

• SPARQL for RDF http://www.sparql.org/ and http://www.w3.org/TR/rdf-sparql-query/

• OWL-QL (for OWL) http://projects.semwebcentral.org/projects/owl-ql/

• XQUERY (for XML)• SeRQL (for SeSAME)• RDFQuery (RDF)• Few as yet for natural language representations

Page 50: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

50

Best practices (some)• Ontologies/ vocabularies must be shared and reused - swoogle.umbc.edu, bioportal, OOR

• Examine ‘core vocabularies’ to start with– SKOS Core: about knowledge systems– Dublin Core: about information resources, digital libraries, with extensions for rights, permissions, digital right management

– FOAF: about people and their organizations – SIOC: about communities– DOAP: on the descriptions of software projects– DOLCE seems the most promising to match science ontologies

• Go “Lite” as much as possible, then increasing logic - balancing expressibility vs. implementability

• Minimal properties to start, add only when needed

Page 51: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Assembling KnowledgeAggregation, Integration,

InferenceCase StudyBioDASH Aggregation

Case StudyFlux Balance Analysis

Case StudyBioPAX Integration

“When it comes to data cleaning, there’s no such thing as a free lunch.” Tim Berners-Lee

Some tasks are specific to a use case, some are common to more than one and there’s no escaping others.

Page 52: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

The Siderean Demo Aggregation Case Study

• Question: What drugs can be used as candidates for treating for B-cell Lymphoma patients?

• By comparing gene expression patterns between patients with and without B-cell lymphoma, a top biomarker was found: BRKCB-1

Page 53: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

53

Seamark Demo: Background & Concepts Demonstration premise

RDF offers high value during early stage research

Leveraging strengths of Oracle 10g & Seamark v3.6 Oracle – large datasets / scalability Seamark – useful subsets / flexible navigation

Project elapsed time - about one week Locating and identifying data sources represented the greatest time element Data sources in RDF required minimal integration time Non-RDF data sources required transformation and linking values (non-trivial but straightforward)

Page 54: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

54

GO2Keyword.rdf

UniProt.rdf

GO.rdf

Keywords.rdf

Taxonomy.rdfPubMed.xml

Citation

IntAct.rdf

Organism

Enzymes.rdf

OMIM.rdf

GO2OMIM.rdf

GO2Enzyme.rdf

MIM Id

KEGG.rdf

KeywordGO2UniProt.rdf

Protein

Enzyme

ProbeSet.rdf

Gene

Probe

Pathway

Compound

1. Differentiate different forms of disease

2. Identify patients subgroups.

3. Identify top biomarkers

4. Identify function

5. Identify biological and chemical properties and disease associations of biomarker

6. Identify documents

7. Identify role in metabolic pathways

8. Identify compounds that interact

9. Identify and compare function in other organisms

10. Identify any prior art

Seamark Demonstration: Identification of new drug candidates

Siderean Seamark Demonstration in collaboration with Joanne Luciano, Predictive Medicine, Inc.

Page 55: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

BioPAX Biological PAthway

eXchange

An abstract data model for biological pathway integration

Initiative arose from the community

Page 56: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

MetabolicPathways

MolecularInteracti

onNetworks

SignalingPathways

GeneRegulation

BioPAXLevel 1

Biological Pathways of the Cell

BioPAX

BioPAXLevel 2

BioPAXLevel 3

BioPAXLevel 4

Page 57: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Different representations of the same pathways

BioCarta Reference Pathway GLYCOLYSIS

Does not compute.

Pretty,but useless

Starts at Glucose (but it doesn’t matter)

Reactions clickable but...

Page 58: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

How bad is it?Pathway Databases

So many pathway databases, so little time.

Pathway Data (domain)

Graphic from Mike Cary and Gary Bader

Page 59: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Exchange Formats in Pathway Data Space

(Scope)

BioPAX

PSI-MI 2SBML,CellML

GeneticInteractions

Molecular InteractionsPro:Pro All:All

Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic

Regulatory PathwaysLow Detail High Detail

Database ExchangeFormats

Simulation ModelExchange Formats

RateFormulas

Metabolic PathwaysLow Detail High Detail

Biochemical Reactions

Small MoleculesLow Detail High Detail

Graphic from Mike Cary & Gary Bader

Page 60: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

BioPAX Motivation

Before BioPAX With BioPAX

Common format will make data more accessible, promoting data sharing and distributed curation efforts

>180 DBs and tools

Database

Application

User

Page 61: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

BioPAX Objectives

• Accommodate existing database representations

• Integration and exchange of pathway data

• Interchange through a common (standard) representation

• Provide a basis for future databases• Enable development of tools for searching and reasoning over the data

Page 62: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Data Aggregation, Integration and Inference

with BioPAX

1. Multiple kinds of pathway databases– metabolic– molecular interactions– signal transduction

2. Constructs designed for integration– DB References– XRefs (Publication, Unification,

Relationship)– synonyms– provenance

3. OWL DL – to enable reasoning

Page 63: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

phosphoglucoseisomerase 5.3.1.9

OWL(schema)

Instances (Individuals)

(data)

BioPAX Biochemical Reaction

Page 64: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

BioPAX Ontology: Overview

Level 1 v1.0 (July 7th, 2004)

parts

how the parts are known to interact

a set ofinteractions

Page 65: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

BioDASHBridging Chemistry and Molecular

Biology

Uniprot:P49841

•Different Views have different semantics: Lenses

• When there is a correspondence between objects, a semantic binding is possible

Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot

Source: Eric Neumann Haystack BioDASH Demo http://www.w3.org/2005/04/swls/BioDash/Demo/

Page 66: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

66

Summary• The science of knowledge representation has, throughout its

history, consisted of a compromise between pragmatism, scientific rigor, and accessibility to domain experts

• Many different options for ontology development and encoding, i.e. knowledge representation

• Sometimes, your choice of representation may need to change based on language and tools availability/ capability…

• Balancing expressivity and implementability means we favor an object-type, e.g. DL representation (but also suggests the need for a meta-representation: e.g. KIF – Knowledge Interchange Format)

• Next class (3) – ontology engineering• Use cases should drive the functional requirements of both

your ontology and how you will ‘build’ one (see class 4)

Page 67: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

67

Assignment for Week 2

• Reading: – Semantic Web for the Working Ontologist– Alternate reading: Pizza Tutorial

• Assignment 1:

Representing Knowledge and Understanding Representations

Page 68: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

Extras

68

Page 69: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

69

DOLCE + SWEETDOLCE = SWEET < SWEET

Physical-body BodyofGround, BodyofWater,…

Material-Artifact Infrastructure, Dam, Product,…

Physical-Object LivingThing, MarineAnimal

Amount-of-Matter Substance

Activity HumanActivity

Physical-Phenomenon Phenomena

Process Process

State StateOfMatter

Quality Quantity, Moisture,…

Physical-Region Basalt,…

Temporal-Region Ordovician,…

Benefitsfull coverage

rich relations

home for orphans

single superclasses

Issuesindividuals (e.g. Planet Earth)

roles (contaminant)

features (SeaFloor)

Courtesy: Boyan Brodaric

Page 70: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

70

Conclusions

Surprisingly good fit amongst ontologiesso far: no show-stopper conflicts, a few difficult conflicts

DOLCE richness benefits geoscience ontologies

good conceptual foundation helps clear some existing problems

Unresolved issues in modeling science entities

modeling classifications, interpretations, theories, models,…

Courtesy: Boyan Brodaric

Same procedure with GeoSciML

Page 71: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

71

CF attributes

SWEET Ontologies(OWL)

Search Terms

CF Standard Names(RDF object)

IRIDL Terms

NC basic attributes

IRIDLattributes/objects

SWEET as Terms

CF Standard NamesAs Terms

Gazetteer Terms

CF data objects

Location

Blumenthal

Page 72: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

72

Data ServersOntologies

MMI

JPL

StandardsOrganizations

Start Point

RDF Crawler

RDFS SemanticsOwl SemanticsSWRL Rules

SeRQL CONSTRUCT

Search Queries

LocationCanonicalizer

TimeCanonicalizer

Sesame

Search Interface

bibliography

IRI RDF Architecture

Blumenthal

Page 73: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

73

CLCE - Common Logic Controlled English

CLCE: If a set x is the set of (a cat, a dog, and an elephant), then the cat is an element of x, the dog is an element of x, and the elephant is an element of x.

PC:~(∃x:Set)(∃x1:Cat)(∃x2:Dog)(∃x3:Elephant)(Set(x,x1,x2,x3) ∧ ~(x1∈x ∧ x2∈x ∧ x3∈x))

Page 74: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

74

Use Case• Provide a decision support capability for an

analyst to determine an individual’s susceptibility to avian flu without having to be precise in terminology (-nyms)

Page 75: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

75

Page 76: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

76

Page 77: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

77

Building SKOS• ThManager

• Protégé (4) plugin for SKOS

Page 78: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

78

Is OWL the only option II? No…• Natural Language (NL)

– Read results from a web search and transform to a usable form

– Find/filter out inconsistencies, concepts/relations that cannot be represented

• Popular options– CLCE (common logic controlled english)– Rabbit, e.g. ShellfishCourse is a Meal Course that (if has

drink) always has drink Potable Liquid that has Full body and which either has Moderate or Strong flavour

– PENG (processable English)

• Really need PSCI - process-able science but that’s another story (research project)

Page 79: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

79

Sydney syntax

If X has Y as a father then Y is the only father of X.

The class person is equivalent to male or female, and male and female are mutually exclusive.

equivalent toThe classes male and female are

mutually exclusive. The class person is fully defined as anything that is a male or a female.

Page 80: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

80

PENG - Processible English

1. If X is a research programmer then X is a programmer.

2. Bill Smith is a research programmer who works at the CLT.

3. Who is a programmer and works at the CLT?

Page 81: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

81

Rules (aka ‘Logic’)• OWL is based on Description Logic• OWL DL follows it precisely• There are things that DL cannot express (though there are things that are difficult to express with rules and easy in DL...)– A well known examples is Horn rules (eg, the ‘uncle’ relationship): (P1 ∧ P2 ∧ ...) → C

– e.g.: parent(?x,?y) ∧ brother(?y,?z) ⇒ uncle(?x,?z)

– Or, for any X, Y and Z: if Y is a parent of X, and Z is a brother of Y then Z is the uncle of X

Page 82: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

82

Examples from http://www.w3.org/Submission/SWRL/

• A simple use of these rules would be to assert that the combination of the hasParent and hasBrother properties implies the hasUncle property. Informally, this rule could be written as:– hasParent(?x1,?x2) ∧ hasBrother(?x2,?x3) ⇒ hasUncle(?x1,?x3)

• In the abstract syntax the rule would be written like:– Implies(Antecedent(hasParent(I-variable(x1) I-variable(x2)) hasBrother(I-variable(x2) I-variable(x3)))Consequent(hasUncle(I-variable(x1) I-variable(x3))))

• From this rule, if John has Mary as a parent and Mary has Bill as a brother then John has Bill as an uncle.

Page 83: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

83

Examples• An even simpler rule would be to assert that Students are Persons, as in– Student(?x1) ⇒ Person(?x1).Implies(Antecedent(Student(I-variable(x1)))Consequent(Person(I-variable(x1))))

– However, this kind of use for rules in OWL just duplicates the OWL subclass facility. It is logically equivalent to write instead• Class(Student partial Person) or • SubClassOf(Student Person)

– which would make the information directly available to an OWL reasoner.

Page 84: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

84

Semantic Web with Rules• Metalog• RuleML• SWRL• RIF• OWL 2 RL• WRL• Cwm• Jess - rules engine

Page 85: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

85

Developing a service ontology• Use case: find and display in the same projection,

sea surface temperature and land surface temperature from a global climate model.

• Find and display in the same projection, sea surface temperature and land surface temperature from a global climate model.

• Classes/ concepts: – Temperature– Surface (sea/ land)– Model– Climate– Global– Projection– Display …

Page 86: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

86

Service ontology• Climate model is a model• Model has domain• Climate Model has component representation• Land surface is-a component representation• Ocean is-a component representation• Sea surface is part of ocean• Model has spatial representation (and temporal)• Spatial representation has dimensions• Latitude-longitude is a horizontal spatial representation• Displaced pole is a horizontal spatial representation• Ocean model has displaced pole representation• Land surface model has latitude-longitude representation• Lambert conformal is a geographic spatial representation• Reprojection is a transform between spatial representation• ….

Page 87: 1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC-6962-01 Week 2, September 13, 2010

87

Service ontology• A sea surface model has grid representation displaced pole

and land surface model has grid representation latitude-longitude and both must be transformed to Lambert conformal for display