63
Introduction to Protégé for Absolute Beginners University at Buffalo August 11-12, 2012

Introduction to Protégé for Absolute Beginners

  • Upload
    yule

  • View
    102

  • Download
    1

Embed Size (px)

DESCRIPTION

Introduction to Protégé for Absolute Beginners. University at Buffalo August 11-12, 2012. Goal and Content of Tutorial. The goal of the tutorial is to explain how to translate ontologies into a language that can be processed by computers Three main sections by content: - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Protégé for Absolute Beginners

Introduction to Protégé for Absolute Beginners

University at BuffaloAugust 11-12, 2012

Page 2: Introduction to Protégé for Absolute Beginners

2

Goal and Content of Tutorial

• The goal of the tutorial is to explain how to translate ontologies into a language that can be processed by computers

• Three main sections by content:– Overview of the Web Ontology Language (OWL)– Hands-on training in Protégé, an OWL editor– Overview of SPARQL Protocol and RDF Query

Language (SPARQL), a query language for retrieving and modifying ontologically grounded information

Page 3: Introduction to Protégé for Absolute Beginners

3

IS THE GOAL WORTHWHILE?

Page 4: Introduction to Protégé for Absolute Beginners

4

The Current State of Data Integration on the Web

• Search engines return some remarkably precise results but the precision degrades as the topics become less standardized

Page 5: Introduction to Protégé for Absolute Beginners

5

A Query Containing Standardized Terms…

Page 6: Introduction to Protégé for Absolute Beginners

6

…Yields Very Good Results

Page 7: Introduction to Protégé for Absolute Beginners

7

But as the Terms Become Less Standardized…

Page 8: Introduction to Protégé for Absolute Beginners

8

…the Results Become Less Precise

Page 9: Introduction to Protégé for Absolute Beginners

9

The Current State of Data Integration in the Enterprise

• Using more than a single software application carries a risk of added cost to combine the information they create. – Databases carry very little meta-data about the

content of information they contain– Spreadsheets most often carry less

Page 10: Introduction to Protégé for Absolute Beginners

10

In the Social Network, Hashtags Cluster Information Into Categories

• But the ambiguities of language reappear in the categories

• and the lack of rigor in relating one category to another is an obstacle to machine based validation of usage.

Page 11: Introduction to Protégé for Absolute Beginners

11

The Value Added by OWL Ontologies to Data Integration

• Ontologies endow terms with machine processable definitions and disambiguate different senses of the same expression

• Ontologies place restrictions on how terms can be related to other terms so that misuse and inconsistencies can be detected.

Page 12: Introduction to Protégé for Absolute Beginners

12

The Ontologized Web, Enterprise and Social Network

• What if creators of web pages, databases, and blogs used terminology from curated ontologies to annotate their content?

• Standardized ways of describing the structures to represent data is accepted, why not extend that acceptance to annotation of content?

• Expected Benefits:– The precision of search increase dramatically– Data from different sources can be merged– Gaps in information can be identified– Falsehoods and incoherent expressions can be detected

Page 13: Introduction to Protégé for Absolute Beginners

13

OVERVIEW OF RESOURCE DESCRIPTION FRAMEWORK (RDF)

Page 14: Introduction to Protégé for Absolute Beginners

14

Resource Description Framework (RDF)• Designed to be a language for making

assertions about resources • A Resource* is – an electronic document, an image, a source of

information with a consistent purpose – not necessarily accessible via the Internet; e.g.,

human beings, corporations, and books in a library can also be resources.

– an abstract concept such as the operators and operands of a mathematical equation or types of a relationship (e.g., "parent" or "employee“)

*derived from RFC 3986-Uniform Resource Identifier (URI): Generic Syntax from http://tools.ietf.org/html/rfc3986

Page 15: Introduction to Protégé for Absolute Beginners

15

Expressing Information in RDF

• Statements are always expressed in the form of a triple: – Subject – Predicate – Object (a.k.a. RDF Triple)

• Translating the statement “Austria’s GDP per capita is 30,500 Euros” into RDF requires breaking it into triples

Subject Predicate Object

Austria has economic indicator Austria’s GDP per capita

Austria’s GDP per capita has value 30,500 Euros

Page 16: Introduction to Protégé for Absolute Beginners

16

Universal Resource Identifiers (URIs) and Literals

• URIs are unique names of resources– http://dbpedia.org/page/Austria– http://en.wikipedia.org/wiki/Austria

• Literals– Can be a simple raw text value– can be annotated with a language tag as in

“Austria”@en– can be typed with a datatype as in

“30,500Euros”^^string

Page 17: Introduction to Protégé for Absolute Beginners

17

Rules for RDF Statements

• Subject and Predicate have to be URI named resources

• Object – can be either a URI named resource or a literal

Page 18: Introduction to Protégé for Absolute Beginners

18

Applying the RulesUsing “dbpedia:”, “ro”, and “example:” as prefixes for:

http://dbpedia.org/page, http://www.obofoundry.org/ro, andhttp://www.myexample.com/resource respectively,

Which of the following are well-formed RDF statements?

Subject Predicate Object

dbpedia:Austria ro:part_of dbpedia:Europe

dbpedia:Austria ro:part_of “Europe”@en

“Europe” ro:has_part dbpedia:Austria

dbpedia:Austria “is trading partner with” dbpedia:Germany

dbpedia:Europe ro:part_of dbpedia:Austria

example:30500Euro example:is_value_of example:AustrianGDPperCapita

Page 19: Introduction to Protégé for Absolute Beginners

19

RDF Graphs

dbpedia:Austria example:has_economic_indicator

example:Austrian_GDPper

Capita

example:has_value

30,500Euros^^string>

Nodes

Edges

The direction of the edges is always away from the subject and towards the object of the statement

Page 20: Introduction to Protégé for Absolute Beginners

20

Graphing RDFHow would the following be represented in a RDF Graph?

Subject Predicate Object

game1:MonopolyPlayer_1 rdf:Type mnply:MonopolyPlayer

game1:MonopolyPlayer_1 mnply:has_role game1:MonopolyBanker_Game1

game1:MonopolyPlayer_1 mnply:represented_by game1:MonopolyTokenBoot_Game1

game1:MonopolyPlayer_1 mnply:competes_in game1:MonopolyGame_Game1

Page 21: Introduction to Protégé for Absolute Beginners

21

Graphing RDF

game1:Monopoly

Game_Game1

game1:MonopolyPlayer_1

mnply:Monopoly

Player

game1:MonopolyBanker_Game1

game1:MonopolyToken

Boot_Game1

mnply:represented_by

mnply:has_role

mnply:competes_in

rdf:type

Page 22: Introduction to Protégé for Absolute Beginners

22

How far does RDF take us toward our goal?

• The value of RDF lies in the use of URIs, as it allows distinct information sources to share a common meaning for terms – Every occurrence of the same URI is a reference to

the same resource

• There is no inference with RDF, no way to validate use of URIs.

Page 23: Introduction to Protégé for Absolute Beginners

23

OVERVIEW OF RDF SCHEMA (RDFS)

Page 24: Introduction to Protégé for Absolute Beginners

24

RDF Schema (RDFS)

“RDF Schema defines classes and properties that may be used to describe classes, properties and other resources”*

RDFS defines terms that can describe classes of things and the relationships that hold between these classes

*RDF Vocabulary Description Language 1.0: RDF Schema from http://www.w3.org/TR/rdf-schema/

Page 25: Introduction to Protégé for Absolute Beginners

25

The Need for RDFS

• RDF can name, but not define, resources or the relationships that hold between them

• But what about…

Apples are a kind of fruit

Subject Predicate Object

dbpedia:Apple ex:is_kind_of dbpedia:Fruit

Page 26: Introduction to Protégé for Absolute Beginners

26

The Need for RDFS

• Machines cannot process elements of an expression that lie outside of RDF. To a machine our example looks like:

• We need language elements that enable a machine to process relationships between entities

Apples are a kind of fruit

Subject Predicate Object

tuvwxyz:Abcde ef:ij_klmn_op tuvwxyz:Fghij

Page 27: Introduction to Protégé for Absolute Beginners

27

RDFS Types

• Allows a resource to be typed as a class (i.e. a collection of individuals)

• Allows a class to be defined as a subclass of another class (i.e. all individuals that it contains are contained in the other)

• Allows a property to be defined as a subproperty of another property

Page 28: Introduction to Protégé for Absolute Beginners

28

RDFS Taxonomies• Enables the creation of taxonomies of both

classes and properties

Fruit

AppleCortland

Apple

Gala Apple

is related to

is sibling of

is brother of

is sister of

Class Taxonomy Property Taxonomy

Page 29: Introduction to Protégé for Absolute Beginners

29

rdfs:Resourcerdfs:Classrdfs:Literalrdfs:Datatyperdfs:rangerdfs:domainrdfs:subClassOfrdfs:subPropertyOfrdfs:labelrdfs:commentrdfs:ContainerMembershipPropertyrdfs:Memberrdfs:seeAlsordfs:isDefinedBy

RDFS Vocabulary

Page 30: Introduction to Protégé for Absolute Beginners

30

RDFS Vocabulary in Action• rdfs:subClassOf is used to assert that every instance of a class is

an instance of another.

• If a resource is rdf:type dbpedia:Apple, a reasoner will assert that the resource is also rdf:type dbpedia:Fruit

example:Newtons

Apple

dbpedia:Apple

dbpedia:Fruit

rdf:type rdfs:subClassOf

rdf:type

Apples are a kind of fruit

Subject Predicate Object

dbpedia:Apple rdfs:subClassOf dbpedia:Fruit

Page 31: Introduction to Protégé for Absolute Beginners

31

RDFS Vocabulary in Action• rdfs:subPropertyOf is used to assert that every

pair of resources that are related by a property are also related by another.

• If Ann is the sister of Ben and is sister of is a subproperty of is sibling of, then a reasoner will assert that Ann is a sibling of Ben

Every sister of a person is a sibling of that person

Subject Predicate Object

ex:is_sister_of rdfs:subPropertyOf ex:is_sibling_of

Page 32: Introduction to Protégé for Absolute Beginners

32

RDFS Vocabulary in Action• rdfs:domain is used to assert that a property is always

applied to instances of one or more classes.

• If Ann is related to Ben via the ex:is_sister_of property, a reasoner will assert that Ann is rdf:type ex:Female

example:Ann

Example:Ben

example:Female

example:is_Sister_of

rdf:type

Only females can be sisters of others

Subject Predicate Object

ex:is_sister_of rdfs:domain ex:Female

Page 33: Introduction to Protégé for Absolute Beginners

33

RDFS Vocabulary in Action• rdfs:range is used to assert that the instances of the

object of a property are always of one or more classes or datatypes

• If Newton’s apple is related to Newton’s apple tree via the ex:is_borne_by property, a reasoner will assert that Newton’s apple tree is rdf:type dbpedia:Plant

example:Newton’s

Apple

Example:Newton’s

Apple Tree

dbpedia:Plant

example:is_borne_by

rdf:type

Only plants can bear fruit

Subject Predicate Object

ex:is_borne_by rdfs:range dbpedia:Plant

Page 34: Introduction to Protégé for Absolute Beginners

34

RDFS Vocabulary in Action

• rdfs:label is used to provide a human readable version of a resource’s name.

• If a GUID is used as the identifier for the class of Apple, then use rdfs:label to assign as many human readable versions as desired.

Subject Predicate Object

ex:EXO_0002032 rdfs:label “Apple”@en

ex:EXO_0002032 rdfs:label “Manzana”@sp

ex:EXO_0002032 rdfs:label “Mela”@it

Page 35: Introduction to Protégé for Absolute Beginners

35

RDFS Vocabulary in Action• rdfs:comment is used to provide a human-

readable description of a resource

Both comments are reused from http://dbpedia.org/page/Apple

Subject Predicate Object

dbpedia:Apple rdfs:comment “The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family”@en

dbpedia:Apple rdfs:comment “La mela è il frutto (più precisamente si tratta di un falso frutto a pomo) del melo.” @it

Page 36: Introduction to Protégé for Absolute Beginners

36

RDFS Vocabulary in Action

• rdfs:seeAlso is used to assert that a resource provides additional information about the subject resource.

Subject Predicate Object

dbpedia:Apple rdfs:seeAlso wiki:Apple

dbpedia:Apple rdfs:seeAlso ex:Apple

Page 37: Introduction to Protégé for Absolute Beginners

37

RDFS Vocabulary in Action

• rdfs:isDefinedBy is used to assert that a resource defines the subject resource.

Subject Predicate Object

dbpedia:Apple rdfs:isDefinedBy wiktionary:apple

dbpedia:Apple rdfs:isDefinedBy wordnet:apple

Page 38: Introduction to Protégé for Absolute Beginners

38

How far does RDFS take us toward our goal?

• Contains elements that enable machine inferencing on necessary conditions (e.g. Apples are the fruit of the apple tree)

• Doesn’t allow restrictions on classes that would enable inferencing on sufficient conditions (e.g. Apples are the fruit of the apple tree)

• Doesn’t provide a way to exclude resources from class membership, can’t validate assertions.

Page 39: Introduction to Protégé for Absolute Beginners

39

OVERVIEW OF THE WEB ONTOLOGY LANGUAGE (OWL)

Page 40: Introduction to Protégé for Absolute Beginners

40

Web Ontology Language (OWL*)

• OWL is the descendant of Knowledge Representation Languages of the 1990’s such as Simple HTML Ontology Extensions (SHOE) and Ontology Inference Layer (OIL) and from the DARPA Agent Markup Language (DAML)

• The initial version of OWL became a formal W3C Recommendation on February 10, 2004

• OWL 2 became a W3C Standard on October 27, 2009

* why “OWL” instead of “WOL” http://lists.w3.org/Archives/Public/www-webont-wg/2001Dec/0169.html

Page 41: Introduction to Protégé for Absolute Beginners

41

The Need for OWL• RDFS lacks the expressive power allow inferences about individuals

beyond their class membership.

• Based on this equivalence a machine can infer only that the two classes have the same instances.

• We want to enable a machine to infer the attributes of an individual based upon the definition of the class of which they are members

Subject Predicate Object

dbpedia:Apple rdf:type rdfs:Class

dbpedia:Apple rdfs:subClassOf ex:FruitOfAppleTree

ex:FruitOfAppleTree rdf:type rdfs:Class

ex:FruitOfAppleTree rdfs:subClassOf dbpedia:Apple

Page 42: Introduction to Protégé for Absolute Beginners

42

OWL Usage

“The W3C OWL 2 Web Ontology Language (OWL) is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can be reasoned with by computer programs either to verify the consistency of that knowledge or to make implicit knowledge explicit.”*

* http://www.w3.org/TR/owl2-primer/

Page 43: Introduction to Protégé for Absolute Beginners

43

Defining Classes -EnumerationUse owl:oneOf to enumerate the members of a

classIn Manchester Syntax

Class: MonopolyToken

EquivalentTo: {Battleship , Boot , Car , Dog , Thimble , Top_Hat , Wheelbarrow, Iron} SubClassOf: Thing

Page 44: Introduction to Protégé for Absolute Beginners

44

Defining Classes - Restrictions

• owl:Restriction creates a class defined using an object property and either:– a value constraint which places a constraint on the range

of the property when applied to this particular class• e.g. the rdfs:range of the is_borne_by property might be plant,

but when defining apple we would constrain the range to the class of apple trees

– a cardinality constraint which places a constraint on the number of values a property can take in the context of a particular class• e.g. there can be no more than 8 players in a game of Monopoly

Page 45: Introduction to Protégé for Absolute Beginners

45

Additional Inferences Gained Through Restrictions

Without a restriction all that can be inferred about an improved property is that it must also be a property

Class: MonopolyImprovedProperty SubClassOf: MonopolyProperty

Adding a restriction adds the information that an improved property must be a property and that it must be the location of some building

Class: MonopolyImprovedProperty

EquivalentTo: location_of some MonopolyBuilding SubClassOf: MonopolyProperty

Page 46: Introduction to Protégé for Absolute Beginners

46

rdfs:subClassOf vs. owl:equivalentClassproperty that is the location of a building

improved property

Virginia Place is thelocation of House 1

is a subclass of

improved property

Virginia Place is thelocation of House 1

is an equivalentclass of

property that is the location of a building

?

?

Page 47: Introduction to Protégé for Absolute Beginners

47

owl:allValuesFrom vs. owl:someValuesFrom

• owl:allValuesFrom constrains the object property so that its value must come from the specified class or data range– Example: A mortgaged property is one such that it is

owned only by the bank• owl:someValuesFrom constrains the object

property so that at least one of its values must come from the specified class or data range– Example: An improved property is the location of some

building

Page 48: Introduction to Protégé for Absolute Beginners

48

owl:hasValue• The owl:hasValue constraint limits an object property to a

given value, which can be either an individual or a data value. For example we could use this constraint to assert that all monopoly railroads have a price of 200.

Class: MonopolyRailroad

SubClassOf: has_price value 200, MonopolyProperty

• Given an resource that is a Monopoly Railroad a reasoner will infer that its price is 200.

game1:ReadingRailroad

mnply:MonopolyRailroad

mnply:has_price =

200

rdf:type rdfs:subClassOf

mnply:has_price 200

Page 49: Introduction to Protégé for Absolute Beginners

49

owl:hasValue• To define the class of New York City building we can use

owl:hasValue on the property of located_in and the individual NewYorkCity

Class: NewYorkCityBuilding

SubClassOf: located_in value NewYorkCity, Building

• Given an resource that is a New York City building a reasoner will infer that its location is New York City.

example:EmpireState

Building

example:NewYorkCity

Building

example:located_in

NYC

rdf:type rdfs:subClassOf

example:located_in example:NewYorkCity

Page 50: Introduction to Protégé for Absolute Beginners

50

Cardinality Constraints• Useful in expressing that a class has an exact number of

relationships to another class or data range.

Example: A turn has exactly one player as a participant and exactly one integer as its ordinal value

Class: MonopolyTurn

Annotations: rdfs:label "Monopoly turn"^^xsd:string SubClassOf: has_ordinal_value exactly 1 xsd:integer, has_participant exactly 1 MonopolyPlayer, occurs_containing some MonopolyRollOfDice, occurs_during some MonopolyRound, MonopolyEvent

Page 51: Introduction to Protégé for Absolute Beginners

51

Cardinality Constraints• Can also express that the number of instances of a given

relationship between a class and another class or data range can span a range of values

Example: A color group can have between 2 and 3 properties as members.

Class: MonopolyColorGroup

SubClassOf: owl:Thing, (has_member min 2 MonopolyProperty) and (has_member max 3 MonopolyProperty)

Page 52: Introduction to Protégé for Absolute Beginners

52

Set Operators

• owl:intersectionOf - a class is formed from the individuals that are common to two or more classes

• owl:unionOf – a class is formed from the individuals that are in any of two or more classes

• owl:complementOf – a class is formed from the individuals that are not members of a class

Page 53: Introduction to Protégé for Absolute Beginners

53

owl:equivalentClass and owl:disjointWith

• owl:equivalentClass establishes that two classes have the same instances– this is similar to the owl:sameAs that establishes

that two classes have the same intention• owl:disjointWith establishes that two classes

have no members in common

Page 54: Introduction to Protégé for Absolute Beginners

54

Defining Properties - Subtypes

• Object Property – used to link individuals to individuals

• Datatype Property – used to link individuals to data values

• Annotation Property – used to link ontology elements to metadata

Page 55: Introduction to Protégé for Absolute Beginners

55

Defining Properties – Relations to Other Properties

• owl:equivalentProperty – behaves similarly to owl:equivalentClass, two properties are equivalent if and only if they have the same members (i.e. they have the same extension)

• owl:inverseOf – if x is related to y with by property A and property A is the inverse of property B, then y is related to x with property B

Page 56: Introduction to Protégé for Absolute Beginners

56

Defining Properties – Cardinality Constraints

• owl:FunctionalProperty is used to place a uniqueness constraint on the value of the range of a property for each value in the domain of that property.

game1:MonopolyPlayer_1

game1:Monopoly

TokenRailroad_1

mnply:is_represented_by

game1:Monopoly

TokenRailroad_2

mnply:is_represented_by

owl:sameAs

Page 57: Introduction to Protégé for Absolute Beginners

57

Defining Properties – Cardinality Constraints

• owl:InverseFunctionalProperty is used to place a uniqueness constraint on the value of the domain of a property for each value in the range of that property

game1:MonopolyPlayer_1

game1:Monopoly

TokenRailroad_1

mnply:is_represented_by

mnply:is_represented_by

owl:sameAs

game1:MonopolyPlayer_2

Page 58: Introduction to Protégé for Absolute Beginners

58

Defining Properties – Logical Characteristics

• Symmetric Property – P is a symmetric property if aPb then bPa

• Asymmetric Property – P is an asymmetric property if aPb then not bPa

• Reflexive Property – P is a reflexive property every aPa• Irreflexive Property – P is an irreflexive property no

aPa• Transitive Property – P is a transitive property if aPb

and bPc, then aPc

Page 59: Introduction to Protégé for Absolute Beginners

59

A Few Examples• The relationship of being adjacent to is symmetric

– If Mediterranean Avenue is adjacent to Go, then Go is adjacent to Mediterranean Avenue

• The relationship of having a role is asymmetric– If Player_1 has the role of banker, then the role of banker does

not have the role of Player_1

• The relationship of occurring prior to is transitive– If Round_1 occurs prior to Round_2 and Round_2 occurs prior to

Round_3, then Round_1 occurs prior to Round_3

Page 60: Introduction to Protégé for Absolute Beginners

60

Multi-typed PropertiesProperties can be typed by more than one of the logical

characteristics

ObjectProperty: adjacent_to

Annotations: rdfs:label "adjacent to"^^xsd:string Characteristics: Irreflexive, Symmetric Domain: MonopolyBoardSpace Range: MonopolyBoardSpace

Page 61: Introduction to Protégé for Absolute Beginners

61

A COUPLE OF IMPORTANT ASSUMPTIONS

Page 62: Introduction to Protégé for Absolute Beginners

62

No Assumption of Unique Names

• There is no assumption that if “two” resources have unique names that they represent distinct entities

• This holds for any type of resource: class, property, datatype or instance

Page 63: Introduction to Protégé for Absolute Beginners

63

Open World Assumption

• Some data management systems use the Closed World assumption, meaning that if a fact is not found among the data in the system, it is assumed to be false.– In a sales database, if the name “Steve Wozniak” does not

appear in the customer table, then Mr. Wozniak is not a customer of that company

• In Semantic Web applications, the Open World assumption is used, meaning that if a fact is not found among a set of data it is not assumed to be false.