Upload
abra-collier
View
50
Download
5
Tags:
Embed Size (px)
DESCRIPTION
An Introduction to RDF and the Semantic Web. Dr. Randy Kaplan. Resource Description Framework. RDF Least Understood standard to come from the W3C May be the most powerful In order that the web achieve its potential May be the most important In order that the web achieve its potential. - PowerPoint PPT Presentation
Citation preview
An Introduction to RDF and the Semantic
WebDr. Randy Kaplan
2
Resource Description Framework
RDF
Least Understood standard to come from the W3C
May be the most powerful
In order that the web achieve its potential
May be the most important
In order that the web achieve its potential
3
Resource Description Framework
Why RDF?
With HTML and XML we can swap our documents easily
No meaning is attached to them - they are just data
RDF addresses the problem of meaning in the data on the web
4
What We Need To Know
When we exchange data we need to know things like,
Who wrote the data
When was the data written
When was the data last updated
These pieces of data are not data per se but the data about the data or meta data
5
XML
Promised to deliver us from the unstructured data that makes up the Internet
XML brings structure to the data
Because HTML combined the appearance of the document with the content of the document it, the content was extremely hard to extract
XML separated content from presentation
6
XML
XML specifically dealt with the data of the content
<music genre =”classical”><title>Eine Kleine Nacht Muzik</title><composer>Mozart</composer><key>E Flat</key><tempo>2/4</tempo></music>
7
XML
We could convey some of the same information with different data
<document type =”classical music”><name>Eine Kleine Nacht Muzik</name><author>Mozart</author></document>
8
XML
What if we wanted to find all pieces of music composed by Mozart?
We would have to find all documents where the <composer> element had a value of ‘Mozart’.
We would also have to find all documents where the <author> element had a value of ‘Mozart’.
9
XML
If there was another element used to denote the creator of the music then that term would have to be searched for also
In order to be able to find all compositions written by Mozart without having to identify all elements designating the creator of the music then the same term would have to be used to identify the creator
10
XML
This problem could also be solved by indicating that when the term composer is used, it means the same when another document says written by, and another says created by
This would be quite an undertaking though as it involves identifying all words and phrases in all languages having this meaning
11
Missing
Our ability to know that one or more terms mean the same thing is the thing that is missing from the Internet
If we can build this layer into the Internet, it will take the information to a fundamentally different level
12
Dublin Core
1995
Conference in Dublin, Ohio
Discussed issues of semantics
Agreed to a core set of themes common to all documents
Set of properties became known as the Dublin Core (DC) initiative
13
Dublin Core
3 Core Properties
DC.Title
DC.Creator
DC.Subject
15 core properties were defined in the Dublin core (originally)
14
Dublin CoreThe Dublin Core can be applied to XML
<music genre =”classical”><title>Eine Kleine Nacht Muzik</title><Creator>Mozart</Creator><key>E Flat</key><tempo>2/4</tempo></music>
<document type =”classical music”><name>Eine Kleine Nacht Muzik</name><Creator>Mozart</Creator></document>
15
Dublin Core
Even though we now have used the same element to identify the entity responsible for creating the we don’t know if the meaning of “Creator” is the same in both of these instances
The only way to be sure is to use a very precise mechanism to identify the element being used
16
Dublin CoreThe Dublin Core can be applied to XML
<music genre =”classical”><title>Eine Kleine Nacht Muzik</title><dc.Creator xmlns:dc=”http://purl.org/dc/elements/1.1/”>Mozart</dc.Creator><key>E Flat</key><tempo>2/4</tempo></music>
<document type =”classical music”><name>Eine Kleine Nacht Muzik</name><dc.Creator xmlns:dc=”http://purl.org/dc/elements/1.1/”>Mozart</dc.Creator></document>
Now we can see that these elements refer to exactly the same concept
17
CD DatabaseSuppose you keep a small database of CDs on your computer
There is a table in the database as below
Primary Key
Album Name Artist
1The Ecleftic: Two Sides II a Book
Wyclef Jean
2 Eine Kleine Nacht Muzik Mozart
3 SoultraneJohn Coltrane
4 The Real Eminem
18
Another CD DatabaseThere is a second database kept by another person who has a CD collection
A table in the database is shown below
Key Title Performer
1 Eine Kleine Nacht Muzik Mozart
2 The Ecleftic Wyclef Jean
3 Kind of Blue Miles Davis
19
Comparing Databases
Exchanging Information
If we wanted to share information there would be a problem since the tuple names are different
The same solution we used in the XML can be used in the database - the unique identifier
20
Another CD DatabaseThere is a second database kept by another person who has a CD collection
A table in the database is shown below
PrimaryKey
http://purl.org/dc/elements/1.1/Title http://purl.org/dc/elements/1.1/Creator
1 The Ecleftic: Two Sides II a Book Wyclef Jean
2 Eine Kleine Nacht Muzik Mozart
3 Soultrane John Coltrane
4 The Real Eminem
21
Another CD DatabaseThere is a second database kept by another person who has a CD collection
A table in the database is shown below
Key http://purl.org/dc/elements/1.1/Title http://purl.org/dc/elements/1.1/Creator
1 Eine Kleine Nacht Muzik Wolfgang Amadeus Mozart
2 The Ecleftic Wyclef Jean
3 Kind of Blue Miles Davis
22
URI’s
Uniform Resource Identifiers (URI’s) give us a way to insure that the meaning of the column of data between databases is the same so long as the column is labeled with the same URI
23
Other ProblemsUnfortunately when we look at the databases we notice some other problems
PrimaryKey
http://purl.org/dc/elements/1.1/Title http://purl.org/dc/elements/1.1/Creator
1 The Ecleftic: Two Sides II a Book Wyclef Jean
2 Eine Kleine Nacht Muzik Mozart
3 Soultrane John Coltrane
4 The Real Eminem
Key http://purl.org/dc/elements/1.1/Title http://purl.org/dc/elements/1.1/Creator
1 Eine Kleine Nacht Muzik Wolfgang Amadeus Mozart
2 The Ecleftic Wyclef Jean
3 Kind of Blue Miles Davis
24
Other Problems
Problem 1
Albums which may be the same have different names
Problem 2
Different names are used to denote the same composers
25
Taxonomies
These problems can be solved through the use of taxonomy
A taxonomy is a -
Controlled vocabulary of words
Usually about a constrained topic
Unique identifiers are key to developing taxonomies
26
Taxonomies
If we were to devise a controlled classification list so we could tell which CD’s were which genre then we would avoid problems like having one CD labeled as classical and another CD labeled as classic
27
Taxonomies
CD Taxonomy
Jazz
Classical
Soul
Pop
Hip Hop
Folk
28
Taxonomies
We are not limited to taxonomies of of music
We could have type of performance, i.e., play, movie, live performance, etc.
29
Moving the Problem
We really didn’t solve the problem we described earlier
We only moved the problem up a level
We now have the problem with having more than one taxonomy for the same thing
30
Moving the Problem
Consider
http://taxonomies.org/Plays/PorgyAndBess
http://taxonomies.org/Albums/PorgyAndBess
We do not know whether the PorgyAndBess in the first reference is the same as the PorgyAndBess in the same reference
31
We Need An Authority Figure
Let us imagine that there is some authority that keeps track of al CDs that are released
This is similar to books and their ISBN numbers which are unique
We will call the fictitious authority MuzicBiz.org
MuzicBiz.org maintains a central database of CDs that have been released
32
Tables Now ...Key http://purl.org/dc/elements/1
.1/Titlehttp://purl.org/dc/elements/1.1/Creator
http://MuzicBiz.org/Album/1011234
Eine Kleine Nacht Muzik Wolfgang Amadeus Mozart
http://MuzicBiz.org/Album/7655432
The Ecleftic Wyclef Jean
http://MuzicBiz.org/Album/8997654
Kind of Blue Miles Davis
Key http://ebiz.org/Stock http://ebiz.org/Cost
http://MuzicBiz.org/Album/1011234
5 $16.00
http://MuzicBiz.org/Album/7655432
4 $19.00
http://MuzicBiz.org/Album/8997654
10 $12.00
33
Unique Identifiers
Since we are guaranteed that these identifiers ALWAYS refer to the same CD any table row having a specific key will ALWAYS refer to the same CD - there is NO reason to doubt this
Data validity is enforced
34
Meta-Data
Meta-Data
Data that describes data
Creator, Type, Date are all kinds of meta-data
So far the meta-data we have described consists of two values - an attribute name and an attribute value
35
Meta-Data
To be precise we need to add one more piece of meta-data to complete any meta-data we might have
Since it is entirely possible to have as Creator, the value Mozart, we need to identify what/where Mozart is the creator of - the so-called DOCUMENT
36
Triples
The combination of Source, Attribute name, and Value makes what is called in the RDF-biz a TRIPLE and that constitutes a fundamental element in RDF
37
Transporting Triples
We will assume the following -
Meta-data can be expressed as a set of triples
Key to sharing meta-data is the URI
Now given that we accept this representation, the next challenge is to decide how we will share this information (transport)
38
Sharing Meta-Data and Data
The database contains the information as organized in the table above
We need to transform this data into the accepted form, i.e., triples
Key http://ebiz.org/Stock http://ebiz.org/Cost
http://MuzicBiz.org/Album/1011234
5 $16.00
http://MuzicBiz.org/Album/7655432
4 $19.00
http://MuzicBiz.org/Album/8997654
10 $12.00
39
Sharing Data and Meta-Data
Document Name Value
http://MuzicBiz.org/Album/1011234
http://ebiz.org/Stock 5
http://MuzicBiz.org/Album/1011234
http://ebiz.org/Cost $16.00
http://MuzicBiz.org/Album/7655432
http://ebiz.org/Stock 4
http://MuzicBiz.org/Album/7655432
http://ebiz.org/Cost $19.00
http://MuzicBiz.org/Album/8997654
http://ebiz.org/Stock 10
http://MuzicBiz.org/Album/8997654
http://ebiz.org/Cost $12.00
40
Sharing Data and Meta-Data
We have adequately represented the meta-data and it is “ready” for transport via XML
But this table only represents the meta-data and does not relate to any data described by it
Document Name Value
http://MuzicBiz.org/Album/1011234
http://ebiz.org/Stock 5
http://MuzicBiz.org/Album/1011234
http://ebiz.org/Cost $16.00
http://MuzicBiz.org/Album/7655432
http://ebiz.org/Stock 4
http://MuzicBiz.org/Album/7655432
http://ebiz.org/Cost $19.00
http://MuzicBiz.org/Album/8997654
http://ebiz.org/Stock 10
http://MuzicBiz.org/Album/8997654
http://ebiz.org/Cost $12.00
41
Sharing Data and Meta-Data
We need a way to identify the document that the meta-data describes
For this purpose we add a name/value pair that names the URL of the document
Document Name Value
http://MuzicBiz.org/Album/1011234
http://ebiz.org/Stock 5
http://MuzicBiz.org/Album/1011234
http://ebiz.org/Cost $16.00
http://MuzicBiz.org/Album/7655432
http://ebiz.org/Stock 4
http://MuzicBiz.org/Album/7655432
http://ebiz.org/Cost $19.00
http://MuzicBiz.org/Album/8997654
http://ebiz.org/Stock 10
http://MuzicBiz.org/Album/8997654
http://ebiz.org/Cost $12.00
42
Sharing Data and Meta-Data
<documenttype="News Item"url="http://www.ePolitix.com/Articles/0000005a4787.htm"xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:Title>I will stand says Portillo</dc:Title><dc:Creator>Craig Hoiy</dc:Creator><dc:Subject>Tory leadership contest</dc:Subject>
</document>
43
RDF: Model and Syntax
RDF Model
In this case the model we are speaking of are the triples
The definition of RDF is representation independent
This means that XML is only one way of writing RDF
44
RDF Terminology
In RDF terminology a STATEMENT is used to describe a triple
This term arises from using a triple to make a statement about a document
45
RDF Terminology
Triples
Resources and Properties
In the RDF specification the name part of the name/value pair is regarded as a PROPERTY
The subject of the meta data is regarded as a RESOURCE
46
RDF Terminology
Triples
A triple is the combination of the three parts - a resource with a property and a value
47
RDF Terminology
A triple can express a relationship between resources
Resource Property Value
http://MuzicBiz.org/Albums/7655432
http://MuzicBiz.org/Prop/Track
http://MuzicBiz.org/Tracks/1667653
http://MuzicBiz.org/Albums/7655432
http://MuzicBiz.org/Tracks/1667653
Track
48
RDF Terminology
The terminology for this model is the SUBJECT of our statement is the album and the track is the OBJECT
The two resources are joined by a PREDICATE
The predicate specifies the nature of the relationship between the two resources
http://MuzicBiz.org/Albums/7655432
http://MuzicBiz.org/Tracks/1667653
Track
49
RDF Terminology
Notation
When writing about RDF it is useful to be able to show statements or sets of triples for discussion
50
Notation
English
English is simplist
Craig Hoy is the author of http://www.ePolitix.com/Articles/0000005a4787.htm
51
Notation
SUBJECT has a PREDICATE of OBJECT
Example
http://www.ePolitix.com/Articles/0000005a4787.htm has an author of Criag Hoy
52
NotationDirected labeled graphs
.../Articles/.../Articles/000000005a4787.htm000000005a4787.htm Craig HoyCraig Hoy
author
53
Notation
Three parts of a triple{[http://MuzicBiz.org/Review],[http://MuzicBiz.org/Albums/101234],“A relaxing album to prune to.”}
{[http://MuzicBiz.org/Review],[http://MuzicBiz.org/Albums/7655432],“Lively! Perfect when mowing the lawn.”}
{[http://MuzicBiz.org/Review],[http://MuzicBiz.org/Albums/8997654],“Very moody. Great when planning your next planting.”}
54
Notation
Complex sets of data can most compactly be represented in a graph
.../Articles/.../Articles/000000005a4787.htm000000005a4787.htm .../Authors/Craig%20Hoy.../Authors/Craig%20Hoy
.../companynumber/.../companynumber/39356443935644 EditorEditor
<dc:Creator>
<dc:Publisher>
<dc:Creator>
<xyz:JobTitle>
55
RDF Syntax
So far we’ve seen how RDF models meta data
Now we need to look at how these models are expressed in XML
56
RDF/XML
57
How is a statement formed?
Statement begins -
Reference to the resource that the statement is about (SUBJECT)
This is in the rdf:about attribute of the <rdf:Description> element
58
How is a statement formed?
The statement is located inside the <rdf:Description> element
Says there is a property of this resource - dc:Creator that has a value of “Craig Hoy”
59
Many Namespaces
When there are many namespaces to be defined in an RDF document grouping them in one place makes them stand out
60
RDF Elements
<rdf:Description> Element
Contains the URI for the resource being described
The <rdf:Description> element identifies the subject
A child element defines a predicate/object pair
61
<rdf:Description>
More detail about this element -
Multiple properties for the same resource
String literals and resource URI’s
Nesting statements
rdf:about attribute
62
<rdf:Description>
More detail about this element -
The rdf:ID attribute
Anonymous resources
The rdf:type attribute
63
<rdf:Description>
The <rdf:Description> element is actually a container for as many predicate/object pairs are you want
.../Articles/.../Articles/000000005a4787.htm000000005a4787.htm Craig HoyCraig Hoy
dc:Creator
ePolitixePolitixdc:Publisher
64
<rdf:Description>
One or more properties may be specified for the same resource
65
<rdf:Description>
An alternative syntax
Attributes take the place of child elements
66
<rdf:Description>
In order that a resource not be confused with a string literal, there is an RDF attribute
67
<rdf:Description>
Supposing we wanted to add some information to the description
../Articles/../Articles/000000005a4787.htm000000005a4787.htm
../companynumber/../companynumber/3935644.htm3935644.htm
../Authors/Craig%20Hoy../Authors/Craig%20Hoy
EditorEditor
<dc:Publisher>
<dc:Creator>
<>
68
<rdf:Description>
One way to code this in RDF is to simply add a statement that contains the new information
69
<rdf:Description>
RDF allows for the <rdf:Description> element to be nested
70
<rdf:Description>
Both representations are correct and the underlying model is the same in both cases
Which to use depends on context?
If there are many articles, the nested information would be repeated
Therefore the first representation would be preferable in this case
71
<rdf:Description>
Attributes
We know about the rdf:about attribute
The contents of the rdf:about attribute are a URI
72
<rdf:Description>
rdf:ID
This attribute allows a resource in a document to be named and then referred to with this name
The ID attribute and the about attribute ARE EXCLUSIVE - only one or the other can be used
73
rdf:ID
74
Anonymous Resources
An option for the <rdf:Description> element would be to NOT specify an rdf:about or rdf:ID attribute
This would be the way to introduce anonymous resources as part of an RDF description
The description element would exist for no other reason then to be given properties
75
Anonymous Resources
76
Anonymous Resources
<dc:creator>
<v.name>
<v.email>
<xyz:jobtitle>
../Articles/000000005a4787.htm
Craig Hoy
editor
77
Anonymous Resources
Back to Mozart
Assume that some authority has given the piece “Eine Kleine Nacht Muzik” the URL:
http://MuzikBiz.org/233456
We can also give this piece of music an assigned code from the Dewey Decimal Classification code
781.68
78
Anonymous Resources
The resulting statement describing this would be:
http://MuzikBiz.org/233456 has a dc:Subject of 781.68
The RDF is shown following
79
Anonymous Resources
80
Anonymous Resources
If we want now to identify the source of this classification we can do so with the RDF value tag (shown following)
81
Anonymous Resources
82
Anonymous Resources
When representing an anonymous resource like this one, we know that there is some resource we are representing, we just don’t know how to name it
This is why we introduce an <rdf:Description> tag into the RDF without an rdf:about tag
The result is a graph with an anonymous node
83
Anonymous Resources
<dc:Subject>
<rdf:value>
<xya:Classification>
http://muzicBiz.com/233456
781.68
DDC
84
rdf:type Attribute
Applies to <rdf:Description>
Powerful
Links worlds of knowledge representation to object orientation (ooh ... aah)
Allows us to specify that the resource being referred to is of a particular class
Allows parsers to understand more about the meta data
85
rdf:Type Attribute
Assume that an organization named the International Press Telecommunications Council (IPTC) is responsible for the XML format used in the ePolitix articles we have been using
86
rdf:Type Attribute
IPTC has defined a URI that allows us to indicate that the article being referred to is in their NITF format
NITF refers to News Industry Text Format
This format is used widely to transfer news between organizations
87
rdf:type Attribute
The URL for all object types that belong to the NITF group of objects is something like -
http://www.iptc.org/schema/NITF#
88
rdf:type Attribute
This information could be used to enhance the RDF used in the ePolitix XML
89
rdf:Type Attribute
Now the rdf:type attribute gives us a very powerful capability akin to one that we would find in object-oriented programming
Once we know that a particular resource is of a particular type then we can use that information to check its meta-tags to insure that the correct meta-tags are used
90
rdf:type Attribute
For example if we are referring to a person resource AND we have said that a person has a FORMAT then this is probably incorrect
(The dc:Format property is used to specify the type of MIME documet)
91
rdf:type Attribute
But we know nothing more at this point about the resource
http://www.ePolitix.com/Authors/Craig%20Hoy
By specifying an rdf:type we can give the RDF processor more information
92
rdf:type Attribute
93
Typed Elements
An alternate syntax to use to express the same type of information are known as TYPED ELEMENTS
In this notation the resource that would be used in the rdf:type attribute would be turned into a namespace qualified element
94
Typed Elements
We assumed a namespace prefix for objects created by the IPTC for their NITF stnadard
The namespace prefix was:
http://www.iptc.org/schema/NITF
It is now possible to create object types or references to schemas by specifying a URI as in:
http://www.iptc.org/schema/NITF#NewsArticle
95
Typed Elements
By assigning the prefix that was just defined to a namespace paceholder, and use the classname as the name of an element the <rdf:Description> element can be replaced
<rdf:type rdf:resource=”http://www.itpc.org/schema/NITF#NewsArticle ==>
<rdf:RDF xmlns:nitf=”http://www.iptc.org/schema/NITF#”
96
Typed Elements
becomes
Namespace definitions
97
Typed Elements
This feature is very important to RDF
Anything which can appear in an RDF description tag,
is valid when used as a typed element
98
Typed Elements
Observe the change to attributes
99
Typed Elements
Being able to do this allows you to extract data from existing XML documents in the form of triplese
100
Property Elements
Property Information can be expressed through -
String literals
value for a predicate defined by the name of the element containing the literal
101
Property Elements
Said a lot about the <rdf:Description> elements so far
Recap
String Literals
Value for a predicate that is defined by the name of the element containing the literal
102
Property Elements
Example
103
Property Elements
Resources
Express properties of a resources
The value of the predicate is actually another resource
Use a URI to specify which resource it is
104
Property Elements
Example
105
Property Elements
Yet another way to accomplish this is to nest RDF statements one within another
This says that the value of the property <dc:Creator> is itself a resource
106
Property Elements
Type information can also be specified in the content of a property element
107
Property Elements
Taking a type resource and turning it into a namespace-qualified element name could abbreviate this
108
parseType=”Literal”
Sometimes it is necessary to tell the parser that it should NOT parse a particular part of the RDF
The RDF should be stored as is
Consider the following example
109
parseType=”Literal”
We are writing a mathematical paper entitled “Ramifications of (a+b)2 to World Peace”
We would like to create a MathML to specify the title since it can help us format the various symbols properly
If we place the MathML inside the <dc:Title> tag we need a way to tell the RDF parser that the MathML is not RDF
110
Ramifications of ...
The contents of this element are not simply a string
The text must be well-formed XML otherwise the parser will fail
111
parseType=”Resource”
There are times when the parser cannot tell the difference between a property value and a resource
Property values are usually inside an rdf:Description element
112
parseType=”Resource”
If this were all there is to it then all would be well. Unfortunately RDF allows us to make statements about the author as follows
113
parseType=”resource”
What is all we wanted to do was to provide the email of the author?
We really don’t care about identifying the author
114
parseType=”Resource”
This still seems too elaborate
We could simply express this information as follows
115
parseType=”Resource”
So if we were to interpret this we would come up with two different interpretations making its meaning ambiguous
On the one hand if we evaluated the representation from the inside out we would have an anonymous <dc:Creator> element which has a <v:Email> property
116
parseType=”Resource”
<dc:Creator>
<v:email>
../Articles.0000005a4787.htm
First Interpretation “Inside Out”
117
parseType=”Resource”
If you interpret the RDF representation from the outside in you would say you had a resource of a web page that had a <dc:Creator> property and that this <dc:Creator> property refers to an anonymous resource of rdf:type v:Email
118
parseType=”Resource”
<dc:Creator>
<rdf:value>
<<rdf:type>>
../Articles.0000005a4787.htm
http://www.vCard.org/Schemas#Email
119
parseType =”Resource”
This second interpretation of the RDF/XML is the one that we would prefer but the parser cannot distinguish which of these two models it should create
The problem is we need the <dc:Creator> element to be interpreted as both a web page and also as an anonymous resource so properties can attach to it
120
parseType = “Resource”
RDF/XML does allow us to force the <dc:Creator> to be interpreted as
a predicate
an anonymous resource
121
parseType = “Resource”
which is exactly the same as specifying the anonymous resource explicitly
122
Containers
Containers
list of resources
collection of resources
Example
List of articles that make up a web site
List of authors who have contributed to an article
123
Containers
RDF
Three types of containers
bag
sequence
alternative
can be used anywhere the <rdf:description> element can be used
124
<rdf:Bag>
simplest container
used to contain multiple values for a property
no significance to the order of the values
125
<rdf:Bag>
Example
The elements in a bag may also be literals
126
<rdf:Seq>
Whereas a bag does not impose any order on the elements in the list that is associated with the element, <rdf:Seq> does require that the list attached to it will be in a specific order
127
<rdf:Seq>
Example
128
<rdf:Alt>
<rdf:Alt> provides us with a way to select from a list of resources, a specific resource
In other words <rdf:Alt> provides a way of specifying alternative options
An rdf processor could choose a resource based on some desirable property
129
<rdf:Alt>
Example