Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Pragmatics of knowledge Pragmatics of knowledge engineering on the Webengineering on the Web
Guus SchreiberGuus SchreiberFree University AmsterdamFree University Amsterdam
CoCo--chair W3C Semantic Web Deployment WGchair W3C Semantic Web Deployment WG
Overview
• Principles for ontology engineering on Web scale– Some remarks about web standards
• RDF/OWL conversion issues• SKOS: pragmatics of publishing Web
vocabularies– Context: W3C SWD Working Group
Principles for ontology engineering in a distributed world
1. Modesty principle
• Ontology engineers should refrain from developing their own idiosyncratic ontologies
• Instead, they should make the available rich vocabularies, thesauri and databases available in web format
• Initially, only add the originally intended semantics
2. Scale principle: “Think large!”
"Once you have a truly massive amount of information integrated as knowledge, then the human-software system will be superhuman, in the same sense that mankind with writing is superhuman compared to mankind before writing."
Doug Lenat
Applications require many ontologies
3. Pattern principle: don’t try to be too creative!
• Ontology engineering should not be an art but a discipline
• Patterns play a key role in methodology for ontology engineering
• See for example patterns developed by the W3C Semantic Web Best Practices group
http://www.w3.org/2001/sw/BestPractices/
SKOS: pattern for thesaurus modeling
• Based on ISO standard• RDF representation• Documentation:
http://www.w3.org/TR/swbp-skos-core-guide/
• Base class: SKOS Concept
4. Enrichment principle
• Don’t modify, but add!• Techniques:
– Learning ontology relations/mappings– Semantic analysis, e.g. OntoClean– Processing of scope notes in thesauri
Example enrichment• Learning relations between art styles in AAT
and artists in ULAN through NLP of art0historic texts
• But don’t learn things that already exist!
DERAIN, AndreThe Turning Road
MATISSE, HenriLe Bonheur de vivre
Extracting additional knowledge from scope notes
Thesauri / vocabularies
• Large bodies of domain-specific knowledge that represent consensus in particular domains
• Typically weak semantic structure• Often lots of implicit semantics available• Representation is typically relational
database and/or XML• Semantic Web Challenge showed that
thesauri are important resources for SW applications
WordNet: internal representation
s(108644031,1,'bed',n,3,2).s(108644031,2,'bottom',n,5,1).
s(102719813,1,'bed',n,1,51).s(102720436,1,'bed',n,2,3).
g(108644031,'(a depression forming the ground under a body of water; "he searched for treasure on the ocean bed")').
g(102719813,'(a piece of furniture that provides a place to sleep; "he sat on the edge of the bed"; "the room had only a bed andchair")').
g(102720436,'(a plot of ground in which plants are growing; "thegardener planted a bed of roses")').
SynsetID Order LexForm Type SenseNum
synset3rd sense ofBed (noun)
5th sense ofBottom (noun)
Synset108644031
a depression forming the ground under a body of water; "he searched for treasure on the ocean bed
WordNet URI s
• What URIs should be chosen?– SynSet, WordSense, Word
• URI name: – ID? => difficult for human interpretation– Concatenated unique, human readable
wn:synset-bank-noun-2 First sense in synset denoted by second sense of “bank”
wn:wordsense-bank-noun-1 wn:word-bank
XML fragment of ULAN
<Associative_Relationships><Associative_Relationship><Historic_Flag>NA</Historic_Flag><Relationship_Type>1102/student of
</Relationship_Type><Related_Subject_ID><VP_Subject_ID>500011051</VP_Subject_ID>
</Related_Subject_ID></Associative_Relationship>
</Associative_Relationship>
Conversion issues
• XML and RDF/OWL are inherently different– XML = thesaurus document structure– RDF = thesaurus document content
• Redundant information in XML file<Associative_Relationships><Historic_Flag>NA</Historic_Flag>
• How to represent “student of”?– Subproperty of Associative_Relationship is
probably preferred– Needs to be derived from the data; not part of
schema
XML fragment of ULAN (2)
<Non-Preferred_Term><Term_Text>Koning, Philips Aertsz. de</Term_Text><Term_ID>1500207734</Term_ID><Display_Order>34</Display_Order><Vernacular>Vernacular</Vernacular>
</Non-Preferred_Term>
Conversion issues
• Do we include all information in the conversion?– Display-order example– Source and revisions information
• Should each term have a URI?• Making language explicit
– “vernacular” means the string is written in the original language
– Multi-linguality is an important issue for thesauri
SWD goals
• Schema for interoperable RDF/OWL representation of vocabularies – SKOS
• Publication guidelines: – URI management, representation of versions
• Embedding RDF in (X)HTML pages– RDFa
ISO standard for representing thesauri
• Term– Preferred term (USE)– Non-preferred term (USED FOR)
• Hierarchical relation between terms– Broader/narrower term (BT/NT)
• Generic• Partitive
• Association between terms (RT)
Multi-lingual labels for concepts
Semantic relation:broader and narrower
• No subclass semantics assumed!
Semantic relations:related
• Symmetry is issue (OWL use)
Indexing a resource with a SKOS concept
• primarySubject is defined as subproperty
Collections:role-type trees
Adding semantics
• Adding OWL statements• Interpretations of thesaurus relations such as
narrower as subclass-of are often imprecise (but can still be useful)
• Learning relations between thesauri is important form of additional semantics– Example: AAT contains styles; ULAN contains
artists, but there is no link– Availability of this kind of alignment knowledge is
extremely useful
SKOS semanticsinference rules
• Collection membership rule(?i skos:subject ?x) (?x skos:broader ?y)
-> (?i skos:subject ?y)
• If a painting of Van Gogh has as subjectSunFlowers and if Flowers is a broaderterm of SunFlowers, then Flowers is also the subject of the painting.
W3C standardization process
• Input: draft specification• Collect use cases• Derive requirements• Create issues list: requirements that cannot be
handled by the draft spec• Propose resolutions for issues• Continuously: ask for public feedback/comments• Get consensus on amended spec• Find two independent implementation for each
feature in the spec
Example issue: relationships between lexical labels
• In draft SKOS spec lexical labels of concepts are represented as datatype properties
• Use cases require relations between labels, e.g. “AAT” is an acronym of “Art & Architecture Thesaurus”
• This is a problem because literals have no URI (so cannot be subject of an RDF property)
• Possible resolutions:– Labels/terms as classes– Relaxing constraints on label property– …..
Recipes for vocabulary URIs
• Simplified rule:– Use “hash" variant” for vocabularies that are
relatively small and require frequent accesshttp://www.w3.org/2004/02/skos/core#Concept
– Use “slash” variant for large vocabularies, where you do not want always the whole vocabulary to be retrieved
http://xmlns.com/foaf/0.1/Person• For more information and other recipes, see:
http://www.w3.org/TR/swbp-vocab-pub/
More information
Query for WordNet URI returns “concept-bounded description”
A RDFa sample
Regular HTML
Resulting RDF statements
HTML with RDFa
Adding datatypes and informal representation
Linking to other resources
Regular HTML
HTML with embedded RDF
Statements about other resources:photo example