Upload
alicia-hoover
View
224
Download
2
Tags:
Embed Size (px)
Citation preview
Logics for Data and Knowledge Representation
Introduction to Semantic Web
Fausto GiunchigliaFeroz Farazi
Semantic Web
An extension of the WWW, in which information is given well-defined
meaning, better enabling computers and people to work in
cooperation [T. Berners-Lee et al., 2001]
A new form of Web content that is computer comprehensible will
open up a revolution of new possibilities [T. Berners-Lee et al., 2001]
An alternative approach to represent Web content in machine
processable way, and to use intelligent techniques to take advantage
of these representations [G. Antoniou and F.v. Harmelen, 2004]
An extra abstraction layer, a so-called semantic layer, to be built on
top of the Web [F. Giunchiglia et al., 2010]
Definitions
Semantic Web
Semantics
Data and documents are assigned semantics
Semantics are codified as metadata
Logic
Logic as a tool for expressing knowledge and semantics
Ontology
A set of terms and semantic relations among them
ZIP code and postal code are equivalent for example
Language and Vocabulary
Semantic Web Languages (e.g., RDF and OWL)
Standard Vocabularies (e.g., Dublin Core and FOAF)
Keys
World Wide Web
An enormous collection of data and documents Any kind Mixed Keeps growing Open to all
Suffers from some well known limitations in information Searching Extracting Maintaining Unveiling
With all this limitations and features it is quite useful and interesting
Nevertheless, for better user experience we want to build a more
integrated and consistent Web
Dumb Web to Smart Web
Consider that you are planning vacation to major excavation region of
Heraklion in Crete Island Find a list of hotels by location List shows your known hotel chain Aldemar has a branch there Unfortunately, you do not see it in Aldemar’s website What would you call it? Dumb? Here with dumb we mean inconsistent
Consider that you are planning a conference trip to Crete Island You find many branches of Aldemar in the surroundings of the conference venue You wonder to know the nearest (minimum walking distance) one You can find many mapping sites (e.g., Google Map) answering the distance with
the addresses given in input You are the one spending time in copying and pasting addresses on the site. Can
we make it any better?
[D. Allemang and J. Hendler, 2008]
Dumb Web to Smart Web
Suppose you wonder to know the municipalities in the Autonomous
Province of Trento
municipalities in the province of Trento were reorganized in 2010
these were reduced from 223 to 217
still many sites listing the former statistics instead of the latter
because information is hard-coded in the html pages or retrieved from the
databases of the authorities to represent them on the web
in way for human consumption only
not for the machines, which hinders other parties to update changes automatically
Considering all the above what do we opt to build a smart web?
Smart applications or smart Web infrastructure?
Why
Smart Web Applications The Web is overwhelmed with smart applications, in addition day to
day new ones are coming to the scene
Great advancement achieved in the implementation of the ideas once
considered very hard to do or will never happen
To name a few applications Search engines’ matches are non-trivial, seem deep and intuitive Commerce sites recommend intelligently considering customer purchase patterns Mapping sites can plan routes and provide detailed information about geography
What role the Web infrastructure can play? All these smart applications are only as smart as the data provided to them Inconsistent data will lead to dumb result even from smart applications Web infrastructure needs to be improved to support better consistency of the data
the fact that smart applications can perform to their potential
Smarter Web A Web with an infrastructure that enhances the whole Web
experience by
enabling connections among data
letting users connect data to smart Web applications
not surprising us with inconsistencies
In the case of Aldemar hotel branch in the major excavation region of
Heraklion we need a coordination
between the Aldemar site and the hotel listing site by location in the level of data
that would help updating the list when there is a change in the location of hotels
In the mapping site scenario, we would like it to understand
the data from the conference and the hotels sites
without requiring human intervention in copying and pasting
Semantic Data and Web of Data
Semantic data is computer understandable data
e.g., representing the hotels as real world entities and their addresses as attributes
in Semantic Web languages using standard vocabularies
e.g., representing each municipality of Trento as part_meronym of the province,
entity-entity connectivity within a dataset
The Semantic Web is a web of interconnected datasets where
one data element can point to another (through URIs), rather than a webpage
points to another, forming a web of data
the Web infrastructure provides a data model supporting a single entity can be
distributed over the Web
the data model coherence is part of the Web infrastructure
Linked Data
Linked Data approach form the basis of data publishing guidelines
pinpointing how can data from government, public and private sectors
be more valuable for the consumers
Linked Data approach came up with
a set of principles
the star rating system
Principles
the use of http URIs as the identifiers of things (concepts, entities and attributes)
the provision of meaningful content published in RDF for each such URI reference
the production of navigable content via links
Linked Open Data
The star rating system is a system that rates the published data in a
scale from 1-star to 5-star
Getting 1-star requires publishing data on the Web with an open license regardless
of format, e.g., datasets can be published as images; this is also called Open Data
Producing 2-star data requires the Open Data to be made available in structured
format (e.g., excel; proprietary) in order to make it become machine readable
Producing 3-star data requires non-proprietary formats, e.g., csv or tsv, on top of
the previous rating levels
Getting 4-star requires publishing data using W3C open standards, e.g., RDF
Achieving 5-star, the highest level in the rating spectrum, demands establishing
links to RDF datasets published by others
A dataset that reaches 5-star is also called Linked Open Data
What is an entity?We organize our world (ground) knowledge around entities
»Entities are objects which are so important in our everyday life to be referred with a name»Each entity has its own metadata (e.g. name, latitude, longitude, height…)»Each entity is in relation with many other entities (e.g. Eiffel Tower is located in Paris, Fausto is a friend of Raffaella)»There are relatively “few” commonsense entity types (person, …, event)»There are many application/focus dependent entities (artifacts, maths, ..)
Eiffel Tower
Entitypedia – the key ideas• Clear separation between the
– knowledge (about entities/instances) and the– language (classes/concepts) used to express the knowledge
• Knowledge as very carefully designed (2)– Lattice of entity types (attributes, relations, services)– … unifying most (all?) standards (de jure, de facto) (Dublin Core,
FOAF, Facebook, …)• Language as very carefully designed (1)
– Linguistic resource (Wordnet + (Corelex + homographs) + multiple NLs)
– … + a faceted domain Knowledge organization infrastructure, developed using the analytico-synthetic approach (extending Library Science PMEST/DEPA frameworks)
• Direct linear time encoding into RDF/DL (3)– but (!) with fine tuned very fast data structures (for search, entity
matching, …)• (Relatively) large scale bootstrapping + continuous evolution (4)
– via system-sourcing and crowd-sourcing (under study now)• Data certification (5)
– … via quality certification pipeline (under study now)
Natural language and formal language
AUTOMOBILE CAR MACCHINA
The same concept can be expressed in different ways in the same language and
across languages
Different languages and terminology
Formal language: domains
DERA domains (D for Domain) organize the (formal concept) language into any number of domains (“any area of knowledge, chosen subjectively, that we want to reason or communicate about”). Examples: medicine. music, pop music, people, Movies, skiing, my garden …
LOCATION
MONUMENT
BODY OF WATER
RIVER
EIFFEL TOWER
COLOSSEUM
GARDA LAKE
MISSISSIPI
AMAZON RIVER
A fragment of the Space Domain
» Inspired by Ranganatan faceted approach
» Following precise design principles (analytico-synthetic approacch)
» Organize entities as classes of similar objects
» Independent of the specific chosen domains
» Lattice of (overlapping) domains » Top level domain = upper level
ontology
Formal language: Facets» A DERA Domain contains any number of facets (hierarchy of terms
each denoting an atomic concept – often corresponding to a NL multiword)
» A DERA Facet is of one of three types (E for Entity, R for Relation, A for Attribute)
LOCATION
MONUMENT
BODY OF WATER
RIVER
EIFFEL TOWER
COLISEUM
GARDA LAKE
MISSISSIPI
AMAZON RIVER
A fragment of an entity facet in the Space Domain
» Entity: see picture (classes of entities and entities)
» Relation: Far, near, east, … with roles playing the double role of entity and relation
» Attribute: qualities / quantities (high, low, 23m,) , descriptive attributes (“India is a democratic country”)
Knowledge
» A set of entity types, each entity type defined in terms of:˃ Attributes (e.g., height, lattitude)
˃ Relations (e.g., locatedIn, friend)
˃ Services (e.g., computeAge, computeFoFs, computeInverseRelation, ..)
˃ Many (categories of) metaattributes (e.g., mandatory, identifying, permanent, timespan, provenance, …)
» Entity types organized in a lattice ˃ coherent with the domain lattice
˃ With an ordering on <attibutes, relations, services> but also subsupmption, value ranges, …
» Entities:˃ A name and a URI
˃ Etype <attributes, relations, services> plus free
˃ One reference etype and many induced etypes
Knowledge services
» CRUD on entities» EntitySearch(“metadata of E1”) (*useful in NER
*)» EntityMatch (E1, E2)» Etypes (“some element of an entity”)» Extension (etype) (* same as search(etype)
*)» Navigate (E1, R) (* Navigate (Fausto,
Friends) *)» Distance(E1,E2,R) (* Distance(Fausto, Obama, Friend)
*)» … » … many etype and application dependent services
Some examples of etypes
ENTITYName String [ ] Description SString [ ] Part Of <Entity> Homepage URL [ ] Start Moment End Moment Duration Duration
EVENT extends ABSTRACT ENTITYParticipant <Person> [ ] | <Organization> [ ]Location <Location> Status Enum <SString>…
LOCATION extends PHYSICAL ENTITYLatitude floatLongitude floatAltitude float…
PHYSICAL ENTITY extends ENTITY Height floatLength floatWidth floatWeight float
Example of entities
ETH Zurich
UNIVERSITY
Albert Einstein Mileva Maric
Ulm Germany
part-of
birth place
spouse
affiliation
SCIENTIST PERSON
CITY COUNTRY
A critical issue: dot-objects
ETH Zurich
UNIVERSITY(as organization)
UNIVERSITY(as building)
Some entities have a clear inherent polysemy (Pustejovski)
» According to the situation either one aspect or the other (typically the physical or abstract aspect) of the entity is emphasized. This generates polysemy in language.
» Since it depends on the situation, it would be wrong to permanently disambiguate it in one or the other way
» We need a systematic way to represent these entities
Encoding into RDF
» Choose (sub)domain» E facet translates into TBOX concept subsumption
axioms (e.g., river LG “body of water”)» R facet translates into TBOX role subsumption (e.g.,
parentOf MG fatherOf)» A facet translates into TBOX subsumption (e.g.,
angularDistance MG latitude)» Entity properties translate into ABOX axioms (e.g.,
livesIn(Fausto, Trento)
NOTE: Used only for interoperability, open data, … reasoning on native data structures as specific purpose services
Features of a Semantic Web Radical new way of thinking about representing information for better
results and better management
The feature of the Web is characterized by AAA Slogan (Anyone can
say Anything about Any topic)
On the Semantic Web any individual has to be allowed to contribute a
piece of data about some entity that can be linked to the information
from other sources
This requirement
was taken into account while designing RDF
has a consequence that there is always one more (something new that someone
will express) could be known – Open World Assumption
RDF RDF (Resource Description Framework)
– A language for representing data in the Semantic Web
– a simple data model for making statements
– the capability to perform inference on the statements
Data model in RDF
– The data model in RDF is a graph data model
– An edge with two connecting nodes form a triple
– Triple elements are subject, object and predicate
RDF representation
– URIs to identify subjects, objects and predicates
– Objects can be Literals
References T. Berners-Lee, J. Hendler, & O. Lassila (2001, May). The Semantic
Web. Scientific American 284,34–43. G. Antoniou & F. van Harmelen (2004). A Semantic Web Primer
(Cooperative Information Systems). MIT Press, Cambridge MA, USA. F. Giunchiglia, F. Farazi, L. Tanca, and R. D. Virgilio. The semantic
web languages. In Semantic Web Information management, a model based perspective. Roberto de Virgilio, Fausto Giunchiglia, Letizia Tanca (Eds.), Springer, 2009.
D. Allemang and J. Hendler. Semantic web for the working ontologist: modeling in RDF, RDFS and OWL. Morgan Kaufmann Elsevier, Amsterdam, NL, 2008.
T. Berners-Lee. Linked Data. Design Issues for the World Wide Web - W3C, http://www.w3.org/DesignIssues/LinkedData.html, 2006.