Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Knowledge Technologies
Manolis Koubarakis & Eleni Tsalapati
1
Introduction to Knowledge Technologies
Knowledge Graphs, the Semantic Web and Linked
Data
Presentation Outline
• Knowledge Technologies & Data Science
• The vision of the Semantic Web
• Linked Data
– RDF, RDFS and SPARQL
• Ontologies
– Very large ontologies, building ontologies, Description logics, OWL
• Linked geospatial and temporal data
• Applications
Knowledge Technologies
Manolis Koubarakis
2
Presentation Outline
• Knowledge Technologies & Data Science
• The vision of the Semantic Web
• Linked Data
– RDF, RDFS and SPARQL
• Ontologies
– Very large ontologies, building ontologies, Description logics, OWL
• Linked geospatial and temporal data
• Applications
Knowledge Technologies
Manolis Koubarakis
3
Knowledge Technologies &
Data Science
When knowledge is:
-from multiple resources
-heterogeneous
Then we need to standardize the datasets into a unified framework that will:
-easier analysis (e.g. consolidating data with different unit systems)
-minimal loss of information
-machine & human readable
-can be explored efficiently
4
Deep learning technologies: need to semantically enrich the data to understand
their meaning
Semantic technologies: need to manipulate data more efficiently
5
Knowledge Technologies &
Data Science
Presentation Outline
• Knowledge Technologies & Data Science
• The vision of the Semantic Web
• Linked Data
– RDF, RDFS and SPARQL
• Ontologies
– Very large ontologies, building ontologies, Description logics, OWL
• Linked geospatial and temporal data
• Applications
Knowledge Technologies
Manolis Koubarakis
6
Knowledge Technologies
Manolis Koubarakis
7
The Web Today
• Universal resource identifiers (URIs) to identify documents.
• The Hypertext Transfer Protocol (HTTP) to exchange documents
between a client and a server.
• HTML for marking up information to be presented to human readers
through a browser.
• Search Engines (Google, duckduckgo) to discover information.
Knowledge Technologies
Manolis Koubarakis
8
Using Search Engines – Example I
• Assume that you want to find information about the actor Sean
Connery.
• What do you do?
– Google “Sean Connery”!
Result
Knowledge Technologies
Manolis Koubarakis
9
Discussion
• Gone are the days when Google would simply return a list of links to
Web pages containing the words in the query ordered by importance
(using PageRank).
• Now Google is able to understand that a user question is about an
entity and returns structured information about this entity (the
infobox) together with relevant web links.
Knowledge Technologies
Manolis Koubarakis
10
Using Search Engines – Example II
• Assume you would like to find information about Prof. Manolis
Koubarakis.
• Google “Prof. Manolis Koubarakis”.
Knowledge Technologies
Manolis Koubarakis
11
Result
Knowledge Technologies
Manolis Koubarakis
12
Discussion
• Good work but no structured information about this entity.
Knowledge Technologies
Manolis Koubarakis
13
Using Search Engines – Example III
• Let us now google: “Who is married to Angelina Jolie?”
Knowledge Technologies
Manolis Koubarakis
14
Using Search Engines – Example III
Knowledge Technologies
Manolis Koubarakis
15
Discussion
• Not only Google knows about entities, it also knows about
relationships e.g., “is married to”.
• Google can understand the semantics (meaning) of a user query.
Knowledge Technologies
Manolis Koubarakis
16
• The Google Knowledge Graph stores information about real-world
entities and relationships in a structured way.
• Similar knowledge graphs are employed in all big Web companies
today like Microsoft, IBM, eBay, Facebook etc. See the article
https://dl.acm.org/citation.cfm?doid=3351434.3331166
Knowledge Technologies
Manolis Koubarakis
17
The Google Knowledge Graph
Discussion
• Constructing and using knowledge graphs (or knowledge bases) has
been, for a long time, the subject of research in Knowledge
Representation and Reasoning (a subarea of Artificial
Intelligence).
• The problem of knowledge representation and reasoning on the Web
has been studied by the areas of Semantic Web, Linked Data which
are the topic of this course.
Knowledge Technologies
Manolis Koubarakis
18
Knowledge Technologies
Manolis Koubarakis
19
Towards a Better Web!
• The Semantic Web vision was articulated in a Scientific American
article by Tim Berners-Lee, James Hendler and Ora Lassila (May
2001).
– “The Semantic Web will bring structure to the meaningful content
of Web pages, creating an environment where agents roaming from
page to page readily carry out sophisticated tasks for users.”
• You can find the article on various Web sites e.g.,
http://www.dcc.uchile.cl/~cgutierr/cursos/IC/semantic-web.pdf
• Notice the words meaning (semantics) and agents (a role for all of
Artificial Intelligence).
20
Towards a Better Web!
Knowledge Technologies
Manolis Koubarakis
21
How Can we Achieve the Semantic
Web? – The Original Vision
• Instead of publishing information to be consumed by humans, publish
machine-processable data and metadata using terms/languages that
can be understood by machines.
• Build machines (agents) that will explore, integrate etc. this data.
• Make sure all agents understand your terms/languages.
Knowledge Technologies
Manolis Koubarakis
22
The Semantic Web and Linked Data
Vision Today
• From the Introduction section of the Semantic Web activity of the W3C http://www.w3.org/2001/sw/
“The Semantic Web is a web of data. There is lots of data we all use every day, and it is not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?
Why not?
• common formats for integration and combination of data drawn from diverse sources
• language for recording how the data relates to real world objects.
Knowledge Technologies
Manolis Koubarakis
23
The Semantic Web Vision Today
(cont’d)
• Why linked data and Web data integration?
– Lots of important Web applications demand this e.g., e-science
and e-government.
• Why Web standards/languages for expressing shared meaning?
– This is important if we want agents that are not handcrafted only for
particular tasks to be developed (“roaming from page to page”)
• See the paper “The Semantic Web Revisited” by Nigel Shadbolt,
Wendy Hall and Tim Berners-Lee at
http://eprints.ecs.soton.ac.uk/12614/1/Semantic_Web_Revisted.pdf .
Knowledge Technologies
Manolis Koubarakis
24
The Semantic Web Layer Cake
Knowledge Technologies
Manolis Koubarakis
25
Emphasis of this Course
• The Semantic Web topics that we will cover in this course are:
– Linked data
• RDF, RDFS, SPARQL and RDF stores (Sesame).
– Ontologies
• Description logics, OWL 2, tools for developing ontologies
(Protégé) and reasoners (Pellet, Hermit).
– Rules
• SWRL and others
– Ontology engineering
– Linked geospatial data (the work of Prof Koubarakis’ group)
• stSPARQL, GeoSPARQL and Strabon
Presentation Outline
• Knowledge Technologies & Data Science
• The vision of the Semantic Web
• Linked Data
– RDF, RDFS and SPARQL
• Ontologies
– Very large ontologies, building ontologies, Description logics, OWL
• Linked geospatial and temporal data
• Applications
Knowledge Technologies
Manolis Koubarakis
26
Knowledge Technologies
Manolis Koubarakis
27
Open Data
• The open data movement is about making data public so they can be
used freely by everyone.
• Lots of such data becoming available today is government data.
https://data.europa.eu/euodp/en/linked-data
Knowledge Technologies
Manolis Koubarakis
28
Open Data Directive
• Εntered into force on 16 July 2019 (replacing the PSI Directive of 2013)
• The goal is to make public sector and publicly funded data re-usable
and transparent.
• Focuses on the economic aspects of the re-use of information.
• Strengthens the transparency requirements for public–private
agreements
• Encourages the Member States to make as much information available
for re-use as possible
• High value datasets: beneficial for the society and economy:
– Geospatial, earth observation and environment, meteorological, statistics, companies
and company ownership, mobility
30
Linked Open Data
• The goal of this W3C community effort is “to extend the Web with a
data commons by publishing various open data sets as RDF on the
Web and by setting RDF links between data items from different data
sources.”
• The LOD community is developing a set of best practices for achieving
this.
Knowledge Technologies
Manolis Koubarakis
31
Knowledge Technologies
Manolis Koubarakis
32
The Linked Open Data Cloud
The dataset currently contains 1260 datasets with 16187 links (as of May 2020)
Linked Geo-data
Knowledge Technologies
Manolis Koubarakis
33
Linked Geo-Data (cont’d)
Knowledge Technologies
Manolis Koubarakis
34
http://www.linkedopendata.gr
Knowledge Technologies
Manolis Koubarakis
35
http://www.linkedopendata.gr
Knowledge Technologies
Manolis Koubarakis
36
37
The Greek Administrative Geography
(GAG)
• According to the Kallikratis administrative reform of 2010, Greece consists
of:
• 7 decentralized administrations (e.g., Macedonia and
Thrace)
• 13 regions (e.g., West Macedonia)
• Each region consists of regional units (e.g., Heraklion)
• Each regional unit consists of municipalities (e.g., Dimos
Chersonisou)
• …
• You can find the dataset at http://www.linkedopendata.gr/dataset/greek-
administrative-geography
Querying GAG
Knowledge Technologies
Manolis Koubarakis
38
Tim Berners-Lee outlined four principles of linked data in his "Linked Data" note
of 2006:
1. Use URIs to name (identify) things.
2. Use HTTP URIs so that these things can be looked up (interpreted,
"dereferenced").
3. When someone looks up a URI, provide useful information, using
the standards (RDF, SPARQL, etc)
4. Include links to other URIs so that they can discover more things.
39
Linked data principles
Key Linked Data Technologies
• URIs: a generic means to identify entities or concepts in the world
• HTTP: a simple, yet universal, mechanism for retrieving resources, or
descriptions of resources
• RDF: a data model for structuring and linking data that describes
things in the world
• RDFS: is a general-purpose language for representing simple RDF
vocabularies on the Web.
• SPARQL: a query language for querying linked RDF data
Knowledge Technologies
Manolis Koubarakis
40
Knowledge Technologies
Manolis Koubarakis
41
Uniform Resource Identifiers
• The Web provides a general form of identifier, called the Uniform Resource Identifier (URI), for identifying (naming) resources on the Web.
• Unlike URLs, URIs are not limited to identifying things that have network locations, or use other computer access mechanisms.
• A number of different URI schemes (URI forms) have been already been developed, and are being used, for various purposes:
• Examples:– http: (Hypertext Transfer Protocol, for Web pages) – mailto: (email addresses), e.g., mailto:[email protected]– ftp: (File Transfer Protocol) – urn: (Uniform Resource Names, intended to be persistent location-independent resource
identifiers), e.g., urn:isbn:0-520-02356-0 (for a book)
• No one person or organization controls who makes URIs or how they can be used. While some URI schemes, such as URL's http:, depend on centralized systems such as DNS, other schemes, such as freenet:, are completely decentralized.
Example
• http://en.wikipedia.org/wiki/Angelina_Jolie
• http://dbpedia.org/page/Angelina_Jolie
• http://www.imdb.com/name/nm0001401/
• http://myfavouriteactors.gr/AngelinaJolie.html (does not exist right now)
• Note the difference between the person Angelina Jolie and the various Web pages for her. It is common to use “hash URIs” to refer to the real person:– http://en.wikipedia.org/wiki/Angelina_Jolie#this
Example (cont’d)
http://dbpedia.org/page/Changeling_(film)
• dbpedia-owl:starring: http://dbpedia.org/page/Angelina_Jolie
• dbpedia-owl:director : http://dbpedia.org/page/Clint_Eastwood
Problem
• Data sets will typically talk about the same object using different
URIs. For example:
Knowledge Technologies
Manolis Koubarakis
44
http://dbpedia.org/page/Angelina_Jolie
http://www.freebase.com/view/en/angelina_jolie
Solution: Linking
• Use the special kind of link owl:sameAs that identifies the resources:
Knowledge Technologies
Manolis Koubarakis
45
http://dbpedia.org/page/Angelina_Jolie
http://www.freebase.com/view/en/angelina_jolie
46
Other Links Beyond owl:sameAs
• Natura 2000
• Example of link to GAG: This protected area is contained
in the perfecture of Macedonia.
• Public transport
• Examples of links:
• This bus stop is in the municipality of Athens.
• This bus stop is near a public building.
• You can find these datasets as well at
http://www.linkedopendata.gr
47
5-star deployment scheme for
Linked Open Data
Knowledge Technologies
Manolis Koubarakis
48
RDF Basics
• The world consists of resources.
• Resources are identified by URIs and described by simple propertiesand property values.
• Example: “Angelina Jolie stars in the film Changeling directed by Clint Eastwood.”
RDF Statements
• RDF statements are triples of the formsubject predicate object .
• Example:
<http://dbpedia.org/page/Changeling_(film)>
<http://dbpedia.org/ontology/starring>
<http://dbpedia.org/page/Angelina_Jolie> .
<http://dbpedia.org/page/Changeling_(film)>
< http://dbpedia.org/ontology/director>
<http://dbpedia.org/page/Clint_Eastwood> .
Knowledge Technologies
Manolis Koubarakis
49
Example (cont’d)
• To avoid writing long URIs we can use namespace prefixes:
@prefix dbpedia: http://dbpedia.org/page/ .
@prefix dbpedia-owl: http://dbpedia.org/ontology/ .
dbpedia:Changeling_(film) dpedia-owl:starring dbpedia:Angelina_Jolie .
dbpedia:Changeling_(film) dpedia-owl:director dbpedia:Clint_Eastwood .
Knowledge Technologies
Manolis Koubarakis
50
Knowledge Technologies
Manolis Koubarakis
51
RDF Graphs
dbpedia:Changeling_(Film)
dbpedia:Angelina_Joliedbpedia:Clint_Eastwood
dbpedia-owl:starringdbpedia-owl:director
Knowledge Technologies
Manolis Koubarakis
52
RDF Schema
• The RDF Vocabulary Description Language 1.0 (or RDF Schema or
RDFS):
– is a simple schema language for RDF.
– allows to declare classes/properties using the predefined language
level class rdfs:Class/ property rdfs:Property
– is an ontology definition language (a simple one; we can only
define taxonomies and do some basic inference about them).
– like a schema definition language in the relational or object-
oriented data models (hence the alternative name RDF Schema –
we will use this name and its shorthand RDFS mostly!).
Knowledge Technologies
Manolis Koubarakis
53
Example – Class Hierarchy
owl:Thing
dbpedia-owl:Actor dbpedia-owl:Film_Director
rdfs:subClassOf
dbpedia-owl:Person
rdfs:subClassOfrdfs:subClassOf
dbpedia:Angelina_Joliedbpedia:Clint_Eastwood
rdf:type/a rdf:type/a rdf:type/a
Knowledge Technologies
Manolis Koubarakis
54
Example – Property Hierarchy
dbpedia-owl:husbandOf dbpedia-owl:wifeOf
dbpedia-owl:spouseOf
rdfs:subPropertyOf rdfs:subPropertyOf
Standard Vocabularies
• Linked data are written using standard vocabularies i.e., vocabularies
that have been agreed by certain communities for describing
certain kinds of resources.
• Examples:
– FOAF
– Dublin Core
– schema.org
Knowledge Technologies
Manolis Koubarakis
55
Knowledge Technologies
Manolis Koubarakis
56
FOAF
• The Friend of a Friend (FOAF) project has created a standard
vocabulary for describing people, the links between them and the
things they create and do.
– e.g. foaf:name, foaf:knows, foaf:homepage
• See http://xmlns.com/foaf/spec/ for more information.
Example
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://en.wikipedia.org/wiki/Angelina_Jolie#this>
rdf:type foaf:Person ;
foaf:name “Angelina Jolie" ;
foaf:mbox <mailto:[email protected]> ;
foaf:homepage <http://www.AngelinaJolie.com/> ;
foaf:interest < http://en.wikipedia.org/wiki/UNHCR> ;
foaf:knows [ rdf:type foaf:Person ;
foaf:name “Michael Douglas" ] .
Knowledge Technologies
Manolis Koubarakis
57
Knowledge Technologies
Manolis Koubarakis
58
Dublin Core
• Dublin Core is a community effort that has defined a set of metadata
attributes (e.g., creator, creation date, title etc.) for describing a wide
range of digital resources.
• See http://dublincore.org for more information.
Knowledge Technologies
Manolis Koubarakis
59
Example
http://schema.org
• This was an initiative by Bing, Google, Yahoo! and Yandex to
provide a collection of schemas that webmasters can use to markup
their pages in ways recognized by major search providers.
• Standard ontology of classes and relations.
• Since April 2015, the W3C Schema.org community group manages this
activity.
Knowledge Technologies
Manolis Koubarakis
60
http://schema.org/LocalBusiness
Knowledge Technologies
Manolis Koubarakis
62
Microdata
• Microdata provide a simple way to annotate HTML elements with
machine readable tags.
• Microdata can use standardized vocabularies to capture the semantics
of HTML items.
• See http://www.w3.org/TR/microdata/ for more details.
• Similar (earlier) approaches are RDFa (RDF in attributes-W3C
Recommendation that adds a set of attribute-level extensions to
HTML), microformats and JSON-LD.
Knowledge Technologies
Manolis Koubarakis
63
Example – Original HTML
• The following is some HTML code for describing a local
business called “Beachwalk Beachwear and Giftware”.
<h1>Beachwalk Beachwear & Giftware</h1>
A superb collection of fine gifts and clothing to
accent your stay in Mexico Beach.
3102 Highway 98
Mexico Beach, FL
Phone: 850-648-4200
Knowledge Technologies
Manolis Koubarakis
64
Example (cont’d)– HTML With
Microdata
<div itemscope itemtype="http://schema.org/LocalBusiness">
<h1><span itemprop="name">Beachwalk Beachwear & Giftware</span></h1>
<span itemprop="description"> A superb collection of fine gifts and
clothing to accent your stay in Mexico Beach.</span>
<div itemprop="address" itemscope
itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">3102 Highway 98</span>
<span itemprop="addressLocality">Mexico Beach</span>,
<span itemprop="addressRegion">FL</span>
</div>
Phone: <span itemprop="telephone">850-648-4200</span>
</div>
Knowledge Technologies
Manolis Koubarakis
65
SPARQL
• SPARQL stands for “SPARQL Protocol and RDF Query Language”.
• It is the standard query language for RDF dataproposed by the W3C.
• It is based on matching graph patterns against RDF graphs.
• The simplest kind of graph pattern is a triple pattern.
– A triple pattern is like an RDF triple, but with the option of a variable in the subject, predicate or object
positions.
Example Dataset
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://en.wikipedia.org/wiki/Angelina_Jolie#this>
rdf:type foaf:Person ;
foaf:name “Angelina Jolie" ;
foaf:mbox <mailto:[email protected]> ;
foaf:homepage <http://www.AngelinaJolie.com/> ;
foaf:interest <http://en.wikipedia.org/wiki/UNHCR> ;
foaf:knows [ rdf:type foaf:Person ;
foaf:name “Michael Douglas" ] .
Knowledge Technologies
Manolis Koubarakis
67
Example SPARQL Query
SELECT ?name
WHERE { ?x foaf:name ?name .
?x rdf:type foaf:Person .
?x foaf:knows ?y .
?y foaf:name “Michael Douglas” .
}
• Result:?name
“Angelina Jolie"
Presentation Outline
• Knowledge Technologies & Data Science
• The vision of the Semantic Web
• Linked Data
– RDF, RDFS and SPARQL
• Ontologies
– Very large ontologies, building ontologies, Description logics, OWL
• Linked geospatial and temporal data
• Applications
Knowledge Technologies
Manolis Koubarakis
69
Knowledge Technologies
Manolis Koubarakis
70
Ontologies
• An ontology is a formal, explicit, shared specification of a conceptualization of a domain (Gruber, 1993).
• Conceptualization: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them. A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose.
• The term ontology is borrowed from Philosophy, where ontology is a systematic account of existence (what things exist, how they can be differentiated from each other etc.).
• Sometimes the word ontology is a synonym for a shared knowledge base. Some other times it is used to refer only to intentional (schema-level) knowledge.
Formal Languages for Ontologies
• Ontologies are typically expressed in some formal logic-based language (e.g., first-order logic).
• The literature also offers special formalisms for defining ontologies that
contain mainly taxonomic knowledge:
– Semantic networks
– Frames
– Description logics
– RDF, RDFS and OWL in the Semantic Web.
Knowledge Technologies
Manolis Koubarakis
71
Formal Languages for Ontologies
(cont’d)
• You can think about these formalisms as being object-oriented logics:
– They have special constructs for representing knowledge about
individuals (or objects), categories (or classes) and relationships (or
roles).
– Categories are organized into taxonomies.
– They have special reasoning methods to deal with these constructs.
• Description logics, RDF, RDFS and OWL 2 are a core part of this course.
Knowledge Technologies
Manolis Koubarakis
72
Taxonomies
• Taxonomies have been used profitably for centuries in various
technical fields (biology, medicine, library science etc.).
• Taxonomic information plays a central role in other Computer
Science areas e.g., programming languages, databases and
software engineering.
• Taxonomies are very important in modern applications: Web
information retrieval and integration, knowledge management, e-
commerce, e-science, e-government etc.
Knowledge Technologies
Manolis Koubarakis
73
Knowledge Technologies
Manolis Koubarakis
74
Example: an RDFS ontology of vehicles
Example: the GAG RDFS ontology
75
Very Large Ontologies and Knowledge
Graphs
• Various areas of human knowledge and deploying this knowledge in
applications such as search engines or question answering.
• Example: Watson, IBM’s question answering system that beat humans
in the quiz show Jeopardy (2011)
(https://www.youtube.com/watch?v=P18EdAKuC1U).
Knowledge Technologies
Manolis Koubarakis
76
Examples of Very Large Ontologies and
Knowledge Graphs
• Cyc
• Wordnet
• DBpedia
• Freebase
• Wikidata
• YAGO
• CaLiGraph
• Google Knowledge Graph
• Facebook Graph Search
• TextRunner
• NELL
Knowledge Technologies
Manolis Koubarakis
77
Cyc (1984)
• From the Cyc Web site (http://www.cyc.com/):
– The Cyc KB is a formalized representation of a vast quantity of
human common sense knowledge: facts, rules of thumb, and
heuristics for reasoning about the objects and events of everyday
life.
– At the present time, the Cyc KB contains nearly 500,000 terms,
including about 15,000 types of relations, and about 5,000,000
facts (assertions) relating these terms.
– New assertions were added to the KB through a combination of
automated and manual means.
– Latest release: 2017
Knowledge Technologies
Manolis Koubarakis
78
What does Cyc know?
Knowledge Technologies
Manolis Koubarakis
79
Cyc (cont’d)
• Cyc is written in CycL: an extension of FOL with equality, default
reasoning, skolemization, and some second-order features.
• The reasoning engine of Cyc uses around 1000 specialized
reasoners.
• There is an open source version called OpenCyc.
• Applications: Glaxo, Cleveland Clinic Foundation, Terrorist KB (2005)
Knowledge Technologies
Manolis Koubarakis
80
Wordnet
• From the WordNet Web site (http://wordnet.princeton.edu/):
– WordNet is a large lexical database of English (around
200,000 of words).
– Nouns, verbs, adjectives and adverbs are grouped into sets of
cognitive synonyms (synsets), each expressing a distinct
concept.
– Synsets are interlinked by means of conceptual-semantic
(hyponym, hypernym, meronym etc.) and lexical relations.
– WordNet is not really an ontology but parts of it can be used into
an ontology containing taxonomic information.
Knowledge Technologies
Manolis Koubarakis
81
Example
Knowledge Technologies
Manolis Koubarakis
82
Example (cont’d)
Knowledge Technologies
Manolis Koubarakis
83
• DBpedia is a community effort to extract structured information from
Wikipedia.
• This structured information resembles an open knowledge graph
• Part of the Linked Open Data effort.
• The English version:
– 4.58 million things (1,445,000 persons, 735,000 places,etc)
• Advantages:
– multi-domain
– represents real community agreement
– automatically evolves as Wikipedia changes
– multilingual (125 languages-but with different content for each one)
• Emphasis on individuals, not general knowledge about various
domains (compared to Cyc and Wordnet).Knowledge Technologies
Manolis Koubarakis
84
http://dbpedia.org/page/Angelina_Jolie
Knowledge Technologies
Manolis Koubarakis
85
http://dbpedia.org/page/Angelina_Jolie
Knowledge Technologies
Manolis Koubarakis
86
Knowledge Technologies
Manolis Koubarakis
87
Freebase• Large collaborative knowledge base consisting of data harvested from
many sources, including individual, user-submitted wiki contributions
• Freebase was an open, Creative Commons licensed repository of
structured data of almost 20 million entities.
• Bought by Google and its Knowledge Graph technology build on it.
• In December of 2014, Google closed it down.
• In September 2018 Google has published at github.com sources of graphd
server,which is a Freebase backend. https://github.com/google/graphd
• The contents of Freebase were exploited by Wikidata:
https://static.googleusercontent.com/media/research.google.com/en//pubs/
archive/44818.pdf
Knowledge Technologies
Manolis Koubarakis
88
Wikidata (https://www.wikidata.org/)
Knowledge Technologies
Manolis Koubarakis
89
Wikidata
• Provides high quality structured data in real-time (500 editors/minute)
• Largest general-purpose knowledge base on the Semantic Web
maintained by open communities
• Provides content to Wikipedia
• Multilingual: same information for each language
Knowledge Technologies
Manolis Koubarakis
90
YAGO
• From the YAGO Web page (https://yago-knowledge.org/):
– Max Planck Institute – YAGO contains both entities (such as movies, people, cities, countries, etc.) and
relations between these entities (who played in which movie, which city is located
in which country, etc.).
– Contains more than 50 million entities and 2 billion facts.
– YAGO4 was released on March 2020. It is open source.
– Combines two resources:
• Wikidata: difficult taxonomy and no human-readable entity identifiers.
• schema.org: ontology of relations and classes but without entities
• Logical constraints: ensure consistency (e.g. no entity can be at the same
time a person and a place) =>
reasoning techniques can be applied.
Knowledge Technologies
Manolis Koubarakis
91
92
CaLiGraph
• Under development (started in 2019)
• DBpedia and YAGO 2 create an entity for each page in Wikipedia.
• Wikipedia's policies permit pages about subjects only if they have a
certain popularity
• Lack information about less well-known entities
• Extraction of entities from Wikipedia's categories and listpages
• Classification of entities with ML techniques.
93
Google’s Knowledge Graph
Knowledge Technologies
Manolis Koubarakis
94
• Information gathered from a variety of sources.
• The information is presented to users in a knowledge panel next to the search results.
• It covers 570 million entities and 18 billion facts
Facebook’s Graph Search
Knowledge Technologies
Manolis Koubarakis
95
• On June 7th 2019, Facebook shut down its Graph Search options
• “The vast majority of people on Facebook search using keywords, a
factor which led us to pause some aspects of graph search and
focus more on improving keyword search. We are working closely
with researchers to make sure they have the tools they need to use
our platform.“ Buzzfeed
• The majority of urls for graph search queries is no longer working
TextRunner
• TextRunner is a knowledge base built by parsing 500,000 Web pages
and extracting relationships among entities from them.
• TextRunner is part of the research project KnowItAll at the University
of Washington. This project concentrates on two fundamental
questions:
– How can a computer accumulate a massive body of knowledge?
– What will Web search engines look like in ten years?-wait, that it is
now
Knowledge Technologies
Manolis Koubarakis
96
TextRunner (cont’d)
Knowledge Technologies
Manolis Koubarakis
97
NELL (Never-Ending Language
Learning)
Knowledge Technologies
Manolis Koubarakis
98
Interacting with NELL
Knowledge Technologies
Manolis Koubarakis
99
Building Ontologies
• The example ontologies and knowledge graphs we saw have been
created using different means:
– Cyc, schema.org: trained ontology engineers
– WordNet: trained experts (lexicographers).
– DBpedia, FreeBase, YAGO4: automatically importing structured
facts from various Web sources possibly with some inference.
– CaLiGraph, TextRunner, NELL: parsing textual data and extracting
information from them
Knowledge Technologies
Manolis Koubarakis
100
Building Ontologies (cont’d)
• Ontology engineering offers us general methodologies of how to
create, maintain and evolve well-structured ontologies.
• See the OWL tutorial available at http://www.co-
ode.org/resources/tutorials/intro/.
Knowledge Technologies
Manolis Koubarakis
101
Description Logics
• The origins of DLs lie in research on semantic networks and frames.
DLs are languages for describing the nature and structure of
objects.
• The DL approach to KR was developed in the 80's and 90's in
parallel with pure FOL approaches and other languages for
structured objects like Telos and F-logic.
• DLs have been used to provide the foundations for ontology
languages for the Web e.g., OWL.
• Many DLs are decidable fragments of first-order logic (FOL)
• Some DLs have features that are not covered in FOL
Knowledge Technologies
Manolis Koubarakis
102
The Origins: KL-ONE
Knowledge Technologies
Manolis Koubarakis
103
Examples
• The set of female doctors.
Female ∏ Doctor
• The set of individuals that have at least 3 children that are male.
• The set of individuals such that all their children have graduated from at
least one Greek University.
• The set of individuals that have at least three children such that all their
degrees are from Greek Universities.
Knowledge Technologies
Manolis Koubarakis
104
Examples (cont’d)
• Ann is a female doctor.
(Female ∏ Doctor)(ANN)
• John is a child of Ann.
hasChild(ANN,JOHN)
• Ann has at least 3 children that are male.
Knowledge Technologies
Manolis Koubarakis
105
Examples (cont’d)
• A woman is a female person.
Woman ≡ Person ∏ Female
• A parent is a person who has at least one child.
Parent ≡ Person ∏
Knowledge Technologies
Manolis Koubarakis
106
Knowledge Technologies
Manolis Koubarakis
107
OWL
• OWL 2 is the current version of the
Web Ontology Language and a
W3C recommendation as of
October 2009.
• The previous version of OWL (OWL 1) became a W3C recommendation in 2004.
• All W3C documents about OWL 2 can be found at http://www.w3.org/TR/2009/REC-owl2-overview-20091027/ .
Knowledge Technologies
Manolis Koubarakis
108
The Structure of OWL 2
Example
SubClassOf(:Student :Person)
EquivalentClasses(:Man
ObjectIntersectionOf(:Person :Male))
EquivalentClasses(:Woman
ObjectIntersectionOf(:Person :Female))
EquivalentClasses(:Parent
ObjectSomeValuesFrom(:hasChild :Person))
Knowledge Technologies
Manolis Koubarakis
109
Example (cont’d)
EquivalentClasses(:Teenager
ObjectIntersectionOf(Person
DataSomeValuesFrom(:hasAge
DatatypeRestriction(xsd:integer
xsd:minExclusive "12"^^xsd:integer
xsd:maxInclusive "19"^^xsd:integer ))))
Knowledge Technologies
Manolis Koubarakis
110
Example (cont’d)
EquivalentClasses(:ChildlessPerson
ObjectIntersectionOf(:Person
ObjectComplementOf(ObjectSomeValuesFrom(
ObjectInverseOf(:hasParent) owl:Thing))))
Knowledge Technologies
Manolis Koubarakis
111
Example (cont’d)
ClassAssertion(:Person :John)
ClassAssertion(:Male :John)
ClassAssertion(:Person :Mary)
ClassAssertion(:Female :Mary)
ObjectPropertyAssertion(:hasChild :John :Mary)
NegativeObjectPropertyAssertion(:hasChild :John :Tina)
DataPropertyAssertion(:hasAge :Maria "18"^^xsd:integer)
Knowledge Technologies
Manolis Koubarakis
112
Example (cont’d)
SameIndividual(:John :Jack)
SameIndividual(:John otherOnt:JohnBrown)
SameIndividual(:Mary otherOnt:MaryBrown)
DifferentIndividuals(:John :Bill)
Knowledge Technologies
Manolis Koubarakis
113
Presentation Outline
• Knowledge Technologies & Data Science
• The vision of the Semantic Web
• Linked Data
– RDF, RDFS and SPARQL
• Ontologies
– Very large ontologies, building ontologies, Description logics, OWL
• Linked geospatial and temporal data
• Applications
Knowledge Technologies
Manolis Koubarakis
114
Geospatial Data on the Web
Knowledge Technologies
Manolis Koubarakis
115
Earth Observation Data
Knowledge Technologies
Manolis Koubarakis
116
ExtremeEarth
Linked Geospatial and Temporal Data
• This is a research area studied by Prof Koubarakis’ group since 2010.
• The basic research question is how to manage geospatial and
temporal data on the Web using linked data technologies.
• See our team web site for more: http://ai.di.uoa.gr/
Knowledge Technologies
Manolis Koubarakis
117
Research Contributions
• The model stRDF and stSPARQL
• The systems GeoTriples and Silk
• The spatiotemporal RDF store Strabon
• The ontology-based data access system Ontop-spatial.
• The visualization tool Sextant.
• Many applications, especially with Earth Observation
data.
• You will use most of our tools in the course project.
• See our survey paper:
http://cgi.di.uoa.gr/~koubarak/publications/2017/ic4ld-
manolis.pdf
Knowledge Technologies
Manolis Koubarakis
118
Example of stRDF (Geospatial
Dimension)gag:Olympia
rdf:type gag:MunicipalCommunity;
gag:name "Ancient Olympia";
gag:population "184"^^xsd:int;
strdf:hasGeometry "POLYGON((21.5 18.5,23.5 18.5, 23.5
21,21.5 21,21.5 18.5));
http://www.opengis.net/def/crs/OGC/1.3/CRS84"^^strdf:WKT.
Knowledge Technologies
Manolis Koubarakis
119
Ancient Olympia
Example of stSPARQL
(Geospatial Dimension)Query: Compute the parts of burnt areas that lie in coniferous forests
SELECT ?burntArea (strdf:intersection(?baGeom,
strdf:union(?fGeom)) AS ?burntForest)
WHERE { ?burntArea rdf:type noa:BurntArea;
strdf:hasGeometry ?baGeom.
?forest rdf:type clc:Region;
clc:hasLandCover clc:ConiferousForest;
strdf:hasGeometry ?fGeom.
FILTER (strdf:intersects(?baGeom,?fGeom)) }
GROUP BY ?burntArea ?baGeom
Knowledge Technologies
Manolis Koubarakis
120
Geospatial Question Answering
• Try the following question on Google:
“What rivers cross London?”
Knowledge Technologies
Manolis Koubarakis
121
Geospatial Question Answering
(cont’d)• Although a Web page about crossings of River Thames
comes up first in the results, it is now clear that we do
not get a definite answer like we got for Sean Connery
and Angelina Jolie earlier.
• The reason is that the Google knowledge graph
currently does not support geospatial question
answering (similarly, temporal or with quantities etc.)
• See our paper on this topic:
http://cgi.di.uoa.gr/~koubarak/publications/2018/template
-based-GeoQA.pdf
Knowledge Technologies
Manolis Koubarakis
122
Presentation Outline
• The Semantic Web vision
• Linked data
• Linked geospatial and temporal data
• Ontologies
• Applications
Knowledge Technologies
Manolis Koubarakis
123
Applications
• Industry (e.g., see project Optique http://www.optique-
project.eu/)
• E-government (e.g., open data)
• E-commerce
• Tourism
• Medicine
• Biology
• Earth Observation (see the work of my group in projects
TELEIOS http://www.earthobservatory.eu/Earth
Observation (see the work of my group in projects
TELEIOS http://www.earthobservatory.eu/ and LEO
http://www.linkedeodata.eu/ ).
• …Knowledge Technologies
Manolis Koubarakis
124
KGs in Enterprise (neo4j)
125Knowledge Graphs: The Path to Enterprise — Michael Moore and AI Omar Azhar, EY
Diffbot
126
https://www.youtube.com/watch?v
=d58BwcyTwEo&t=16s
Knowledge Technologies
Manolis Koubarakis
127
Readings
• The papers mentioned in the presentation.
• Gomez-Perez J.M., Pan J.Z., Vetere G., Wu H. (2017) Enterprise Knowledge Graph: An Introduction. In: Pan J., Vetere G., Gomez-Perez J., Wu H. (eds) Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer, Cham. https://doi.org/10.1007/978-3-319-45654-6_1
• Chapter 1 of the Semantic Web Primer by Grigoris Antoniou and Frank van Harmelen available from http://www.csd.uoc.gr/~hy566/
• Chapter 1 of the book “Foundations of Semantic Web Technologies” by Pascal Hitzler, Markus Krötzsch and Sebastian Rudolph. See http://semantic-web-book.org.
• The book “Linked Data: Evolving the Web into a Global Data Space”by Tom Heath and Christian Bizer available from http://linkeddatabook.com/editions/1.0/ .
Readings (cont’d)
• The W3C Semantic Web Activity web site (http://www.w3.org/2001/sw/). Browse! There are many interesting introductory tutorials.
• The ESWC invited talk of Jim Hendler available at http://videolectures.net/eswc2011_hendler_work/ which looks at what the Semantic Web has achieved so far and deals with various criticisms of the area.
Knowledge Technologies
Manolis Koubarakis
128
What to do after this Course
• Join my research group ☺
• Innovate using Semantic Web Technologies and Linked
Data! Two nice examples:
– Nomothesi@ (http://legislation.di.uoa.gr/, Ilias
Chalkidis, Babis Nikolaou and Panagiotis Soursos)
– Diavgeia Redefined (https://eellak.github.io/gsoc17-
diavgeia/index.html , Themis Beris)
Knowledge Technologies
Manolis Koubarakis
129
Nomothesi@
Knowledge Technologies
Manolis Koubarakis
130
Diavgeia Redefined
Knowledge Technologies
Manolis Koubarakis
131
geoQA
-Question Answering over Large Geographic KBs
-Yago2geo
-Scalable KG update
-QA via KG embeddings
-QA via templates
132
Knowledge Technologies
Manolis Koubarakis
133
Outline of the Rest of the
Course• The Resource Description Framework (RDF and
RDFS)
• SPARQL: A Query language for RDF and RDFS
• Description logics
• The Web Ontology Language (OWL 2)
• Rule languages for the Semantic Web (SWRL)
• Ontology Engineering
• Linked geospatial and temporal data. Geospatial and temporal extensions of RDF and SPARQL