41
Ivan Herman

Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

Ivan Herman

Page 4: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(4)(4)

Site editors roam the Web for new facts

may discover further links while roaming

They update the site manually

And the site gets soon out-of-date

Page 5: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(5)(5)

Editors roam the Web for new data published on

Web sites

“Scrape” the sites with a program to extract the

information

Ie, write some code to incorporate the new data

Easily get out of date again…

Page 6: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(6)(6)

Editors roam the Web for new data via API-s

Understand those… input, output arguments, datatypes used, etc

Write some code to incorporate the new data

Easily get out of date again…

Page 7: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(7)(7)

Use external, public datasets

Wikipedia, MusicBrainz, …

They are available as data

not API-s or hidden on a Web site

data can be extracted using, eg, HTTP requests or standard

queries

Page 8: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(8)(8)

Use the Web of Data as a Content Management

System

Use the community at large as content editors

Page 9: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(9)(9)

Page 10: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(10)(10)

There are more an more data on the Web

government data, health related data, general knowledge,

company information, flight information, restaurants,…

More and more applications rely on the availability of

that data

Page 11: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(11)(11)Photo credit “nepatterson”, Flickr

Page 12: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(12)(12)

A “Web” where

documents are available for download on the Internet

but there would be no hyperlinks among them

Page 13: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(13)(13)

Page 14: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(14)(14)

We need a proper infrastructure for a real Web of

Data

data is available on the Web

• accessible via standard Web technologies

data are interlinked over the Web

ie, data can be integrated over the Web

This is where Semantic Web technologies come in

Page 15: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(15)(15)Photo credit “kxlly”, Flickr

Page 16: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(16)(16)

We will use a simplistic example to introduce the

main Semantic Web concepts

Page 17: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(17)(17)

Map the various data onto an abstract data

representation make the data independent of its internal representation…

Merge the resulting representations

Start making queries on the whole! queries not possible on the individual data sets

Page 18: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(18)(18)

Page 19: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(19)(19)

ISBN Author Title Publisher Year

0006511409X id_xyz The Glass Palace id_qpr 2000

ID Name Homepage

id_xyz Ghosh, Amitav http://www.amitavghosh.com

ID Publisher’s name City

id_qpr Harper Collins London

Page 20: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(20)(20)

http://…isbn/000651409X

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:namea:homepage

a:author

Page 21: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(21)(21)

Relations form a graph

the nodes refer to the “real” data or contain some literal

how the graph is represented in machine is immaterial for

now

Page 22: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(22)(22)

Page 23: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(23)(23)

A B C D

1 ID Titre Traducteur Original

2 ISBN 2020286682 Le Palais des Miroirs

$A12$ ISBN 0-00-6511409-X

3

4

5

6 ID Auteur

7 ISBN 0-00-6511409-X

$A11$

8

9

10 Nom

11 Ghosh, Amitav

12 Besse, Christianne

Page 24: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(24)(24)

http://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:nom

f:traducteur

f:auteur

http://…isbn/2020386682

f:nom

Page 25: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(25)(25)

http://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:nom

f:traducteur

f:auteur

http://…isbn/2020386682

f:nom

http://…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:author

Page 26: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(26)(26)

http://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:nom

f:traducteur

f:auteur

http://…isbn/2020386682

f:nom

http://…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:author

Same URI!

Page 27: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(27)(27)

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:original

f:nom

f:traducteur

f:auteur

http://…isbn/2020386682

f:nom

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:author

http://…isbn/000651409X

Page 28: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(28)(28)

User of data “F” can now ask queries like:

“give me the title of the original”

• well, … « donnes-moi le titre de l‟original »

This information is not in the dataset “F”…

…but can be retrieved by merging with dataset “A”!

Page 29: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(29)(29)

We “feel” that a:author and f:auteur should be the

same

But an automatic merge doest not know that!

Let us add some extra information to the merged

data: a:author same as f:auteur

both identify a “Person”

a term that a community may have already defined:

• a “Person” is uniquely identified by his/her name and, say,

homepage

• it can be used as a “category” for certain type of resources

Page 30: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(30)(30)

Besse, Christianne

Le palais des miroirsf:original

f:nom

f:traducteur

f:auteurhttp://…isbn/2020386682

f:nom

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:author

http://…isbn/000651409X

http://…foaf/Person

r:type

r:type

Page 31: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(31)(31)

User of dataset “F” can now query:

“donnes-moi la page d‟accueil de l‟auteur de l‟original”

• well… “give me the home page of the original‟s „auteur‟”

The information is not in datasets “F” or “A”…

…but was made available by: merging datasets “A” and datasets “F”

adding three simple extra statements as an extra “glue”

Page 32: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(32)(32)

Using, e.g., the “Person”, the dataset can be

combined with other sources

For example, data in Wikipedia can be extracted

using dedicated tools

e.g., the “dbpedia” project can extract the “infobox”

information from Wikipedia already…

Page 33: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(33)(33)

Besse, Christianne

Le palais des miroirsf:original

f:nom

f:traducteur

f:auteurhttp://…isbn/2020386682

f:nom

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:author

http://…isbn/000651409X

http://…foaf/Personr:type

r:type

http://dbpedia.org/../Amitav_Ghosh

r:type

foaf:name w:reference

Page 34: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(34)(34)

Besse, Christianne

Le palais des miroirsf:original

f:nom

f:traducteur

f:auteurhttp://…isbn/2020386682

f:nom

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:author

http://…isbn/000651409X

http://…foaf/Personr:type

r:type

http://dbpedia.org/../Amitav_Ghosh

http://dbpedia.org/../The_Hungry_Tide

http://dbpedia.org/../The_Calcutta_Chromosome

http://dbpedia.org/../The_Glass_Palace

r:type

foaf:name w:reference

w:author_of

w:author_of

w:author_of

w:isbn

Page 35: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(35)(35)

Besse, Christianne

Le palais des miroirsf:original

f:nom

f:traducteur

f:auteurhttp://…isbn/2020386682

f:nom

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:author

http://…isbn/000651409X

http://…foaf/Personr:type

r:type

http://dbpedia.org/../Amitav_Ghosh

http://dbpedia.org/../The_Hungry_Tide

http://dbpedia.org/../The_Calcutta_Chromosome

http://dbpedia.org/../Kolkata

http://dbpedia.org/../The_Glass_Palace

r:type

foaf:name w:reference

w:author_of

w:author_of

w:author_of

w:born_in

w:isbn

w:long w:lat

Page 36: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(36)(36)

It may look like it but, in fact, it should not be…

What happened via automatic means is done every

day by Web users!

The difference: a bit of extra rigour so that machines

could do this, too

Page 37: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(37)(37)

We could add extra knowledge to the merged

datasets e.g., a full classification of various types of library data

geographical information

etc.

This is where ontologies, extra rules, etc, come in ontologies/rule sets can be relatively simple and small, or

huge, or anything in between…

Even more powerful queries can be asked as a

result

Page 38: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(38)(38)

Data in various formats

Data represented in abstract format

Applications

Map,Expose,…

ManipulateQuery…

Page 39: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(39)(39)

… the graph representation is independent of the

exact structures

… a change in local database schema‟s, XHTML

structures, etc, do not affect the whole

“schema independence”

… new data, new connections can be added

seamlessly

Page 40: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(40)(40)

Through URI-s we can link any data to any data

The “network effect” is extended to the (Web) data

“Mashup on steroids” become possible

Page 41: Tutorial on Semantic Web - unibo.it · (5()5) Editors roam the Web for new data published on Web sites Scrape the sites with a program to extract the information Ie, write some code

(41)(41)

The Semantic Web provides technologies

to make such integration possible!