38
Ontologies, Knowledge Bases, Wikidata MPRI 2.26.2: Web Data Management Antoine Amarilli Friday, January 11th 1/31

Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Ontologies, Knowledge Bases, WikidataMPRI 2.26.2: Web Data Management

Antoine AmarilliFriday, January 11th

1/31

Page 2: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Reminder

• Ontology: vocabulary (classes and relations) to describe things• Knowledge base: set of facts in one or several ontologies→ Focus on Wikidata: a general-purpose knowledge base and

ontology

2/31

Page 3: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Ontologies

Page 4: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Ontologies

• Various domain-specific vocabularies used across knowledgebases

• One general-purpose ontology used by Google, Microsoft, Yahoo,Yandex: schema.org

• Other ontologies that come together with a knowledge base

3/31

Page 5: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Friend of a friend (FOAF)

Describe people, relationship, profiles, activities (social network)

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<#JW>a foaf:Person ;foaf:name "Jimmy Wales" ;foaf:mbox <mailto:[email protected]> ;foaf:homepage <http://www.jimmywales.com> ;foaf:nick "Jimbo" ;foaf:depiction <http://www.jimmywales.com/aus_img_small.jpg> ;foaf:interest <http://www.wikimedia.org> ;foaf:knows [

a foaf:Person ;foaf:name "Angela Beesley"

] .4/31

Page 6: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Creative Commons

Describe the license and rights on documents

<div about="http://lessig.org/blog/"xmlns:cc="http://creativecommons.org/ns#">This page, by <a property="cc:attributionName"

rel="cc:attributionURL"href="http://lessig.org/">Lawrence Lessig</a>,

is licensed under a <a rel="license"href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution License</a>.

</div>

• Many content providers add this kind of markup (e.g., Flickr)• Search engines can use it (e.g., Google)

5/31

Page 7: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Other domain-specific ontologies

• Dublin Core (DC): Describe digital resources (videos, images, etc.)and physical resources (books, CDs, etc.)

• Simple knowledge organization system (SKOS): describe thesauri,taxonomies, etc.

• Open Graph Protocol: metadata for Web pages to be integratedin Facebook’s social graph; also Twitter Cards for Twitter

• DOAP (Description of a Project): describe software projects• VoID (Vocabulary of Interlinked Datasets): describe a linkeddataset

• Countless others6/31

Page 8: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Schema.org: a general-purpose ontology

• General-purpose ontology: 598 types and 862 properties inversion 3.5

• Intended to be used on Web pages to annotate the semantics ofelements

• Used by search engines for rich search results• Used in over 10 million sites1

1Source: https://schema.org/7/31

Page 9: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Format: Microdata

<div class="event-wrapper" itemscope itemtype="http://schema.org/Event"><div class="event-date" itemprop="startDate"

content="2013-09-14T21:30">Sat Sep 14</div><div class="event-title" itemprop="name">

Typhoon with Radiation City</div><div class="event-venue" itemprop="location"

itemscope itemtype="http://schema.org/Place"><span itemprop="name">The Hi-Dive</span><div class="address" itemprop="address" itemscope

itemtype="http://schema.org/PostalAddress"><span itemprop="streetAddress">7 S. Broadway</span><br><span itemprop="addressLocality">Denver</span>,<span itemprop="addressRegion">CO</span><span itemprop="postalCode">80209</span>

</div></div><div class="event-time">9:30 PM</div>

</div>

• itemscope creates an item and itemtype gives its type• itemprop gives values for properties of the item 8/31

Page 10: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Format: RDFa

Competing format to Microdata, seems less common2

<div vocab="http://schema.org/" class="event-wrapper" typeof="Event"><div class="event-date" property="startDate"

content="2013-09-14T21:30">Sat Sep 14</div><div class="event-title" property="name">

Typhoon with Radiation City</div><div class="event-venue" property="location" typeof="Place"><span property="name">The Hi-Dive</span><div class="address" property="address" typeof="PostalAddress">

<span property="streetAddress">7 S. Broadway</span><br><span property="addressLocality">Denver</span>,<span property="addressRegion">CO</span><span property="postalCode">80209</span>

</div></div><div class="event-time">9:30 PM</div>

</div>

2http://webdatacommons.org/structureddata/index.html#toc29/31

Page 11: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Format: JSON-LD

Alternative approach: give the structured data separately in JSON<script type="application/ld+json">{"@context": "http://schema.org","@type": "Event","location": {"@type": "Place","address": {

"@type": "PostalAddress","addressLocality": "Denver","addressRegion": "CO","postalCode": "80209","streetAddress": "7 S. Broadway"

},"name": "The Hi-Dive"

},"name": "Typhoon with Radiation City","startDate": "2013-09-14T21:30"

}</script>

• The @contextattribute gives thenamespace for the@type.

• No longer gives anylink to the pagecontents

• Also @id to give an URIto a node

• Many other features(editor’s draft of thespec is 167 pages)

10/31

Page 12: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Web Data Commons Structured Data

• Extraction of semantic content from the Common Crawl

• Also useful to measure usage of structured data:• In November 2017, the Common Crawl contained 66 TB(compressed), 260 TB (uncompressed), 3.2G pages

• 39% of pages (and 28% of domains) contained semantic data• 9G entities and 38G triples• http://webdatacommons.org/structureddata/

11/31

Page 13: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Knowledge bases

Page 14: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Common Knowledge bases

• Generalistic: DBpedia, YAGO, Freebase (defunct), Wikidata• Proprietary: Google Knowledge Graph, Bing Knowledge Graph(aka Satori)

• Domain-specific• We will focus afterwards on Wikidata

12/31

Page 15: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

DBpedia

• Started in 2007• License: CC-BY-SA• Code license: GPLv2• Actors: Leipzig University, University of Mannheim, Open LinkSoftware

• Latest release: 2016-10• Extracted from Wikimedia projects

• 6M entities and 10G triples in 2016-043,3https://blog.dbpedia.org/2016/10/19/yeah-we-did-it-again-new-2016-04-dbpedia-release/

13/31

Page 16: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

YAGO

• Started in 2008• License: CC-BY• Code license: GPLv3• Actors: Max Planck Institute for Informatics, Télécom ParisTech

• Latest release: YAGO 3.1 (2017)• Extracted from Wikipedias and other sources; manual evaluation

• 10M entities and 120M triples4,4http://yago-knowledge.org/

14/31

Page 17: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Freebase

• Started in 2007, discontinued in 2016• License: CC-BY• Code license: Apache2 (provided after-the-fact by Google)• Actors: Metaweb, acquired by Google in 2010• Initially imported from various sources

• Could be edited by anyone• Partially imported into Wikidata (but not completely)• Last release: 2016• Last dump has 1.9G triples

15/31

Page 18: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Wikidata

• Started in 2012• License: public domain• Code license: GPLv2• Actors: Wikimedia Deutschland, Wikimedia• Last release: weekly• Around 650M statements and 54M items

• Can be edited by anyone! Around 20k active users.16/31

Page 19: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Domain-specific

• MusicBrainz, for CDs and music in general (20 million recordings)• British National Bibliography: bibliographic details about bookspublished in the UK since 1950

• data.bnf.fr, data from the French national library

• OpenStreetMaps, and Geonames• Medicine and chemistry with SNOMED CT, and other databases:DrugBank, KEGG, UniProt, ChEMBL, etc.

• Linguistic resources, e.g., Babelnet• Bibliography, e.g., DBLP, Crossref

17/31

Page 20: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Linked Open DataLegend

Cross Domain

Geography

Government

Life Sciences

Linguistics

Media

Publications

Social Networking

User Generated

status...

GeoNam...

Person...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...Amino ...

Compar...

Chemic...

CRISP ...

Logica...

Cell l...

MESH T...

Medica...

NCI Th...

Nation...

Nation...

NIFSTD

NanoPa...

Read C...

RxNORM

SNOMED...

SNP-On...

Sequen...

Sugges...

VANDF

DBpedi...

DBpedia

datahub

openli...

W3C

Arthro...

DBLP R...

Freebase

New Yo...

status...

status...

status...

status...

status...status...

status...

status...

TaxonC...

BBC Wi...

Europe...

Fishes...

GeoSpe...

OpenCyc

UMBEL ...

UniProt

status...

status...

DBTune...

MusicB...

Poképé...

Pokede...

Univer...

OLiA

Japane...

Web ND...

DBpedi...

HEALTH...

Cancer...

Cancer...

COSTART

Human ...

Experi...

Health...

ICPC-2...

MedDRA

Medlin...

Natura...NIF Dy...

Online...

PMA 2010

RadLex

WHO Ad...

ChEMBL...

Bio2RD...

EPA-CDR

EPA-FRS

EPA-SRS

DWS-Group

Semant...

semant...

Bio2RD...

Bio2RD...

Bio2RD...

Bio2RD...

Bio2RD...

Inspec...

Czech ...

Geospa...

YAGO

Wikidata

Nation...

Associ...

CiteSe...

Commun...

ReSIST...

DBLP C...

ePrint...

Univer...

Univer...

Resear...

School...

ReSIST...

Uberbl...

TIP

Linked...

Influe...

Advers...

BioAss...

Bone D...

Basic ...

BIRNLex

Gene R...

BioTop

CAO

Cell C...

Chemic...

Cell L...

Cognit...

Ontolo...

Electr...

Human ...

Cardia...

eagle-...

eVOC (...

Fly ta...

Genera...

Gene O...

Gene R...

Host P...

Inform...

Intern...

Infect...

Brucel...

Malari...

Intera...

SysMO-...

Mental...

Emotio...

Protei...

Mosqui...

Neural...

Neomar...

NIF Cell

Neural...

NMR-in...

Ontolo...

Ontolo...

OBOE SBC

Ontolo...

Ontolo...

Ontolo...

Ontolo...

Ontolo...

Ontolo...

Ontolo...

Ontolo...

Phenot...

Pediat...

PRotei...

RNA on...

Subcel...

Sleep ...

Semant...

Softwa...

Time E...

Transl...

VIVO

Vaccin...

MGED O...

Mass s...

Solana...

Units ...

Units ...

Rechts...

Parole...

lexinfo

Rat St...

Africa...

Minima...

Physic...

PHARE

Pathwa...

El Via...

GeoLin...

DBpedi...

2000 U...

DBTune...

flickr...

DailyMed

DBLP B...

Diseasome

DrugBank

Eurost...

Projec...

SIDER:...

Linked...

RDF Bo...

Revyu....

TCMGen...

WordNe...

World ...

Gemeen...

zhishi...

BabelNet

DBpedi...

Zhishi.me

status...

status...

status...

status...

status...

status...

status...

AI/RHEUM

Bleedi...Curren...

Common...

Plant ...

FlyBas...

HCPCS

Human ...

ICD10

ICD10CM

Intern...

Intern...

Molecu...

Breast...

Cell l...

Master...

Mammal...

Mouse ...

Metath...

NCBI o...

Ontolo...

Orphan...

Studen...

Reuter...

Amphib...

Anatom...

Basic ...

Bilate...

BRENDA...

Cerebr...

Human ...

Human ...

Drosop...

Hymeno...

Mouse ...

Medaka...

Teleos...

Uber a...

Verteb...

verteb...

Xenopu...

Zebraf...

CLLD-WOLD

CLLD-G...

Lexvo

Persée...

data.b...

IdRef:...

VIAF: ...

EnAKTi...

Ordnan...

Prince...

WordNe...

openda...

statis...

Agenda...

Instit...

Ascomy...

System...

Cognit...

Fungal...

Fissio...

Gene O...

Cereal...

Event ...

IxnO

MeGO

Plant ...

Plant ...

Physic...

System...

SoyOnt...

Plant ...

Verteb...

Yeast ...

status...

Linked...

U.S. S...

ichoose

eagle-...

Biomed...

Basisr...

Open D...

eagle-...

EventKG

Deaths...

Regist...

data.g...

status...

status...

Univer...

EPA-TRI

Family...

Intern...

eagle-...

Intera...

Didact...

Focus ...

status...status...

status...

status...

status...

MLSA -...

wiktio...

Dendri...

Protei...

openda...

Linked...

EUR-Le...

ABA Ad...

Cell type

Enviro...

Spider...

Mosqui...

C. ele...

Tender...

State ...

R&D Pr...

Temple...

Semant...

Syndro...

Atheli...

LemonW...

Tradit...

Multip...

EARTh

GEnera...

ThISTUMTHES

Deusto...

MORElab

CLLD-E...

DBkWik

Europe...

Bundes...

Food a...

Intern...

Transp...

World ...

ICD-10...

Ontolo...

Bio2RD...

Bio2RD...

Bio2RD...

Bio2RD...

Breast...

Dictyo...

Tick g...

BBC Music

openda...

refere...

RISM A...

Gemein...

Fundaç...Budape...

Instit...

France...

Divers...

Korean...

Univer...

Prince...

Librar...

Brown ...

ICANE

Lista ...

cablegate

Situat...

Sample...

Facete...

Thai W...

Reacto...

UniProtKB

Bio2RD...

Bio2RD...

Bio2RD...

Bio2RD...

Bio2RD...

Bio2RD...

Bio2RD...

IMGT-O...

Parasi...

Proyec...

openda...

Biolog...

FDA Me...

Lipid ...

PKO_Re

Experi...

dbnary

ALPINO...

School...

Resili...

DEPLOY...

dotAC ...

epsrc

IBM Re...

IEEE P...

UK JIS...

LAAS-C...

Open A...

Univer...

RISKS ...

Univer...

ECS So...

C. ele...

Amphib...

Taxono...

Teleos...

TOK_On...

TWC: L...

GovTra...

vivo2doi

CrossR...

VIVO S...

VIVO U...

VIVO W...

VIVO W...

tags2c...

WordNe...

Europe...

EEA Re...

EIONET...

Telegr...

Linked...

DBTune...

Multil...

Neomar...

DATATU...

NASA S...

BBC Pr...

Integr...

Clinic...

DBpedi...

openda...

eagle-...

EUMIDA...

Linked...

NUTS (...

Sudoc ...

CE4R K...

eagle-...

OpenMo...

Linked...

lobid-...

B3Kat ...

Dewey ...

Projec...

lobid-...

Open L...

Automa...

fun

Linked...

Bio2RD...

Aperti...

Animal...

Spatia...

ExO

Logger...

MIxS C...

Sentim...

openda...

Google...

LinkedCT

Univer...

Aperti...

xLiD-L...

dbpedi...

Projet...

DBpedi...

Bio2RD...

Manual...

Debian...

Bricklink

Bio2RD...

sloWNe...

openda...

Job ap...

status...

status...

bio2rd...

CLLD-afbo

Aperti...

ReSIST...

southa...BPR ? ...

Univer...

Aperti...

Open M...

ISOcat

wordpress

Univer...

lemonUby

Univer...

Univer...

The Li...

Univer...

MARC C...

lingvo...

Englis...

Genera...

TDS

SmartL...

iServe...

Verrij...

Cornet...

DBpedi...

Art & ... ERA - ...

openda...

Medici...

ATC gr...

YSA - ...

YSO - ...

SALDO-RDF

Data a...

Compre...

Alpine...

BibBase

busine...

Chroni...

Discog...

Mosele...

Data I...

data.o...

DBTropes

DBTune...

data.dcs

educat...

EnAKTi...

EnAKTi...

EnAKTi...

enviro...

ESD St...

Eurost...

EventM...

TheSoz...

Hungar...

John G...

Linked...

Linked...

Linked...

The Lo...

Lotico

myExpe...

Nation...

OpenCa...

Openly...

patent...

Englis...

Last.F...

resear...

Techni...

Deep B...

UN/LOC...

WordNe...

Semant...

STW Th...

Surge ...

Thesau...

Open L...

The Vi...

transp...

UK Leg...

UK Pos...

Univer...

URIBurner

VIVO C...

VIVO I...

20th C...

GeoEcu...

Nation...

Linked...

Diagno...

Non Ra...

Random...

datos....

Thesau...

openda...

Diavgeia

Hellen...

Hellen...

status...

status...

status...

status...

status...

status...

status...

status...

Bio2RD...

Linked...

Schema...

openda...

associ...

Edublogs

EnAKTi...

Accomm...

Inever...

Inever...

CLLD-P...

CLLD-WALS

status...

status...

Genera...

Code l...

Cadast...

status...

Aperti...

Public...

openda...

PreLex

Linked...

Drosop...

eagle-...

DBpedi...

Amster...

Commun...

Italia...

Albane...

SIMPLE

Weathe...

MetaSh...

TEKORD

eagle-...

ciard-...

Univer...

EU Age...

Linked...

OpenEI...

KORE 5...

MultiW...

Federa...

IATI a...

The Eu...

UNESCO...

openda...

openda...

GeoWor...

FrameB...

LODAC ...

Persia...

status...

Univer...

theses.fr

Polyma...

Regist...

EU Par...

EU Who...

Educat...

CTIC P...

Public...

Bio2RD...

DIKB-E...

Epilepsy

ICPS N...

MaHCO ...

Measur...

Proteo...

Role O...

Traffi...

CLLD-S...

eagle-...

Univer...

Datos ...

openda...

proven...

DBLP i...

Reprod...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

status...

DataGo...

BulTre...

Univer...

IPTC N...

apache

Archiv...

berlios

Deutsc...

Eniped...

FAO ge...

greek-...

Linked...

Linked...

LOD2 P...

myopen...

NHS Ja...

oreillyPlanet...

RDFohloh

status...

status...

status...

Chines...

DBpedi...

The Eu...

Norweg...

Tradit...

Univer...

EU: fi...

Linked...

MExiCo

Instit...

Organi...

Univer...

Smokin...

FiESTA

Bio2RD...

Bio2RD...

Airpor...

unipro...

Open D...

Comput...

Physic...

C. ele...

Linked...

Univer...

OpenWN...

Univer...

Nomenc...

MediCare

Social...

openda...

Active...

Romani...

Audite...

Data a...

Edinbu...

eagle-...

Linked...

World ...

Slovak...

SORS

openda...

Nation...

Linked...

status...

Rådata...

Produc...

Produc...

photos

status...

eagle-...

Univer...

eagle-...

eagle-...

Deutsc...

Instan...

openda...

status...

Italia...

Result...

R&D Pr...

Face Link

Yahoo ...

FinnWo...

Univer...

RAMEAU...

World ...

ISIL->...

Bio2RD...

DisGeNET

Global...

Univer...

Univer...

oceand...

Aperti...

Kallik...

Bio2RD...

Nobel ...

ZBW Labs

Univer...

CLLD-A...

HUGO

IATE RDF

Ocean ...

Ocean ...

Linked...

Univer...

openda...

vulner...

Salzbu...

Univer...

Betwee...

openda...

Summar...

CIPFA

Aperti...

DBTune...

OBOE

openda...

Bio2RD...

thesaurus

status...

Univer...

Norsk ...

Univer...

Entrez...

status...

Univer...

Founda...

Wordne...

BioPAX

Klapps...

Chem2B...

bio2rd...

Univer...

JITA C...

GeoSpe...

openda...

PanLex

Vytaut...

Shoah ...

Reposi...

Open D...

OLAC M...

Images...

OpenCo...

openda...

openda...

Requir...

Austra...

Bank f...

Spring...

Schola...

status...

Mis Mu...

Univer...

Organi...VIVO

status...

Averag...

Ruben ...

NPM

Ruben ...

Bio2RD...

Semant...

EURAXE...

QBOAir...

Aperti...

Wheat ...

Nation...

Aperti...

Open D...

Multex...

WarSampo

Aperti...

Red Un...

Univer...

yso-fi...

yso-fi...

Copyri...

eagle-...

Univer...

EMN

Accomm...

Taxons

The Co...

openda...

Lexico...

Bio2RD...

semanlink

Europe...

prefix.cc

ProductDB

typepad

Univer...

openda...

openda...

webconf

Addgene

SwetoDblp

AGROVOC

Norweg...

Scotti...

Climb ...

notube

Unempl...

Univer...

ItalWo...

status...

Univer...

Aperti...

NERC V...

WordLi...

mEduca...

FOODpe...

German...

Job ap...

eagle-...

openda...

ISOcat...

openda...

Basque...

taxonc...

Open D...

Period...

Englis...

Pleiades

Europe...

openda...

Univer...

Univer...

AragoD...

Aragon...

Instit...

Univer...

tharaw...

Ocean ...

EPA-RCRA

Prospe...

Univer...

Swedis...

Univer...

geodom...

SLI Ga...

data-h...

ECCO-T...

Linkin...

openda...

Merite...

Plant ...

LinkLi...

ePrint...

School...

Biblio...

Galici...

AEMET ...

Yovist...

Courts...

Univer...

Green ...

Europe...

status...

status...

CORE -...

RDFLic...

Univer...

Univer...

Enviro...

Metoff...

Aperti...

Ordnan...

IEEE V...

The Or...

LCSubj...

MASC-B...

DanNet...

Univer...

openda...

twc-op...

Regist...

IWN

DBTune...

Italia...

Univer...

RSS-50...

Interc...

status...

Japane...

openda...

STITCH...

PreMOn

Lingui...

Garnic...

Univer...

Select...

SALDOM...

EnAKTi...

Lexvo.org

openda...

List o...

IceWor...

Renewa...

Salzbu...

webnma...

Aperti...

Chemic...

Aperti...

Farmac...

Whisky...

openda...

openda...

openda...

openda...

Influe...

Eventseer

Social...

Univer...

openda...

eagle-...

Mi Guí...

ASN:US

Univer...

Europe...

Swedis...

status...

openda...

Number...

openda...

OLiA D...

Hedatuz

Termin...

BioMod...

Univer...

eagle-...

Aperti...

Univer...

Finnis...

openda...

Framester

Biblio...

status...

plWord...

CareLex

openda...

sears.com

Open E...

Univer...

BioSam...

Gene E...

Phonet...

HeBIS ...

ESD-To...

Calames

Standa...

Mathem...

Univer...

Brazil...

Univer...

Serend...

eagle-...

My Fam...

LIBRIS

eagle-...

eagle-...

Univer...

Britis...

openda...

Learni...

aliada...

Aperti...

Englis...

eagle-...

Univer...

openda...

de-gaa...

Chines...

Univer...

Muninn...

USPTO ...

Thesau...

Regist...

Museos...

taxonc...

openda...

Aperti...

Univer...

Aperti...

openda...

Europe...

Aperti...

Datos....

Catala...

openda...

GNOSS....

Evalua...

GovWIL...

EEA Vo...

eagle-...

Univer...

List o...

DBTune...

eagle-...

Allie ...

Ontos ...

WordLi...

Sancti...

Univer...

Kidney...

Salzbu...

Freeyork

DBTune...

The Ge...

2011 U...

Aperti...

Open B...

RDFizi...

DM2E

Judaic...

N-Lex ...

"Raini...

Bans o...

JRC-Na...

Taiwan...

Univer...

data-s...

Polyth...

News-1...

Hebrew...

TAXREF...

Orthol...

Geolog...

ISTAT ...

Univer...

status...

Organi...

gemet-...

Publis...

Lichfi...

Web Sc...

xxxxx

UNODC ...

BibSon...

gdlc

crowds...

Confis...

Street...

Linked...

Croati...

Inspec...

Struct...

Wikili...

Greek ...

AgriNe...

Univer...

Univer...

eagle-...

interv...

Univer...

Glottolog

Entorn...

Aperti...

ietflang

Univer...

ChEMBL...

Biblio...

Univer...

Twarql

Aperti...

status...

OntoBe...

TCGA R...

Drug D...

World ...

OSM Se...

WOLF W...

openda...

Aperti...

EuroSe...

SweFN-RDF

sandra...

SPARQL...

datos-...

ISPRA ...

Open W...

Deusto...

Social...

Transc...

PDEV-L...

Geogra...

bio2rd...

NTNU s...

Arabic...

Open D...

dev8d

openda...Greek ...

medline

Source...

linked...

openda...

AEGP, ...

openda...

openda...

Next W...

Linked...

Univer...

Near

eagle-...

WebIsALOD

zarago...

Biogra...

Chat G...

Univer...

AGRIS

Linked...

Atlant...

Bio2RD...

semant...

The Linked Open Data Cloud from lod-cloud.net

18/31

Page 21: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Gathering Semantic Web Data

• Browsing online versions of KBs• Using ad-hoc APIs to retrieve relevant triples• Using a SPARQL endpoint• Downloading a dump• Crawling other knowledge bases, e.g., dereferencing Cool URIs

19/31

Page 22: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Systems

• RDF stores (triplestores) with relational or native backend,open-source or commercial, related to graph databases

• Apache Jena• Virtuoso• Blazegraph, essentially acquired by Amazon• Amazon Neptune

• SPARQL engines, usually on top of a triplestore.http://en.wikipedia.org/wiki/SPARQL

• Tool to view semantic data in Web pages: http://www.google.com/webmasters/tools/richsnippets

20/31

Page 23: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Semantic Web challenges

• Complexity:• Writing structured content is harder than writing text!• Using structured content (with heterogeneous schema) iscomplicated!

• Discoverability problem for knowledge bases, vocabularies

• Performance:• Data is large• Running queries on graphs is tricky• Reasoning makes it even worse• Federation makes things worse again

21/31

Page 24: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Semantic Web challenges, cont’d

• Data quality:• Vagueness and modeling issues• Trust (anyone can add a triple)• Canonicity and alignment• Temporality, sources often complicated to represent• Open-world semantics: missing values vs no values

• Incentives: many data providers do not want to be eaten byothers

22/31

Page 25: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Wikidata

Page 26: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Why Wikidata matters

• Backed by the Wikimedia foundation: credible andnoncommercial

• Not run by academics, but some academics are involved• Genuine uses on Wikipedia (to some extent)• Centralized model, which is a good idea for now• Good tradeoffs in terms of expressiveness, scope...• Uses the successful wiki model

23/31

Page 27: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Wikidata basics

• Entities: Q1, Q2, Q3, ..., Q60527475 and beyond• Properties: P1, P2, P3, ..., P6343 and beyond

• Entities and properties have a label and short description ineach language, along with aliases (search engine)

• Entities can also have sitelinks to Wikimedia projects (e.g., thecorresponding Wikimedia pages)

• For each entity and property, we can have facts (or claims) withdifferent objects

• Everyone can create and edit entities and facts• Discussion is needed before creating a property• Software: Wikibase, a set of extensions to Mediawiki

24/31

Page 28: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Wikidata basics

• Entities: Q1, Q2, Q3, ..., Q60527475 and beyond• Properties: P1, P2, P3, ..., P6343 and beyond• Entities and properties have a label and short description ineach language, along with aliases (search engine)

• Entities can also have sitelinks to Wikimedia projects (e.g., thecorresponding Wikimedia pages)

• For each entity and property, we can have facts (or claims) withdifferent objects

• Everyone can create and edit entities and facts• Discussion is needed before creating a property• Software: Wikibase, a set of extensions to Mediawiki

24/31

Page 29: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Wikidata basics

• Entities: Q1, Q2, Q3, ..., Q60527475 and beyond• Properties: P1, P2, P3, ..., P6343 and beyond• Entities and properties have a label and short description ineach language, along with aliases (search engine)

• Entities can also have sitelinks to Wikimedia projects (e.g., thecorresponding Wikimedia pages)

• For each entity and property, we can have facts (or claims) withdifferent objects

• Everyone can create and edit entities and facts• Discussion is needed before creating a property• Software: Wikibase, a set of extensions to Mediawiki

24/31

Page 30: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Wikidata basics

• Entities: Q1, Q2, Q3, ..., Q60527475 and beyond• Properties: P1, P2, P3, ..., P6343 and beyond• Entities and properties have a label and short description ineach language, along with aliases (search engine)

• Entities can also have sitelinks to Wikimedia projects (e.g., thecorresponding Wikimedia pages)

• For each entity and property, we can have facts (or claims) withdifferent objects

• Everyone can create and edit entities and facts• Discussion is needed before creating a property

• Software: Wikibase, a set of extensions to Mediawiki

24/31

Page 31: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Wikidata basics

• Entities: Q1, Q2, Q3, ..., Q60527475 and beyond• Properties: P1, P2, P3, ..., P6343 and beyond• Entities and properties have a label and short description ineach language, along with aliases (search engine)

• Entities can also have sitelinks to Wikimedia projects (e.g., thecorresponding Wikimedia pages)

• For each entity and property, we can have facts (or claims) withdifferent objects

• Everyone can create and edit entities and facts• Discussion is needed before creating a property• Software: Wikibase, a set of extensions to Mediawiki

24/31

Page 32: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Qualifiers, references, ranks, data types

• Each fact can have qualifiers to indicate things like start/endtime, details (e.g., major/degree for P69 “educated at”)

• Each fact can also have sources to indicate where it comes from(a source is a set of key–value pairs)

• Each fact can have a rank among “normal”, “preferred” (e.g., forthe current value), or “deprecated”.

• Literal values can have data typeshttps://www.wikidata.org/wiki/Special:ListDatatypes

• Also two special values• “unknown value” (a value exists but is unknown)• “no value” (it is known that there is no value)

25/31

Page 33: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Constraints

• Wikidata has constraints which are only advisory (= you cancreate violations) and are quite simple. Main ones:

• “single (best) value constraint”• “inverse constraint” (mother vs child), “symmetric constraint”• “type constraint”, or requiring/disallowing certain facts• “range constraint” “contemporary constraint”, “format constraint”• “one-of/none-of constraint” (list of allowed/forbidden values)• Requiring/allowing qualifiers or units• Allowing use as a qualifier/unit

• There is a mechanism for exceptions

• Many constraint violations in practice

26/31

Page 34: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Usage on Wikipedia

• Used for interwiki links, i.e., the links between Wikipedia pagesacross languages

• Used in some infoboxes on Wikipedia, e.g., to automaticallypopulate some fields

• Can be used for other things, e.g., filling tables, or external linksto other sources

• Policy depends on each Wikipedia: some communities are morewelcoming than others...

27/31

Page 35: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Ongoing Wikidata discussions

• Project scope: what belongs in Wikidata?• The public domain license is a strong requirement• Concerns, e.g., about the high number of bibliographic entities(almost half of the entities)

• Some external datasets are imported, but Wikipedia (historically)gave much importance to human validation of imports

• Some support for federation in queries; and many external links

• Notability: essentially no policy currently• Managing vandalism?• Importance of references?

28/31

Page 36: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Accessing Wikidata data

• Simply by browsing• Can retrieve in multiple formats, e.g.,https://www.wikidata.org/wiki/Special:EntityData/Q42.json

• For simple queries (triple patterns), Linked data fragmentshttps://query.wikidata.org/bigdata/ldf

• Wikimedia API, e.g., API for recent changes• SPARQL queries, https://query.wikidata.org/ (and API)• Weekly dumps in JSON, RDF, XML (around 50 GB compressed)

29/31

Page 37: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Other cool Wikidata stuff

• Distributed Wikidata Game: crowdsourcing edits on Wikidatahttps://tools.wmflabs.org/wikidata-game/distributed/

• Reasonator: automatically generate a Wikipedia-like page from aWikidata entity https://tools.wmflabs.org/reasonator/

• Lexemes: ongoing effort to add linguistic data to Wikidata• OWL ontology: http://wikiba.se/ontology• askplatyp.us: natural language question answering tool• File captions on Wikimedia Commons to have a structured way togive labels to images (deployed on January 10)

• OpenRefine to reconcile datasets with Wikidata and add Wikidatafacts https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Tutorials/Video

30/31

Page 38: Ontologies, Knowledge Bases, Wikidata - MPRI 2.26.2: Web ... · Ontologies •Variousdomain-specificvocabulariesusedacrossknowledge bases •Onegeneral-purposeontologyusedbyGoogle,Microsoft,Yahoo,

Slide acknowledgements

• Many thanks to Thomas Pellissier-Tanon for his helpful feedback

• Slide 4: https://en.wikipedia.org/wiki/FOAF_(ontology)

• Slide 5: https://www.w3.org/Submission/ccREL/

• Slide 8–10: https://schema.org/Event

• Slide 13:https://commons.wikimedia.org/wiki/File:DBpediaLogo.svg

• Slide 14: https://en.wikipedia.org/wiki/File:YAGO.svg

• Slide 15: https://commons.wikimedia.org/wiki/File:Freebase_Logo_optimised.svg

• Slide 16, 23:https://en.wikipedia.org/wiki/File:Wikidata-logo-en.svg

31/31