34
DBpedia: A Nucleus for a Web of Open Data Original presentation by Christian Bizer, Freie Universität Berlin Sören Auer , Universität Leipzig Georgi Kobilarov, Freie Universität Berlin Jens Lehmann, Universität Leipzig Richard Cyganiak, Freie Universität Berlin Edited by Sangkeun Lee

DBpedia: A Nucleus for a Web of Open Data

  • Upload
    lexiss

  • View
    60

  • Download
    1

Embed Size (px)

DESCRIPTION

DBpedia: A Nucleus for a Web of Open Data. Original presentation by Christian Bizer, Freie Universität Berlin Sören Auer , Universität Leipzig Georgi Kobilarov, Freie Universität Berlin Jens Lehmann, Universität Leipzig Richard Cyganiak, Freie Universität Berlin Edited by Sangkeun Lee. - PowerPoint PPT Presentation

Citation preview

Page 1: DBpedia: A Nucleus for a Web of Open Data

DBpediaA Nucleus for a Web

of Open DataOriginal presentation by

Christian Bizer Freie Universitaumlt BerlinSoumlren Auer Universitaumlt Leipzig

Georgi Kobilarov Freie Universitaumlt BerlinJens Lehmann Universitaumlt Leipzig

Richard Cyganiak Freie Universitaumlt Berlin

Edited by Sangkeun Lee

DBpediaorg is a effort to bull extract structured information from

Wikipediabull make this information available on the Web

under an open licensebull interlink the DBpedia dataset with other

datasets on the Web

Outline1 Extracting Structured Information from Wikipedia2 The DBpedia Dataset3 Accessing the DBpedia Dataset over the Web4 Use Casesbull Improving Wikipedia Searchbull Royalty-Free Data Source for other Applicationsbull Nucleus for the Emerging Web of Data

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Extracting Structured Information from Wikipedia

1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4

1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in

different languages

Overview of the DBpedia component

TraditionalWeb Browser

Web 20Mashups

Semantic WebBrowsers

SPARQLEndpoint

Linked Data SNORQLBrowser

QueryBuilder

Virtuoso

Articles

MySQL

Infobox Categories

Wikipedia Dumps

DB tablesArticle texts

DBpedia datasets loaded into

published via

Extraction

Wikitext Syntax

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 2: DBpedia: A Nucleus for a Web of Open Data

DBpediaorg is a effort to bull extract structured information from

Wikipediabull make this information available on the Web

under an open licensebull interlink the DBpedia dataset with other

datasets on the Web

Outline1 Extracting Structured Information from Wikipedia2 The DBpedia Dataset3 Accessing the DBpedia Dataset over the Web4 Use Casesbull Improving Wikipedia Searchbull Royalty-Free Data Source for other Applicationsbull Nucleus for the Emerging Web of Data

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Extracting Structured Information from Wikipedia

1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4

1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in

different languages

Overview of the DBpedia component

TraditionalWeb Browser

Web 20Mashups

Semantic WebBrowsers

SPARQLEndpoint

Linked Data SNORQLBrowser

QueryBuilder

Virtuoso

Articles

MySQL

Infobox Categories

Wikipedia Dumps

DB tablesArticle texts

DBpedia datasets loaded into

published via

Extraction

Wikitext Syntax

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 3: DBpedia: A Nucleus for a Web of Open Data

Outline1 Extracting Structured Information from Wikipedia2 The DBpedia Dataset3 Accessing the DBpedia Dataset over the Web4 Use Casesbull Improving Wikipedia Searchbull Royalty-Free Data Source for other Applicationsbull Nucleus for the Emerging Web of Data

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Extracting Structured Information from Wikipedia

1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4

1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in

different languages

Overview of the DBpedia component

TraditionalWeb Browser

Web 20Mashups

Semantic WebBrowsers

SPARQLEndpoint

Linked Data SNORQLBrowser

QueryBuilder

Virtuoso

Articles

MySQL

Infobox Categories

Wikipedia Dumps

DB tablesArticle texts

DBpedia datasets loaded into

published via

Extraction

Wikitext Syntax

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 4: DBpedia: A Nucleus for a Web of Open Data

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Extracting Structured Information from Wikipedia

1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4

1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in

different languages

Overview of the DBpedia component

TraditionalWeb Browser

Web 20Mashups

Semantic WebBrowsers

SPARQLEndpoint

Linked Data SNORQLBrowser

QueryBuilder

Virtuoso

Articles

MySQL

Infobox Categories

Wikipedia Dumps

DB tablesArticle texts

DBpedia datasets loaded into

published via

Extraction

Wikitext Syntax

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 5: DBpedia: A Nucleus for a Web of Open Data

Extracting Structured Information from Wikipedia

1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4

1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in

different languages

Overview of the DBpedia component

TraditionalWeb Browser

Web 20Mashups

Semantic WebBrowsers

SPARQLEndpoint

Linked Data SNORQLBrowser

QueryBuilder

Virtuoso

Articles

MySQL

Infobox Categories

Wikipedia Dumps

DB tablesArticle texts

DBpedia datasets loaded into

published via

Extraction

Wikitext Syntax

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 6: DBpedia: A Nucleus for a Web of Open Data

Overview of the DBpedia component

TraditionalWeb Browser

Web 20Mashups

Semantic WebBrowsers

SPARQLEndpoint

Linked Data SNORQLBrowser

QueryBuilder

Virtuoso

Articles

MySQL

Infobox Categories

Wikipedia Dumps

DB tablesArticle texts

DBpedia datasets loaded into

published via

Extraction

Wikitext Syntax

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 7: DBpedia: A Nucleus for a Web of Open Data

TraditionalWeb Browser

Web 20Mashups

Semantic WebBrowsers

SPARQLEndpoint

Linked Data SNORQLBrowser

QueryBuilder

Virtuoso

Articles

MySQL

Infobox Categories

Wikipedia Dumps

DB tablesArticle texts

DBpedia datasets loaded into

published via

Extraction

Wikitext Syntax

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 8: DBpedia: A Nucleus for a Web of Open Data

Wikitext Syntax

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 9: DBpedia: A Nucleus for a Web of Open Data

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 10: DBpedia: A Nucleus for a Web of Open Data

Questionbull How good is the extraction from

the markup in Wiki pages

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 11: DBpedia: A Nucleus for a Web of Open Data

1048607Short and long abstracts in 10 different languagesdbpediaCalgary

dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode

1048607Categorization informationdbpediaCalgary

skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games

1048607Links to the original Wikipedia articles pictures and relevantexternal web pages

dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki

Calgarygt foafdepiction

lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 12: DBpedia: A Nucleus for a Web of Open Data

The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

DBpedia Basics

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 13: DBpedia: A Nucleus for a Web of Open Data

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 14: DBpedia: A Nucleus for a Web of Open Data

1 SPARQL Endpoint

2 Linked Data Interface

3 DB Dumps for Download

Accessing the DBpedia Dataset over the Web

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 15: DBpedia: A Nucleus for a Web of Open Data

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 16: DBpedia: A Nucleus for a Web of Open Data

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 17: DBpedia: A Nucleus for a Web of Open Data

To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field

SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag

entities

Table

Interesting Example

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 18: DBpedia: A Nucleus for a Web of Open Data

The Linked Data Interface

bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web

bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin

bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers

bull Semantic Web browsers enable users to navigate between different data sources by following RDF links

It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 19: DBpedia: A Nucleus for a Web of Open Data

1048607The project follows the Linked Data principles

bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource

1048698 The Linked Data interface can be used by

bull Semantic Web Browsers like

- DISCO Hyperdata Browser

- Tabulator Browser

- OpenLink RDF Browser

bull Semantic Web Crawlers like

- Zitgist (Zitgist LLC USA)

- SWSE (DERI Ireland)

- Swoogle (UMBC USA )

The Linked Data Interface

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 20: DBpedia: A Nucleus for a Web of Open Data

DBpedia Use Cases

1 Improving Wikipedia Search

2 Royalty-Free Data Source for other Applications

3 Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 21: DBpedia: A Nucleus for a Web of Open Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 22: DBpedia: A Nucleus for a Web of Open Data

Query to find all web browser SW at httpwikipediaaskworg

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 23: DBpedia: A Nucleus for a Web of Open Data

Improving Wikipedia Search

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 24: DBpedia: A Nucleus for a Web of Open Data

Royalty-Free Data Source for other Applications

1048607DBpedia is published under GNU Free Documentation License

1048607Example use case SPARQL generated tables within webpages

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 25: DBpedia: A Nucleus for a Web of Open Data

Nucleus for the Emerging Web of Data

1048607W3C SWEO Linking Open Data Project

1048607Over all size of the dataset over 1 billion RDF triples

1048607Out-bound RDF links within DBpedia 75000

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 26: DBpedia: A Nucleus for a Web of Open Data

1048607Better data cleansing required

1048607Improvement in the classification

1048607Interlink DBpedia with more datasets

1048607Improvement in the user interfaces

1048607Performance

1048607Scalability

1048607 More Expressiveness

Proposed Improvements

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
Page 27: DBpedia: A Nucleus for a Web of Open Data

Discussionbull DBpedia is the first and largest source of structured data on the Internet

covering topics of general knowledge

bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors

ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia

ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump

bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia

ndash If there is okayndash If there is no do we really need a large general structured knowledge

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34