Upload
lexiss
View
60
Download
1
Tags:
Embed Size (px)
DESCRIPTION
DBpedia: A Nucleus for a Web of Open Data. Original presentation by Christian Bizer, Freie Universität Berlin Sören Auer , Universität Leipzig Georgi Kobilarov, Freie Universität Berlin Jens Lehmann, Universität Leipzig Richard Cyganiak, Freie Universität Berlin Edited by Sangkeun Lee. - PowerPoint PPT Presentation
Citation preview
DBpediaA Nucleus for a Web
of Open DataOriginal presentation by
Christian Bizer Freie Universitaumlt BerlinSoumlren Auer Universitaumlt Leipzig
Georgi Kobilarov Freie Universitaumlt BerlinJens Lehmann Universitaumlt Leipzig
Richard Cyganiak Freie Universitaumlt Berlin
Edited by Sangkeun Lee
DBpediaorg is a effort to bull extract structured information from
Wikipediabull make this information available on the Web
under an open licensebull interlink the DBpedia dataset with other
datasets on the Web
Outline1 Extracting Structured Information from Wikipedia2 The DBpedia Dataset3 Accessing the DBpedia Dataset over the Web4 Use Casesbull Improving Wikipedia Searchbull Royalty-Free Data Source for other Applicationsbull Nucleus for the Emerging Web of Data
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Extracting Structured Information from Wikipedia
1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4
1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in
different languages
Overview of the DBpedia component
TraditionalWeb Browser
Web 20Mashups
Semantic WebBrowsers
SPARQLEndpoint
Linked Data SNORQLBrowser
QueryBuilder
Virtuoso
Articles
MySQL
Infobox Categories
Wikipedia Dumps
DB tablesArticle texts
DBpedia datasets loaded into
published via
Extraction
Wikitext Syntax
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
DBpediaorg is a effort to bull extract structured information from
Wikipediabull make this information available on the Web
under an open licensebull interlink the DBpedia dataset with other
datasets on the Web
Outline1 Extracting Structured Information from Wikipedia2 The DBpedia Dataset3 Accessing the DBpedia Dataset over the Web4 Use Casesbull Improving Wikipedia Searchbull Royalty-Free Data Source for other Applicationsbull Nucleus for the Emerging Web of Data
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Extracting Structured Information from Wikipedia
1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4
1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in
different languages
Overview of the DBpedia component
TraditionalWeb Browser
Web 20Mashups
Semantic WebBrowsers
SPARQLEndpoint
Linked Data SNORQLBrowser
QueryBuilder
Virtuoso
Articles
MySQL
Infobox Categories
Wikipedia Dumps
DB tablesArticle texts
DBpedia datasets loaded into
published via
Extraction
Wikitext Syntax
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Outline1 Extracting Structured Information from Wikipedia2 The DBpedia Dataset3 Accessing the DBpedia Dataset over the Web4 Use Casesbull Improving Wikipedia Searchbull Royalty-Free Data Source for other Applicationsbull Nucleus for the Emerging Web of Data
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Extracting Structured Information from Wikipedia
1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4
1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in
different languages
Overview of the DBpedia component
TraditionalWeb Browser
Web 20Mashups
Semantic WebBrowsers
SPARQLEndpoint
Linked Data SNORQLBrowser
QueryBuilder
Virtuoso
Articles
MySQL
Infobox Categories
Wikipedia Dumps
DB tablesArticle texts
DBpedia datasets loaded into
published via
Extraction
Wikitext Syntax
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Extracting Structured Information from Wikipedia
1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4
1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in
different languages
Overview of the DBpedia component
TraditionalWeb Browser
Web 20Mashups
Semantic WebBrowsers
SPARQLEndpoint
Linked Data SNORQLBrowser
QueryBuilder
Virtuoso
Articles
MySQL
Infobox Categories
Wikipedia Dumps
DB tablesArticle texts
DBpedia datasets loaded into
published via
Extraction
Wikitext Syntax
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Extracting Structured Information from Wikipedia
1048607Wikipedia consists ofbull 1048698 69 million articlesbull 1048698 in 251 languagesbull 1048698 monthly growth-rate 4
1048607Wikipedia articles contain structured informationbull 1048698 infoboxes which use a template mechanismbull 1048698 images depicting the articlersquos topicbull 1048698 categorization of the articlebull 1048698 links to external webpagesbull 1048698 intra-wiki links to other articlesbull 1048698 inter-language links to articles about the same topic in
different languages
Overview of the DBpedia component
TraditionalWeb Browser
Web 20Mashups
Semantic WebBrowsers
SPARQLEndpoint
Linked Data SNORQLBrowser
QueryBuilder
Virtuoso
Articles
MySQL
Infobox Categories
Wikipedia Dumps
DB tablesArticle texts
DBpedia datasets loaded into
published via
Extraction
Wikitext Syntax
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Overview of the DBpedia component
TraditionalWeb Browser
Web 20Mashups
Semantic WebBrowsers
SPARQLEndpoint
Linked Data SNORQLBrowser
QueryBuilder
Virtuoso
Articles
MySQL
Infobox Categories
Wikipedia Dumps
DB tablesArticle texts
DBpedia datasets loaded into
published via
Extraction
Wikitext Syntax
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
TraditionalWeb Browser
Web 20Mashups
Semantic WebBrowsers
SPARQLEndpoint
Linked Data SNORQLBrowser
QueryBuilder
Virtuoso
Articles
MySQL
Infobox Categories
Wikipedia Dumps
DB tablesArticle texts
DBpedia datasets loaded into
published via
Extraction
Wikitext Syntax
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Wikitext Syntax
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Questionbull How good is the extraction from
the markup in Wiki pages
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
1048607Short and long abstracts in 10 different languagesdbpediaCalgary
dbpediaabstract ldquoCalgary is the largest rdquoen dbpediaabstract ldquoCalgary ist eine Stadt rdquode
1048607Categorization informationdbpediaCalgary
skossubject dbpediaCategory_Cities_in_Alberta skossubject dbpediaHost_cities_Olympic_Games
1048607Links to the original Wikipedia articles pictures and relevantexternal web pages
dbpediaCalgaryfoafpage lthttpenwikipediaorgwikiCalgarygt dbpediawikipage-delthttpdewikipediaorgwiki
Calgarygt foafdepiction
lthttpuploadwikimediaorgthumb332gt dbpediareference lthttpwwwcalgarycagt dbpediareference lthttpwwwtourismcalgarycomgt
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
DBpedia Basics
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
1 SPARQL Endpoint
2 Linked Data Interface
3 DB Dumps for Download
Accessing the DBpedia Dataset over the Web
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
To know everything Bart wrote on blackboard board in season 12 of SimpsonsbullThe Simpson episode Wikipedia pages are the identified things that we would consider as the subjects of our RDF triplesbullThe bottom of the Wikipedia page for the Tennis the Menace episode tells us that it is a member of the Wikipedia category The Simpsons episodes season 12bullThe episodes DBpedia page tells us that pblackboard is the property name for the Wikipedia infobox Chalkboard field
SELECT episodechalkboard_gag WHERE episode skossubject lthttpdbpediaorgresourceCategoryThe_Simpsons_episodes2C_season_12gt episode dbpedia2blackboard chalkboard_gag
entities
Table
Interesting Example
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
The Linked Data Interface
bull A large body of information and knowledge is often already available in structured form yet not accessible as such on the Web
bull Integrating open data provides real value It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs at its origin
bull Linked Data on the Web can be accessed using Semantic Web browsers just as the traditional Web of documents is accessed using HTML browsers
bull Semantic Web browsers enable users to navigate between different data sources by following RDF links
It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
1048607The project follows the Linked Data principles
bull All concepts are identified using Uniform Resource Identifier references URI is a compact string of characters used to identify or name a resource
1048698 The Linked Data interface can be used by
bull Semantic Web Browsers like
- DISCO Hyperdata Browser
- Tabulator Browser
- OpenLink RDF Browser
bull Semantic Web Crawlers like
- Zitgist (Zitgist LLC USA)
- SWSE (DERI Ireland)
- Swoogle (UMBC USA )
The Linked Data Interface
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
DBpedia Use Cases
1 Improving Wikipedia Search
2 Royalty-Free Data Source for other Applications
3 Nucleus for the Emerging Web of Data
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Improving Wikipedia Search (Various Interfaces)
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Query to find all web browser SW at httpwikipediaaskworg
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Improving Wikipedia Search
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Royalty-Free Data Source for other Applications
1048607DBpedia is published under GNU Free Documentation License
1048607Example use case SPARQL generated tables within webpages
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Nucleus for the Emerging Web of Data
1048607W3C SWEO Linking Open Data Project
1048607Over all size of the dataset over 1 billion RDF triples
1048607Out-bound RDF links within DBpedia 75000
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
1048607Better data cleansing required
1048607Improvement in the classification
1048607Interlink DBpedia with more datasets
1048607Improvement in the user interfaces
1048607Performance
1048607Scalability
1048607 More Expressiveness
Proposed Improvements
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge
Discussionbull DBpedia is the first and largest source of structured data on the Internet
covering topics of general knowledge
bull DBpedia gains new information when it extracts data from the latest Wikipedia dump whereas Freebase in addition to Wikipedia extractions gains new information through its userbase of editors
ndash Which one is better approachbull Can Freebase or DBpedia be substitute for Wikipedia
ndash Freebase Not good in that we have two similar things ndash Wikipedia Freebasendash DBPedia Not good in that it extracts data from dump
bull How can we interlink Freebase amp DBpediabull What can be killer applications using Dbpedia
ndash If there is okayndash If there is no do we really need a large general structured knowledge