How to get your data into Sindice and Google with sitemap4rdf

Preview:

DESCRIPTION

 

Citation preview

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

How to get your data into Sindice and Google with

sitemap4rdfBoris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)

Digital Enterprise Research Institute www.deri.ie

Publishing Linked Data

from a triple store

Digital Enterprise Research Institute www.deri.ie

Linked Data frontends for triple stores

Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/

Digital Enterprise Research Institute www.deri.ie

Search engines

Digital Enterprise Research Institute www.deri.ie

Sindice: the best RDF search engine

Digital Enterprise Research Institute www.deri.ie

Sindice: the best RDF search engine

120M+ documents Continuously updating since 2006 Low-latency search API RDF/XML, Turtle, RDFa, microformats

Digital Enterprise Research Institute www.deri.ie

The Sitemap protocol

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol

Used by web crawlers Efficiently find all your content &

discover what has been updated

http://sitemaps.org/

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Simple example

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url></urlset>

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Optional parts

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> </url></urlset>

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Huge sitemaps

Gzip-compress your sitemap Limit: 50k URLs or 10MB

split into multiple sitemap filesadd a sitemap index file

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Discovery

Publish the sitemap file Add a line to http://yoursite/robots.txt

Sitemap: http://yoursite/sitemap.xml

Digital Enterprise Research Institute www.deri.ie

sitemap4rdfGenerate Sitemap files from a SPARQL endpoint

Digital Enterprise Research Institute www.deri.ie

sitemap4rdf

Simple command line tool Sends a SPARQL query to list all URIs Generates sitemap

sitemap4rdf http://yoursite/sparql http://yoursite/resource/

Digital Enterprise Research Institute www.deri.ie

Submit the sitemap location - Sindice

http://sindice.com/main/submit

Digital Enterprise Research Institute www.deri.ie

Submit the sitemap location - Google

https://www.google.com/webmasters/tools/

Digital Enterprise Research Institute www.deri.ie

Summary

Sitemap protocol informs search engines about available pages Supported by Sindice!

sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint Open source, Java http://lab.linkeddata.deri.ie/2010/sitemap4rdf/