Upload
herbert-van-de-sompel
View
960
Download
1
Embed Size (px)
DESCRIPTION
This ResourceSync tutorial was presented at OAI8, June 19 2013
Citation preview
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync:A Web-Based
Resource SynchronizationFramework
OAI8 version, June 19 2013
ResourceSync is funded by The Sloan Foundation & JISC#resourcesync
1
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
2
These slides were presented OAI8, Geneva, Switzerland, June 19 2013
The most recent version of the slides is available at
http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Herbert Van de Sompel Los Alamos National Laboratory<[email protected]>@hvdsomp
Robert Sanderson Los Alamos National
Laboratory<[email protected]>@azaroth42
Richard JonesCottage Labs<[email protected]>@cottagelabs
ResourceSync Tutorial Presenters
3
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Martin KleinLos Alamos National Laboratory<[email protected]>
@mart1nkle1n
ResourceSync Tutorial Contributors
Simeon WarnerCornell University
4
Herbert Van de Sompel Los Alamos National Laboratory
<[email protected]>@hvdsomp
Robert SandersonLos Alamos National Laboratory
<[email protected]>@azaroth24
Richard JonesCottage Labs
<[email protected]>@cottagelabs
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSyncCore Team
OAI
Herbert Van de SompelMartin KleinRobert Sanderson(Los Alamos National Laboratory)
Simeon Warner(Cornell University)
Berhard Haslhofer(University of Vienna)
Michael L. Nelson(Old Dominion University)
Carl Lagoze(University of Michigan)
NISO
Todd CarpenterNettie Lagace
Lyrasis
Peter Murray
5
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync Technical Group
JISC
Richard JonesGraham Klyne
Stuart Lewis
OCLC
Jeff Young
LOCKSS
David Rosenthal
RedHat
Christian Sadilek
Ex Libris Inc.
Shlomo Sanders
Library of Congress
Kevin Ford
6
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual Approach
2. Motivation & Use Cases
3. Framework Walkthrough
4. Framework (Technical) Details
5. Implementation
6. Q&A
7
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual Approach
2. Motivation & Use Cases
3. Framework Walkthrough
4. Framework (Technical) Details
5. Implementation
6. Q&A
8
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Synchronize What?
• Web resourceso things with a URI that can be dereferenced
• Focus on needs of research communication and cultural heritage organizationso but aim for generality
9
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Synchronize What?
• Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources)
sync
sync
10
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Synchronize What?
• Low change frequency (weeks/months) to high change frequency (seconds)
sync
sync
sync
11
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Synchronize What?
• Synchronization latency and accuracy needs may vary
sync
Sync ???
12
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Why?
… because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches!
• Project team involved with projects that need this
• Experience with OAI-PMH: widely used in repos buto XML metadata onlyo Attempts at synchronizing actual content via OAI-PMH
(complex object formats, dc:identifier) not successful.o Web technology has moved on since 1999
• Devise a shared solution for data, metadata, linked data?
13
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync Problem
• Consideration:• Source (server) A has resources that change over time: they
get created, modified, deleted• Destination (servers) X, Y, and Z leverage (some)
resources of Source A.• Problem:
• Destinations want to keep in step with the resource changes at Source A: resource synchronization.
• Goal:• Design an approach for resource synchronization aligned
with the Web Architecture that has a fair chance of adoption by different communities.• The approach must scale better than recurrent HTTP
HEAD/GET on resources.
14
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Source: 4 Core Synchronization Capabilities
1. Describing content – publish a list of resources subject to synchronization to enable Destinations to perform an initial load or catch-up with a Source
2. Packaging content – bundle resources to enable bulk download for destinations
3. Describing changes – publish a list of resource changes to enable destinations to stay synchronized and decrease latency
4. Packaging changes – bundle resource changes for bulk download for destinations
15
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Source: Synchronization Features
5. Linking to related resources – provide links from to be synchronized resources to related resources
applicable to all core capabilities (1..4)
6. Access to historical data – provide archives of 1..4
7. Discovery of capabilities – support Destinations in discovering all offered capabilities 1..4
16
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Destination: Synchronization Needs
1. Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source
- avoid out-of-band setup
2. Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source
- subject to some latency; minimal: create/update/delete- allow to catch-up after destination has been offline
3. Audit – A destination should be able to determine whether it is synchronized with a source
- subject to some latency
17
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual Approach
2. Motivation & Use Cases
3. Framework Walkthrough
4. Framework (Technical) Details
5. Implementation
6. Q&A
18
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Use Cases – The Basics
a)
b)
19
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Use Cases – The Basics
c)
d)
20
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Use Cases – The not-so-Basics
e)
f)
21
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Use Cases – The not-so-Basics
g)
h)
22
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
1. Use Case: arXiv Mirroring and Data Sharing
• Repository of scholarly articles in physics, mathematics, computer science, etc.
• > 850k articles• approx. 1.5 revisions per article on
average• approx. 75k new articles per year• Each article has full-text and separate
metadata record• approx. 3.8M resources
23
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
1. Use Case: arXiv Mirroring and Data Sharing
• 2,700 updates dailyo at 8pm ESTo Currently using homebrew mirroring
solution (running with minor modifications since 1994!)
o occasional rsync (file system-specific, auth issues)
24
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Mirroring arXiv: 1994 - 2013
• Operated since the very early days of the Web!
1. HTTP trigger from the main site
2. HTTP pull update specific to mirror site
3. HTTP download of the resources
4. HTTP trigger to main site when mirror process complete
5. HTTP verification (via HEAD) by the main site which updates the update list specific to mirror site
6. periodic repeat as long as there are updates in the inventory for that mirror
• Requires trusted set of servers operating with the same internal organization
• Does not support synchronization check (so rsync is used periodically)
25
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Mirroring
• GOAL: Keep mirror sites synchronized with daily changes
• WANT:o high consistencyo moderate latencyo robustness to global network outages (low admin effort)o ability to verify sync status in case of questions
1. Use Case: arXiv
26
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Data Sharing
• GOAL: Make resources and update information publicly available so that any other service may synchronize at the frequency it needs, e.g.o Math Front at UC Daviso EprintWeb from IOP in UKo Data for bibliometric and scientometric analysis
• WANT:o low admin effort (i.e. standard approach, standard tools)o reasonable consistency, latency, efficiency
1. Use Case: arXiv
27
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
2. Use Case: DBpedia Live Duplication
• Average of 2 updates per second• Low latency desirable => need for a push technology
28
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
2. Use Case: DBpedia Live Duplication
29
• Initial experiment with distributed infrastructure
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
2. Use Case: DBpedia Live Duplication
• Daily traffic:o 99% updateso 0.6% deletionso 0.03% creations
30
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
2. Use Case: DBpedia Live Duplication
• # of content transfer events in two 8 hour intervals
• Max, queue size of remote duplication process
31
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual Approach
2. Motivation & Use Cases
3. Framework Walkthrough
4. Framework (Technical) Details
5. Implementation
6. Q&A
32
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Source Capability 1: Describing Content
In order to advertise the resources that a source wants destinations to know about, it may describe them:
o Publish a Resource List, a list of resource URIs and possibly associated metadata- Destination GETs the Content Description- Destination GETs listed resources by their URI
o Describes state of set of resources at one point in time (snapshot)
33
34
35
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Source Capability 2: Packaging Content
By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms:
o Publish a Resource Dump, a document that points to packages of resource representations and necessary metadata- Destination GETs the package- Destination unpacks the package- ZIP format supported
o Packages set of resources at one point in time (snapshot)
36
37
38
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Source: Modular Capabilities
39
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Source Capability 3: Describing Changes
In order to achieve lower latency, a source may communicate about changes to its resources:
o Publish a Change List, a list of recent change events (created, updated, deleted resource)- Destination acts upon change events, e.g. GETs
created/updated resources, removes deleted resources.o Describes changes to resources that occurred in a temporal
interval with a start- and an end-date
40
41
42
43
44
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Source Capability 4: Packaging Changes
In order to reduce the number of requests to obtain resource changes, a source may provide packaged bitstreams for changed resources:
o Publish a Change Dump, a document that points to packages of recently changed resource representations and necessary metadata - Destination GETs the package- Destination unpacks the package- ZIP format supported
o Packages resources that changed in a temporal interval with a start- and an end-date
45
46
47
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Source: Modular Capabilities
48
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
FrameworkStructure
(light)
49
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
FrameworkStructure
(complete)
50
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Destination: Key Processes
51
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual Approach
2. Motivation & Use Cases
3. Framework Walkthrough
4. Framework (Technical) Details
5. Implementation
6. Q&A
52
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
4. Framework (Technical) Details
1. Sitemaps
2. Pull method
3. Linking between resources
4. Discovery
5. Push method
6. Archives
53
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
4. Framework (Technical) Details
1. Sitemaps
2. Pull method
3. Linking between resources
4. Discovery
5. Push method
6. Archives
54
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
So Many Choices
XMPP
AtomPub
SDShare
RSS
Atom
PubSubHubbub
Sitemap
XMPP
rsync
OAI-PMH
WebDAV Col. Syn.
OAI-ORE
DSNotify
RDFsync
Crawl
Push
Pull
SWORD
SPARQLpush
55
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
So Many Choices
XMPP
AtomPub
SDShare
RSS
Atom
PubSubHubbub
Sitemap
XMPP
rsync
OAI-PMH
WebDAV Col. Syn.
OAI-ORE
DSNotify
RDFsync
Crawl
Push
Pull
SWORD
SPARQLpush
56
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
57
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
A Framework Based on Sitemaps
• Modular framework allowing selective deployment
• Sitemap is the core format throughout the framework
o Introduce extension elements and attributes: - In ResourceSync namespace (rs:) to
accommodate synchronization needso Reuse Sitemap format for all capability documents:
Resource List, Resource Dump, Change List, Change Dump, as well as for manifest in Dumps
o Utilize Sitemap index format where needed/allowed
58
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Sitemap Format
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”>
<url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url>
<url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> </url> …</urlset>
59
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Sitemap Index Format
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”>
<sitemap> <loc>http://example.com/sitemap1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </sitemap>
<sitemap> <loc>http://example.com/sitemap2.xml</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> </sitemap> …</sitemapindex>
60
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync Sitemap Extensions
<urlset xmlns=http://www.sitemaps.org/schemas/sitemap/0.9 xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </url> <url> … </url></urlset>
61
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync Sitemap Extensions
<sitemapindex xmlns=http://www.sitemaps.org/schemas/sitemap/0.9 xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/><sitemap> <loc>http://example.com/sitemap1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </sitemap>…</sitemapindex>
62
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
4. Framework (Technical) Details
1. Sitemaps
2. Pull method
3. Linking between resources
4. Discovery
5. Push method
6. Archives
63
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability 1: Resource List
64
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability 1: Resource List
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" from="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url></urlset>
65
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Resource List
• Describe Source’s resources that are subject to synchronization• At one point in time (snapshot)
• Typical Destination use: Baseline Synchronization, Audit
• Each URI typically listed only once• Might be expensive to generate• Destinations use @from to determine freshness• Issue GETs against URIs to obtain resources• Very similar to current Sitemaps
66
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability 2: Resource Dump
67
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability 2: Resource Dump
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump" from="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/resourcedump_part1.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”97553" type=”application/zip"/> </url> <url> <loc>http://example.com/resourcedump_part2.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”21294" type=”application/zip"/> </url></urlset>
68
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Resource Dump Manifest
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump-manifest" from="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type="text/html" path=”/resources/res1"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type=”application/pdf” path=”/resources/res2"/> </url></urlset>
69
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Resource Dump
• Package Source’s resourcesthat are subject to synchronization• At one point in time (snapshot)
• Points to ZIP packages• Mandatory, even for only one ZIP• ZIP package contains manifest, listing contained bitstreams• Typical Destination use: Baseline Synchronization, bulk
download
• Each URI typically listed only once• Might be expensive to generate• Destinations use @from to determine freshness• GETs against individual URIs from Resource List achieves the
same result (ignoring varying freshness)
70
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability 3: Change List
71
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability 3: Change List
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url></urlset>
72
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Change List
• Describe Source’s resource changes• Occurring during temporal interval with start- and end-date
• Typical Destination use: Incremental Synchronization, Audit
• Changes are listed in chronological order• Multiple changes to one URI may result in multiple listing of
same URI• Source determines duration of temporal interval• Destinations use @from and @until to determine freshness• Issue GETs against URIs to obtain changed resources
73
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability 4: Change Dump
74
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability 4: Change Dump
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changedump" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/change_dump_part1.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length="887" type=”application/zip"/> </url> <url> <loc>http://example.com/change_dump_part2.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”9767" type=”application/zip"/> </url></urlset>
75
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Change Dump Manifest
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changedump-manifest" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" length=”2887” type=”text/html” path=”changes/res1”/> </url> <url> … </url></urlset>
76
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Change Dump
• Package Source’s resources that have changed• during temporal interval with start- and end-date
• Points to ZIP packages• Mandatory, even for only one ZIP• ZIP package contains manifest, listing contained bitstreams• Typical Destination use: Incremental Synchronization, bulk
download of changes
• Changes in Change Dump Manifest listed in chronological order• Same URI can be listed multiple times• Might be expensive to generate• Destinations use @from and @until to determine freshness
77
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Recall… *Index
<changelist_index.xml>
<changelist1.xml>
78
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Change List Index <changelist_index.xml>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <sitemap> <loc>http://example.com/changelist1.xml</loc> <lastmod>2013-01-02T11:00:00Z</lastmod> <rs:md type="application/xml"/> </sitemap> <sitemap> <loc>http://example.com/changelist2.xml</loc> <lastmod>2013-01-02T23:00:00Z</lastmod> <rs:md type="application/xml"/> </sitemap></urlset>
79
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Change List <changelist1.xml>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs=http://www.openarchives.org/rs/terms/> <rs:ln rel=”up” href=”http://example.com/changelist_index.xml”/> <rs:md capability="changelist" from="2013-01-02T09:00:00Z” until="2013-01-02T21:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url></urlset>
80
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
4. Framework (Technical) Details
1. Sitemaps
2. Pull method
3. Linking between resources
4. Discovery
5. Push method
6. Archives
81
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Supported Linking Use Cases
The web is based on links between resources, many of which are important to understand for synchronization.
1. Mirrored content with multiple download locations
2. Alternate representations of the same content
3. Patching content rather than replacing
4. Resources and their metadata
5. Prior versions of resources
6. Collection membership of resources
7. Republishing synchronized resources
All cases are handled with a <rs:ln> element referring to the remote resource
82
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Notes about Linked Resources
Some important things to keep in mind about linked resources:
• They may also be subject to synchronization• They may be updated in a very different schedule to the
resource it is linked from• Therefore, it is recommended to convey metadata about the
linked resource too• Links can be bi-directional – the linked resource can link back to
the linking resource
83
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #1 - Mirror
1. Mirrored content with multiple download locations
This might occur due to:• Content distribution networks• Mirror sites• Backup locations• Load balancing
84
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #1 - Mirror
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”duplicate” pri=”1” href=”http://mirror1.example.com/res1"/> <rs:ln rel=”duplicate” pri=”2” href=”http://mirror2.example.com/res1"/> </url></urlset>
85
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #2 – Alternate Representations
2. Alternate representations of the same content
This might occur due to:• Server supports HTTP content negotiation• Multiple copies of the same resource• Format migration for preservation reasons • Different clients wanting different formats• Multiple languages of the content
86
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #2 – Alternate Representations
87
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel="alternate" type="text/html" href="http://example.com/res1.html"/> <rs:ln rel="alternate" type=“application/pdf" href=”http://example.com/res1.pdf"/> </url></urlset>
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #2 – Alternate Representations
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1.html</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”canonical” href="http://example.com/res1"/> </url></urlset>
88
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #3 – Patching Content
3. Patching content rather than replacing
This might occur due to:• Resources are very large and server wishes to conserve
bandwidth where possible• Changes are frequent and small• Changes are managed in a CMS that tracks differences• Format exists or can be described that is machine
processable to replicate the change
89
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #3 – Patching Content
90
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1.json</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated” length=“398723”/> <rs:ln rel=”http://www.openarchives.org/rs/terms/patch” type=”application/json-patch” modified=“2013-01-02T17:00:00Z” length=“58” href=”http://example.com/res1-patch.json"/> </url></urlset>
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #4 – Metadata about Resources
4. Resources and their metadata
This might occur due to:• Resources have additional metadata records, which are
useful for understanding the resource• Such as cultural heritage images, audio, video• Collections with descriptive metadata• Resources with technical metadata• Administrative or Rights metadata
91
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #4 – Metadata about Resources
92
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”describedby” type=”application/xml” href=”http://example.com/metadata/res1.xml"/> </url></urlset>
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #4 – Metadata about Resources
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/metadata/res1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”describes” type=”text/html” href=”http://example.com/res1"/> </url></urlset>
93
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #5 – Prior Versions of Resources
But first…
94
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Memento Intermezzo
http://www.mementoweb.org/
URI for Original, URI for Version
URI-M - http://web.archive.org/web/20010911203610/http://www.cnn.com/
Web Archive
URI-R - http://www.cnn.com/
URI for Original, URI for Version
URI-M - http://en.wikipedia.org/w/index.php?title=September_11_attacks&oldid=282333
CMS
URI-R - http://en.wikipedia.org/wiki/September_11_attacks
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #5 – Prior Versions of Resources
104
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”memento” href=”http://example.com/past/20130102130000/res1"/> <rs:ln rel=”timegate” href=”http://example.com/timegate/res1"/> <rs:ln rel=”timemap” href=“http://example.com/timemap/res1” type=“application/link-format”/> </url></urlset>
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #6 – Collection Membership
6. Collection membership of resources
This might occur due to:• Resources being part of OAI-ORE aggregations• Resources being part of OAI-PMH sets• Or any other type of collections of resources
105
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #6 – Collection Membership
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”collection” href=”http://example.com/aggregation/allres"/> </url></urlset>
106
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #7 – Republishing Resources
7. Republishing synchronized resources
This might occur due to:• Aggregator systems that harvest resources from remote
sites and then republish them at new URIs• Examples include Blog republishing, content distribution
networks, mirrored or combined collections• Hypothetical scenario: Lots of little museums with small
collections, and a large European/American aggregating digital library system that wants to provide fast, combined access to the content (with permission)
107
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #7 – Republishing Resources
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”via” modified=“2013-01-02T10:00:00Z” href=”http://original.example.org/res1"/> </url></urlset>
108
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Linking #7 – Republishing Resources
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://aggregator.example.com/res1</loc> <lastmod>2013-01-02T18:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”via” modified=“2013-01-02T13:00:00Z” href=”http://example.org/res1"/> <rs:ln rel=”via” modified=“2013-01-02T10:00:00Z” href=”http://original.example.org/res1"/> </url></urlset>
109
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
4. Framework (Technical) Details
1. Sitemaps
2. Pull method
3. Linking between resources
4. Discovery
5. Push method
6. Archives
110
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Discovery ofCapabilities
111
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Discovery of Capability Documents
Requirements:• Need to discover capability documents, i.e. Resource List,
Resource Dump, Change List, Change Dump, Archives• Need to know the type of capability each document
represents.
Approach:• The Capability List provides links to these capability documents,
if the Source supports them.• These links have appropriate relation types, e.g.
“resourcelist”, “changelist”, etc.
112
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Capability List
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”capabilitylist”/> <rs:ln rel=“resourcesync” href=“http://example.com/.well-known/resourcesync”/> <url> <loc>http://aggregator.example.com/dataset1/resourcelist.xml</loc> <rs:md capability=”resourcelist”/> </url> <url> <loc>http://aggregator.example.com/dataset1/changelist.xml</loc> <rs:md capability=”changelist”/> </url></urlset>
113
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
114
Requirements:• Need to discover a Capability List
Approach:• HTTP Link header from resources subject to synchronization,
relation type “resourcesync”• Links from HTML document <head>, relation type
“resourcesync”• Links from Capability documents, relation type “up”
Link header on example.com/res1.pdf
Link: <example.com/dataset1/capabilitylist.xml>;rel=“resourcesync”
Discovery of Capability Lists
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Discovery ofCapabilities
115
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Discovery: ResourceSync Description
Requirements:• Support for multiple Capability Lists, one per “set of
resources”• Need to discover these Capability Lists• Need descriptive information about each set of resources
that a Capability List pertains to• Useful to have descriptive information about the Source itself
Approach:• The ResourceSync Description document meets these
requirements. • It should be at a particular location to avoid having registries:
http://(hostname)/.well-known/resourcesync• It can be linked to from the Capability Lists as well.
116
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Discovery ofCapabilities
117
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync Description
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcesync”/> <rs:ln rel=“describedby” href=“http://example.com/info_about_source.xml”/> <url> <loc>http://aggregator.example.com/dataset1/capabilitylist.xml</loc> <rs:md capability=”capabilitylist”/> <rs:ln rel=“describedby” href=“http://example.com/dataset1/info_about_dataset1.xml”/> </url> <url> <loc>http://aggregator.example.com/dataset2/capabilitylist.xml</loc> <rs:md capability=”capabilitylist”/> <rs:ln rel=“describedby” href=“http://example.com/dataset2/info_about_dataset2.xml”/> </url></urlset>
118
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Discovery ofCapabilities
119
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
4. Framework (Technical) Details
1. Sitemaps
2. Pull method
3. Linking between resources
4. Discovery
5. Push method
6. Archives
120
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Motivation for a Push Component in ResourceSync
• Reduce synchronization latency by having the Source push out resource change information• To avoid continuous pull of Change Lists by Destinations
• Share information about changes to the Source’s ResourceSync implementation, e.g. announcement of new Resource List, new Capability List, etc.• To avoid continuous polling of e.g. Resource Lists,
ResourceSync Description
121
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Notification Types
• Events pertaining to a resource• updated | created | deleted for a resource• 3rd party defined events
• Events pertaining to a set of resources• updated | created | deleted for a Resource List, Resource
Dump, Change List, Change Dump, Archives• 3rd party defined events
• Events pertaining to the overall ResourceSync implementation• updated | created | deleted for a Capability List,
ResourceSync Description• 3rd party defined events
122
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Possible Push Technology: XMPP PubSub
Other technologies: WebSockets, HTTP callback
123
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Notification Payload
• Payload the same irrespective of transport protocol• Use <urlset> as encapsulating element• One <url> element per notification
124
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Notification Payload – Resource Update (XMPP)
<xmpp:iq from=“[email protected]” to=“[email protected]” type=“set” id=“liAJUz3S”> <xmpp:pubsub> <xmpp:publish node=“resource_notification_channel”> <xmpp:item id=“1234577”> <sm:urlset xmlns:sm=“http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs=“http://www.openarchives.org/rs/terms/”> <sm:url> <sm:loc>http://example.com/res1</sm:loc> <sm:lastmod>2013-01-02T14:00:00Z</sm:lastmod> <rs:md change=“updated” hash=“md5:12324324jhhjl234234” length=“987665” type=“application/pdf”/> </sm:url> </sm:urlset> </xmpp:item> </xmpp:publish> </xmpp:pubsub></xmpp:iq>
125
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Notification Payload – Capability Update (XMPP)<xmpp:iq from=“[email protected]” to=“[email protected]” type=“set” id=“liAJUz3S”> <xmpp:pubsub> <xmpp:publish node=“changelist_notification_channel”> <xmpp:item id=“1234577”> <sm:urlset xmlns:sm=“http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs=“http://www.openarchives.org/rs/terms/”> <sm:url> <sm:loc>http://example.com/dataset1/changelist.xml</sm:loc> <sm:lastmod>2013-01-02T14:00:00Z</sm:lastmod> <rs:md capability=“changelist” change=“updated”/> </sm:url> </sm:urlset> </xmpp:item> </xmpp:publish> </xmpp:pubsub></xmpp:iq>
126
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Considerations
• Notification channels• Multiple channels per Source to divide up notifications, e.g.
• a channel for changes pertaining to all resources that belong to a set of resources
• a channel for changes to capabilities for a set of resources
• Server-side filtering preferred over client-side
• Authentication/Authorization• To subscribe/create channels
127
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Considerations
• Delayed notification• Insurance that Destination does not miss anything
• Discovery• Links to channels e.g. from a Capability List• Links from channels to other channels• Provide channel metadata (transport protocol info etc.)
128
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs=“http://www.openarchives.org/rs/terms/”>… <url> <loc>xmpp:pubsub.example.com/dataset1?;node=resource_notification_channel</loc> <rs:md capability=“resource-notification”/> <rs:ln rel=“alternate” href=“ws://example.com/dataset1/meta_notification_channel”/> </url> <url> <loc>xmpp:pubsub.example.com/dataset1?;node=capability_notification_channel</loc> <rs:md capability=“capability-notification”/> </url> <url> <loc>xmpp:pubsub.example.com/dataset1?;node=resourcesync_notification_channel</loc> <rs:md capability=“resourcesync-notification”/> </url></urlset>
Push Channel Discovery
129
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
4. Framework (Technical) Details
1. Sitemaps
2. Pull method
3. Linking between resources
4. Discovery
5. Push method
6. Archives
130
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync Framework Component: Archives
In order to allow a Source to hold on to historical data and Destinations to catch up with events it has missed:
o Publish a - Resource List Archive, - Resource Dump Archive,- Change List Archive, and/or a - Change Dump Archive
o Documents, listing historical capability documents
131
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Resource List Archive
132
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist-archive" from="2013-01-09T13:00:00Z"/> <url> <loc>http://example.com/resourcelist1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url> <url> <loc>http://example.com/resourcelist2.xml</loc> <lastmod>2013-01-09T13:00:00Z</lastmod> </url> <url> … </url></urlset>
Resource List Archive
133
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Resource Dump Archive
134
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcedump-archive" from="2013-02-10T03:00:00Z"/> <url> <loc>http://example.com/resourcedump1.xml</loc> <lastmod>2013-01-10T03:00:00Z</lastmod> </url> <url> <loc>http://example.com/resourcedump2.xml</loc> <lastmod>2013-02-10T03:00:00Z</lastmod> </url> <url> … </url></urlset>
Resource Dump Archive
135
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Change List Archive
136
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist-archive" from="2013-02-01T23:00:00Z until="2013-02-03T23:00:00Z"/> <url> <loc>http://example.com/changelist1.xml</loc> <lastmod>2013-02-01T23:00:00Z</lastmod> </url> <url> <loc>http://example.com/changelist2.xml</loc> <lastmod>2013-02-02T23:00:00Z</lastmod> </url> <url> <loc>http://example.com/changelist3.xml</loc> <lastmod>2013-02-03T23:00:00Z</lastmod> </url></urlset>
Change List Archive
137
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Change Dump Archive
138
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changedump-archive" from="2013-02-10T03:00:00Z until="2013-02-17T03:00:00Z"/> <url> <loc>http://example.com/changedump1.xml</loc> <lastmod>2013-02-10T03:00:00Z</lastmod> </url> <url> <loc>http://example.com/changedump2.xml</loc> <lastmod>2013-02-17T03:00:00Z</lastmod> </url> <url> … </url></urlset>
Change Dump Archive
139
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual Approach
2. Motivation & Use Cases
3. Framework Walkthrough
4. Framework (Technical) Details
5. Implementation
6. Q&A
140
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Implementation #1:The Metadata Harvesting Use Case
141
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
The Metadata Harvesting Use Case
1. Identification of metadata records within a service
2. Use of standards in metadata formats
3. Incremental updates
4. Create, Update, Delete
5. Sets
142
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
The Metadata Harvesting Use Case
1. Identification of metadata records within a service
2. Use of standards in metadata formats
ResourceSync does not specifically care about metadata records, only resources. It is up to the server to identify which of those resources are metadata.
We are free to annotate a resource's entry with appropriate metadata to indicate the format.
143
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
The Metadata Harvesting Use Case
3. Incremental updates
4. Create, Update, Delete
5. Sets
All resources that can be obtained from a change list will be annotated with the kind of change that happened to them.
ResourceSync allows the server to publish lists of resources and changes and indexes of those lists all annotated with metadata.
ResourceSync publishes changes as static documents. The client is then free to walk up and down the change lists provided by the server.
144
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
(Required) Documents formetadata harvesting use case
145
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Describing Metadata Resources
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" from="2013-05-05T13:00:00Z"/> <url> <loc>http://mydspace.edu/dspace-rs/resource/123456789/7/qdc</loc> <lastmod>2013-05-01T19:09:35Z</lastmod> <changefreq>never</changefreq> <rs:md type=”application/xml”/> <rs:ln href="http://mydspace.edu/bitstream/123456789/7/1/bitstream.pdf" rel="describes"/> <rs:ln href="http://mydspace.edu/bitstream/123456789/7/2/image.jpg" rel="describes"/> <rs:ln href="http://mydspace.edu/123456789/3" rel=”collection"/> </url></urlset>
146
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Describing Bitstream Resources<urlset … <url> <loc>http://mydspace.edu/bitstream/123456789/7/1/bitstream.pdf</loc> <lastmod>2013-05-01T19:09:35Z</lastmod> <changefreq>never</changefreq> <rs:md hash="md5:75d0ea94097a05fce9aca5b079e2f209" length="419805" type="application/pdf"/> <rs:ln href="http://mydspace.edu/dspace-rs/resource/123456789/7/qdc" rel="describedby"/> <rs:ln href="http://mydspace.edu/dspace-rs/resource/123456789/7/mets" rel="describedby"/> <rs:ln href="http://mydspace.edu/dspace-rs/resource/123456789/12/qdc" rel="describedby"/> <rs:ln href="http://mydspace.edu/123456789/2" rel=”collection"/> </url></urlset>
147
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Serving Metadata Resources
http://mydspace.edu/dspace-rs/resource/123456789/7/qdc
ResourceSync webapp Item handle Metadata Format
metadata.formats = \ qdc = http://purl.org/dc/terms/, \ mets = http://www.loc.gov/METS/
metadata.types = \ qdc = application/xml, \ mets = application/xml
<loc>http://mydspace.edu/dspace-rs/resource/123456789/7/qdc<loc> <rs:md type="application/xml”/> <rs:ln href="http://purl.org/dc/terms/" rel="describedby"/>
<loc>http://mydspace.edu/dspace-rs/resource/123456789/7/mets</loc> <rs:md type="application/xml”/> <rs:ln href="http://www.loc.gov/METS/" rel="describedby"/>
148
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Generating Documents1. Initialise
Creates initial Capability List and Resource List documents
[dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -i
2. Update
Creates a new Change List which covers the period since the last Change List was created
[dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -u
3. Rebase
A combination of both Initialise and Update.
[dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -r
149
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Usage of Resources by clients
150
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Impact on DSpace
151
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
URLs• Stable identifiers for archived items• Stable identifiers for unarchived items• Stable identifiers for metadata resources (in their various formats)• Stable identifiers for previous versions
Provenance• History of changes to an item/bitstream• Item/bitstream deletions (vs withdraw)• Bitstream create/update dates• Item create/update dates
152
?
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Versioning• Access of previous versions of both metadata and bitstreams• Stable identifiers for previous versions of both metadata and bitstreams
Metadata Resources• Metadata in a variety of formats• Metadata as file/bitstream
?
?
153
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Admin Files• ResourceSync documents (Resource Lists, Change Lists, etc)• ResourceSync exports - Resource Dumps, Change Dumps• Metadata exports in a number of formats
Scheduled Tasks• Regular generation of RS documents
Complex Objects• Item/bitstream relationships• Collections of content
154
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Dspace Module:https://github.com/CottageLabs/DSpaceResourceSync
depends on the common java library:https://github.com/CottageLabs/ResourceSyncJava
PHP client:https://github.com/stuartlewis/resync-php
depends on the SWORDv2 clienbt library:https://github.com/swordapp/swordappv2-php-library/
Get the software!
155
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Implementation #2:ResourceSync at arXiv.org
156
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync @ arXiv
• Use ResourceSync for both mirroring and public data accesso efficient updateso ability to do periodic auditso public synchronization capabilityo reduce admin burden
• Likely start with metadata + source for mirroring use case (doing experiments now)
• Open access use cases requires processed PDF also• Some concerns about likely use/load…
157
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
158
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Alternate download location
• Likely want to separate machine accesses from human accesses to preserve response time on main server
=> Use Mirrored Content part of spec
o <loc> specifies canonical URI - e.g. http://arxiv.org/pdf/1306.1073v1.pdf
o <rs:ln rel=“duplicate”> specifies preferred download location- e.g. http://export.arxiv.org/pdf/1306.1073v1.pdf
159
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
<url> <loc>http://arxiv.org/pdf/1306.1073v1.pdf</loc> <lastmod>2013-06-06T00:57:12Z</lastmod> <rs:md hash="md5:e08e0c4e4d7b0895120014f0aa09e7c4" length="287714” type=”application/pdf"/> <rs:ln rel="duplicate” pri="1" href="http://export.arxiv.org/pdf/1306.1073v1.pdf" modified="2013-06-06T02:00:59Z"/></url>
160
Alternate download location
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Getting a copy of arXiv
It might be as easy as:
(of course, you probably have to wait a while but it is nice to know ResourceSync is stateless so one can efficiently restart)
161
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual Approach
2. Motivation & Use Cases
3. Framework Walkthrough
4. Framework (Technical) Details
5. Implementation
6. Q&A
162
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
163
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
164
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
165
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Timeline
• June 2013o Version 0.9 of ResourceSync framework specification releasedo Soliciting broad feedback
• July 2013o Version 0.x of Push-based methods for ResourceSync
• Fall 2013o Specification becomes NISO standard
166
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
Pointers
• Specification
http://www.openarchives.org/rs/http://www.openarchives.org/rs/0.9/resourcesynchttp://www.openarchives.org/rs/0.9/archives
• List for public comment
https://groups.google.com/d/forum/resourcesync
• Simulator codeo http://github.org/resync/simulator
167
ResourceSync TutorialJune 19th 2013 OAI8, Geneva, Switzerland
ResourceSync:A Web-Based
Resource SynchronizationFramework
ResourceSync is funded by The Sloan Foundation & JISC
#resourcesync
168