Upload
herbert-van-de-sompel
View
4.765
Download
5
Embed Size (px)
DESCRIPTION
This presentation provides updated technical information regarding the Memento framework to support time travel on the Web. Its technical content overrides the first Memento presentation (http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web). More Memento information is available at http://www.mementoweb.org.
Citation preview
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Memento Team
Herbert Van de Sompel Michael L. Nelson Robert Sanderson
Lyudmila Balakireva Scott Ainsworth Harihar Shankar
Updated Technical Details (March 2010)
Memento: Time Travel for the Web
Memento is partially funded by the Library of Congress
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento wants to make navigating the Web’s Past Easy
2
http://www.mementoweb.org http://groups.google.com/group/memento-dev
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Recap of the Basics …
3
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
W3C Web Architecture: Resource – URI - Representation
Resource Representation
Represents
URI Identifies
dereference
4
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
dereference content negotiation W3C Web Architecture: Resource – URI - Representation
Resource URI
Identifies
Representation 1 Represents
Representation 2 Represents 5
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Problem Statement …
6
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Resources
7
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Resources have Representations
8
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Resources have Representations that Change over Time
9
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Only the Current Representation is Available from a Resource
10
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Old Representations are Lost Forever
11
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Archived Resources Exist
12
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Archived Resources
http://web.archive.org/web/20010911203610/http://www.cnn.com/ archived resource for http://cnn.com http://en.wikipedia.org/w/index.php?
title=September_11_attacks&oldid=282333 archived resource for http://en.wikipedia.org/wiki/
September_11_attacks
Sep 11 2001, 20:36:10 UTC Dec 20 2001, 4:51:00 UTC
13
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Finding Archived Resources
Go to http://www.archive.org/ and search http://cnn.com On http://web.archive.org/web/*/http://cnn.com, select
desired datetime 14
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Finding Archived Resources
Go to http://en.wikipedia.org/wiki/September_11_attacks
and click History Browse History
15
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigating Archived Resources
http://en.wikipedia.org/w/index.php?title=September_11_attacks&oldid=282333 archived
resource for http://en.wikipedia.org/wiki/September_11_attacks3
Dec 20 2001, 4:51:00 UTC
http://en.wikipedia.org/wiki/The_Pentagon
current
Pentagon
16
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigating Archived Resources
http://web.archive.org/web/20010911203610/http://www.cnn.com/ archived resource for http://cnn.com http://web.archive.org/web/20010911213855/
www.cnn.com/TECH/space/
Sep 11 2001, 20:36:10 UTC Sep 11 2001, 21:38:55 UTC
SPACE
17
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Current and Past Web are Not Integrated
18
• Current and Past Web based on same technology.
• But, going from Current to Past Web is a matter of (manual) discovery.
• Memento wants to make going from Current to Past Web a (HTTP) protocol matter.
• Memento wants to integrate Current And Past Web.
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Memento Approach …
19
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
http://en.wikipedea.org/wiki/Web_Archiving
20
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
http://en.wikipedea.org/wiki/Web_Archiving Oct 11 2009, 05:30:33 UTC Set browser time dial to …
21
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
http://en.wikipedea.org/wiki/Web_Archiving
Oct 01 2009, 16:30:00 UTC From Wikipedia History Oct 11 2009, 05:30:33 UTC Set browser time dial to …
22
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
http://en.wikipedea.org/wiki/Web_Archiving
Robots Exclusion Protocol Oct 11 2009, 05:30:33 UTC
Oct 01 2009, 16:30:00 UTC From Wikipedia History Oct 11 2009, 05:30:33 UTC Set browser time dial to …
23
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
Oct 11 2009, 05:30:33 UTC http://en.wikipedea.org/wiki/Robots_exclusion_protocol
24
Oct 11 2009, 05:30:33 UTC Browser time dial still at …
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
Oct 11 2009, 05:30:33 UTC http://en.wikipedea.org/wiki/Robots_exclusion_protocol
Sep 15 2009, 20:49:00 UTC From Wikipedia History
25
Oct 11 2009, 05:30:33 UTC Browser time dial still at …
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
Oct 11 2009, 05:30:33 UTC http://en.wikipedea.org/wiki/Robots_exclusion_protocol
Robots Exclusion Oct 11 2009, 05:30:33 UTC
Sep 15 2009, 20:49:00 UTC From Wikipedia History
26
Oct 11 2009, 05:30:33 UTC Browser time dial still at …
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
Oct 11 2009, 05:30:33 UTC http://www.robotstxt.org/
27
Oct 11 2009, 05:30:33 UTC Browser time dial still at …
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Navigate the Web of the Past
Oct 11 2009, 05:30:33 UTC http://www.robotstxt.org/ Nov 09 2007, 06:21:04 UTC From Internet Archive
28
Oct 11 2009, 05:30:33 UTC Browser time dial still at …
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How does Memento achieve this?
There are two components to the Memento Solution:
• Component 1: Navigation towards an archived resource via its original resource, by leveraging content negotiation.
• Component 2: A discovery API for archives that allows requesting a list of all archived versions it holds for a resource with a given URI.
29
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How does Memento achieve this?
• Component 1: Navigation towards an archived resource via its original resource, by leveraging content negotiation.
30
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Web without a Time Dimension
31
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Web without a Time Dimension
32
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Web without a Time Dimension
33
Need to use a different URI to access archived versions of a resource and its current version
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Web with Time Dimension added by Memento
34
In Memento: use URI of the current version to access archived versions, but qualify it with datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Web with Time Dimension added by Memento
35
… and magically arrive at an archived version
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How does Memento achieve this?
In order to fully understand how Memento introduces a time dimension to the Web, we present a brief recap of Transparent Content Negotiation (conneg) in HTTP.
RFC 2295. Transparent Content Negotiation in HTTP, http://www.ietf.org/rfc/rfc2295.txt
36
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP GET on URI A
37
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET with conneg on URI T – Server Choice – 302 Found – Step 1
38
transparently negotiable resource
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET with conneg on URI T – Server Choice – 302 Found – Step 2
39
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET with conneg on URI T – Server List – 406 Not Acceptable
40
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How does Memento do This?
• Component 1: Navigation towards an archived resource via its original resource, by leveraging content negotiation.
41
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Terminology Intermission
We introduce the term Memento to refer to an archived version of a resource.
A Memento for a resource URI-R (as it existed) at time ti is a resource URI-Mi [URI-R@ti] for which the representation at any moment past its creation time tc is the same as the representation that was available from URI-R at time ti, with tc >= ti. Implicit in this definition is the notion that, once created, a Memento always keeps the same representation.
42
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
DT-conneg: Content Negotiation in the datetime dimension
• RFC 2295 introduces conneg in the following dimensions: media type, language, compression, character set, e.g.:
- HTTP Request: o Accept-Language: en-US
o HTTP Response: o Content-Language: en-US
• Inspired by RFC 2295, Memento introduces datetime conneg: - HTTP Request:
o Accept-Datetime: Mon, 12 Oct 2009 14:20:33 GMT
o HTTP Response: o Content-Datetime: Sun, 11 Oct 2009 11:18:05 GMT
43
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
DT-conneg: Content Negotiation in the datetime dimension
• This means that somewhere, we will need transparently negotiable resources (cf. slides 38-40) that supports the datetime dimension to get to appropriate Mementos.
• This will be discussed for 2 classes of servers: o Web servers without internal archival capabilities; o Web servers with internal archival capabilities.
44
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Servers Without Internal Archival Capabilities
• This type includes: o Servers that are crawled by a web archive, such as the
Internet Archive o Servers with an associated transactional archive
• These servers are not aware of the details of Mementos of their resources held by external archives.
• These servers do not have the essential information (URI-Ms, and associated datetimes) to respond to a DT-conneg request.
• But they can be constructive by pointing (HTTP Link) a client to an archive that can respond to the DT-conneg request.
o Unconditionally do this for resources for which Mementos are conceivably available in the archive.
45
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
http://lanlsource.lanl.gov/ hello
current
http://mementoarchive.lanl.gov/store/ta/20091021120001/http://lanlsource.lanl.gov/hello
Oct 04 2009, 12:00:01 UTC
Oct 21 2009, 12:00:01 UTC
Oct 10 2009, 12:00:03 UTC
46
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
original resource Mementos
original server archival server 47
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
original resource
variant resources Mementos
transparently negotiable resource
DT-conneg with URI-G to get URI-M
original server archival server 48
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
original resource
variant resources Mementos
DT-conneg with URI-G to get URI-M transparently
negotiable resource
original server archival server 49
HTTP Link
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Terminology Intermission
We introduce the term TimeGate to refer to a transparently negotiable resource that supports the datetime dimension.
A TimeGate for an original resource URI-R is a transparently negotiable resource URI-G[URI-R] for which all variant resources are Mementos URI-Mi[URI-R@ti] of the resource URI-R. Since multiple archives may host versions of URI-R, multiple TimeGates may exist for any given resource, i.e. one per archive.
50
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
original resource
variant resources Mementos
DT-conneg with URI-G to get URI-M transparently
negotiable resource TimeGate
original server archival server 51
HTTP Link
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How to redirect from Original Resource to its (external) TimeGate
• Q1: Which archive to HTTP Link to?
o The archive with the best coverage for the server at hand. o Always redirect to an Aggregator (see slides 110-139) o No redirection by server: client takes control, accessing its
preferred TimeGate, bypassing Original Resource.
• Q2: What is the TimeGate URI-G for URI-R on the chosen archive?
o Convention for syntax of URI-G as function of URI-R. - http://web.archive.org/web/timegate/http://cnn.com
o Always redirect to an Aggregator (see slides 110-139)
52
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Servers With Internal Archival Capabilities
• This type includes: o Content Management Systems o Version Control Systems o Servers that archive resource representations in the cloud
and keep track of the URIs and datetimes of remotely archived resources.
• These servers have all the essential information (URI-Ms, and associated datetimes) to respond to a DT-conneg request.
• The previous architectural solution is maintained to enforce strict distinction between handling requests for current and past representations.
53
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
http://en.wikipedia.org/wiki/ September_11_attacks
current Dec 20 2001, 4:51:00 UTC
Dec 31 2004, 20:46:00 UTC
Dec 20 2008, 22:21:00 UTC http://en.wikipedia.org/w/index.php? title=September_11_attacks&oldid=259237305
54
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Mementos original resource
original server 55
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
original resource
variant resources
Mementos DT-conneg with URI-G to get URI-M
transparently negotiable resource TimeGate
original server 56
HTTP Link
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How to redirect from Original Resource to its (internal) TimeGate
• Q1: Which archive to HTTP Link to?
o No problem as the archive and the original server coincide: Original Resources, TimeGates, Mementos are on same server.
• Q2: What is the TimeGate URI-G for URI-R on the chosen archive?
o Can be internal convention for syntax of URI-G as function of URI-R.
- For example, MediaWiki: http://a.wiki.org/wiki/Special:TimeGate/http://a.wiki.org/wiki/the_title
57
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Headers in Memento
58
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The HTTP Headers used in Memento …
• Define two new headers: – request: Accept-Datetime: – response: Content-Datetime:
• Introduce new content for two existing headers: – response: Vary: ; Link:
• Use one existing headers without modification: – response: Location:, TCN:
59
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Request Headers for DT-conneg
• Accept-Datetime: o Issued against TimeGate, (Original Resource), (Memento) o Header content: desired datetime of Memento
Accept-Datetime: Mon, 12 Oct 2009 14:20:33 GMT
60
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Response Headers for DT-conneg
• Content-Datetime: o Returned by Mementos
- Even when not as a result of DT-conneg. o Symmetrical with regular conneg e.g. Content-Type, Content-
Language, … o Header content: archival datetime of the Memento that is being
returned.
• Note: This header is crucial to allow a client to understand it has arrived at a Memento.
Content-Datetime: Mon, 12 Oct 2009 14:20:33 GMT
61
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Response Headers for DT-conneg
• TCN: o Returned by TimeGate o Same use as in regular conneg o Header content:
- Choice: – 302 response – Chosen Memento in Location header – Alternative Mementos listed in Link header
- List: – 406, 300 response – Possible Mementos listed in Link header (and
optionally in body as HTML)
TCN: choice
62
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Response Headers for DT-conneg
• Vary: o Returned by TimeGate o Similar to regular conneg o Header content:
- negotiate, accept-datetime
• Note: accept-datetime content in Vary header is crucial to allow a client to understand it has arrived at a TimeGate.
Vary: negotiate, accept-datetime
63
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Response Headers for DT-conneg
• Location: o Returned by TimeGate o Similar to regular conneg o Header content: Location of Memento
Location: http://web.archive.org/web/20010911223004/http://cnn.com
64
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Response Headers for DT-conneg
• Link: o Returned by Original Resource, TimeGate and Mementos
65
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Response Headers for DT-conneg
• Link header content for Original Resource: o Recommended: URI-G of TimeGate
- rel="timegate"
66
Link: <http://web.archive.org/web/timegate/http://cnn.com/>; rel="timegate"
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Response Headers for DT-conneg
• Link header content for TimeGate, Mementos: o Required: URI-R of Original Resource
- rel="original" . Note: even when not as part of DT-conneg. o Required: URI-M of first and last available Mementos;
- rel="first-memento"; datetime="…" - rel="last-memento"; datetime="…"
o Recommended: URI-M of time-adjacent Mementos; - rel="prev-memento"; datetime="…" - rel="next-memento"; datetime="…"
o Recommended: URI-B of TimeBundle - rel="timebundle"
o Optional: URI-M of other Mementos - rel="memento"; datetime="…"
67
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
HTTP Response Headers for DT-conneg
• Link header content for TimeGate, Mementos:
68
Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle", <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel="first-memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel="last-memento"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="prev-memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="next-memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT"
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Two Memento HTTP Navigations
69
Details at http://www.mementoweb.org/guide/http/
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: Notes (1)
GET G, Accept-Datetime
• Accept-Datetime header only essential when communication with URI-G. Not necessary when communicating with URI-R, URI-M.
• GET G can be optimized to HEAD G
71
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
LinkG
Memento HTTP Flow: Notes (2)
• Link header pointing at URI-G only essential when server of URI-R has its special-purpose of preferred archive(s) for its own resources.
• If Link is missing, HTTP client can resort to its preferred TimeGates.
72
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: Notes (3)
Three aspects of the Memento HTTP flow ensure maximal leverage of established Web caching infrastructure:
• URI-R separate from URI-G: Coinciding URI-R and URI-G would cause problems because caching regimes for regular resources and negotiable resources are different.
• Link header pointing from URI-R to URI-G (as opposed to HTTP 302 used in prior Memento design) allows for caching of responses from URI-R (many thanks to Erik Hetzner of California Digital Library).
• URI-G separate from URI-M: Allows for caching of Mementos.
73
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Two Memento HTTP Navigations
74
Scenario 1
• cnn.com includes Link to TimeGate at Internet Archive • URI-R on one server, URI-G & URI-M on another
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: URI-R
HEAD R, Accept-Datetime
HEAD http://cnn.com/ HTTP/1.1 Host: cnn.com Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
76
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: Success – URI-R
LinkG
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://web.archive.org/web/timegate/http://cnn.com>; rel="timegate" Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1
78
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET G, Accept-Datetime
Memento HTTP Flow: URI-G
GET http://web.archive.org/web/timegate/http://cnn.com HTTP/1.1 Host: web.archive.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
80
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success – URI-G
302M, Vary, LinkR,B,M
HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://web.archive.org/web/20010911203610/http://www.cnn.com Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle", <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel="first- memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel="last-memento"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="prev-memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="next-memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT" Content-Length: 0 Connection: close Content-Type: text/plain; charset=UTF-8
82
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET M, Accept-Datetime
Memento HTTP Flow: URI-M
GET http://web.archive.org/web/20010911203610/http://www.cnn.com HTTP/1.1 Host: web.archive.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
84
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success – URI-M
200, Content-Datetime, LinkR,B,M
HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Archive-Orig-Accept-Ranges: bytes … Content-Type: text/html;charset=utf-8 Content-Length: 23364 Date: Thu, 21 Jan 2010 00:09:40 GMT Content-Datetime: Tue, 11 Sep 2001 20:36:10 GMT Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle", <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel="first-memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel="last-memento"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="prev-memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="next-memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT" Connection: close
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Two Memento HTTP Navigations
87
Scenario 2
• wikipedia.org natively supports Memento • URI-R, URI-G & URI-M on one server
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: URI-R
HEAD R, Accept-Datetime
HEAD /wiki/DJ_Shadow HTTP/1.1 Host: en.wikipedia.org Accept-Datetime: Thu, 05 Nov 2009 00:00:00 GMT Connection: close
89
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: Success – URI-R
LinkG
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://en.wikipedia.org/Special:TimeGate/http://en.wikipedia.org/wiki/DJ_Shadow>; rel="timegate" Content-Length: 1462 Connection: close Content-Type: text/plain; charset=UTF-8
91
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET G, Accept-Datetime
Memento HTTP Flow: URI-G
GET /Special:TimeGate/http://en.wikipedia.org/wiki/DJ_Shadow HTTP/1.1 Host: en.wikipedia.org Accept-Datetime: Thu, 05 Nov 2009 00:00:00 GMT Connection: close
93
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success – URI-G
302M, Vary, TCN, LinkR,B,M
HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=324178040 Link: <http://en.wikipedia.org/Special:TimeBundle/http://en.wikipedia.org/wiki/DJ_Shadow>; rel="timebundle", <http://en.wikipedia.org/wiki/DJ_Shadow>; rel="original", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=1493688>; rel="first-memento"; datetime="Sun, 28 Sep 2003 01:42:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=337446696>; rel="last-memento"; datetime="Tue, 12 Jan 2010 19:55:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=322586071>; rel="prev-memento"; datetime="Wed, 28 Oct 2009 14:307:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=326164283> ; rel="next-memento"; datetime="Thu, 26 Nov 2009 23:50:00 GMT" Content-Length: 0 Connection: close Content-Type: text/plain; charset=UTF-8
95
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET M, Accept-Datetime
Memento HTTP Flow: URI-M
GET /w/index.php?title=DJ_Shadow&oldid=324178040 HTTP/1.1 Host: en.wikipedia.org Accept-Datetime: Thu, 05 Nov 2009 00:00:00 GMT Connection: close
97
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success – URI-M
HTTP/1.1 200 OK Server: Apache Content-Length: 82705 Content-Type: text/html; charset=utf-8 Date: Thu, 21 Jan 2010 00:09:40 GMT Content-Datetime: Thu, 05 Nov 2009 23:41:00 GMT Link: <http://en.wikipedia.org/Special:TimeBundle/http://en.wikipedia.org/wiki/DJ_Shadow>; rel="timebundle", <http://en.wikipedia.org/wiki/DJ_Shadow>; rel="original", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=1493688>; rel="first-memento"; datetime="Sun, 28 Sep 2003 01:42:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=337446696>; rel="last-memento"; datetime="Tue, 12 Jan 2010 19:55:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=322586071>; rel="prev-memento"; datetime="Wed, 28 Oct 2009 14:307:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=326164283> ; rel="next-memento"; datetime="Thu, 26 Nov 2009 23:50:00 GMT" Connection: close
200, Content-Datetime, LinkR,B,M
99
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Navigations involving codes other than 200, 302
100
300 Multiple Choices
HTTP/1.1 300 Multiple Choices Server: Apache Content-Length: 705 Content-Type: text/html; charset=utf-8 Date: Thu, 21 Jan 2010 00:09:40 GMT TCN: list Vary: negotiate, accept-datetime Link: <http://en.wikipedia.org/Special:TimeBundle/http://en.wikipedia.org/wiki/DJ_Shadow>; rel="timebundle", <http://en.wikipedia.org/wiki/DJ_Shadow>; rel="original", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=1493688>; rel="first-memento"; datetime="Sun, 28 Sep 2003 01:42:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=337446696>; rel="last-memento"; datetime="Tue, 12 Jan 2010 19:55:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=322586071>; rel=”memento"; datetime="Sun, 31 May 2009 15:43:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=326164283> ; rel=”memento"; datetime="Sun, 31 May 2009 15:43:00 GMT" Connection: close
• Two scenarios that generate a 300 at the TimeGate: – A client requests a 300 using the "Negotiate: 1.0" request header – An archive has two or more Mementos with the same Datetime (HTTP
only supports second-level granularity)
101
406 Not Acceptable
• A client request for a Memento with a datetime outside the first and last values will generate a 406
• For example a request in Wikipedia with: Accept-DateTime: Mon, 31 May 1999 00:00:00 GMT
HTTP/1.1 406 Not Acceptable Server: Apache Content-Length: 709 Content-Type: text/html; charset=utf-8 Date: Thu, 21 Jan 2010 00:09:40 GMT Vary: negotiate, accept-datetime TCN: list Link: <http://en.wikipedia.org/Special:TimeBundle/http://en.wikipedia.org/wiki/DJ_Shadow>; rel="timebundle", <http://en.wikipedia.org/wiki/DJ_Shadow>; rel="original", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=1493688>; rel="first-memento"; datetime="Sun, 28 Sep 2003 01:42:00 GMT", <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=337446696>; rel="last-memento"; datetime="Tue, 12 Jan 2010 19:55:00 GMT", Connection: close 102
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Web with Time Dimension added by Memento
103
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How does Memento do This?
There are two components to the Memento Solution:
• Component 1: Navigation towards an archived resource via its original resource, by leveraging content negotiation.
• Component 2: A discovery API for archives that allows requesting a list of all archived versions it holds for a resource with a given URI.
Done
104
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How does Memento do This?
• Component 2: A discovery API for archives that allows requesting a list of all archived versions it holds for a resource with a given URI.
105
• Mementos for any given URI-R are distributed across archives.
• In order to get a correct perspective of available Mementos, different archives need to be consulted.
• Can do so in distributed consultation mode (slooow), or by consulting an aggregator.
Why an API?
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Terminology Intermission
We introduce the term TimeBundle to refer to a resource via which an overview of all Mementos for an original resource URI-R is available.
A TimeBundle for a resource URI-R, is a resource URI-B[URI-R] that is an aggregation of:
(a) All Mementos URI-Mi [URI-R@ti] available from an archive,
(b) The archive's TimeGate URI-G for URI-R, (c) The original resource URI-R itself.
107
Memento: Time Travel for the Web
Updated Technical Details (03/2010) 108
URI-R
URI-
M1
URI-
M3
URI-
M2
MementoOriginal
resource
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
URI-R
URI-
M1
URI-
M3
URI-
M2URI-G
MementoOriginal
resourceTimeGate
DT-conneg
HTTP LINK HEADERtimegate
Memento DT-conneg component 109
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
URI-R URI-B
URI-
M1
URI-
M3
URI-
M2
ore:aggregates
ore:aggregates
ore:aggregates
ore:aggregates
URI-G
ore:aggregates
MementoOriginal
resourceTimeGate TimeBundle
Memento DT-conneg component 110
See OAI-ORE: http://www.openarchives.org/ore/1.0/toc/
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento DT-conneg component Memento discovery component 111
URI-R URI-B
URI-
M1
URI-
M3
URI-
M2URI-T
TimeMap
HTTP303
ore:aggregates
ore:aggregates
ore:aggregates
ore:aggregates
HTTP LINK HEADERtimebundle
URI-G
ore:aggregates
MementoOriginal
resourceTimeGate TimeBundle TimeMap
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Aggregator: A Service using the TimeBundle API
112
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
TimeBundle API: For Discovery, Cross-Archive Services
• Archive uses common approaches to make TimeBundles/TimeMaps discoverable:
o SiteMaps, o Atom Feeds, o OAI-PMH.
• Aggregator harvests and merges TimeMaps. Based on this information, the Aggregator exposes its own TimeGates.
o Cross-archive o Finer datetime granularity o Better chances of matching a client’s datetime preference. o Can become a shared target for redirection for many web
servers.
113
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Aggregator Conceptualization
114
Aggregator using TimeBundle API
TimeMap
TimeMap
TimeMapbrowser
Originalserver
Aggregator
Mementoserver
Accept-Datetime
HTTP LINK
rel="timegate"
datetimecontent
negotiation
HTTP LINK rel="original"
HTTP Link rel="timegate"
Content-Datetime
TimeMapservers
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Two Memento HTTP Navigations involving an Aggregator
115
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
A Memento HTTP Navigation involving an Aggregator
116
Scenario 3
• www.digitalpreservation.gov points at TimeGate provided by an Aggregator • URI-R, URI-G, URI-M on different servers
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
LinkG
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: URI-R
HEAD R, Accept-Datetime
HEAD / HTTP/1.1 Host: www.digitalpreservation.gov Accept-Datetime: Sat, 10 Oct 2009 00:00:00 GMT Connection: close
118
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
LinkG
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: Success – URI-R
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://mementoproxy.lanl.gov/aggr/timegate/http://www.digitalpreservation.gov/> ; rel="timegate" Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1
120
LinkG
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
LinkG
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET G, Accept-Datetime
Memento HTTP Flow: URI-G
GET /aggr/timegate/http://www.digitalpreservation.gov/ HTTP/1.1 Host: mementoproxy.lanl.gov Accept-Datetime: Sat, 10 Oct 2009 00:00:00 GMT Connection: close
122
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
LinkG
Memento HTTP Flow: Success – URI-G
302M, Vary, LinkR,B,M
HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://wayback.archive-it.org/1610/20090928171405/http:// www.digitalpreservation.gov/ Link: <http://www.digitalpreservation.gov/>; rel="original", <http://mementoproxy.lanl.gov/aggr/timebundle/http://www.digitalpreservation.gov/>; rel="timebundle", <http://wayback.archive -it.org/256/20051108162921/http://www.digitalpreservation.gov/>; rel="first-memento"; datetime="Tue, 08 Nov 2005 00:00:00 GMT", <http://webcitation.org/query?id=1257028234035091>; rel="next-memento"; datetime="Sat, 31 Oct 2009 18:30:35 GMT", <http://webcitation.org/query?id=1213058061345794>; rel="prev-memento"; datetime="Mon, 09 Jun 2008 20:34:23 GMT", <http://wayback.archive -it.org/256/20100120102000/http://www.digitalpreservation.gov/>; rel="last-memento"; datetime="Wed, 20 Jan 2010 10:20:00 GMT" Content-Length: 0 Connection: close
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
LinkG
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET M, Accept-Datetime
Memento HTTP Flow: URI-M
GET /1610/20090928171405/http://www.digitalpreservation.gov/ HTTP/1.1 Host: wayback.archive-it.org Accept-Datetime: Sat, 10 Oct 2009 00:00:00 GMT Connection: close
126
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
LinkG
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: Success – URI-M
200, Content-Datetime, LinkR,B,M
HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Archive-Orig-Accept-Ranges: bytes … Content-Type: text/html;charset=utf-8 Content-Length: 23364 Date: Thu, 21 Jan 2010 00:09:40 GMT Content-Datetime: Mon, 28 Sep 2009 17:14:05 GMT Link: <http://www.digitalpreservation.gov/>; rel="original", <http://wayback.archive-it.org/web/timebundle/http://www.digitalpreservation.gov/>; rel="timebundle", <http://wayback.archive -it.org/256/20051108162921/http://www.digitalpreservation.gov/>; rel="first-memento"; datetime="Tue, 08 Nov 2005 00:00:00 GMT", <http://wayback.archive -it.org/256/20100120102000/http://www.digitalpreservation.gov/>; rel="last-memento"; datetime="Wed, 20 Jan 2010 10:20:00 GMT" Connection: close
128
Link header values are local to wayback.archive-it.org and different than those provided by URI-G
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
A Memento HTTP Navigation involving an Aggregator
129
Scenario 4
• cnn.com does not include a Link to a TimeGate • client takes control by directly contacting its preferred Aggregator • URI-R, URI-G, URI-M on different servers
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
200
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: URI-R
HEAD R, Accept-Datetime
HEAD http://cnn.com/ HTTP/1.1 Host: cnn.com Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
131
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
200
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento HTTP Flow: No Success – URI-R
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1
133
200
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
200
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET G, Accept-Datetime
Memento HTTP Flow: URI-G
GET /aggr/timegate/http://cnn.com/ HTTP/1.1 Host: mementoproxy.lanl.gov Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
135
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
200
Memento HTTP Flow: Success – URI-G
302M, Vary, LinkR,B,M
HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://web.archive.org/web/20010911203610/http://www.cnn.com Link: <http://cnn.com/>; rel="original", <http://mementoproxy.lanl.gov/aggr/timebundle/http://cnn.com/>; rel="timebundle", <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel="first-memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://archive-it.org/2245/20100310102000/http://www.cnn.com>; rel="last-memento"; datetime="Wed, 10 Mar 2010 10:20:00 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="prev-memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="next-memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT" Content-Length: 0 Connection: close Content-Type: text/plain; charset=UTF-8
137
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
200
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
GET M, Accept-Datetime
Memento HTTP Flow: URI-M
GET http://web.archive.org/web/20010911203610/http://www.cnn.com HTTP/1.1 Host: web.archive.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
139
Memento HTTP Flow
HEAD R, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
200
Memento HTTP Flow: Success – URI-M
200, Content-Datetime, LinkR,B,M
HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Archive-Orig-Accept-Ranges: bytes … Content-Type: text/html;charset=utf-8 Content-Length: 23364 Date: Thu, 21 Jan 2010 00:09:40 GMT Content-Datetime: Tue, 11 Sep 2001 20:36:10 GMT Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle", <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel="first-memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel="last-memento"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="prev-memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel="next-memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT" Connection: close
141
note: Link header values are local to web.archive.org and different than those provided by URI-G
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
The Memento Profile of OAI-ORE
142
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento Profile of OAI-ORE
143
URI-BURI-
M1
URI-G
URI-R
URI-T
Original
resource
TimeGate
TimeBundleTimeMap
Memento
ore:describes
ore:aggregates
ore:aggregates
ore:aggregates
URI-
M2
ore:aggregates
Memento
ORE Aggregated Resource
ORE Aggregated Resource
ORE Aggregated Resource
ORE Aggregated Resource
ORE Aggregation ORE Resource Map
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
• Coverage Period the TimeGate has Mementos for Predicate: mem:covers Object: mem:TimeSpan
• Which Original Resource it is a TimeGate for Predicate: mem:timeGateFor Object: mem:OriginalResource
• TimeSpan Class: • mem:start datestamp • mem:end datestamp
Information in the TimeMap about the TimeGate
144
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Information in the TimeMap about the TimeGate
145
URI-BURI-
M1
URI-G
URI-R
URI-T
ore:describes
ore:aggregates
ore:aggregates
ore:aggregates
mem:timeGateFor
mem:covers
datetime
datetime
mem:start
mem:end
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
• The representation's mime-type: dc:format
• The representation's byte size: dc:extent
• Over what period was the representation observed as the current representation of the Original Resource::
Predicate: mem:observedOver Object: mem:TimeSpan
• Or over what period was the representation known to be active as the current representation of the Original Resource:
Predicate: mem:validOver Object: mem:TimeSpan
• The number of observations of the representation in the given time period: mem:observations integer
• Any further information that is available about Mementos
Information in the TimeMap about Mementos
146
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Information in the TimeMap about Mementos
147
URI-BURI-
M1
URI-G
URI-R
URI-T
ore:describes
ore:aggregates
ore:aggregates
ore:aggregates
datetime
datetime
mem:start
mem:end
mem:observedOver
integermem:observations
mem:mementoFor
string
integer
dc:format
dc:extent
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
• TimeMap SubClassOf ore:ResourceMap
• TimeBundle SubClassOf ore:Aggregation
• OriginalResource SubClassOf gen:TimeGenericResource
• TimeGate SubClassOf irw:WebResource
• Memento SubClassOf gen:TimeSpecificResource
• TimeSpan Typed Blank Node
Memento Ontology: Classes
148
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
• timeGateFor subject: TimeGate object: OriginalResource
• hasTimeGate subject: OriginalResource object: TimeGate
• mementoFor subject: Memento object: OriginalResource
• covers subject: TimeGate object: TimeSpan
• observedOver subject: Memento object: TimeSpan
• validOver subject: Memento object: TimeSpan
Memento Ontology: Relationships
149
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
• start subject: TimeSpan type: datestamp
• end subject: TimeSpan type: datestamp
• observations subject: TimeSpan type: integer
Memento Ontology: Properties
150
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento - mem: http://www.mementoweb.org/terms/tb/
Generic/Specific Resource – gen: http://www.w3.org/2006/gen/ont
Identity of Resource on the Web – irw: http://ontologydesignpatterns.org/ont/web/irw.owl
Object Reuse & Exchange – ore: http://www.openarchives.org/ore/terms/
Dublin Core – dc: http://purl.org/dc/elements/1.1/
Ontologies
151
<rdf:RDF> <mem:TimeMap rdf:about="http://mementoproxy.lanl.gov/ia/timemap/rdf/http://www.archive.org/"> <ore:describes> <mem:TimeBundle rdf:about="http://mementoproxy.lanl.gov/ia/timebundle/http://www.archive.org/"> <ore:aggregates> <mem:OriginalResource rdf:about="http://www.archive.org/"/> </ore:aggregates> <ore:aggregates> <mem:TimeGate rdf:about="http://mementoproxy.lanl.gov/ia/timegate/http://www.archive.org/"> <mem:timeGateFor rdf:resource="http://www.archive.org/"/> <mem:covers> <mem:TimeSpan> <mem:start>1999-10-11T06:44:03 GMT</mem:start> <mem:end>2008-07-19T08:52:28 GMT</mem:end> </mem:TimeSpan> </mem:covers> </mem:TimeGate> </ore:aggregates> <ore:aggregates rdf:resource="http://web.archive.org/web/19971011064403/http://www.archive.org/"/>
…
152
Sample TimeMap in RDF/XML
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
How does Memento do This?
There are two components to the Memento Solution:
• Component 1: Navigation towards an archived resource via its original resource, by leveraging content negotiation.
• Component 2: A discovery API for archives that allows requesting a list of all archived versions it holds for a resource with a given URI.
Done
Done
153
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento in a non-Memento Web …
154
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Client Development
• Browser plug-in must use different HTTP request/response cycle when in time travel mode.
• Browser plug-in needs to be more or less aware of its state in the chain Original Resource => TimeGate => Memento
• Interesting user interface challenges, among others related to time-distribution of Mementos embedded in encompassing Memento, e.g. images in HTML page.
155
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Web Archives
• Web Archives rewrite URLs in archived pages, in order to avoid:
o Serving current representations of embedded resources; o Linking to current representations of resources
• The upside: Archived pages are self-contained.
• The downside: o Cannot navigate beyond the archive’s content; o Other servers (archives or original) may have appropriate
version of (missing) embedded or linked resource.
• Memento does not require URL-rewriting.
156
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
BBC home page from Internet Archive; URL rewriting
157
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Embedded & linked resources still available at BBC site
158
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
BBC home page from Internet Archive; no URL-rewriting
159
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Standardization
• Link relationships original, timegate, timebundle, memento, first-memento, last-memento, prev-memento, next-memento will be registered as per upcoming HTTP Link header RFC.
• Headers Accept-Datetime, Content-Datetime will be registered in the provisional registry for Message Header Fields as per RFC 3864.
• A process will be initiated aimed at publishing an RFC specifying the Memento solution.
• TimeGate URI syntax convention for Web archives will be discussed in the context of International Internet Preservation Consortium.
160
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance …
161
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance: Hyping
162
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance: Discussing
http://efoundations.typepad.com/efoundations/2009/11/memento-and-
negotiating-on-time.html
Is the Memento proposal OK from the perspective of the Architecture of the World
Wide Web, and REST?
Resulted in some adjustments to the proposed framework.
163
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance: Discussing
http://www.mementoweb.org/events/IA201002/
Two day discussion with Internet Archive, California Digital Library, LoCKSS, Library
of Congress about Memento.
Resulted in some adjustments to the proposed framework.
Resulted in commitment by all represented archives to deploy experimental Memento
support.
164
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance: Discussing
http://groups.google.com/group/memento-dev
Discussion list regarding Memento
Resulted in some adjustments to the proposed framework.
165
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance: Thinking
http://www.open.ac.uk/blogs/telstar/2009/11/24/the-when-of-the-web/
Use Memento to go from formal citation of Web page (URL, datetime) to
appropriate archival version of the Web page.
166
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance: Developing
• Apache plug-in rule that adds Link header to TimeGate on HTTP HEAD/GET requests to Original Resources. • http://mementoweb.org/tools/apache
• Memento plug-in for the MediaWiki platform. o http://www.mediawiki.org/wiki/Extension:Memento o Working towards community acceptance. o Started similar process for Drupal.
167
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance: Developing
• Created demonstration client: o FireFox plug-in: http://www.mementoweb.org/tools/ (many
thanks to Sam Adams, Cambridge University) o Currently working on plug-in for Internet Explorer
168
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Towards Acceptance: Ongoing
• JISC funded developer contest for the creation of Memento-based prototype demonstrators: http://wiki.2010.dev8d.org/w/Talk_5
• The Memento Team will engage in a funded, collaborative agreement with the Library of Congress for:
o Standardization (I-D => RFC). o Outreach. o Research. o Software development.
169
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Some Interesting Consequences of Memento …
170
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Resource Persistence (1)
• URI-R vanishes, but the server that used to serve it is still operational:
o The server should still include a HTTP Link header with rel="timegate" pointing at a TimeGate for URI-R, irrespective of whether the client issues a DT-conneg request or not. This allows seamless access to a Memento of URI-R, even if the server no longer hosts the original.
o If the server does not include a HTTP Link header pointing at a TimeGate for URI-R, the client resorts to interaction with archives (or with an Aggregator) and arrives at the most recent Memento of the resource.
171
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Resource Persistence (2)
• A domain vanishes:
o The client is looking for a current representation of URI-R that was hosted by the domain, but fails.
o The client resorts to interaction with archives (or with an Aggregator) and arrives at the most recent Memento of the resource.
172
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Resource Persistence (3)
• A domain is taken over by a new custodian:
o The new custodian adheres to other policies regarding which archive to redirect a DT-conneg request.
o The client understands from the first-memento/last-memento HTTP Link header of that archive of choice, that it does not cover the time range in which the previous custodian operated the domain.
o The client resorts to interaction with other archives (or with an Aggregator) and arrives at an appropriate Memento.
173
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Resource Versioning (1)
Resource state may evolve over time. Requiring a URI owner to publish a new URI for each change in resource state would lead to a significant number of broken references. For robustness, Web architecture promotes independence between an identifier and the state of the identified resource.
From: The Architecture of the World Wide Web, http://www.w3.org/TR/webarch/
174
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Resource Versioning (2)
But there are many use cases that require resource versioning, including community-based content creation, scientific communication, open government, etc.
How does resource versioning work currently? How can it work in a Memento Web?
175
Paper at http://arxiv.org/abs/1003.3661
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Current Resource Versioning (3)
• Version Identification: HTTP URIs
• Versioning Strategy: new URI for new version
• Version Relationships: RDF or HTTP Links
• Version Datetime: RDF
176
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Resource Versioning (4)
177
• Version Identification: HTTP URIs
• Versioning Strategy: stable , cool URI for "current" version; new URI for old version
• Version Relationships: HTTP Links, HTTP datetime conneg
• Version Datetime: HTTP Content-Datetime response header
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Non-Information Resources (1)
URI-R
Currentdescriptionof URI-R
HTTP303 URI-S
Non-Information
resource
Information
resource
178
Paper at http://arxiv.org/abs/1003.3661
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Non-Information Resources (2)
URI-R
Currentdescriptionof URI-R
HTTP303
URI-G
HTTP LINK HEADERtimegate
URI-M
URI-S
DT-conneg
PastDescription
of URI-R
Memento
Non-Information
resource
TimeGate
Information
resource
179
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Non-Information Resources (3)
• DBpedia subject URI e.g. http://dbpedia.org/resource/France leads to current description of the subject.
• Archive of prior descriptions of DBpedia subject URIs implemented at LANL:
o Archive exposes a TimeGate per DBpedia subject URI e.g. http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/resource/France
o These TimeGates support DT-conneg. • DBpedia provides HTTP Link header with rel="timegate"
pointing at these TimeGates. • Result: Clients can access current and prior descriptions using
"follow your nose" HTTP navigation. • Proof-of-Concept: "follow your nose" time-series analysis across
DBpedia versions.
180
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Non-Information Resources (4)
181
Time-Series analysis across DBpedia versions
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento and Time-Persistent Web Annotations
182
Paper at http://arxiv.org/abs/1003.2643
Memento: Time Travel for the Web
Updated Technical Details (03/2010)
Memento wants to make navigating the Web’s Past Easy
183
http://www.mementoweb.org http://groups.google.com/group/memento-dev