29
The Design Of A Web Document Snapshots Delivery System David Chao College of Business San Francisco State University

The Design Of A Web Document Snapshots Delivery System

Embed Size (px)

DESCRIPTION

The Design Of A Web Document Snapshots Delivery System. David Chao College of Business San Francisco State University. What Is A Web Document Snapshot?. A web document snapshot is the state of a web document at a point in time (snaptime). Applications Of Web Document Snapshots. - PowerPoint PPT Presentation

Citation preview

Page 1: The Design Of A Web Document Snapshots Delivery System

The Design Of A Web Document Snapshots Delivery System

David Chao

College of Business

San Francisco State University

Page 2: The Design Of A Web Document Snapshots Delivery System

What Is A Web Document Snapshot?

• A web document snapshot is the state of a web document at a point in time (snaptime).

Page 3: The Design Of A Web Document Snapshots Delivery System

Applications Of Web Document Snapshots

• It enables an organization to audit a web document’s contents in the past.

• Perform business analyses with historical information recorded in it.

• It is also an archived copy of a web document when it is changed.

Page 4: The Design Of A Web Document Snapshots Delivery System

The State Of A Web Document At The Snaptime

• The code that creates the contents

• The rendering of the web document as displayed with a browser:– Dynamic web documents

Page 5: The Design Of A Web Document Snapshots Delivery System

Factors Affecting The State Of A Web Document

• Web document code • The state of internal resources it references:

– Internal resources are files managed by a web site and are available in creating the web site’s contents.

– Images, style sheet, components, script files, databases, etc.

• The state of external resources it references:– External resources are files not managed by the web site

but can be referenced in creating the web site’s contents.

• Web site host environment variables:– System clock

Page 6: The Design Of A Web Document Snapshots Delivery System

Four Levels Of Web Document Snapshot

• Level 1 snapshot: A web document snapshot is the state of web document code at snaptime. – Creating level 1 snapshot enables a web site to trace the

changes to the web document code over time.

• Level 2 snapshot: A level 2 snapshot is a level 1 snapshot with the additional requirement that all the internal resources it references are at least level 1 snapshots at the same snaptime. – Referencing database snapshots

Page 7: The Design Of A Web Document Snapshots Delivery System

• Level 3 snapshot: A level 3 snapshot is a level 2 snapshot with the additional requirement that all the external resources it references are at least level 2 snapshots at the snaptime.

• Level 4 snapshot: A level 4 snapshot is a level 3 snapshot with the additional requirement that all the web site host’s environment variables are reset to their values at snaptime.

Page 8: The Design Of A Web Document Snapshots Delivery System

Objective Of This Research

• The level 3 and level 4 snapshots involve resources that are not managed by the web site and it’s difficult, if not impossible, for the web site to keep track of changes to these resources. This research develops a web document snapshot management system to deliver web documents’ level 1 and level 2 snapshots.

Page 9: The Design Of A Web Document Snapshots Delivery System

The Design Of The Web Document Snapshot Delivery System

• The system consists of two components:– Database Snapshot Manager for maintaining

database snapshots– Web Document Snapshot Manager for

maintaining web document snapshots.

• The system is designed to deliver snapshots with any snaptime requirement, and snapshots are created only when requested.

Page 10: The Design Of A Web Document Snapshots Delivery System

Database Snapshot Manager

• The objective of this module is to provide a database snapshot at any snaptime requested by users.

• This requires recording all updates in a log. The log uses time stamp to record update time, and use flags to indicate deletions and insertions where a modification is treated as the deletion of the old version followed by an insertion of the new version.

Page 11: The Design Of A Web Document Snapshots Delivery System

Database Snapshot Management

• Defining snapshots:CREATE SNAPSHOT snapshotname

AS query

AS OF snaptime

• Refreshing snapshots:REFRESH SNAPSHOT snapshotname

AS OF new snaptime

Page 12: The Design Of A Web Document Snapshots Delivery System

Web Document Snapshot Manager

• The objective of the Web Document Snapshot Manager is to generate level 2 snapshots for all internal non-database files.

Page 13: The Design Of A Web Document Snapshots Delivery System

The M:M Relationship Between A URL And A Web Documdent

URLx

D1 D11 D12

D2

D31 D32

D33

URLA

URLB

Figure 1: Relationship between URLs and web documents.

a b.

Page 14: The Design Of A Web Document Snapshots Delivery System

Historical Links

• The historical links of a web site include the URLs invalidated due to:– web site reorganization– document removal, renaming or relocation

• and links to document snapshots:– document’s contents as of a specific point in

time.

Page 15: The Design Of A Web Document Snapshots Delivery System

HP0 HP1 HP2 HP3 HP4 P1 P2 P3 P3 P5 P3 P4 P5 P1 P4 P4 P4 P8 P2 P2 P3 P3 P3 P3 P3 P5 P5 P5 P7 P6 P6 P1 P1 VP0 VP1 VP2 VP3 VP4

T1 T2 T3 T4 T0

Figure 2: An example of the progression of a web site with changes since its initiation.

Time

Historical links

Current links

Page 16: The Design Of A Web Document Snapshots Delivery System

Logging Scheme

• The log, named TemporalURLLog, is designed to keep the history of changes to web documents. It has four fields: – URL: document’s URL– PublishDate: document publish time– ExpireDate: document URL expire date– NewURL: document’s new URL if any

Page 17: The Design Of A Web Document Snapshots Delivery System

TemporalURLLog Maintenance Algorithm • New document: An entry is entered with its URL and PublishDate;

ExpireDate and NewURL are null.

• Deleted document: The ExpireDate of the document’s entry is changed to its deletion time.

• Modified document: The ExpireDate of the document’s entry is changed to its modification time and a new entry is entered with its URL and modification time as PublishDate; ExpireDate and NewURL are null.

• Renamed document: The ExpireDate of the document’s entry is changed to the time it is renamed and the NewURL is changed to its new URL. Then, it adds a new entry with its new URL and the PublishDate is set to the time the document is renamed.

Page 18: The Design Of A Web Document Snapshots Delivery System

URL PublishDate ExpireDate NewURL P1 T0 T1 P4 P2 T0 T2 Null P3 T0 T2 Null P4 T1 T4 P8 P5 T1 T3 Null P3 T2 T3 Null P3 T3 T4 Null P5 T3 T4 P7 P6 T3 Null Null P1 T3 Null Null P8 T4 Null Null P7 T4 Null Null

Figure 3: The contents of TemporalURLLog based on changes described in Figure 2.

Page 19: The Design Of A Web Document Snapshots Delivery System

Archiving Scheme

• Deleted document: The deleted document in the Archive with URL + PublishDate as file name.

• Modified document: The old version is saved in the Archive with URL + PublishDate as file name.

Page 20: The Design Of A Web Document Snapshots Delivery System

URL PublishDate ExpireDate NewURL P1 T0 T1 P4 P2 T0 T2 Null P3 T0 T2 Null P4 T1 T4 P8 P5 T1 T3 Null P3 T2 T3 Null P3 T3 T4 Null P5 T3 T4 P7 P6 T3 Null Null P1 T3 Null Null P8 T4 Null Null P7 T4 Null Null

Figure 3: The contents of TemporalURLLog based on changes described in Figure 2.

1. A URL P2 valid between T0 and T1 is deleted

2. A URL P3 has been modified repeatedly and is eventually deleted.

3. An old URL P5 is now renamed to P7. It has been modified at T3

4. The log is able to determine that a historical link P1 is now renamed to P8.

5. A URL P12 has never existed in the web site.

With the scheme we can determine:

Page 21: The Design Of A Web Document Snapshots Delivery System

Patterns of Log Entries for a URL

• 1. If a URL has a log entry with a non-null PublishDate and null ExpireDate field then it is a current URL; such as P6 in figure 3.

• 2. If all entries of a URL have a non-null PublishDate, ExpireDate and null NewURL field, then this URL is deleted from the web site; such as P3 in Figure 3.

• 3. If a URL has a log entry with a non-null NewURL field, then it has been renamed, and the log entries for the new URL may again have these three patterns of changes.

Page 22: The Design Of A Web Document Snapshots Delivery System

Backward Search For A Document’s Snapshot At A Specific Time

• This algorithm processes log entries backward starting from a document’s current entry to trace back its changes in order to locate the snapshot at time T. If the current URL’s PublishDate is less than T, then the current document itself is its snapshot at T. Otherwise the backward search starts.

Page 23: The Design Of A Web Document Snapshots Delivery System

An entry’s predecessor has one of the following properties:

• 1. If the entry has a null NewURL then its successor must have the same URL and the successor’s PublishDate must equal the entry’s ExpireDate. If no such successor is found, then this entry must have been generated due to a deletion.

•  2. If the entry has a non-null NewURL then it must have a successor with a URL equal to the NewURL and the PublishDate equals to the entry’s ExpireDate. Renaming or relocation must have generated the successor.

Page 24: The Design Of A Web Document Snapshots Delivery System

Retrieving Web Document P7’s

Snapshot at time=T2 • Entries processed:

– (P5, T1, T3, Null)

– (P5, T3, T4, P7)

– (P7, T4, Null, Null)

• Document retrieved:– Archive(P5 + T1)

Page 25: The Design Of A Web Document Snapshots Delivery System

Retrieving Web Document P8’s Snapshot at time =T0

• Entries processed:– (P1, T0, T1, P4)

– (P4, T1, T4, P8)

– (P8, T4, Null, Null)

– P8 has been renamed at T1 and T4. It has not been modified since T0.

• Document retrieved:– P8

Page 26: The Design Of A Web Document Snapshots Delivery System

Requesting Web Document Snapshot with Temporal URL

• A temporal URL is a URL submitted with temporal requirements of which the documents associated with the URL must meet.

• Entering temporal requirements with QueryString• Example:

– URL?SnapshotAsOf=date

Page 27: The Design Of A Web Document Snapshots Delivery System

Requesting Web Document Snapshot With A Web Service

• A web service is an application logic accessible via the Internet.

• The snapshot-retrieving algorithm can be implemented as a web service that takes URL and snaptime as its inputs and returns the document snapshot as output.

Page 28: The Design Of A Web Document Snapshots Delivery System

Creating A Web Document Snapshots Management Site

• This is a site designed to manage web document snapshots and handle the requests for snapshot. Its objectives are: – (1) maintaining web document snapshots.– (2) providing interface for users to enter request

for snapshots.– (3) educating users about the web document

snapshot systems.

Page 29: The Design Of A Web Document Snapshots Delivery System

Summary

• This paper has two contributions: – 1. It presents an analysis for defining four

levels of snapshots for a web document. – 2. It presents a design of the web document

snapshot management system that is capable of dynamically creating web document snapshots.

• Future research:– Designing a web document snapshots

management site.