Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos

  • View
    641

  • Download
    0

  • Category

    Science

Preview:

DESCRIPTION

Slides presented by Justin F. Brunelle at Digital Preservation 2014 in London.

Citation preview

Not All Mementos are Created Equal: Measuring the Impact of Missing

ResourcesJustin F. Brunelle, Mat Kelly, Hany SalahEldeen, Michele C. Weigle,

Michael L. Nelson

Old Dominion University

{jbrunelle, mkelly, hany, mweigle, mln}@cs.odu.edu

1

Goal: Automatically measure the quality of the archives

2

20% missing

Goal: Automatically measure the quality of the archives

3

14% missing

Goal: Automatically measure the quality of the archives

4

28% missing

Goal: Automatically measure the quality of the archives

5

7% missing

“Live” XKCD

• Missing 17% of embedded resources

• Looks complete

6

“Live” XKCD

• Take three resources:• Logo

• Main Comic

• Navigation Strip

• Relative importance?

• All present in “Live” XKCD

7

Damaging XKCD

• Created a local memento

• Removed the logo and navigation strip

• Now missing 29% of embedded resources

• Human assessment: looks OK

8

Damaging XKCD

• From our local memento

• Removed the Main Comic

• Now missing 24% of embedded resources

• Human assessment: Not a usable memento

9

Damaging XKCD

• From our local memento

• Removed the Main Comic

• Now missing 24% of embedded resources

• Human assessment: Not a usable memento

• Percent of missing embedded resources is not a suitable metric for memento quality

10

Image Importance

• Size (as percentage of all pixels)

11

Image Importance

• Size

• Position (in viewport?)

12

Image Importance

• Size

• Position

• Centrality (in the vertical or horizontal center?)

13

Missing CSS

• Damage not limited to images

• When missing CSS, content shifts left

14

Missing CSS

• Partitioned snapshot into thirds

• Background color determined

• Pixel-by-pixel comparison

15

Missing CSS

• Calculated the amount of content in each vertical third

• If >=80% in left column and missing CSS, CSS is important

• Only performed if stylesheets are missing

16

Percent Missing vs. Weighted Damage

• 𝑀𝑀 = Percent of embedded resources missing

𝑀𝑀 =𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 𝑀𝑖𝑠𝑠𝑖𝑛𝑔

𝑇𝑜𝑡𝑎𝑙 𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠

• 𝐷𝑀 = Damage rating of missing embedded resources

𝐷𝑀 =𝐷𝑀𝐴𝑐𝑡𝑢𝑎𝑙𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙

𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 = 𝑖=1

𝑛[𝐼|𝑀𝑀]𝐷[𝐼|𝑀𝑀] (𝑖)

𝑛[𝐼|𝑀𝑀]+ 𝑖=1

𝑛[𝐶]𝐷[𝐶] (𝑖)

𝑛𝐶 17

𝐼 = 𝐼𝑚𝑎𝑔𝑒

𝑀𝑀 = 𝑀𝑢𝑙𝑡𝑖𝑀𝑒𝑑𝑖𝑎

𝐶 = 𝐶𝑆𝑆

Calculated Damage

• 𝑀𝑀 = Percent of embedded resources missing

𝑀𝑀 =𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 𝑀𝑖𝑠𝑠𝑖𝑛𝑔

𝑇𝑜𝑡𝑎𝑙 𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠

• 𝐷𝑀 = Damage rating of missing embedded resources

𝐷𝑀 =𝐷𝑀𝐴𝑐𝑡𝑢𝑎𝑙𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙

𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 = 𝑖=1

𝑛[𝐼|𝑀𝑀]𝐷[𝐼|𝑀𝑀] (𝑖)

𝑛[𝐼|𝑀𝑀]+ 𝑖=1

𝑛[𝐶]𝐷[𝐶] (𝑖)

𝑛𝐶 18

𝑀𝑀 = 0.29𝐷𝑀 = 0.36

𝑀𝑀 = 0.24𝐷𝑀 = 0.41

What do Web users think?

19

Setting up the Turk Test

• Amazon’s mechanical turkers represent real web users

• Two legs of the experiment:• Manually damaged memento vs. Live resource

• 10 manually damaged mementos and resources

• Real Memento vs. Real Memento• 100 URI-Rs, one memento per year

20

21

22

23

Quantifying Turker Response

• 5 turkers for each comparison

• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)

• Measure turker agreement:

Image A Image B Split

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 5 0 5-024

Quantifying Turker Response

• 5 turkers for each comparison

• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)

• Measure turker agreement:

Image A Image B Split

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 4 1 4-125

Quantifying Turker Response

• 5 turkers for each comparison

• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)

• Measure turker agreement:

Image A Image B Split

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 0 5 0-526

Quantifying Turker Response

• 5 turkers for each comparison

• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)

• Measure turker agreement:

Image A Image B Split

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 0 5 0-527

No agreement!

Quantifying Turker Response

• 5 turkers for each comparison

• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)

• Measure turker agreement:

Image A Image B Split

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 3 2 3-228

Quantifying Turker Response

• 5 turkers for each comparison

• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)

• Measure turker agreement:Defined only by 4-1 and 5-0 splits

Image A Image B Split

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 3 2 3-229

Split decision No agreement!

Turk Results

• Compared damage(𝐷𝑀) and percent missing (𝑀𝑀)• M0: Manually damaged mementos

• D: Internet Archive Mementos

• M: Percent missing in Internet Archive Mementos

• 𝐷𝑀vs. Live: 78.9% true positives

• 𝑀𝑀 vs. Live: 47.2% true positives• Worse than a 50/50 chance!

• 𝐷𝑀 vs 𝐷𝑀: 58.4% true positives

30

Damage in the Internet Archive

• 1,000 URI-Rs from Bitly

• 1,000 URI-Rs from Archive-it

• Remove non-HTML representations

• 1,861 URI-Rs remaining

• Sample 1 memento per year from Internet Archive

• Measure damage

31

• Measured Internet Archive mementos

• Damage generally improves over time

• Despite missing more resources over time

Damage in the Internet Archive

32

Conclusions

• 𝐷𝑀 is a better measure of memento quality than 𝑀𝑀• On average, the Internet Archive is improving its quality over time

• Internet Archive is also missing more embedded resources over time

• Improved damage weighting (58.4% correct can be improved)

• Measure cumulative temporal damage ratings• E.g., a logo that never changes for 10 years and is used by 100 mementos is

more important than the one used in a single memento.

33

Recommended