Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos

Not All Mementos are Created Equal: Measuring the Impact of Missing

ResourcesJustin F. Brunelle, Mat Kelly, Hany SalahEldeen, Michele C. Weigle,

Michael L. Nelson

Old Dominion University

{jbrunelle, mkelly, hany, mweigle, mln}@cs.odu.edu

Goal: Automatically measure the quality of the archives

20% missing

14% missing

28% missing

7% missing

“Live” XKCD

• Missing 17% of embedded resources

• Looks complete

“Live” XKCD

• Take three resources:• Logo

• Main Comic

• Navigation Strip

• Relative importance?

• All present in “Live” XKCD

Damaging XKCD

• Created a local memento

• Removed the logo and navigation strip

• Now missing 29% of embedded resources

• Human assessment: looks OK

Damaging XKCD

• From our local memento

• Removed the Main Comic

• Human assessment: Not a usable memento

Damaging XKCD

• From our local memento

• Removed the Main Comic

• Human assessment: Not a usable memento

• Percent of missing embedded resources is not a suitable metric for memento quality

Image Importance

• Size (as percentage of all pixels)

Image Importance

• Size

• Position (in viewport?)

Image Importance

• Size

• Position

• Centrality (in the vertical or horizontal center?)

Missing CSS

• Damage not limited to images

• When missing CSS, content shifts left

Missing CSS

• Partitioned snapshot into thirds

• Background color determined

• Pixel-by-pixel comparison

Missing CSS

• Calculated the amount of content in each vertical third

• If >=80% in left column and missing CSS, CSS is important

• Only performed if stylesheets are missing

Percent Missing vs. Weighted Damage

• 𝑀𝑀 = Percent of embedded resources missing

𝑀𝑀 =𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 𝑀𝑖𝑠𝑠𝑖𝑛𝑔

𝑇𝑜𝑡𝑎𝑙 𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠

• 𝐷𝑀 = Damage rating of missing embedded resources

𝐷𝑀 =𝐷𝑀𝐴𝑐𝑡𝑢𝑎𝑙𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙

𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 = 𝑖=1

𝑛[𝐼|𝑀𝑀]𝐷[𝐼|𝑀𝑀] (𝑖)

𝑛[𝐼|𝑀𝑀]+ 𝑖=1

𝑛[𝐶]𝐷[𝐶] (𝑖)

𝑛𝐶 17

𝐼 = 𝐼𝑚𝑎𝑔𝑒

𝑀𝑀 = 𝑀𝑢𝑙𝑡𝑖𝑀𝑒𝑑𝑖𝑎

𝐶 = 𝐶𝑆𝑆

Calculated Damage

• 𝑀𝑀 = Percent of embedded resources missing

𝑀𝑀 =𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 𝑀𝑖𝑠𝑠𝑖𝑛𝑔

𝑇𝑜𝑡𝑎𝑙 𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠

• 𝐷𝑀 = Damage rating of missing embedded resources

𝐷𝑀 =𝐷𝑀𝐴𝑐𝑡𝑢𝑎𝑙𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙

𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 = 𝑖=1

𝑛[𝐼|𝑀𝑀]𝐷[𝐼|𝑀𝑀] (𝑖)

𝑛[𝐼|𝑀𝑀]+ 𝑖=1

𝑛[𝐶]𝐷[𝐶] (𝑖)

𝑛𝐶 18

𝑀𝑀 = 0.29𝐷𝑀 = 0.36

𝑀𝑀 = 0.24𝐷𝑀 = 0.41

What do Web users think?

Setting up the Turk Test

• Amazon’s mechanical turkers represent real web users

• Two legs of the experiment:• Manually damaged memento vs. Live resource

• 10 manually damaged mementos and resources

• Real Memento vs. Real Memento• 100 URI-Rs, one memento per year

Quantifying Turker Response

• 5 turkers for each comparison

• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)

• Measure turker agreement:

Image A Image B Split

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 5 0 5-024

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 4 1 4-125

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 0 5 0-526

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 0 5 0-527

No agreement!

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 3 2 3-228

• Measure turker agreement:Defined only by 4-1 and 5-0 splits

Turker 1 Y

Turker 2 Y

Turker 3 Y

Turker 4 Y

Turker 5 Y

Result 3 2 3-229

Split decision No agreement!

Turk Results

• Compared damage(𝐷𝑀) and percent missing (𝑀𝑀)• M0: Manually damaged mementos

• D: Internet Archive Mementos

• M: Percent missing in Internet Archive Mementos

• 𝐷𝑀vs. Live: 78.9% true positives

• 𝑀𝑀 vs. Live: 47.2% true positives• Worse than a 50/50 chance!

• 𝐷𝑀 vs 𝐷𝑀: 58.4% true positives

Damage in the Internet Archive

• 1,000 URI-Rs from Bitly

• 1,000 URI-Rs from Archive-it

• Remove non-HTML representations

• 1,861 URI-Rs remaining

• Sample 1 memento per year from Internet Archive

• Measure damage

• Measured Internet Archive mementos

• Damage generally improves over time

• Despite missing more resources over time

Damage in the Internet Archive

Conclusions

• 𝐷𝑀 is a better measure of memento quality than 𝑀𝑀• On average, the Internet Archive is improving its quality over time

• Internet Archive is also missing more embedded resources over time

• Improved damage weighting (58.4% correct can be improved)

• Measure cumulative temporal damage ratings• E.g., a logo that never changes for 10 years and is used by 100 mementos is

more important than the one used in a single memento.

Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos

Science

Visual Mementos: Reﬂecting Memories with Personal Data › innovis › uploads › Publications... · Visual Mementos: Reﬂecting Memories with Personal Data Alice Thudt, Dominikus

dmissing missing missing missing after Decismissing ... · missing missing missing missing missin gafter Making emotional and psychological support routine in diabetes care. missingToo

Missing Lines, Missing Views, and Multiview to Isometric

Awards, Ceremonies, Food or Refreshments, Gifts or Mementos

MEMENTOS HINOS E CANÇÕES

EQUAL PAY fOR EQUAL wORk AND wORk Of EQUAL vALUE: ThE

California Courts - Home · Keywords: Equal Access,equal employment,employment equal,equal policy,equal access law,equal access education,equal access employment service Created Date

Not All Mementos Are Created Equal: Measuring The Impact ...mln/pubs/jcdl-2014/jcdl... · (a)Allthreeoftheembeddedimages are included in m 0 and identiﬁed by theredarrows(Mm=0.17)

Christian teaching on prejudice and discrimination Jesus Against Parable Good Samaritan “Love... Neighbour” God Created Equal Fill in the missing words:-

Missing Information: Missing SASID’s Missing Disabilities Missing Ethnicity Missing Date of Birth Missing CSAPA/CSAP clarification Completed Information:

Hier steht der Titel der Power Point Präsentation. · Protocol (Layer 1-7) equal equal equal equal equal ... Application function equal equal Meaning of ... (NV) Taster -Knoten (Sensor)

Missing links, missing markets: Internal exchanges ... · Missing links, missing markets: Internal exchanges, reciprocity and external connections in the economic networks of Gambian

Mementos 2014 gift holiday collection catalog

Macquarie mementos The Hippies’ car wins Variety Bash B · The Hippies’ car wins Variety Bash New tattooist Macquarie mementos. September 2010 The Nimbin GoodTimes Page 3 by Sue

Is the ‘missing generation’ still missing?

Children Missing Education and Children Missing From

Burhani Acrylics, Pune, Acrylic Mementos

Promotional Clocks,Tabletops, Photoframes & Mementos

Missing the Point or Missing the Norms?

National Missing and Unidentified Persons System - …€¦ · National Missing and Unidentified Persons System ... The National Missing and Unidentified Persons ... missing and unidentified