1
User Access Patterns in Web Archives Robot sessions outnumber human sessions 10:1 in the Internet Archive Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson {yasmin, mweigle, mln}@cs.odu.edu How do Users access Web Archives? Although user patterns in the live web are well-understood, there has been no corresponding study of how users, both humans and robots, access web archives. Abstract Models for Accessing Web Archives Methodology Data Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2, 2012. Robots vs Humans User Raw Requests Filtered Requests Sessions MBs Transferred Robots 1,002,573 (50.1%) 396,627 (93.0%) 34,203 (90.9%) 20,010 Humans 810,049 (40.5%) 29,690 (7.0%) 3,431 (9.1%) 4,459 Results Dip Dive Slide and Dive Skim Slide Percentage 0 10 20 30 40 50 Dip Dive Slide & Dive Skim Slide 0 10 20 30 40 TimeMap Memento Robots Humans Robots and humans exhibit different access patterns. Conclusion Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1 in terms of MB transferred. Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern, and that they access TimeMaps almost exclusively. Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlike robots, humans mainly access archived pages rather than TimeMaps. References 1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle and Michael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.

User Access Patterns in Web Archives

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: User Access Patterns in Web Archives

UserAccessPatterns inWebArchivesRobot sessions outnumber human sessions 10:1 in the Internet Archive

Yasmin AlNoamany,Michele C.Weigle, andMichael L. Nelson

{yasmin, mweigle, mln}@cs.odu.edu

How do Users access Web Archives?Although user patterns in the live web are well-understood, there has been no corresponding study of howusers, both humans and robots, access web archives.

Abstract Models for Accessing Web Archives

MethodologyData Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2,2012.

Robots vs Humans

User Raw Requests Filtered Requests Sessions MBs Transferred

Robots 1,002,573 (50.1%) 396,627 (93.0%) 34,203 (90.9%) 20,010Humans 810,049 (40.5%) 29,690 (7.0%) 3,431 (9.1%) 4,459

Results

Dip Dive Slide and Dive Skim Slide

Per

cen

tag

e

0

10

20

30

40

50

Dip Dive Slide & Dive Skim Slide

010

2030

40 TimeMapMemento

Robots Humans

Robots and humans exhibit different access patterns.

Conclusion• Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1

in terms of MB transferred.

• Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern,and that they access TimeMaps almost exclusively.

• Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlikerobots, humans mainly access archived pages rather than TimeMaps.

References1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle andMichael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.