NetarchiveSuite Meeting, BnF, 24./25.11.2011 1 Curator Track Web@rchive Austria Michaela Mayr...

Preview:

Citation preview

NetarchiveSuite Meeting, BnF, 24./25.11.2011 1

Curator TrackWeb@rchive Austria

Michaela Mayr

Austrian National Librarywebarchiv@onb.ac.atwww.onb.ac.at

NetarchiveSuite Meeting, BnF, 24./25.11.2011 2

Selecting Websites with External Partners (1)Selection @ ANL• No team of curators • Selection usually by WA team• Experience: very difficult to involve

subject librarians• Austrian Literature collection

– List of URLs submitted by Literature Archive– No tool used– Monthly crawl– Exchange with external literature WA project

• (IIPC Projects: Nomination tool UNT)

NetarchiveSuite Meeting, BnF, 24./25.11.2011 3

Selecting Websites with External Partners (2)

Ideas for the Future…• Universities, students, researchers• Use of bookmarking tools, e.g.

Diigo, Delicious etc.• Public invitation for nomination via

social media

Crowdsourcing of Selection?

NetarchiveSuite Meeting, BnF, 24./25.11.2011 4

Selecting Websites with External Partners (3)

Questions & Discussion• Why?• What partners are you working

with?• What selection tools are you using?

Social Media?• Special topics?• Suggestions or binding?• Selectors‘ involvement in QA?

NetarchiveSuite Meeting, BnF, 24./25.11.2011 5

Metrics (1)

• Dynamically generated from data warehouse

• Reports:– Storage distribution– Daily use of storage– Storage per harvest definitions (total)– Storage per harvest definitions (daily)– Storage and Objects (per year)– Storage and Objects (monthly)– Storage and Objects (daily)

NetarchiveSuite Meeting, BnF, 24./25.11.2011 6

Metrics (2)

NetarchiveSuite Meeting, BnF, 24./25.11.2011 7

Domain Crawl 2009/2010

• Ca. 900.000 Domains• Physischer Speicher: ca. 6 TB (original ca.

8,5 TB, komprimiert und dedupliziert)• Ca. 386 Mio. Objekte• Erkenntnisse zu .at Webseiten:

– 14% (115.000) sind > 10 MB– 71% (580.000) sind < 1 MB– 10% (90.000) enthalten 0 Objekte– 53% (470.000) enthalten < 10 Objekte

NetarchiveSuite Meeting, BnF, 24./25.11.2011 8

Rich and Social Media (1)

• No special harvest definitions

NetarchiveSuite Meeting, BnF, 24./25.11.2011 9

Rich and Social Media (2)

NetarchiveSuite Meeting, BnF, 24./25.11.2011 10

News and Media Harvesting (1)

NetarchiveSuite Meeting, BnF, 24./25.11.2011 11

News and Media Harvesting (2)

• Started April 2011• 23 websites• Weekly, daily, hourly• QA Tool• 310 GB

NetarchiveSuite Meeting, BnF, 24./25.11.2011 12

Further Information:http://webarchiv.onb.ac.at

Social Media:http://twitter.com/AT_Webarchivehttp://www.facebook.com/ATWebarchivehttp://www.slideshare.net/ATWebarchivehttp://screenr.com/user/AT_Webarchive

Questions?

Recommended