14
Team BC 2017 Politics

Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

Team BC 2017 Politics

Page 2: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co
Page 3: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co
Page 4: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co
Page 5: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co
Page 6: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co
Page 7: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

What we learned

Page 8: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

1. that working with web archives is highly dependent on the quality of the data and the context and constraints with which it was collected.

Page 9: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

2. The framework is cool, but requires programming skillz (command-line comfort, which mean greater opportunities for digital training needed for humanities scholars.)

Page 10: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

3. You have to know how to create your own datasets or get ahold of WARC files. MOU Economy? How do we surface WARC files for this type of work?

Page 11: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

4. Need to link efforts among Canadian institutions, a community of web-archiving people, and archives unleashed can be hybrid community to bring Librarian/Archivist folks with the Historian folks.

Page 12: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

Problems

1. Crawler trap baggage (West Point Grey);

510255 URLS

510242 were 404s.

Out of those 13 that were “good” , only 7 were real information, the other 6 were redirects

Page 13: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

2. Topic Modeling: Noise in the data. Special Characters. (Null characters encoding)

Page 14: Team BC 2017 Politics - Archives Unleashed · Team BC 2017 Politics. What we learned. 1. that working with web archives is highly dependent on the quality of the data ... apps f ebook.co

With Ian and future historians in mind…Where's the measure of completeness, and how do you build that into the web archive?

Historian | Archivist | Librarian | Systems |