Introduction to British Library digital resources for social scientists

Preview:

DESCRIPTION

Presentation to NCRM event DIGITAL METHODS AS MAINSTREAM METHODOLOGY

Citation preview

Welcome and introduction to British Library digital resources for social scientists

John Kaye – Lead Curator Digital Social Science

Peter Webster - Web Archiving Engagement and Liaison Manager

7th December 2012

www.slideshare.net/johnkayebl

2

What kind of library are we?

“We exist for everyone who wants to do research – for academic, personal or commercial purposes”

Our collections cover all known subject areas; sciences, technology, medicine, arts & humanities, social sciences…

We have a copy of every item published in the UK

Our collections cover all formats; sound, images, video, newspapers, maps, manuscripts, databases, books and journals, much more…

3

News, newspapers and magazines

4

News and current events

Broadcast news, recording from May 2010

Political change in Middle East

Olympic Games

Occupy movement

5

Images and photographs

Images online

Online gallery

Photographically illustrated books

6

Online Services

7

Social Science online resources for researchers

ESRC online resource

Management and Business Studies Portal

Social Welfare Portal

www.bl.uk/oralhistory

Social Science blog

8

9

Management and Business Studies portal

10

11

http://britishlibrary.typepad.co.uk/socialscience/

12

Oral history at a glancewww.bl.uk/oralhistory

370 collections from 1 tape to 5,500 (Millennium Memory Bank)

100-150 hours of new digital fieldwork recordings per month

2200 catalogue records added or updated per year

4000 public enquiries per year

40 talks and lectures per year

60 training sessions per year with OHS (500+ people)

13

Guides and support

Reference services: reading room, telephone, email

Help for Researchers web pages

Collection guides, eg for government publications:http://www.bl.uk/reshelp/findhelprestype/offpubs/guides/govtguides.html

Topical bibliographies, eg Globalisation and employment, Gang culture and knife crime, Corporate Social Responsibility, Far Right in Britain …

Welfare Reform on the Web

14

Exhibitions and events

www.bl.uk/whatson

15

Doctoral Open Days 2013

11 February – Social Sciences

18 February – Media, Cultural Studies and Journalism

http://www.bl.uk/whatson/events/docopendays/index.html

Web archives and digital method

Dr Peter WebsterWeb Archiving Engagement and Liaison Officer

@UKWebArchive / @pj_webster

Peter.Webster@bl.uk

http://www.webarchive.org.uk

December 7th 2012

17

The lost web: people

[votedavidcameron.com, (archived 24/5/05)]

18

The lost web: people

[robincook.org.uk (archived 8/8/05)]

19

The lost web: organisations

[tvpa.police.uk (archived 21/11/12)]

20

The lost web: organisations

[woolworthsgroupplc.com (archived 12/12/08)]

Our mission:

Collect, preserve, and make accessible

web sites of cultural and scholarly

importance from the UK domain

22

UK Web Archive http://www.webarchive.org.uk

Selective Web Archive over 11,000 websites collected since

2004

over 50,000 instances

Over 16TB of compressed data

British Library, National Library of Wales, JISC

Also National Library of Scotland, the National Archives, Wellcome Library

Many collaborators

eg Women’s Library, Live Arts Development Agency, Quakers in Britain

A typical event-based special collection

Collect, preserve, and make accessible

eb sites of cultural and scholarly

importance from the UK domain

24

The orphaned web

A comprehensive special collection

Collect, preserve, and make accessible

eb sites of cultural and scholarly

importance from the UK domain

26

Web archiving: the basics

What Selecting, capturing, storing, preserving and managing access to snapshots of websites over time

How Use crawler software to download websites automatically Selective or domain archiving Provide access in a Web Archive

When Since mid 1990s

Who Heritage and memory organisations, eg BL, The National Archives University libraries Not-for-profit and commercial organisations, eg Internet Archive Individual researchers

Why Global information resource Artefact of cultural and technology change Representative sample of the web: historical and sociological data that may not be found

elsewhere Part of national digital heritage - legal requirements

27

Selective versus domain archiving

Two complementary approaches: selective and domain archiving

W i d t h

Dep

t

h

Selective archiving:- More frequent gathers; manual QA

- Guided by collection policy

- Can be based on events or themes e.g. credit crunch

-- manual & expensive

Domain harvesting:

- Typically once/twice a year

- Domain wide snapshot

- Supported by national legislative framework

-- automated & cost-effective

28

Non-print Legal Deposit 2013: what will we collect ?

A deposit library is entitled to copy UK publications from the open web.

A deposit library is entitled to collect other password-protected material by harvesting, subject to giving at least 1 month’s written notice for the publisher to provide a password or access credentials.

29

What will we be collecting ?

Includes resources:

• that are issued from a .uk or other UK geographic top-level domain, or

• where part of the publishing process takes place in the UK;

• but excluding any which are only accessible to audiences outside the UK.

30

What will we NOT be collecting ?

Film and recorded sound where the audio-visual content predominates

Private intranets and emails

Personal data in social networking sites or that are only available to restricted groups.

31

What will users be able to do with it ?

Users may:

• access deposited material while on “library premises controlled by a deposit library”.

• print one copy of a restricted amount of any deposited material, for non-commercial research or other defined ‘fair dealing’ purposes such as court proceedings, statutory enquiry, criticism and review or journalism.

32

What will users NOT be able to do with it ?

Users may NOT:

• use an item simultaneously with another user;

• make any digital copies, except by specific and explicit licence of the publisher.

33

A web archiving strategy based on prioritisation

Domain Crawl

Event Event Event

Domain harvesting: •Broad sweep of .uk domain•Survey and discovery •Implement Legal Deposit

Events: •Political, cultural, social and economic events of national interest, eg Olympics 2012

Special Collection: •Focused, thematic collections•Support priority subjects

34

JISC UK Web Domain Dataset (1996-2010)

Funded by JISC to create a research collection of UK websites

Collaboration between the Internet Archive, JISC and the British Library

Copy of subset of the Internet Archive’s web collection that relates to the UK

470466 files, mostly arc.gz, with 4494 warc.gz. Total size: 32TB

No local access – possible through the Internet Archive

Can be used to generate secondary datasets and make these available

Analytical access the main route

Historical Archive – HTML Version Analysis

N-Gram Search: Prime Ministers

N-Gram Search: Social Media

38

Questions ?

John.Kaye@bl.uk

Twitter: @johnkayeBL

Peter.Webster@bl.uk

Twitter: @UKWebArchive / @pj_webster

UK Web Archive: http://www.webarchive.org.uk