Hub Distributed Model 2009

Preview:

DESCRIPTION

 

Citation preview

The Archives Hub ~Interoperability, Spokes and the Distributed Model

The Hub in a Nutshell

• Based at Mimas, University of Manchester

• In service since 2000

• Over 23,000 collection descriptions

• 170 repositories

• JISC funded

• Management and service team at Manchester

• Development team at Liverpool

• Cheshire software

• Cheshire for Archives – works with EAD descriptions

• Distributed system

Hub Workshop 2009

Content and contributors

• Strategic aim: build and enhance content

• Meeting the needs of the UK research community

• Meeting the needs of the wider community

• Archives for education and research

Flickr cc licence: eileenaway's photostream

The success of the Hub is a reflection of the rich content available from Hub contributors

Hub Workshop 2009

Current contributors

• Higher/Further Education

• Consortium contributions

• Institutions with a research agenda

• Others on a case-by-case basis

• We encourage institutions to contact us

John Rylands Library, Manchester

Hub Workshop 2009

Collection or lower-level…?

• Originally funded for collection-level

• Software/searches effective with both

• Complimentary approaches

• Researchers ask for detail

Flickr cc licence: Muffet’s photostream

Flickr cc licence: soylentgreen23’s photostream

• Images useful at item level

JISC Information Environment

… a vast and sometimes bewildering range of potential sources of electronic information. Each source of information has its own name, its own interface, features and search facilities. Little wonder, then, that many users remain unaware of their existence or fail to discover their value for their own learning, teaching or research.

A key challenge is therefore to achieve a managed, coherent and shared information environment that will overcome these obstacles.

Being able to cross-search and use customised, value added and other services will considerably simplify users’ interactions with online resources. This should encourage take-up and greatly improve means of accessing these resources.

…these activities need to be based on standards for the creation, access, use, preservation and interoperability of networked resources.

http://www.jisc.ac.uk/index.cfm?name=ie_home

Hub Workshop 2009

JISC Information Environment

Most content providers will already offer a Web site through which end-users can access their content. To be a part of the JISC-IE, content providers also need to support machine oriented interfaces to their resources.

1. Support searching using Z39.50/SRW

2. Support metadata harvesting using OAI-PMH

Andy Powell

5 step guide to becoming a content provider in the JISC Information Environment

http://www.ariadne.ac.uk/issue33/info-environment

Hub Workshop 2009

E-GIF, open source and open standardse-GIF version 6.1 (18th March 2005)

– The e-Government Interoperability Framework (e-GIF) sets out the government’s technical policies and specifications for achieving interoperability… across the public sector.

– There is a strategic decision to adopt XML and XSL as the core standards for data integration and management.

– It is a pragmatic strategy that aims to reduce cost and risk for government systems while aligning them to the global Internet revolution.

http://www.govtalk.gov.uk/documents/eGIF%20v6_1(1).pdf

Open Source, Open Standards and Re–Use: Government Action Plan http://www.netvibes.com/cabinetoffice#Open_Source

Hub Workshop 2009

Isn’t technology brilliant?!!

• Technical know-how• XML

• Data creation/editing template

• Web interface

• Machine interfaces

• Distributed model

• Web 2.0

• Dissemination

= Satisfying user experience

+ understanding users

Hub Workshop 2009

Hub Data Flow

• Sustainable model

• Data held as XML

• Efficient search

mechanism

• Flexible access

• Easy to become a

Spoke

The Distributed Hub

Flickr cc licence : Thomas Hawk

The main goal of a distributed computing system is to connect users and resources in a transparent, open, and scalable way. Ideally this arrangement is drastically more fault tolerant and more powerful than many combinations of stand-alone computer systems.

[Wikipedia]

• Administration interface

• Customisable web front-end

• Machine-to-machine interfaces

• Data Creation Template

• Local control

• Technical support locally

• Hub team support

Spokes software

• Offers a means of storing and sharing archival descriptions in XML

• Provides machine-to-machine access to the descriptions through Z39.50 and SRU (Search and Retrieve via URL) & OAI-PMH for harvesting records

• Provides a customisable Web search interface

• Is open source and based on open standards

• Includes a data creation and editing template

Hub Workshop 2009

Anatomy of a Spoke

EAD XML files

Web search interface

Direct searching access for other

applications through

standards-based machine-to-

machine protocols

Including the central Hub!

Z39.50

SRUCheshire indexes of EAD data

HT

TP

Spokes indexes

The database will provide indexes based on the following standards:

Data standard Data field(s)

cql.anywhere full text

dc.description unittitle, controlaccess, and scopecontent fields

dc.title collection title (titleproper)

dc.creator creator of the collection

dc.identifier eadid

dc.date unitdate

dc.subjects controlaccess fields

bath.name personal, family, corporate and geographic names

bath.personalName personal names

bath.corporateName corporate names

bath.geographicName geographic names

bath.genreForm genre

Hub Workshop 2009

Administration Interface

http://spoke.mimas.archiveshub.ac.uk/ead/admin/

Hub Workshop 2009

Hub Workshop 2009

Liverpool Spoke

Hub Workshop 2009

John Rylands Spoke

Hub Workshop 2009

Agreement with Spokes

Hub Workshop 2009

Hosted Spokes

• Spokes at Manchester– Configuration

– Agreement between parties

• Manchester team undertake agreed level of support

• Institution still responsible for the data

Hub Workshop 2009

Being an Archives Hub Spoke…

• Gives you control over your own EAD files

– Allows you to update and add new files when you need to

• Exposes your EAD to other applications which need to cross-search the descriptions

– Using standards-compliant methods

• Means you benefit from using software that has been developed with the Archives Hub community

Hub Workshop 2009

Collaboration & Sharing

• Networks and communities – the National Archives Network

• Cross-service and cross-domain collaboration

– Copac

– Intute

– Digitisation Projects

• Expand and share content

– import/export/M2M

• Links to other archive services

– NRA

Hub Workshop 2009

The National Archives Network

‘Our vision of the future of British archives is of a flow of archival information which takes account of all the opportunities offered by digital networks and offers opportunity for exploration - historical, personal, social - to the broadest possible range of people wherever they can use it - in the home, the classroom or the office.’

British Archives: The Way Forward (NCA, 2000)

A comprehensive national resource discovery mechanism

Hub Workshop 2009

The importance

‘There can be no higher priority for archives than the creation of this collaborative electronic network, overcoming the limitations of geography, crossing the many archival sectors and creating a truly unified digital directory or encyclopaedia of British historical documents.’

British Archives: The Way Forward (NCA, 2000)

Hub Workshop 2009

National Archive Network

Hub Workshop 2009

The opportunity

‘Outreach has been a developing preoccupation for archives in recent years, but the arrival of the internet age provides the opportunities to take archives, as never before, to the doorstep of the community at large.’

British Archives: The Way Forward (2000)

Hub Workshop 2009

Progress of the NAN

• Many archives took part in this drive towards a national archives network

• …many still are taking part

• The importance of recognised standards

• Intention to create collection level catalogues of all substantial collections within a defined timeframe

Hub Workshop 2009

Success of the NAN

• Strands of the national archives network provide access to archives that were previously inaccessible

• The HLF has played a major role in enabling access and online discovery

• Users of archives have benefited enormously

• Data standards have become of central importance

Hub Workshop 2009

Shortcomings of the NAN

• We don’t have a single national network

• Differences in data structure; content; search capabilities; look and feel

• Strands are not fully interoperable

• Politics, funding and willpower may not combine in favour of this approach

• The landscape has changed substantially since 2000 – maybe this solution is no longer appropriate?

Hub Workshop 2009

The NAN today

• Many ‘strands’

• Only a few use EAD (support EAD export)

• Lack of funding for a joint solution

Key is interoperability and machine-to-machine interfaces:

• NAN as a community, sharing knowledge and experiences

• NAN as a promoter of standards and facilitator for data sharing

• NAN strands as promoters of flexible and open approaches

Hub Workshop 2009

The Interoperable Hub

The ability of software and hardware on different machines to share data

• Content standards

• Structural standards

• Validation of content

• Data Editor

•Training and awareness

• Contributor responsibility

• Networking and community building

Hub Workshop 2009

Hub Workshop 2009

Machine-to-machine interfaces

• Web access is just one means of access to the data

• Machine access provides flexible access, so people can set their own agendas– Z39.50

– SRU

– OAI-PMH (harvester)

• Need to provide semantic data – properly marked up, well-structured

Hub Workshop 2009

Pilot project for SRU: Genesis portal for Women’s Studies

• Hub hosts data

• Genesis searches the Hub using SRU

• Implications for data – how search just for appropriate descriptions?

• Possible issues with search speeds

Hub Workshop 2009

Persistent Identifiers

• All Hub descriptions have their own identifiers – a unique reference

• Gives them their own web address – can point to any description

• Facilitates linking, e.g. from National Register of Archives

• Enables bookmarking of content

http://www.archiveshub.ac.uk/arch/glossary.shtml#identifier

Hub Workshop 2009

Challenges (of which there are many)

• Understanding our users

• Encouraging item-level descriptions

• Encouraging images/links to content

• Which technology?

• Perceptions of relevancy

• Understanding Impact

• Sustainability

Flickr cc licence: hoodwink’s photostream

Hub Workshop 2009

Moving Forward

• Increasing content and contributors

• Branding and new Website

• More engagement with users / user generated content

• Continuing to be standards-based, open and interoperable

Recommended