32
Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole ([email protected]) University of Illinois at Urbana-Champaign http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/ ALA/CLA Annual Meeting 22 June 2003 Toronto, CA

Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole ([email protected]) University of Illinois at Urbana-Champaign

Embed Size (px)

Citation preview

Page 1: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources

Timothy W. Cole ([email protected])University of Illinois at Urbana-Champaign

http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/

ALA/CLA Annual Meeting22 June 2003

Toronto, CA

Page 2: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Order of Presentation

Perspectives on OAI-PMH Illinois OAI metadata harvesting

project Goals & objectives Findings regarding metadata Findings regarding search & discovery

New OAI projects at Illinois IMLS digital collections & content CIC OAI metadata harvesting project

Page 3: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

OAI Protocol for Metadata Harvesting

Harvesting approachto interoperabilityat metadata level

Divides world intoMetadata Providers& Service Providers

Builds on HTTP,XML, & Dublin Core

http://www.openarchives.org/

Page 4: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

OAI Antecedents

Call to other E-Print archives (July 1999)Paul Ginsparg, Rick Luce, & Herbert Von de

Sompel:“…mobilize core group to work towards achieving auniversal service for author self-archived scholarly literature.”

Santa Fe Mtgs. (Oct. 1999 & June 2000) OAI – PMH version history:

First Alpha Release, Sept. 2000 1.0 (Beta) Release January 2001 1.1 (Beta 2) Release July 2001 2.0 (Production) Release June 2002

Page 5: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Original OAI Organization

OAI Executive: Carl Lagoze & Herbert Van de Sompel

OAI Steering Committee: Co-Chairs: Dan Greenstein, Cliff

Lynch OAI Technical Committee Funded by NSF, DLF & CNI Seeks to be user community driven

Page 6: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

OAI-PMH as a tool

All about moving metadata around Designed to be a building block,

useable by many different communities Can facilitate (in some cases enable)

services & functions Assumes widely distributed content,

butcentralized indexing(!) & services

Build once, use for many applications Focus of OAI is interoperability

Page 7: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Harvesting vs. Broadcast Competing approaches to

interoperability

Distributed/Broadcast searching: search and discovery over remote services and data

Harvesting is when data/metadata is transferred from the remote source to the destination where search & discovery services are located (e.g. Union catalogs)

OAI-PMH is a harvesting protocol

Page 8: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

As Compared to Z39.50

Z39.50 OAI

Content (Objects) Distributed Distributed

World View Bibliographic Bibliographic

Object Presentation

Data provider Data provider

Searching is Distributed Centralized

Search done by Data provider Service provider

Metadata searched is

Up to date Stale

Semantic Mapping When searching Metadata delivery

Page 9: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Metadata vs. Resources

Resource refers to information objects or digital representations of information objects

Metadata item is a collection of properties about a resource (e.g. title, author, etc.)

Metadata record is a metadata item expressed in a specific syntax according to an XSD

OAI focuses on metadata, with the implicit understanding that metadata contains useful links to the source information object(s)

Page 10: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

When to use OAI-PMH

Metadata is sufficient for services desired Normalization, dedupping, metadata

augmentation desired Content is widely distributed across small,

non-Z39.50 enabled repositories OAI-PMH is more lightweight than Z39.50

Portals can use BOTH Z39.50 & OAI-PMH

Page 11: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

What OAI-PMH Is Not

Not search & discovery on its own

Not a database management system

Not a single metadata schema

Not OAIS

Page 12: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

How OAI Works

OAI “VERBS”

Identify

ListMetadataFormats

ListSets

ListIdentifiers

ListRecords

GetRecord

HARVESTER

REPOSITORY

OAI OAI

Service Provider Metadata Provider

HTTP Request

HTTP Response

(OAI Verb)

(Valid XML)

Page 13: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

OAI Provider Architectures

Descriptive Metadata

DBMS

XML

HTML <meta>

OAI Administrative Metadata, e.g., Ids, datestamps, sets, formats

Webserver - HTTP

OAI Application (CGI, ASP, PHP, etc.)OAI

Harvesters

Page 14: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

A few projects using OAI-PMH

Basic building block of the National Science Digital Library

Large-scale implementations in E-Prints, OLAC, NDLTD, …

Built into ENCompass, ContentDM, Michigan’s DLXS, D-Space, and other products

Open Archives Forum in Europe; will be part of federation activities in the UK and EU

Page 15: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Univ. of Illinois OAI Metadata Harvesting Project

Funded by Andrew W. Mellon Foundation(July 2001 – May 2003)

Primary objectives: Develop & make available OAI harvesting tools Build search services for aggregated metadata

in the domain of cultural heritage Examine metadata aggregation issues,

including use of EAD in OAI context Investigate utility of aggregated metadata,

including preliminary testing with end-users

Page 16: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Type of resources

39 data providers academic libraries Museums / cultural

orgs digital libraries public library

1.1 million original DC records + 1.5 million derived

from EAD

Images25%

Text & Sheet Music50%

Artifact20%

Other5%

Page 17: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Variations in DC element usage

Records containing subject & description elementSUBJECT DESCRIPTION

Digital libraries(10 total, 122,719 records)

78% 36%

Museums, hist. societies, etc. (6 total, 255,800 records)

93% 93%

Academic libraries(7 total, 235,294 records)

15% 13%

Many different controlled and local vocabularies in use Granularity: a record may describe a collection

of coins — or one coin

Page 18: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Excerpt of a metadata record describing a cotton coverlet

Description: Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin.

Source: Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss.

Format: Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web.

Coverage: —

Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920?

Type: Image

Page 19: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Excerpt of a metadata record describing "American woven coverlet“

Description: Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973

Source: —

Format: 228 x 169 x 1.2 cm (1,629 g)

Coverage: Euro-American; America, North; United States; Indiana? Illinois?

Date: Early 19th c. CE

Type: cultural; physical object; original

Page 20: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Implications Service providers

Automatically normalize metadata encoding where possible (e.g., dates)

Normalize for and co-locate by type / format where possible

Metadata providers Create metadata for interoperability Consider more expressive schema –

e.g., Qualified DC, MARC

Page 21: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

Original interface Portal had two search

pages—simple (keyword) and advanced.

Page 22: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign
Page 23: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Pilot study with student teachers

23 users in honors-level C&I class Assignment: Use the site in preparing a lesson

plan (high school social studies)__________

Introduced to “aggregated metadata” concept Focus group interviews conducted Students’ papers examined Transaction logs analyzed

Page 24: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Results of initial user testing

1. Users expected all links pointed to digital objects

Some records pointed to finding aids Some records pointed to collection’s web site Some records described analog objects

2. Users unable to make use of search results Simple searches produced 1000s of unranked

results Advanced search (with limits) rarely used

3. Distinction between portal and data providers unimportant to users

Page 25: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

What does “online access” mean?

To librarian & curator

To student teacher

Page 26: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Response to test results

EAD-derived records segregated

Analog only collections excluded

Categories of resource types reduced to 3:

Images and Video Text, Sheet Music, and Websites Museums and Archival Collections

Page 27: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Revised interface

Simple keyword & advanced searchput on one page

Clarify “online access”

Natural language in Boolean operators

Page 28: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

Revised search results

Link goes to finding aid or collection page? “Learn more.”

Link displays object? “View item.”

Subj/Desc expanded

Page 29: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

IMLS Digital Collections & Content

Build a registry of all National Leadership Grant collections with digital content.

Assist and guide NLG projects in making item-level metadata sharable using OAI.

Build a repository and search & discovery tools for integrated access to the content of NLG collections (unique metadata schema?).

Research best practices for sharing metadata about diverse digital content and for supporting the interests of diverse user communities.

Page 30: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

http://imlsdcc.grainger.uiuc.edu/

Page 31: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-

[email protected])

CIC OAI metadata harvesting

Univ. of Illinois at UC will host an OAI-PMH metadata harvesting service for 10 CIC libraries

Project Goals (3 year experimentation phase) Improve access to selected resources at CIC libraries Advertise these resources (internally & externally) Prepare member institutions for future grant-

mandated OAI-based resource sharing Serve as a useful testbed for experimentation with

OAI-PMH, development of metadata best practices, usability and user needs testing, etc.

Page 32: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at Urbana-Champaign

Using OAI-PMH to Aggregate Metadata Describing Cultural

Heritage Resources

http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/

Timothy W. Cole ([email protected])University of Illinois at Urbana-Champaign