42

Metadata Training for Staff and Librarians for the New Data Environment

Embed Size (px)

DESCRIPTION

Presented at the 2011 DLF Forum in Baltimore Maryland.

Citation preview

Page 1: Metadata Training for Staff and Librarians for the New Data Environment
Page 2: Metadata Training for Staff and Librarians for the New Data Environment

Today’s Task

Part 1: Audiences, current training strategies, cost-effectiveness

Part 2: A taste of the training “From Metadata to a Web of Data”

Part 3: Structured feedback session Can you help us make this better?

DLF Forum, Nov. 2, 2011 2

Page 3: Metadata Training for Staff and Librarians for the New Data Environment

Why Are We Doing This?

Increasing frustration with webinars Not particularly good for anything but introductions Very few opportunities for interaction or follow-up

One day seminars at various institutions and conferences also seems limited in terms of participation

‘Older’ model of repeatable workshops (with a group of trainers) is still useful if tweaked Better opportunities for participation and learning

DLF Forum, Nov. 2, 2011 3

Page 4: Metadata Training for Staff and Librarians for the New Data Environment

Goals

Offer direct training for libraries in a format that encourages participatory learning Building on the successful library workshop

model is one option

Encourage other library organizations and conference planners to include training options in their regular meetings Generally requires members to lobby for

workshops, pre-conferences, etc.

DLF Forum, Nov. 2, 2011 4

Page 5: Metadata Training for Staff and Librarians for the New Data Environment

Part I: Intro to Metadata

Questions: Do we have a shared understanding of

metadata What are some of the practical definitions and

modes of thinking that you can use in practice?

What is the basis for understanding the technology context of today’s data?

DLF Forum, Nov. 2, 2011 5

Page 6: Metadata Training for Staff and Librarians for the New Data Environment

Intro to Metadata

What is metadata? not: data about data

Instead: Data with a purpose constructed (human-made, artificial) constructive (designed for a purpose, not theoretical) computable (all metadata today will be used by computer

applications as well as managed and understood by humans)

DLF Forum, Nov. 2, 2011 6

Page 7: Metadata Training for Staff and Librarians for the New Data Environment

Exercise 1: Data With a Purpose

Each group has a book on the table. What metadata is needed for: A warehouse that will ship books to bookstores A brick-and-mortar bookstore that orders

books, displays and sells them An online bookstore that will take orders and

ship books to customers

Look over your lists—it will cost you $1 for every metadata field you create. If you use this field in your operation, you get back the $1 Have you changed your mind?

DLF Forum, Nov. 2, 2011 7

Page 8: Metadata Training for Staff and Librarians for the New Data Environment
Page 9: Metadata Training for Staff and Librarians for the New Data Environment

Part II: Understanding DATA

Goals: Understand the difference between data and

text by thinking about computability Learn some basic data types Recognize data types in library data

DLF Forum, Nov. 2, 2011 9

Page 10: Metadata Training for Staff and Librarians for the New Data Environment

Standard Data Types

Text – ‘text’ (we know this one!)

Defined data types: Date (& time) Currency Numbers (integers, etc.)

Controlled lists: finite sets of values to use Languages (ISO) Countries (ISO)

DLF Forum, Nov. 2, 2011 10

Page 11: Metadata Training for Staff and Librarians for the New Data Environment

Why Data?

Enables machine processing of amounts of data too large for humans to grasp (which is just about all of our information) processing across patron files, or bibliographic

database processing on retrieved sets (e.g. extracting facets)

Enables libraries to move beyond ‘artisanal metadata’ towards more efficient and cost-effective assignment of tasks to humans and machines Comes with new sources of data and new

collaborations

DLF Forum, Nov. 2, 2011 11

Page 12: Metadata Training for Staff and Librarians for the New Data Environment

Data Use Examples

Making decisions If user for more than 5 years, then …  If book height greater than x, then …

Making connections These books have the same author These books have the same (or similar topic) These CDs have the same orchestra This place of publication has lat/long info and

can be located on a map

DLF Forum, Nov. 2, 2011 12

Page 13: Metadata Training for Staff and Librarians for the New Data Environment
Page 14: Metadata Training for Staff and Librarians for the New Data Environment

Things: What Your Metadata Talks About

Book

Author

Place

Person (in subject)

Historical period

All of these exist outside your metadata, and are independent of it You can talk about these ‘things’ in many different contexts

If you assign them identifiers that can be shared with others, then you have a ‘thing’ or entity Things become points of connection between metadata

descriptions (e.g., all books by the same author)

DLF Forum, Nov. 2, 2011 14

Page 15: Metadata Training for Staff and Librarians for the New Data Environment

Strings: Limited Connections

Metadata statements using strings don’t represent (to machines) something outside the metadata They aren’t linkable to other things or strings They often can’t be effectively parsed by machines

Transcribed data in traditional library metadata is often ‘strings’ Titles are good examples

Some strings are intended to identify something else (controlled author names, for instance) but may be used for display as well

DLF Forum, Nov. 2, 2011 15

Page 16: Metadata Training for Staff and Librarians for the New Data Environment

Exercise 2: Things & Strings

Start with a simple file

Each group has a ‘record’ (BBC, etc.—not MARC)

A general description is provided of the purpose of the data

Tasks: Pick out the strings and things in your example Bonus points: any data types? Reporting by groups and discussion

DLF Forum, Nov. 2, 2011 16

Page 17: Metadata Training for Staff and Librarians for the New Data Environment
Page 18: Metadata Training for Staff and Librarians for the New Data Environment

Identifiers

Uniquely identify a variety of resources On the web they use http and domain names

Advantages Language independent Display independent Unambiguous

Usage should be oriented towards machines, hidden from humans Humans have different requirements

DLF Forum, Nov. 2, 2011 18

Page 19: Metadata Training for Staff and Librarians for the New Data Environment

Identifiers: What They Identify

Easier to attach an identifier than understand what it actually identifies ISBN – identifies publisher’s product LCCN – identifies LC-created metadata; \=

ISBN even though may have very similar metadata to publisher’s

DOI – identifies item in DOI system, but may link to a general sales page

DLF Forum, Nov. 2, 2011 19

Page 20: Metadata Training for Staff and Librarians for the New Data Environment

Identifiers must …

Be unique within a domain (private db; web)

Be consistent (identifier must always ID the same thing; DO NOT RE-USE!)

Be persistent (must live as long as thing it identifies)

Be in a standard format

DLF Forum, Nov. 2, 2011 20

Page 21: Metadata Training for Staff and Librarians for the New Data Environment

Note on “Consistent”

The same thing may have more than one identifier – this happens naturally in the creation of metadata. It’s not a huge problem as long as you have a way of saying that:

A = B

… so that you can bring together the identifiers for the same thing. (cf. VIAF; also xISBN)

This is the basis for mapping between vocabularies so that metadata can be more easily re-used

DLF Forum, Nov. 2, 2011 21

Page 22: Metadata Training for Staff and Librarians for the New Data Environment

Identifier Readability

Opaque: no meaning to the identifier, ex.: LCCN example (just a number)

Readable: makes sense to a human, ex.: Wikipedia page IDs (include page name or partial page name)

Can be both: system can add readable bit to opaque identifier, ex.: Open Library thing IDs

Choices here are controversial, and have a big impact on multilingual efforts

DLF Forum, Nov. 2, 2011 22

Page 23: Metadata Training for Staff and Librarians for the New Data Environment
Page 24: Metadata Training for Staff and Librarians for the New Data Environment

The Open World

Assumption“The open world

assumption (OWA) is used in knowledge

representation to codify the informal notion that

in general no single agent or observer has complete knowledge, and therefore cannot

make the closed world assumption.”

--Wikipedia

DLF Forum, Nov. 2, 2011 24

Page 25: Metadata Training for Staff and Librarians for the New Data Environment

Things with relationships to other things

Thing ThingRelationship

DLF Forum, Nov. 2, 2011 25

Page 26: Metadata Training for Staff and Librarians for the New Data Environment

Things with relationships to other things

Thing ThingRelationship

Subject Predicate (verb) Object

DLF Forum, Nov. 2, 2011 26

Page 27: Metadata Training for Staff and Librarians for the New Data Environment

object can be URI or "string"URI is a thing

some examples:

book -- has author – [lcname#]book -- has author -- "John Doe"

Subject and Predicate Must be URIs

DLF Forum, Nov. 2, 2011 27

Page 28: Metadata Training for Staff and Librarians for the New Data Environment

[diagram that shows this -- i have a slide]

Page 29: Metadata Training for Staff and Librarians for the New Data Environment
Page 30: Metadata Training for Staff and Librarians for the New Data Environment

Triples or Graphs?

Machines work with triples Statements about the same thing have the

same subject

Graphs are easier for humans to understand In libraries we’re not used to visualizing data as

graphs More used to databases, files, hierarches

Making this new world work for us is as much about changing how we think as it is changing what we do

DLF Forum, Nov. 2, 2011 30

Page 31: Metadata Training for Staff and Librarians for the New Data Environment

DLF Forum, Nov. 2, 2011 31

http://milicicvuk.com/blog/2011/10/04/the-web-is-just-a-bunch-of-trees-plus-shorcuts/

“Graph Thinking”

Graph relationships are different than tree relationships …

Page 32: Metadata Training for Staff and Librarians for the New Data Environment

Exercise 3: Statements

Present a set of triples and ask participants to turn them into sentences Ex.: Book has title ‘Moby Dick’ Ex.: Book has author [lcna] or ‘Herman Melville’ Ex.: Author has death date XXXX

Suggest participants try drawing graphs to represent statements with the same subject

Suggest that participants represent how ‘strings’ create dead ends and ‘things’ can be linked

DLF Forum, Nov. 2, 2011 32

Page 33: Metadata Training for Staff and Librarians for the New Data Environment

Exercise 4: Statements

Give each group a web page with a description

Ask them to organize the data as statements

See if the site you are using has data for persons, subjects or places

Discussion How hard was it to find the ‘things’? Did you always have the predicates you needed? How different is this from today’s metadata?

DLF Forum, Nov. 2, 2011 33

Page 34: Metadata Training for Staff and Librarians for the New Data Environment
Page 35: Metadata Training for Staff and Librarians for the New Data Environment

Properties and Classes

Record-based metadata is often in the form of ‘records’, using elements from only one schema

Statement-based metadata is often more flexible Proper declaration, definition and management of the

elements is very important Mix and match is part of the value

Some current schemas might find the transition to from records to statements more challenging Especially where the definition of the property

depends on its place in a hierarchy (MODS and ONIX for example)

DLF Forum, Nov. 2, 2011 35

Page 36: Metadata Training for Staff and Librarians for the New Data Environment

Hierarchy (top-down organization)

A B

Military assets Pets

Guns Dogs Cats Dogs

A Military Assets Dogs ≠ B Pets Dogs

DLF Forum, Nov. 2, 2011 36

Page 37: Metadata Training for Staff and Librarians for the New Data Environment

Caveats

Unless … there is a definition of dog and it can be used in either hierarchy

But if the meaning is defined by the hierarchy, the hierarchy is part of its meaning

DLF Forum, Nov. 2, 2011 37

Page 38: Metadata Training for Staff and Librarians for the New Data Environment

Bottom-up organization

Dogs

Military assets Pets

“Dogs” has meaning on its own, and can be used in multiple contexts.

DLF Forum, Nov. 2, 2011 38

Page 39: Metadata Training for Staff and Librarians for the New Data Environment

Exercise 6: Mix & Match

Each group is assigned an entity to describe in metadata

Around the room are poster-sized depictions of various vocabularies and their definitions

Groups are instructed to study their task, determine what elements they need, then get up and look at the posters Getting up and contemplating the posters

encourages conversation! Discussion: How do you decide what’s fit for

purpose?

DLF Forum, Nov. 2, 2011 39

Page 40: Metadata Training for Staff and Librarians for the New Data Environment

Overview of Training Plan

DLF Forum, Nov. 2, 2011 40

Page 41: Metadata Training for Staff and Librarians for the New Data Environment

Feedback

Important questions as we continue to build this program Does the program plan seem useful? If not,

what’s missing? Does the content of the session seem at an

appropriate level? What could be improved?

What advice can you give about bringing this program to libraries? Is there a place for F2F training in your

budgets? Would you pay for personalized online training

for staff or local trainers?

DLF Forum, Nov. 2, 2011 41

Page 42: Metadata Training for Staff and Librarians for the New Data Environment

Slide Credits:Karen Coyle

Diane Hillmann

Contact info: [email protected]

Metadata Matters:

http://managemetadata.com/blog

DLF Forum, Nov. 2, 2011 42