Upload
diane-i-hillmann
View
3.335
Download
0
Embed Size (px)
DESCRIPTION
Presented at the 2011 DLF Forum in Baltimore Maryland.
Citation preview
Today’s Task
Part 1: Audiences, current training strategies, cost-effectiveness
Part 2: A taste of the training “From Metadata to a Web of Data”
Part 3: Structured feedback session Can you help us make this better?
DLF Forum, Nov. 2, 2011 2
Why Are We Doing This?
Increasing frustration with webinars Not particularly good for anything but introductions Very few opportunities for interaction or follow-up
One day seminars at various institutions and conferences also seems limited in terms of participation
‘Older’ model of repeatable workshops (with a group of trainers) is still useful if tweaked Better opportunities for participation and learning
DLF Forum, Nov. 2, 2011 3
Goals
Offer direct training for libraries in a format that encourages participatory learning Building on the successful library workshop
model is one option
Encourage other library organizations and conference planners to include training options in their regular meetings Generally requires members to lobby for
workshops, pre-conferences, etc.
DLF Forum, Nov. 2, 2011 4
Part I: Intro to Metadata
Questions: Do we have a shared understanding of
metadata What are some of the practical definitions and
modes of thinking that you can use in practice?
What is the basis for understanding the technology context of today’s data?
DLF Forum, Nov. 2, 2011 5
Intro to Metadata
What is metadata? not: data about data
Instead: Data with a purpose constructed (human-made, artificial) constructive (designed for a purpose, not theoretical) computable (all metadata today will be used by computer
applications as well as managed and understood by humans)
DLF Forum, Nov. 2, 2011 6
Exercise 1: Data With a Purpose
Each group has a book on the table. What metadata is needed for: A warehouse that will ship books to bookstores A brick-and-mortar bookstore that orders
books, displays and sells them An online bookstore that will take orders and
ship books to customers
Look over your lists—it will cost you $1 for every metadata field you create. If you use this field in your operation, you get back the $1 Have you changed your mind?
DLF Forum, Nov. 2, 2011 7
Part II: Understanding DATA
Goals: Understand the difference between data and
text by thinking about computability Learn some basic data types Recognize data types in library data
DLF Forum, Nov. 2, 2011 9
Standard Data Types
Text – ‘text’ (we know this one!)
Defined data types: Date (& time) Currency Numbers (integers, etc.)
Controlled lists: finite sets of values to use Languages (ISO) Countries (ISO)
DLF Forum, Nov. 2, 2011 10
Why Data?
Enables machine processing of amounts of data too large for humans to grasp (which is just about all of our information) processing across patron files, or bibliographic
database processing on retrieved sets (e.g. extracting facets)
Enables libraries to move beyond ‘artisanal metadata’ towards more efficient and cost-effective assignment of tasks to humans and machines Comes with new sources of data and new
collaborations
DLF Forum, Nov. 2, 2011 11
Data Use Examples
Making decisions If user for more than 5 years, then … If book height greater than x, then …
Making connections These books have the same author These books have the same (or similar topic) These CDs have the same orchestra This place of publication has lat/long info and
can be located on a map
DLF Forum, Nov. 2, 2011 12
Things: What Your Metadata Talks About
Book
Author
Place
Person (in subject)
Historical period
All of these exist outside your metadata, and are independent of it You can talk about these ‘things’ in many different contexts
If you assign them identifiers that can be shared with others, then you have a ‘thing’ or entity Things become points of connection between metadata
descriptions (e.g., all books by the same author)
DLF Forum, Nov. 2, 2011 14
Strings: Limited Connections
Metadata statements using strings don’t represent (to machines) something outside the metadata They aren’t linkable to other things or strings They often can’t be effectively parsed by machines
Transcribed data in traditional library metadata is often ‘strings’ Titles are good examples
Some strings are intended to identify something else (controlled author names, for instance) but may be used for display as well
DLF Forum, Nov. 2, 2011 15
Exercise 2: Things & Strings
Start with a simple file
Each group has a ‘record’ (BBC, etc.—not MARC)
A general description is provided of the purpose of the data
Tasks: Pick out the strings and things in your example Bonus points: any data types? Reporting by groups and discussion
DLF Forum, Nov. 2, 2011 16
Identifiers
Uniquely identify a variety of resources On the web they use http and domain names
Advantages Language independent Display independent Unambiguous
Usage should be oriented towards machines, hidden from humans Humans have different requirements
DLF Forum, Nov. 2, 2011 18
Identifiers: What They Identify
Easier to attach an identifier than understand what it actually identifies ISBN – identifies publisher’s product LCCN – identifies LC-created metadata; \=
ISBN even though may have very similar metadata to publisher’s
DOI – identifies item in DOI system, but may link to a general sales page
DLF Forum, Nov. 2, 2011 19
Identifiers must …
Be unique within a domain (private db; web)
Be consistent (identifier must always ID the same thing; DO NOT RE-USE!)
Be persistent (must live as long as thing it identifies)
Be in a standard format
DLF Forum, Nov. 2, 2011 20
Note on “Consistent”
The same thing may have more than one identifier – this happens naturally in the creation of metadata. It’s not a huge problem as long as you have a way of saying that:
A = B
… so that you can bring together the identifiers for the same thing. (cf. VIAF; also xISBN)
This is the basis for mapping between vocabularies so that metadata can be more easily re-used
DLF Forum, Nov. 2, 2011 21
Identifier Readability
Opaque: no meaning to the identifier, ex.: LCCN example (just a number)
Readable: makes sense to a human, ex.: Wikipedia page IDs (include page name or partial page name)
Can be both: system can add readable bit to opaque identifier, ex.: Open Library thing IDs
Choices here are controversial, and have a big impact on multilingual efforts
DLF Forum, Nov. 2, 2011 22
The Open World
Assumption“The open world
assumption (OWA) is used in knowledge
representation to codify the informal notion that
in general no single agent or observer has complete knowledge, and therefore cannot
make the closed world assumption.”
--Wikipedia
DLF Forum, Nov. 2, 2011 24
Things with relationships to other things
Thing ThingRelationship
DLF Forum, Nov. 2, 2011 25
Things with relationships to other things
Thing ThingRelationship
Subject Predicate (verb) Object
DLF Forum, Nov. 2, 2011 26
object can be URI or "string"URI is a thing
some examples:
book -- has author – [lcname#]book -- has author -- "John Doe"
Subject and Predicate Must be URIs
DLF Forum, Nov. 2, 2011 27
[diagram that shows this -- i have a slide]
Triples or Graphs?
Machines work with triples Statements about the same thing have the
same subject
Graphs are easier for humans to understand In libraries we’re not used to visualizing data as
graphs More used to databases, files, hierarches
Making this new world work for us is as much about changing how we think as it is changing what we do
DLF Forum, Nov. 2, 2011 30
DLF Forum, Nov. 2, 2011 31
http://milicicvuk.com/blog/2011/10/04/the-web-is-just-a-bunch-of-trees-plus-shorcuts/
“Graph Thinking”
Graph relationships are different than tree relationships …
Exercise 3: Statements
Present a set of triples and ask participants to turn them into sentences Ex.: Book has title ‘Moby Dick’ Ex.: Book has author [lcna] or ‘Herman Melville’ Ex.: Author has death date XXXX
Suggest participants try drawing graphs to represent statements with the same subject
Suggest that participants represent how ‘strings’ create dead ends and ‘things’ can be linked
DLF Forum, Nov. 2, 2011 32
Exercise 4: Statements
Give each group a web page with a description
Ask them to organize the data as statements
See if the site you are using has data for persons, subjects or places
Discussion How hard was it to find the ‘things’? Did you always have the predicates you needed? How different is this from today’s metadata?
DLF Forum, Nov. 2, 2011 33
Properties and Classes
Record-based metadata is often in the form of ‘records’, using elements from only one schema
Statement-based metadata is often more flexible Proper declaration, definition and management of the
elements is very important Mix and match is part of the value
Some current schemas might find the transition to from records to statements more challenging Especially where the definition of the property
depends on its place in a hierarchy (MODS and ONIX for example)
DLF Forum, Nov. 2, 2011 35
Hierarchy (top-down organization)
A B
Military assets Pets
Guns Dogs Cats Dogs
A Military Assets Dogs ≠ B Pets Dogs
DLF Forum, Nov. 2, 2011 36
Caveats
Unless … there is a definition of dog and it can be used in either hierarchy
But if the meaning is defined by the hierarchy, the hierarchy is part of its meaning
DLF Forum, Nov. 2, 2011 37
Bottom-up organization
Dogs
Military assets Pets
“Dogs” has meaning on its own, and can be used in multiple contexts.
DLF Forum, Nov. 2, 2011 38
Exercise 6: Mix & Match
Each group is assigned an entity to describe in metadata
Around the room are poster-sized depictions of various vocabularies and their definitions
Groups are instructed to study their task, determine what elements they need, then get up and look at the posters Getting up and contemplating the posters
encourages conversation! Discussion: How do you decide what’s fit for
purpose?
DLF Forum, Nov. 2, 2011 39
Overview of Training Plan
DLF Forum, Nov. 2, 2011 40
Feedback
Important questions as we continue to build this program Does the program plan seem useful? If not,
what’s missing? Does the content of the session seem at an
appropriate level? What could be improved?
What advice can you give about bringing this program to libraries? Is there a place for F2F training in your
budgets? Would you pay for personalized online training
for staff or local trainers?
DLF Forum, Nov. 2, 2011 41
Slide Credits:Karen Coyle
Diane Hillmann
Contact info: [email protected]
Metadata Matters:
http://managemetadata.com/blog
DLF Forum, Nov. 2, 2011 42