20
IT’S ALL ABOUT THE METADATA Shana L. McDanold June 10, 2014 1

It's All About the Metadata

Embed Size (px)

DESCRIPTION

UPDATED and REPLACED with new file June 2014 Simplified presentation on library metadata evolution, the perils of not curating the metadata properly, and how it's being used "in the wild". But…it’s all on the internet and a keyword search will find it, right? Not exactly... There's been a massive change in cataloging in libraries with the rise of the internet. Everything is connected, including our metadata. Catalogers are no longer isolated, and metadata management is no longer just an internal process. Everything we do now links to the wider world of metadata, pushing libraries into re-purposing our long-held work into the new frontiers of identity management and linked data.

Citation preview

Page 1: It's All About the Metadata

1

IT’S ALL ABOUT THE METADATA

Shana L. McDanold

June 10, 2014

Page 2: It's All About the Metadata

2

WHY DOES METADATA MATTER? – GEORGE

Search: Koran Search: Quran

Page 3: It's All About the Metadata

3

WHY DOES METADATA MATTER? – ONESEARCH

Search: Koran Search: Quran

Page 4: It's All About the Metadata

4

WHY DOES METADATA MATTER? – GEORGE

Search: 9/11 Search: 9-11

Page 5: It's All About the Metadata

5

WHY DOES METADATA MATTER? – ONESEARCH

Search: 9/11 Search: 9-11

Page 6: It's All About the Metadata

6

WHY DOES METADATA MATTER? – DIGITALGEORGETOWN

Author Index Subject Index

Page 7: It's All About the Metadata

7

WHY DOES METADATA MATTER?

“This town built a memorial to the wrong guy” Ottawa, Canada

“It’s the metadata, stupid: and it’s not just for your audience” (Joshua Lasky, posted 5/21/2014) “To succeed in the digital age is to be able to easily

aggregate all of your articles in the most meaningful way for each of your visitors. Competitors such as Circa actively use metadata to surface relevant content during breaking news events.”

Page 8: It's All About the Metadata

8

WHY DOES METADATA MATTER?

What are we trying to identify? OR What are people trying to find? Works Individuals Places Things/objects Concepts

Discovery and discovery enhancement Relationships

“On the fly” collections of resources Users start elsewhere

Page 9: It's All About the Metadata

9

WHAT DO WE DO WHEN WE CURATE [CREATE] METADATA?

Create and enhance descriptive metadata Apply controlled vocabularies Disambiguation of works, authors, etc. Unique identification of editions, works, etc. Collocation of editions, works, etc. Use agreed upon standards for data

elements to ensure consistent application/use MARC DigitalGeorgetown (DublinCore) RDF (Resource Description Framework)

Page 10: It's All About the Metadata

10

HOW DO WE EXPOSE “OUR” METADATA?

Controlled vocabulary and mapping Genres Subjects/Concepts Classification Identification:

People Places/Geographic Works

OWL (Web Ontology Language) SKOS (Simple Knowledge Organization System)

Normalization Indexing

Page 11: It's All About the Metadata

11

OWL: WEB ONTOLOGY LANGUAGE Utilizes RDF (Resource Description Framework) 5.2 Individual identity

Many languages have a so-called "unique names" assumption: different names refer to different things in the world. On the web, such an assumption is not possible. For example, the same person could be referred to in many different ways (i.e. with different URI references). For this reason OWL does not make this assumption. Unless an explicit statement is being made that two URI references refer to the same or to different individuals, OWL tools should in principle assume either situation is possible.

OWL provides three constructs for stating facts about the identity of individuals:

owl:sameAs is used to state that two URI references refer to the same individual.

owl:differentFrom is used to state that two URI references refer to different individuals

owl:AllDifferent provides an idiom for stating that a list of individuals are all different.

Page 12: It's All About the Metadata

12

SKOS: SIMPLE KNOWLEDGE ORGANIZATION SYSTEM Utilizes RDF (Resource Description Framework) 2.3 Semantic Relationships

In KOSs semantic relations play a crucial role for defining concepts. The meaning of a concept is defined not just by the natural-language words in its labels but also by its links to other concepts in the vocabulary. Mirroring the fundamental categories of relations that are used in vocabularies such as thesauri [ISO2788], SKOS supplies three standard properties:

skos:broader and skos:narrower enable the representation of hierarchical links, such as the relationship between one genre and its more specific species, or, depending on interpretations, the relationship between onewhole and its parts;

skos:related enables the representation of associative (non-hierarchical) links, such as the relationship between one type of event and a category of entities which typically participate in it. Another use for skos:related is between two categories where neither is more general or more specific. Note that skos:related enables the representation of associative (non-hierarchical) links, which can also be used to represent part-whole links that are not meant as hierarchical relationships.

Page 13: It's All About the Metadata

13

CURATED METADATA IN THE WILD – LIBRARY OF CONGRESS

Library of Congress data exposed as linked data “The Library of Congress Linked Data Service

enables both humans and machines to programmatically access authority data at the Library of Congress. This service is influenced by -- and implements -- the Linked Data movement's approach of exposing and inter-connecting data on the Web via dereferenceable URIs.”

Page 14: It's All About the Metadata

14

CURATED METADATA IN THE WILD - WORLDCAT

Bibliographic records

Page 15: It's All About the Metadata

15

CURATED METADATA IN THE WILD - WORLDCAT

Google searches!

Page 16: It's All About the Metadata

16

CURATED METADATA IN THE WILD - OTHERS

Wikipedia/dbpedia

WorldCat: links to WorldCat Identities http://www.worldcat.org/identities/lccn-n79-007035/

LCCN: links to LC National Authority File (NAF) http://id.loc.gov/authorities/names/n79007035.html

VIAF record https://viaf.org/viaf/88919448/

ISNI (International Standard Name Identifier) record http://isni-url.oclc.nl/isni/0000000121429031

Page 17: It's All About the Metadata

17

CURATED METADATA IN THE WILD - OTHERS

Wikipedia/dbpedia Disambiguation

http://en.wikipedia.org/w/index.php?title=Category:All_disambiguation_pages

Identity management: John Smith http://en.wikipedia.org/wiki/John_Smith St. Mary’s Church http://en.wikipedia.org/wiki/St._

Mary%27s_Church Georgetown http://en.wikipedia.org/wiki/Georgetown Hamlet

http://en.wikipedia.org/wiki/Hamlet_(disambiguation)

Page 18: It's All About the Metadata

18

CURATED METADATA IN THE WILD - OTHERS

“MARC 21 records for CONSER serials either cataloged or processed by LC or by CONSER (Cooperative Online Serials Program) participants. Also includes records with ISSN assignments and U.S. Newspaper Program cataloging. Records include all languages. Available in MARC 21 and MARCXML formats.”

eCIP CONSER

Page 19: It's All About the Metadata

19

BUILDING CURATED METADATA: OTHER OPTIONS

Crowd sourcing Archives and Alumni

Identification of individuals for identity control Penn Provenance project

“We are trying to identify former owners and virtually reunite dispersed collections, and we welcome any information you have about the images posted here.” Incorporate data into records; establish identities

https://www.flickr.com/photos/58558794@N07

Page 20: It's All About the Metadata

20

CONCLUSION

All comes back to the basics of metadata work: DESCRIPTION COLLOCATION DISAMBIGUATION (uniquely identifiable) RELATIONSHIPS