Upload
john-mclain
View
221
Download
2
Tags:
Embed Size (px)
Citation preview
November 22, 2003 DASER Conference. Copyright MIT, 2003 1
METS: Metadata Encoding & Transmission Standard
November 22, 2003 DASER Conference. Copyright MIT, 2003 2
Part One: Problem definition
November 22, 2003 DASER Conference. Copyright MIT, 2003 3
Digital (Library) Objects
• Reformatted to digital• scanned photographs, books and journals• digitized audio/video files
• “Born digital”• TEI-encoded texts• digital images, audio, video files• GIS, statistical datasets• interactive content
November 22, 2003 DASER Conference. Copyright MIT, 2003 4
Digital (Library) Objects
• Simple Objects– single files, e.g.
• visual TIFF images• MP3 files• TEI-encoded text
– objects stand alone • no relationships to other objects
November 22, 2003 DASER Conference. Copyright MIT, 2003 5
Digital (Library) Objects
• Complex Objects– multiple related files, e.g.
– page images from books or articles– multiple channels in digital audio files– related sound and text files (multimedia)– statistical dataset and codebook
– objects cannot stand alone• multiple files required to interpret the
object• requires structural metadata to model
November 22, 2003 DASER Conference. Copyright MIT, 2003 6
Structural metadata
• Maps physical files (digital assets) to logical items (complex digital objects)
• Examples– Scanned print material
• complex publication structures (e.g. journals runs)
• ordered relationship between digital page images
– A/V material• multiple resolutions of an image• multiple channels of an audio file
November 22, 2003 DASER Conference. Copyright MIT, 2003 7
Structural metadata
• Examples, continued– Multimedia presentations
• relationship between images, text, sound, video, etc. (time-based or other)
– Web sites• linkages between web pages• sitemaps
– Databases• table models and ER diagrams
November 22, 2003 DASER Conference. Copyright MIT, 2003 8
Digital (Library) Objects
• Also have other (non-structural) metadata– descriptive
• MARC, DC, FGDC, VRA core, other ontologies
– administrative• rights, provenance
– technical• format details, OAIS “representation
information”
• Standards exist or emerging for these
November 22, 2003 DASER Conference. Copyright MIT, 2003 9
Part Two: Introduction to METS
November 22, 2003 DASER Conference. Copyright MIT, 2003 10
METS Scope
• Supports– Structural metadata
• complex reformatted or born digital objects
– Metadata wrapper framework• descriptive, administrative, structural, etc.• structural required• others use namespaces to reference
“extension schemas”
November 22, 2003 DASER Conference. Copyright MIT, 2003 11
Brief History
• 1997-2001 Making Of America II project– Funded by DLF and NEH– Included Berkeley, Cornell, NYPL, Penn State,
Stanford, U of Michigan
– Designed for scanned archival collections– SGML DTD included pre-defined descriptive,
administrative, structural metadata
• February 2001 DLF workshop on structural metadata produced METS framework
November 22, 2003 DASER Conference. Copyright MIT, 2003 12
METS Header
Administrativemetadata
FileInventory
Structuremap
Descriptivemetadata
Behavioralmetadata
METS metadata “buckets”
optional
optional
optional required
optional optional
November 22, 2003 DASER Conference. Copyright MIT, 2003 13
METS metadata
• XML “extension schemas”– descriptive metadata
• Dublin Core, MARC, FGDC, VRA, etc.• Berkeley’s GDM schema (from MOA2)
– administrative/technical metadata• NISO image technical metadata• LC schemas for A/V technical metadata• Rights metadata (e.g. PRISM, XrML, etc.)• Provenance metadata
November 22, 2003 DASER Conference. Copyright MIT, 2003 14
M etad a ta R e fe ren ce M etad a ta W rap p er
D esc rip tive M etad a ta
Metadata Reference (mdRef): A link to external descriptive metadata. The type of link (URN/Handle/etc.)is included as an attribute, as is the metadata type.
Metadata Wrapper (mdWrap): Included descriptive metadata, as either binary data (Base64 encoded) or arbitrary XML using namespace mechanism. The metadata type is specified as an attribute.
METS Descriptive Metadata Section
November 22, 2003 DASER Conference. Copyright MIT, 2003 15
Tech n ica lM etad a ta
IP R ig h tsM etad a ta
S ou rceM etad a ta
P reserva tionM etad a ta
A d m in is tra tiveM etad a ta
Technical Metadata (techMD): technical metadata regarding content files
IP Rights Metadata (rightsMD): rights metadata regarding content files or primary source material
Source Metadata (sourceMD): provenance information for content files.
Preservation Metadata (preservationMD): metadata to assist in preservation of digital content
All sections use generic metadata reference and wrapper subelements.
METS Administrative Metadata Section
November 22, 2003 DASER Conference. Copyright MIT, 2003 16
e tc ., e tc ., e tc .
F ile G rou p F ile
F ile G rou p F ile
F ile In ven to ry(F ile G rou p )
File Group (fileGrp): provides mechanism for hierarchically subdividing physical files, for example by type
File (file): provides a pointer to an external file (Flocat) or includes file content internally (Fcontent) in Base64 encoding
METS File Inventory
November 22, 2003 DASER Conference. Copyright MIT, 2003 17
etc ., e tc . e tc ....
D ivis ion M E TS P o in te r F ile P o in te r
D ivis ion M E TS P o in te r F ile P o in te r
D ivis ion
S tru c tu ra l M ap
The Structural Map provides a tree structure describing the original document. Each division (div) element is a node in that tree, and can identify content files associated with that division by a METS Pointer (mptr) or a File Pointer (fptr)
METS Structural Map
November 22, 2003 DASER Conference. Copyright MIT, 2003 18
METS Pointer and File Pointer
METS Pointer (mptr): xlink to another METS file containing the content for the associated div. Useful for breaking up large objects (e.g., a journal run) into a series of smaller METS documents.
File Pointer (fptr): Identifies one or more entries in the File Inventory section containing the content for the associated div element. Can also limit the link from a div element to a portion of a content file (e.g., a segment of an audio or video file, a subarea of an image or video file, etc.).
November 22, 2003 DASER Conference. Copyright MIT, 2003 19
A rea A rea . . .
P ara lle l F iles
A rea A rea . . .
S eq u en tia l F iles
F ile P o in te r
File Pointer (fptr): Can identify a single file in File Inventory using ID/IDREF linking
Parallel/Sequential(par/seq): Allows a div to be associated with several content files that should be played/displayed in parallel (video with separate audio track file) or sequentially.
Area (area): identifiers a point, linear segment, or 2D area within content file that corresponds with associated div element.
METS File Pointer Mechanisms
November 22, 2003 DASER Conference. Copyright MIT, 2003 20
METS Area Element Attribtes
FILE: ID for File element in File InventorySHAPE: As in HTML Area elementCOORDS: As in HTML Area elementBEGIN: A start point within a file for defining
a segmentEND: An end point within a file for defining
a segmentBETYPE: Begin/End type: IDREF, Byte Offset,
or SMPTE time codeEXTENT: Length Duration of SegmentEXTYPE: Extent Type: Bytes, or SMPTE
November 22, 2003 DASER Conference. Copyright MIT, 2003 21
Structure Example
<file ID=“f1” MIMETYPE=“audio/x-wav” SEQ=“1”><Flocat LOCTYPE=“URN”>
urn:x-nyu:violet42</Flocat>
</file><div N=“5” LABEL=“Question 5”>
<fptr><seq>
<area FILE=“f1” BEGIN=00:23:17:00 END=“00:23:38:00” BETYPE=“SMPTE”>
</area><seq>
</fptr></div>
November 22, 2003 DASER Conference. Copyright MIT, 2003 22
• Created for multimedia structural encoding
• SMIL has “time-based” orientation – for playing multimedia presentations
• Very complex• May eventually be incorporated
Related standards: SMIL (W3C), MPEG-7 (ISO)
November 22, 2003 DASER Conference. Copyright MIT, 2003 23
Related standards: RDF (W3C)• Also metadata wrapper framework• Structural metadata could be
supported, but doesn’t specify how…
• Opaque to use• No element semantics provided• element names deliberately meaningless
• Originally designed for descriptive metadata
November 22, 2003 DASER Conference. Copyright MIT, 2003 24
Related standards: OAIS framework
November 22, 2003 DASER Conference. Copyright MIT, 2003 25
METS and OAIS framework
• Submission Information Package (SIP)• METS as transfer syntax
• Dissemination Information Package (DIP)
• METS as tranfer syntax• METS as input to display applications
• Archival Information Package (AIP)• METS stored internally in an archive
November 22, 2003 DASER Conference. Copyright MIT, 2003 26
Library Applications
• Digital Object transfer syntax– between systems
• enables interoperability
– between institutions• enables collection sharing
– implements OAIS SIP/DIP/AIP
November 22, 2003 DASER Conference. Copyright MIT, 2003 27
Library Applications
• Input to Digital Object delivery systems (aka “disseminators”)– Simple bit-streaming– XSL stylesheet– Custom program for complex digital
object display
November 22, 2003 DASER Conference. Copyright MIT, 2003 28
Part Three: METS Summary
November 22, 2003 DASER Conference. Copyright MIT, 2003 29
METS summary
• Descriptive/technical/administrative metadata– not defined internally– points to external standard schemas
• Dublin Core, MARC, MPEG-7, etc.• AES audio metadata
– set of “best practice” schemas being identified
November 22, 2003 DASER Conference. Copyright MIT, 2003 30
METS summary
• Structural metadata– defined internally and required– SMIL-lite
• simple support for multimedia, audio/visual
• SMIL may replace eventually
November 22, 2003 DASER Conference. Copyright MIT, 2003 31
METS summary
• Current users include• UC Berkeley (archival collections)• Harvard (scanneded print publications, e-
journals)• Library of Congress (audio/visual collections)• British Library• RLG and OCLC• EU METAe project (historic newspapers)• Michigan State (oral history collections)• Univ of Virginia (FEDORA digital objects)• more daily...
November 22, 2003 DASER Conference. Copyright MIT, 2003 32
METS summary
• Tools under development for– metadata capture– transformation– transfer– dissemination/display
• Profiles necessary for interoperation– Which extension schemas used?– How structure maps are organized…
November 22, 2003 DASER Conference. Copyright MIT, 2003 33
METS summary
• Current status– version 1.3 available from LC– editorial board in place– LC standards office for maintenance
agency– DLF and RLG underwriting
• RLG will host editorial board, offer documentation and training, develop tools
– Several extension schemas available– Opening Day in October 2004
November 22, 2003 DASER Conference. Copyright MIT, 2003 34
METS summary
• METS is not all things to all people…– Designed for local institutional application
support• Solving an immediate local problem• Common to many institutions• Flexible framework supports many institutional
situations
– Profiling necessary to interoperate• For OAIS packages• For shared tools• For other kinds of interoperation (e.g. cross
repository search)