17
Journal of Library Metadata, 11:83–99, 2011 Copyright © Taylor & Francis Group, LLC ISSN: 1938-6389 print / 1937-5034 online DOI: 10.1080/19386389.2011.570662 Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries MINGYU CHEN and MICHELE REILLY Digital Services Department, University of Houston Libraries, Houston, Texas, USA The University of Houston Digital Library (UHDL) provides access to collections of digital materials related to the institutional memory of the university and to areas connected to its teaching, research, and cultural missions. Recently, a variety of image archives have been processed and preserved. This article demonstrates the de- velopment of preservation metadata strategies at UHDL and the preservation of Metadata Encoding Transmission Standard (METS) records generated from customized “7train” based on Dublin Core (DC) descriptive metadata and NISO Metadata for Images in XML Schema (MIX) technical metadata using two open-source software tools (JHOVE and 7train). We are able to produce complete METS records for digital objects preserved. KEYWORDS digital libraries, preservation, metadata, XML, METS, DC, MIX Ever since the University of Houston Digital Library (UHDL) was founded in 2009 it has been a platform to provide infrastructure for digital col- lections reflecting the history of the University of Houston, the City of Houston, and the State of Texas, as well as other historically and cultur- ally significant materials related to the University’s teaching and research mission. A comprehensive digital library program is being developed that will provide our students, faculty, and the greater community a rich and exciting environment of digital resources and knowledge. The goal of UHDL is also Address correspondence to Mingyu Chen, 601 Experian Parkway, Allen, TX 75013, USA. E-mail: [email protected] 83

Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

  • Upload
    michele

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Journal of Library Metadata, 11:83–99, 2011Copyright © Taylor & Francis Group, LLCISSN: 1938-6389 print / 1937-5034 onlineDOI: 10.1080/19386389.2011.570662

Implementing METS, MIX, and DC forSustaining Digital Preservation at the

University of Houston Libraries

MINGYU CHEN and MICHELE REILLYDigital Services Department, University of Houston Libraries, Houston, Texas, USA

The University of Houston Digital Library (UHDL) provides access tocollections of digital materials related to the institutional memoryof the university and to areas connected to its teaching, research,and cultural missions. Recently, a variety of image archives havebeen processed and preserved. This article demonstrates the de-velopment of preservation metadata strategies at UHDL and thepreservation of Metadata Encoding Transmission Standard (METS)records generated from customized “7train” based on Dublin Core(DC) descriptive metadata and NISO Metadata for Images in XMLSchema (MIX) technical metadata using two open-source softwaretools (JHOVE and 7train). We are able to produce complete METSrecords for digital objects preserved.

KEYWORDS digital libraries, preservation, metadata, XML, METS,DC, MIX

Ever since the University of Houston Digital Library (UHDL) was foundedin 2009 it has been a platform to provide infrastructure for digital col-lections reflecting the history of the University of Houston, the City ofHouston, and the State of Texas, as well as other historically and cultur-ally significant materials related to the University’s teaching and researchmission.

A comprehensive digital library program is being developed that willprovide our students, faculty, and the greater community a rich and excitingenvironment of digital resources and knowledge. The goal of UHDL is also

Address correspondence to Mingyu Chen, 601 Experian Parkway, Allen, TX 75013, USA.E-mail: [email protected]

83

Page 2: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

84 M. Chen and M. Reilly

to trigger more activities and collaborations throughout the UH campus,ensuring that research and scholarship materials are collected and preservedin digital form. UHDL considers balancing the need for collecting materials,with access to the materials and the preservation of the materials as thebackbone of our efforts.

This deeper preservation effort demands greater resources—monetary,staff hours, and technology—than simple collection and access demand,which makes automated processing of metadata imperative. In order to sup-port UHDL’s strategic preservation plan the metadata group has been tryingto explore a better and more productive way to acquire preservation meta-data automatically.

This article demonstrates the development of preservation metadatastrategies at UHDL and the preservation of Metadata Encoding Transmis-sion Standard (METS) records generated from customized 7train based onDublin Core (DC) descriptive metadata and NISO Metadata for Images inXML Schema (MIX) technical metadata using two open-source software tools(JHOVE and 7train). Using these tools we are able to produce complete METSrecords for digital objects preserved automatically.

PROBLEM STATEMENT

A number of studies have been conducted in combining multiple schema,such as METS, MODS (Metadata Object Description Schema), and MIX, whileimplementing preservation. Most of the studies have been focused on MODSbecause MODS can effectively get around the deficiencies in DC and “offers aricher set of approximately 80 elements which allows a much greater degreeof precision” (Gartner, 2008). Dappert and Enders (2008) adopted METS,MODS, and PREMIS (Preservation Metadata Implementation Strategies) toarchive electronic journals. They concluded that the three standards are suit-able for complex cases like e-journals. They stated that METS provides arobust and flexible way to define digital objects, while MODS provides waysto describe objects and PREMIS provides ways to describe objects and pro-cesses that are essential for digital preservation.

In Dulock and Cronin’s study (2009), METS, MODS, and MIX schemawere adopted to preserve metadata for a digitized map collection project.They asserted that one of the strengths of METS was that is could “facil-itate the association, organization, and collocation of technical, structural,descriptive, provenance, and behavioral metadata for digital objects withina single METS wrapper.” At the time their article was written, METS creationtools were barely available commercially. METS tools were also not avail-able through the open source marketplace. This forced them to develop ahomegrown system as a platform to process the dispersed metadata schema

Page 3: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Implementing METS, MIX, and DC 85

and combine them. Dulock and Cronin’s descriptive metadata is embeddedin the METS records instead of pointing to external descriptive metadata.

On the METS official website (http://www.loc.gov/standards/mets/mets-registry.html), various institutions have implemented METS and registeredon the “METS Implementation Registry” page. Most implementations arebased on a homegrown system and on adopting existing metadata standardsthat the institutions have already used to describe the digital objects.However, none of them tried to modify existing software tools to betterprocess preservation metadata.

The Digital Services Department (UHDS) is a relatively small departmentwithin the University of Houston Libraries. The mission of the department isto provide access to digital objects within the digital library, facilitate ingestof electronic theses and dissertations into the institutional repository, andensure enduring long-term and stable storage of digital objects. Balancingthese initiatives against available resources presents a variety of challenges.One such challenge is adopting a metadata schema that adequately describesall the digital objects in our preservation storage and is also easy to implementwith a limited staff.

The situation at the UHDL is not unique. More and more digital repos-itories are confronted with the puzzle of how to provide a rich metadataschema that can be transformed to another one in the future, if necessary,with limited resources. This paper explains how the UHDL developed anautomated system to address this problem.

UHDL PRESERVATION

The University of Houston Digital Library is a member of the Texas Digital Li-brary (TDL) consortium. TDL provides 160 terabytes of archival storage spaceto TDL member institutions. The space is allotted to TDL members based onmembership tiers and constitutes the “dark archive” for the TDL PreservationNetwork. The UHDS has been actively collaborating with TDL to preservedigital archival data within the University of Houston, especially some recenthistorical image archives. Below is a description of our workflow:

• Preserve original TIFF image files• Preserve METS records generated from customized “7train” software• Copy both TIFF image files as well as their corresponding METS records

to “dark archive” provided by TDL on a prearranged schedule determinedby the UHDL

• TDL transfers data to the Texas Advanced Computing Center (TAC) forreplication and multigeographic disbursement

Page 4: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

86 M. Chen and M. Reilly

PRESERVATION METADATA STANDARDS METS/DC/MIX

It is easy for descriptive metadata to meet access requirements but it doesnot completely meet the preservation requirements. According to Lavoieand Gartner (2005), “A preservation metadata schema will include descrip-tive, structural, and administrative metadata elements.” Preservation metadataneeds to provide additional higher level structural information about a digitalobject than descriptive metadata, such as file format, system requirements,and hardware and software tools used to process the digital asset. Adminis-trative metadata includes technical, rights, provenance, and source informa-tion. Since preservation is an information management process, preservationmetadata is categorized mainly as administrative metadata. Therefore, whenthe system is migrated or file formats become obsolete, useful information isnot lost.

Dublin Core, the most widely used descriptive metadata schema amongdigital libraries, is descriptive metadata. It defines a resource and describesits content using 15 basic elements (http://dublincore.org/). Dublin Coremetadata set is defined by ANSI/NISO and is extensible and flexible for use.The UHDL uses Dublin Core for its access metadata. However, descriptivemetadata is not enough for preservation purposes. Just like Hunter andChoudhury (2003) claimed: “Such metadata standards are important to avoida completely chaotic repository of information on the web, they do notguarantee continual long-term access to digital resources.”

Gartner (2008) illustrated that “the choice of standards for technicalmetadata will inevitably depend on the type of files that make up the digitalobject.” Most recent digital preservation projects in the UHDL have focusedon historical image archives. Metadata for Digital Still Images in XML (MIX)standard defines file formats, file proprieties, and technical characteristics. Itprovides ways to describe objects and processes that are essential for digitalpreservation, making MIX fit for the present objects housed within the UHDL.MIX is maintained for NISO by the Network Development and MARC Stan-dards Office of the Library of Congress (http://www.loc.gov/standards/mix/).Table 1 shows the MIX metadata profile adopted by UHDL.

Besides descriptive and technical metadata schema, we also need astructural metadata schema that includes both descriptive and technical meta-data into one record. The need for integrating these metadata standards isessential for the digital library development of preservation strategies. “Al-though none of these schemas on its own provides all of the elementsneeded to meet the requirements of a comprehensive digital library meta-data environment, taken together they do form a viable system of this kind”(Gartner, 2008). METS (The Metadata Encoding and Transmission Standard)is applied to encode metadata via a standardized XML schema and is ableto handle all types of metadata that are relevant to preservation: descriptive,administrative, and technical/structural metadata. A METS record includes

Page 5: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Implementing METS, MIX, and DC 87

TABLE 1 UHDL MIX Metadata Profile

Elements Example

Source Information Source type PhotographSource width Provide the width of the archival TIFF

as shown in Microsoft OfficeDocument Imaging.

Source width unit InchesSource height Provide the height of the archival TIFF

as shown in Microsoft OfficeDocument Imaging.

Source height unit InchesCapture Information Capture device Scanner

Scanner manufacturer EpsonScanner model name Epson ExpressionScanner model number 10000XLScanner model serial

numberAJR0005717

Maximum opticalresolution

12800 × 12800 dpi

Scanner sensor UndefinedScanning software name Adobe Photoshop CS2Scanning software

version number9.0.2

all the information about a digital object, and it has been widely adoptedas a standard schema in academic libraries. Dulock and Cronin (2009) alsoillustrated that METS allows academic libraries to access, reuse, recondition,or repackage digital objects and metadata.

METS, DC, and MIX are all using the same Extensible Metadata Language(XML) syntax. XML enables the data exchangeability and interoperability,making it possible to implement the blending of DC records and MIX recordsinto one METS record.

IMPLEMENTATION OF METADATA PRESERVATION AT UHDL

Descriptive Metadata: Exporting DC Records CONTENTdm

UHDL selected CONTENTdm (developed and managed by OCLC (OnlineComputer Library Center)) as the content management system for ingest andaccess of digital items.

Dublin Core metadata schema is used to describe digital contents inCONTENTdm. CONTENTdm allows for the output of three different for-mats of Dublin Core XML files: “Standard Dublin Core XML”, “CONTENTdmStandard XML,” and “Custom XML.” Figure 1 is a screenshot captured fromCONTENTdm admin module showing how to output Dublin Core records to

Page 6: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

88 M. Chen and M. Reilly

FIGURE 1 Output Dublin Core records to standard XML files in CONTENTdm.

standard XML files. The 7train software, open-source software designedby Erick Hetzner and Paul Fogel from the California Digital Library (CDL)and selected as the automation tool, requires CONTENTdm standard XMLas the input in order to generate METS output records (http://seventrain.sourceforge.net/). All page-level metadata should be extracted as the inputfor the 7train software tool so that a complete record can be generated.

Technical Metadata: Generating MIX Records With JHOVE

Since most preservation repositories will have to deal with huge volumes ofdigital content and data, metadata creation should be automated as much aspossible to enhance productivity. The UHDL metadata team explored JHOVE(http://hul.harvard.edu/JHOVE/) for the possibility of processing technicalmetadata automatically. JSTOR (Journal Storage) is a nonprofit organizationhelping academic communities use digital technologies to preserve scholarlyrecords and to advance research and teaching in sustainable ways. JHOVEwas designed and developed by JSTOR and the Harvard University Library.JHOVE can capture technical aspects of digital items with regard to objectstorage ingest, access, and preservation. Moreover, JHOVE can automati-cally generate MIX records, which describe technical information about how

Page 7: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Implementing METS, MIX, and DC 89

FIGURE 2 User interface for JHOVE.

digital objects have been processed. JHOVE supports a wide variety of dataformats of digital objects such as GIF, JPEG, JPEG 2000, TIFF, PDF, UTF-8,WAVE, XML and HTML. Figure 2 shows the user interface for JHOVE whenextracting a MIX record from a scanned image file. In this example, a TIFFfile format is selected.

According to our current digital imaging workflow, the digital teamdecides what kind of tools to use to convert physical items into digitalformat based on the characteristics of the materials to be digitized. For mostcollections, we select more than one tool. For example, collections thatcontain photographs will be scanned on a flatbed scanner and pamphletsmight be scanned on a Bookeye scanner. Either way, scanner information(manufacturer, mode, etc.) is stored in the scanned TIFF files and is ultimatelyextracted by JHOVE into MIX records. Figure 3 shows part of a MIX recordwith scanner equipments and software tools information included.

Structural Metadata: Establishing METS Profile

Before combining descriptive metadata and technical metadata into one com-plete record, metadata structure for the final METS records needs to be de-fined. This is a reverse engineering problem. XSLT (Extensible StylesheetLanguage Transformations) is a technique that can transform any twodifferent metadata schema. In order to create an XSLT code for process au-tomation from MIX to METS, a sample METS record must first be provided.Such a record is called an “application profile.” It provides a solid frame-work that combines three independent and flexible standards together. The

Page 8: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

90 M. Chen and M. Reilly

<mix:ImageCaptureMetadata>

<mix:ScannerCapture>

<mix:scannerManufacturer> Image access, Inc.</mix:scannerManufacturer>

<mix:ScannerModel>

<mix:scannerModelName>Bookeye 2 Plus</mix:scannerModelName>

<mix:scannerModelNumber>GS400</mix:scannerModelNumber>

<mix:scannerModelSerialNo>[Scanner Model Serial

Number]</mix:scannerModelSerialNo>

</mix:ScannerModel>

<mix:maximumOpticalResolution>600 x 400 dpi</mix:maximumOpticalResolution>

<mix:scannerSensor>[Scanner Sensor]</mix:scannerSensor>

<mix:scanningSystemSoftware>

<mix:scanningSoftwareName>OPUS</mix:scanningSoftwareName>

<mix:scanningSoftwareVersionNo>3.1.7</mix:scanningSoftwareVersionNo>

</mix:scanningSystemSoftware>

</mix:ScannerCapture>

FIGURE 3 Scanner equipment and software tools information in MIX.

application profile offers guidance and suggests best practices for usingDC schema and MIX schema in METS wrapper. Heery and Patel (2000)emphasized the importance of application profiles because they allow theimplementers to declare how they are using standard schema and whatoutputs they are expecting. Greenberg and Severiens claimed that “an ap-plication profile includes the set of metadata elements, policies, and guide-lines defined for a particular application or implementation (Greenberg andSeveriens, 2007).

In an application profile, one can choose either to embed the MIX recordinside the METS record or to externally reference to an outside MIX recordusing xlink attribute (http://www.loc.gov/standards/mets/). The first methodis to directly insert the MIX record coming from JHOVE output and put it inthe <mets:techMD>section of the 7train output. The second method is toinclude a URL in xlink attribute that provides a path to the MIX record.

Based on our current workflow, the first method, embedding a MIXrecord in the METS document has more advantages for our data structurebecause at this juncture our technical metadata are fairly simple. However,if a more complex workflow is required because of additional technical orcopyright information, then the second method, linking to the MIX recordwith an xlink attribute, may prove more useful. The complete xlink profileis too long to be included in this article. Figure 4 is a portion of embeddedMETS application profile. The profile is based on a historical image archiveproject called “Texas City Disaster.”

Page 9: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Implementing METS, MIX, and DC 91

As can be seen in Figure 4, the DC record is embedded in the section of<mets:dmdSec>. According to the METS guideline, every <mets:dmdSec>section must have an ID attribute. The descriptive metadata elements mustbe embedded using the <mets:mdWrap> element. The <mets:mdWrap>

element must have a <mets:xmlData> child element including all the de-scriptive data.

The MIX record is embedded in the section of <mets:amdSec>. METSguidelines stipulate that technical metadata be stored in the <mets:amdSec>section. Each <mets:admSec> section is identified by its ID attribute.

<mets:dmdSec ID="DC" CREATED="2010-07-22T12:16:53.347-05:00">

<mets:mdWrap MIMETYPE="text/xml" MDTYPE="DC" LABEL="DC">

<mets:xmlData>

<dc:title>Police headquarters</dc:title>

<dc:date>1947-04-17</dc:date>

<dc:type>black-and-white photographs;</dc:type>

<dc:subject>Texas City, Texas</dc:subject>

<dc:subject>man-made disasters</dc:subject>

<dc:format>3.52 x 4.61 inches</dc:format>

<dc:source>Texas City Disaster</dc:source>

<dc:publisher>University of Houston, Special Collections</dc:publisher>

<dc:source>04/1969-035</dc:source>

<dc:format>Photograph</dc:format>

<dc:source>1/2</dc:source>

<dc:identifier>txcy002</dc:identifier>

</mets:xmlData>

</mets:mdWrap>

</mets:dmdSec>

<mets:amdSec>

<mets:techMD ID="txcy002">

<mets:mdWrap MDTYPE="OTHER" LABEL="mix">

<mets:xmlData>

<mix:mix>

<mix:BasicImageParameters>

<mix:Format>

<mix:MIMEType>image/tiff</mix:MIMEType>

<mix:ByteOrder>little-endian</mix:ByteOrder>

<mix:Compression>

<mix:CompressionScheme>1</mix:CompressionScheme>

FIGURE 4 A portion of UHDL embedded METS application profile. (Continued)

Page 10: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

92 M. Chen and M. Reilly

</mix:Compression>

<mix:PhotometricInterpretation>

<mix:ColorSpace>2</mix:ColorSpace>

<mix:ReferenceBlackWhite>0.0 255.0 0.0 255.0 0.0

255.0</mix:ReferenceBlackWhite>

</mix:PhotometricInterpretation>

<mix:Segments>

<mix:StripOffsets>3446</mix:StripOffsets>

<mix:RowsPerStrip>2665</mix:RowsPerStrip>

<mix:StripByteCounts>14518920</mix:StripByteCounts>

</mix:Segments>

<mix:PlanarConfiguration>1</mix:PlanarConfiguration>

</mix:Format>

<mix:File>

<mix:Orientation>1</mix:Orientation>

</mix:File>

</mix:BasicImageParameters>

<mix:ImageCreation>

</mix:ImageCreation>

<mix:ImagingPerformanceAssessment>

<mix:SpatialMetrics>

<mix:SamplingFrequencyUnit>2</mix:SamplingFrequencyUnit>

<mix:XSamplingFrequency>600</mix:XSamplingFrequency>

<mix:YSamplingFrequency>600</mix:YSamplingFrequency>

<mix:ImageWidth>1816</mix:ImageWidth>

<mix:ImageLength>2665</mix:ImageLength>

</mix:SpatialMetrics>

<mix:Energetics>

<mix:BitsPerSample>8,8,8</mix:BitsPerSample>

<mix:SamplesPerPixel>3</mix:SamplesPerPixel>

</mix:Energetics>

</mix:ImagingPerformanceAssessment>

</mix:mix>

</mets:xmlData>

</mets:mdWrap>

</mets:techMD>

</mets:amdSec>

FIGURE 4 (Continued)

Page 11: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Implementing METS, MIX, and DC 93

<xsl:value-of select="base-uri(/)" />

<xsl:text>_mix</xsl:text>

</xsl:variable>

<xsl:choose>

<xsl:when test="doc-available($mix-file-name)">

<mets:techMD ID="mix-{identifier[1]}">

<mets:mdWrap MDTYPE="OTHER" LABEL="mix">

<mets:xmlData>

<xsl:apply-templates

select="document($mix-file-name)"

mode="local:mix-copy"/>

</mets:xmlData>

</mets:mdWrap>

</mets:techMD>

</xsl:when>

<xsl:otherwise>

<xsl:message>

<xsl:text>Could not find MIX file named: </xsl:text>

<xsl:value-of select="$mix-file-name"/>

</xsl:message>

</xsl:otherwise>

</xsl:choose>

</mets:amdSec>

</xsl:template>

<xsl:template match="element()" mode="local:mix-copy">

<xsl:copy>

<xsl:apply-templates select="@*,node()" mode="local:mix-copy"/>

</xsl:copy>

</xsl:template>

<xsl:template match="attribute()|text()|comment()|processing-instruction()"

mode="local:mix-copy">

<xsl:copy/>

</xsl:template>

<xsl:template match="mix:ObjectIdentifier" mode="local:mix-copy"/>

</xsl:transform>

<xsl:template match="record" mode="seventrain:mets-amdSec">

<mets:amdSec>

<xsl:variable name="mix-file-name">

FIGURE 5 Additional revised XSLT source code transforming MIX record into METS record.

Page 12: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

94 M. Chen and M. Reilly

Similar to <mets:dmdSec>, the <mets:techMD> element must contain a<mets:mdWrap> element. The <mets:xmlData> element is a child elementwrapped within the <mets:mdWrap> element.

7train Customization

Once, the application profile, MIX record (technical metadata), and DCrecord (descriptive metadata) have been defined, the last step is to cus-tomize the 7train software tool for batch processing. The 7train software tool(http://seventrain.sourceforge.net/7train.html) is an XSLT 2.0-based tool forgenerating METS records from CONTENTdm standard XML files. Version 1of 7train was designed to transform CONTENTdm standard XML DC recordsinto METS records.

XSLT is an XML-based language for the transformation of one metadataschema XML document into other metadata schema XML documents. It wasdeveloped by the World Wide Web Consortium (W3C). Many digital librarytechnologies and tools have been developed using this language. With analtered XSLT code, the UHDL metadata team was able to customize 7train toautomatically combine a DC record and a MIX record into a complete METSrecord.

The original 7train XSLT code can only transform CONTENTdm standardXML files (DC records) into METS records. Figure 5 shows the additionalrevised XSLT code portion that handles the MIX record transformation. Thecode is basically inserting the MIX record into the <mets:amdSec> section sothat all the information originally in the MIX record format can be transformedinto METS records.

Output Complete METS Records From 7train

A complete METS record is included in the Appendix. This complete METSrecord is transformed from a DC record (output from CONTENTdm) and aMIX record (output from JHOVE) using customized 7train software tool witha revised XSLT code.

CONCLUSION

There are many ways to combine different metadata schema to provide acomplete preservation solution. A strategic preservation plan should be de-veloped in advance based on one’s own institution’s situation. Even thoughthere is no existing mature infrastructure and workflow to support the pro-cess, one can still achieve preservation goals by modifying existing metadataprocess tools and utilizing available resources.

Page 13: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Implementing METS, MIX, and DC 95

As demonstrated in this paper, we have been able to automate theprocess of transforming DC descriptive metadata records and MIX techni-cal metadata records into new METS metadata records with the help oftwo open source software tools: JHOVE and 7train. Currently the solutionis able to transform multiple-DC records and single-MIX record into METSrecords. Future work will focus on processing multiple-DC records and multi-MIX records so that more-complicated digital object projects can be auto-mated as well. Based on our workflow and the nature of the collections,we adopted three types of metadata in this preservation effort: descriptivemetadata, technical metadata, and structural metadata. However, technicalmetadata is only one aspect of administrative metadata. In the future, wewould also try to address preservation automation of rights metadata. In thelong term, some other work might also be done to address our local preser-vation activities in our metadata records by referring to PREMIS (Preser-vation Metadata Implementation Strategies) data dictionary (http://www.loc.gov/standards/premis/).

One of the biggest challenges in preservation automation is to developa strategic preservation metadata plan and decide how much informationwe need to record and whether the information can be accurately recorded.We will also explore and develop better automated tools to integrate intoour workflow.

ACKNOWLEDGMENTS

Part of the work described in this article was supported by ErickHetzner, who is one of the creators of 7train open source software. Thanksto his great help and support during the implementation process.

REFERENCES

Dappert, A., & Enders, M. (2008). Using METS, PREMIS and MODS for archivingeJournals. D-L Magazine, 14(9/10).

Dublin Core Metadata Initiative (DCMI) official Web Site. http://dublincore.org/Dulock, M., & Cronin, C. (2009). Providing metadata for compound digital objects:

Strategic planning for an institution’s first use of METS, MODS, and MIX. Journalof Library Metadata, 9(3), 289–304.

Gartner, R. (2008). Metadata for digital Libraries: State of the art and future directions.Retrieved from http://www.jisc.ac.uk/techwatch

Greenberg, J., & Severiens, T. (2007). DCMI-Tools: Ontologies for digi-tal application description. Retrieved from http://elpub.scix.net/data/works/att/123 elpub2007.content.pdf

Heery, R., & Patel, M. (2000). Application profiles: Mixing the matching metadataschema. Ariadne, 25. Retrieved from http://www.ariadne.ac.uk/issue25/app-profiles/

Page 14: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

96 M. Chen and M. Reilly

Hunter, J., & Choudhury, S. (2003). Implementing preservation strategiesfor complex multimedia objects. Retrieved from http://metadata.net/panic/Papers/ECDL2003 paper.pdf

Lavoie, B., & Gartner, R. (2005). Preservation metadata. A joint report of OCLC,Oxford Library Services, and the Digital Preservation Coalition (DPC), publishedelectronically as a DPC Technology Watch Report (No. 05-01). Retrieved fromhttp://www.dpconline.org/docs/reports/dpctw05-01.pdf

Metadata Encoding and Transmission Standard (METS) official Web Site. http://www.loc.gov/standards/mets/

NISO Metadata for Images in XML Schema (MIX) official Web Site. http://www.loc.gov/standards/mix//

Page 15: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Implementing METS, MIX, and DC 97

APPENDIX

A complete METS record transformed from a DC record (output fromCONTENTdm) and a MIX record (output from JHOVE) using customized7train software tool with a revised XSLT code.

<?xml version="1.0" encoding="UTF-8"?>

<mets:mets xmlns:dc="http://purl.org/dc/elements/1.1/"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xlink="http://www.w3.org/1999/xlink"

xmlns:mets="http://www.loc.gov/METS/"

xsi:schemaLocation="http://www.loc.gov/METS/

http://www.loc.gov/standards/mets/mets.xsd"

OBJID="ghp001-cc.tif "

LABEL="Waves on Seawall 4:30-5:00 p.m."

TYPE="image"

PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">

<mets:metsHdr CREATEDATE ="2010-09-08T16:49:53.382-05:00"

LASTMODDATE ="2010-09-08T16:49:53.382-05:00"

RECORDSTATUS ="NEW"

ID="d1953">

<mets:agent ROLE="EDITOR" TYPE="ORGANIZATION">

<mets:name>University of Houston Digital Library</mets:name>

<mets:note>Record created by conversion of CONTENTdm XML metadata</mets:note>

<mets:note>Created using 7train</mets:note>

</mets:agent>

</mets:metsHdr>

<mets:dmdSec ID="DC" CREATED="2010-09-08T16:49:53.382-05:00">

<mets:mdWrap MIMETYPE="text/xml" MDTYPE="DC" LABEL="DC">

<mets:xmlData>

<dc:title>Waves on Seawall 4:30-5:00 p.m. </dc:title>

<dc:creator>Frazier, Rex Dunbar [photographer]; </dc:creator>

<dc:format>Waves on the Galveston Seawall between 4:30 and 5:00 p.m. before the hurricane. </dc:format>

<dc:date>1915-08-16</dc:date>

<dc:format>1 photograph</dc:format>

<dc:language>English</dc:language>

<dc:type>black-and-white photographs;</dc:type>

<dc:subject>natural disasters</dc:subject>

<dc:subject>hurricanes</dc:subject>

<dc:subject>seawalls</dc:subject>

<dc:source>Galveston 1915 Hurricane Photographs, 1915

"http://archon.lib.uh.edu/index.php?p=collections/controlcard&amp;id=295</dc:source>

<dc:source>Special Collections, University of Houston Libraries</dc:source>

<dc:source>ID 04/1997-002, Box 1997-002, Folder 1</dc:source>

<dc:rights>This image is in the public domain and may be used freely. If publishing in print, electronically, or on a

website, please use the citation: "Courtesy of Special Collections, University of Houston Libraries." To order a higher resolut ion

reproduction, see http://info.lib.uh.edu/services/sca/rphotographs.html</dc:rights>

<dc:format>electronic</dc:format>

<dc:type>still image</dc:type>

<dc:format>image/tiff</dc:format>

<dc:source>reformatted digital</dc:source>

Page 16: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

98 M. Chen and M. Reilly

<dc:identifier>ghp001-cc.tif </dc:identifier>

</mets:xmlData>

</mets:mdWrap>

</mets:dmdSec>

<mets:amdSec>

<mets:techMD ID="mix-ghp001-cc.tif ">

<mets:mdWrap MDTYPE="OTHER" LABEL="mix">

<mets:xmlData>

<mix:mix xmlns:mix="http://www.loc.gov/mix/" xsi:schemaLocation="http://www.loc.gov/mix/">

<mix:BasicImageParameters>

<mix:Format>

<mix:MIMEType> image/tiff</mix:MIMEType>

<mix:ByteOrder>little-endian</mix:ByteOrder>

<mix:Compression>

<mix:CompressionScheme>1</mix:CompressionScheme>

</mix:Compression>

<mix:PhotometricInterpretation>

<mix:ColorSpace>2</mix:ColorSpace>

<mix:ReferenceBlackWhite>0.0 255.0 0.0 255.0 0.0 255.0</mix:ReferenceBlackWhite>

</mix:PhotometricInterpretation>

<mix:Segments>

<mix:StripOffsets>240</mix:StripOffsets>

<mix:RowsPerStrip>2126</mix:RowsPerStrip>

<mix:StripByteCounts>22310244</mix:StripByteCounts>

</mix:Segments>

<mix:PlanarConfiguration>1</mix:PlanarConfiguration>

</mix:Format>

<mix:ScanningSystemCapture>

<mix:ScanningSystemHardware>

<mix:ScannerManufacturer>Epson</mix:ScannerManufacturer>

<mix:ScannerModel>

<mix:ScannerModelName>Expression</mix:ScannerModelName>

<mix:ScannerModelNumber>836XL</mix:ScannerModelNumber>

<mix:ScannerModelSerialNo>AJR0005717</mix:ScannerModelSerialNo>

<mix:maximumOpticalResolution>800 dpi</mix:maximumOpticalResolution>

</mix:ScannerModel>

</mix:ScanningSystemHardware>

<mix:ScanningSystemSoftware>

<mix:ScanningSoftware>Adobe Photoshop CS2</mix:ScanningSoftware>

<mix:ScanningSoftwareVersionNo>9.0.2</mix:ScanningSoftwareVersionNo>

</mix:ScanningSystemSoftware>

</mix:ScanningSystemCapture>

<mix:File>

<mix:Orientation>1</mix:Orientation>

</mix:File>

</mix:BasicImageParameters>

<mix:ImageCreation/>

Page 17: Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries

Implementing METS, MIX, and DC 99

<mix:ImagingPerformanceAssessment>

<mix:SpatialMetrics>

<mix:SamplingFrequencyUnit>2</mix:SamplingFrequencyUnit>

<mix:XSamplingFrequency>600</mix:XSamplingFrequency>

<mix:YSamplingFrequency>600</mix:YSamplingFrequency>

<mix:ImageWidth>3498</mix:ImageWidth>

<mix:ImageLength>2126</mix:ImageLength>

</mix:SpatialMetrics>

<mix:Energetics>

<mix:BitsPerSample>8,8,8</mix:BitsPerSample>

<mix:SamplesPerPixel>3</mix:SamplesPerPixel>

</mix:Energetics>

</mix:ImagingPerformanceAssessment>

</mix:mix>

</mets:xmlData>

</mets:mdWrap>

</mets:techMD>

</mets:amdSec>

<mets:fileSec ID="d1959">

<mets:fileGrp USE="thumbnail image">

<mets:file ID="d3e9091">

<mets:FLocat LOCTYPE="URL" xlink:role="thumbnail"

xlink:href="http://cdm15195.contentdm.oclc.org/cgi-

bin/thumbnail.exe?CISOROOT=/p15195coll5&amp;CISOPTR=114" />

</mets:file>

</mets:fileGrp>

<mets:fileGrp USE="reference image">

<mets:file ID="d3e9095">

<mets:FLocat LOCTYPE="URL" xlink:role="access"

xlink:href="http://cdm15195.contentdm.oclc.org/cgi-

bin/showfile.exe?CISOROOT=/p15195coll5&amp;CISOPTR=114" />

</mets:file>

</mets:fileGrp>

</mets:fileSec>

<mets:structMap>

<mets:div ID="d1960" DMDID="DC" LABEL="Waves on Seawall 4:30-5:00 p.m." >

<mets:div ID="d1961" TYPE="thumbnail image">

<mets:fptr ID="d1962" FILEID="d3e9091"/>

</mets:div>

<mets:div ID="d1963" TYPE="reference image">

<mets:fptr ID="d1964" FILEID="d3e9095"/>

</mets:div>

</mets:div>

</mets:structMap>

</mets:mets>