14
This article was downloaded by: [Newcastle University] On: 17 March 2014, At: 13:11 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Cataloging & Classification Quarterly Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/wccq20 Complementarity in Subject Metadata in Large-Scale Digital Libraries: A Comparative Analysis Oksana L. Zavalina a a Department of Library and Information Sciences , University of North Texas , Denton , Texas , USA Published online: 18 Dec 2013. To cite this article: Oksana L. Zavalina (2014) Complementarity in Subject Metadata in Large-Scale Digital Libraries: A Comparative Analysis, Cataloging & Classification Quarterly, 52:1, 77-89, DOI: 10.1080/01639374.2013.848316 To link to this article: http://dx.doi.org/10.1080/01639374.2013.848316 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Complementarity in Subject Metadata in Large-Scale Digital Libraries: A Comparative Analysis

Embed Size (px)

Citation preview

This article was downloaded by: [Newcastle University]On: 17 March 2014, At: 13:11Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Cataloging & Classification QuarterlyPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/wccq20

Complementarity in Subject Metadatain Large-Scale Digital Libraries: AComparative AnalysisOksana L. Zavalina aa Department of Library and Information Sciences , University ofNorth Texas , Denton , Texas , USAPublished online: 18 Dec 2013.

To cite this article: Oksana L. Zavalina (2014) Complementarity in Subject Metadata in Large-ScaleDigital Libraries: A Comparative Analysis, Cataloging & Classification Quarterly, 52:1, 77-89, DOI:10.1080/01639374.2013.848316

To link to this article: http://dx.doi.org/10.1080/01639374.2013.848316

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Cataloging & Classification Quarterly, 52:77–89, 2014Published with license by Taylor & FrancisISSN: 0163-9374 print / 1544-4554 onlineDOI: 10.1080/01639374.2013.848316

Complementarity in SubjectMetadata in Large-Scale Digital Libraries:

A Comparative Analysis

OKSANA L. ZAVALINADepartment of Library and Information Sciences, University of North Texas, Denton,

Texas, USA

Provision of high-quality subject metadata is crucial for organiz-ing adequate subject access to rich content aggregated by digi-tal libraries. A number of large-scale digital libraries worldwideare now generating subject metadata to describe not only indi-vidual objects but entire digital collections as an integral whole.However, little research to date has been conducted to empiricallyevaluate the quality of this collection-level subject metadata. Thestudy presented in this article compares free-text and controlled-vocabulary collection-level subject metadata in three large-scalecultural heritage digital libraries in the United States and the Euro-pean Union. As revealed by this study, the emerging best practicesfor creating rich collection-level subject metadata includes describ-ing a collection’s subject matter with mutually complementary datavalues in controlled-vocabulary and free-text subject metadata ele-ments. Three kinds of complementarity were observed in this study:one-way complementarity, two-way complementarity, and multiplecomplementarity.

KEYWORDS digital libraries, metadata quality, collection-levelmetadata, subject metadata

© Oksana L. ZavalinaReceived September 2013; revised September 2013; accepted September 2013.The author thanks developers of The European Library, American Memory, and Opening

History digital libraries for providing collection-level metadata for this analysis. Special thanksto Drs. Carole L. Palmer, Allen Renear, and Kathryn La Barre at the University of Illinois atUrbana-Champaign (USA) and Dr. Dietmar Wolfram at the University of Wisconsin (USA) forvaluable feedback on this study.

Address correspondence to Oksana L. Zavalina, Department of Library and InformationSciences, College of Information, University of North Texas, 1155 Union Circle, Denton, TX76203-5017, USA. E-mail: [email protected]

77

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

78 O. L. Zavalina

INTRODUCTION

Both cultural heritage institutions and funding agencies worldwide haveinvested intensively in digitization projects. Large-scale digital libraries nowbring together hundreds of individual digital collections produced by thesedigitization projects.

Metadata, defined as “structured data about an object that supports func-tions associated with the designated object,”1 is used in digital libraries toorganize information for effective retrieval via search and browse functions.Metadata is subdivided into two distinct kinds: controlled-vocabulary meta-data which draws values from formally maintained list of terms, and free-textmetadata which relies on natural language. Subject metadata, “informationconcerning what the resource is about and what it is relevant for,”2 is crucialfor providing subject access to information objects in digital collections andaggregations. To help achieve optimal recall and precision, the inclusionof Subject, Type, and Coverage elements are recommended3 in metadatarecords in digital libraries to accommodate different subject-related facets:topic, place, time period, language, and so on.

Metadata that describes collections as an integral whole has long beenapplied in the archival community. Many digital aggregations are now sup-plying collection-level metadata, that is “metadata providing a high-leveldescription of an aggregation of individual items,”4 as a means of providingcontext for the digital items harvested from distributed collections. However,virtually no research to date has evaluated and compared the collection-levelmetadata in digital aggregations.

In discussions of metadata, the terms “richness,” “detailed description,”“level of description,” or “quality” of metadata seem to be used interchange-ably.5,6 The three most important metadata quality criteria are accuracy, con-sistency, and completeness.7,8 Metadata accuracy is measured as the degreeto which the metadata values match characteristics of the described object.9

Metadata consistency is further subdivided into semantic and structural con-sistency.10 Semantic consistency refers to the extent to which the same valuesor elements are used for representing similar concepts, while structural con-sistency is evaluated as a degree to which the same structure is followedin representing information in certain metadata elements.11 Metadata com-pleteness is evaluated as the extent to which objects are described usingall applicable metadata elements to their full access capacity. Some of theassessment criteria used to evaluate metadata completeness12 include thenumber of metadata elements per record, practice of presenting blank (i.e.,nonpopulated but displayed) metadata elements, and utilization and selectedcharacteristics of mandatory and optional elements.

The evaluation of metadata in digital libraries, which has not yetbecome a common practice, is expected to gain more and more impor-tance to ensure metadata quality,13 and yet almost no research to date has

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

Subject Metadata in Large-Scale Digital Libraries 79

attempted to evaluate collection-level metadata. Zavalina, Palmer, Jackson,and Han14 started addressing this research gap by assessing collection-levelmetadata in the Digital Collections and Content registry of digital collectionsfunded by the Institute for Museum and Library Services. However, becausethat study focused on a single digital library, generalizabilty of its results islimited.

To produce more generalizable results, Zavalina’s study15 examined andcompared the free-text collection-level subject metadata (i.e., data values inDescription metadata element) across multiple digital libraries, and found thata variety of information about a digital collection is included in the free-textcollection-level Description metadata element. This includes both subject-specific (topical, geographic and temporal coverage, and types/genres ofobjects in a digital collection) and non-subject-specific information: title,size, provenance, collection development, copyright, audience, navigationand functionality, language of items in a digital collection, frequency of ad-ditions, institutions that host a digital collection or contribute to it, fundingsources, item creators, importance, uniqueness, and comprehensiveness ofa digital collection. The study presented in this article extends the compara-tive analysis reported in Zavalina’s study16 by comparing the data values infree-text Description and four controlled-vocabulary subject metadata fieldsin three large-scale digital libraries. The aim of this study was to empiricallytest whether (1) free-text Description element data values alone can compre-hensively describe a digital collection and controlled-vocabulary metadataelements’ data values add little to this representation or (2) a combina-tion of free-text and controlled-vocabulary subject metadata provides valueadded through more comprehensive representation than free-text metadataalone.

DATA COLLECTION AND ANALYSIS

Three large-scale cultural heritage digital libraries were selected for analysis:American Memory17 developed by the United States Library of Congress,Opening History18 developed by the University of Illinois at Urbana-Champaign, and The European Library19 that aggregates digital collectionscreated by the national libraries in the European Union. Among these threedigital libraries, only Opening History displays its entire human-readablecollection-level metadata records. American Memory and The European Li-brary keep most collection-level metadata (except for the Title and free-text Description elements) behind the scenes to support search and facetedbrowse functions. For this study, the Extensible Markup Language (XML)files with complete collection metadata records were obtained from the de-velopers of The European Library and American Memory.

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

80 O. L. Zavalina

A systematic sample of collection-level metadata records in the threedigital libraries was analyzed: 39 records from American Memory, 33 recordsfrom Opening History, and 27 records from The European Library. Theresulting 99 collection-level metadata records were subjected to detailedmanual qualitative content analysis to determine how the data values indifferent collection-level subject metadata elements within a record relate toeach other. The findings of the study are presented below.

FINDINGS

Among the digital libraries examined in this study, only The EuropeanLibrary had a noticeable proportion (19% of records) of redundancy betweenthe data values in free-text and controlled-vocabulary collection-level sub-ject metadata elements. Very little redundancy was observed in the OpeningHistory and American Memory collection-level metadata records. Examplesof redundancy include restating of identical geographic information (e.g.,“Estonia,” “Netherlands,” “Ljubljana”) in both Description and GeographicCoverage metadata elements, temporal information (e.g., “1763”) in bothDescription and Temporal Coverage, and genre information (e.g., “pho-tographs”) in both Description and Subject.

On the contrary, a significant proportion of collection metadata recordsin the sample included cases of one-way complementarity; that is, when datavalue in one collection-level subject metadata element complemented infor-mation in one or more other metadata elements by providing additional de-tails absent elsewhere. The highest occurrence of one-way complementaritybetween collection-level subject metadata elements was observed in Open-ing History. In 76% of collection metadata records analyzed in this study itwas the free-text Description metadata element that complemented informa-tion found in one or more of the controlled-vocabulary subject metadata el-ements: Subject, Geographic Coverage, Temporal Coverage, and Object Type.

As shown in Figure 1 and Figure 2, the free-text Description metadataelement data values most often complemented topical information found inthe Subject element (76% of records overall: 86% in American Memory, 76%in Opening History, and 70% in The European Library). Representative exam-ples include: “Spanish cartographer, . . . history, urbanism, public works andagriculture from a strictly geographic point of view” in Description versus“900 History and geography, 911 Historical geography” in Subject; “interiordesign, . . . homes of U.S. presidents” in Description, with these topics notmentioned in Subject; “early developments in the National Park, . . . land-scape and park facilities” in Description versus “Great Basin, Social studies,State history” in Subject. Figure 3 provides an example of collection-levelmetadata record, which includes complementing topical data values.

The Object Type metadata element was the second most often com-plemented by object-type or genre-specific information in Description data

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

Subject Metadata in Large-Scale Digital Libraries 81

83%

49% 46%

29%

52%

23%

43% 43%

Subject Object Type Temporal Coverage Geographic Coverage

free-text Description complements controlled-vocabulary subjectmetadata elementsControlled-vocabulary subject metadata elements complementfree-text Description

FIGURE 1 Complementarity between Collection-Level Subject Metadata Elements (color fig-ure available online).

values (49% overall: 70% in American Memory, 44% in The European Li-brary, and 30% in Opening History). Representative examples included:“uniform books, ego documents, photographs and sketches” in Descrip-tion versus “images” in Object Type; “digital pre-print originals and onlinepublications” in Description while Object Type field was missing; “histori-cal photographs, . . . portraits, . . . aerial shots” in Description versus “pho-tographs/slides/negatives” in Object Type; “rare books, government docu-ments, manuscripts, maps, musical scores, plays, films, and recordings” inDescription versus “software, multimedia” in Object Type. Figure 3 providesan example of a collection-level metadata record that includes complement-ing object type and genre data values.

70%

44%

15%

33%

86%70%

51%

19%

76%

30%

67%

39%

Descrip�oncomplements

Subject

Descrip�oncomplementsObject Type

Descrip�oncomplements

Temp. Coverage

Descrip�oncomplementsGeo. Coverage

The European Library American Memory Opening History

FIGURE 2 Free-Text to Controlled-Vocabulary Subject Metadata Complementarity (color fig-ure available online).

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

82 O. L. Zavalina

FIGURE 3 An Example of Collection-Level Metadata Record with Highlighted Complemen-tary Data Values. This Figure Displays Partial (Truncated to Show Only Subject Metadata)Record from The Opening History (color figure available online).

Data values in the Temporal Coverage metadata element were also oftencomplemented by Description (46% overall: 67% in Opening History, 51%in American Memory, and 15% in The European Library). Representativeexamples included: “16th century, 17th century, 18th century, 19th century,20th century” in Temporal Coverage versus “Since the Eighty Years’ War” inDescription; “from 1895–1920s” in Description versus “1850–1899,1900–1929” in Temporal Coverage field.

Geographic Coverage data values were complemented by Descriptionmetadata element the least often (29% overall: 39% in Opening History, 33%in The European Library, and 19% in American Memory). Representativeexamples included: “Hispanic America . . . Spanish territories in America andOceania” in Description versus “Hispanic America” in Geographic Coverage;“Hungary or the Central European region” in Description versus machine-readable MARC country code “hu” in Geographic Coverage; “American states,the District of Columbia, and London, England” in Description versus “United

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

Subject Metadata in Large-Scale Digital Libraries 83

States” in Geographic Coverage; “Baja California, Mexico in an area south-east of Ensenada” versus “Mexico (nation)” in Geographic Coverage. Figure 3provides an example of collection-level metadata record, which includescomplementing geographical data values.

Cases of data values in free-text Description metadata element com-plementing information contained in several controlled-vocabulary subjectmetadata elements in the same collection-level metadata record were also ob-served. In the example record in Figure 4, Description includes keywords thatcomplement both Subject and Object Type with topical information (“food-ways, religious traditions, Native American culture, maritime traditions, ethnicfolk culture, material culture”), genre information (“children’s lore,” “occupa-tional lore,” “performances,” “interviews,” “surveys”), and occupational sub-ject information (“musicians, craftpersons, storytellers, folklife interpreters”),

FIGURE 4 An Example Collection-Level Metadata Record: Three Kinds of Complementarity.This Figure Displays Partial (Truncated to Show Only English-Language Data Values) Recordfrom The European Library (color figure available online).

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

84 O. L. Zavalina

60%

0%

72%

56%

30%

14%3%

24%

70%

52%

67%55%

Subjectcomplements

Descrip�on

Object Typecomplements

Descrip�on

Temp. Coveragecomplements

Descrip�on

Geo. Coveragecomplements

Descrip�on

The European Library American Memory Opening History

FIGURE 5 Controlled-Vocabulary to Free-Text Metadata Complementarity (color figure avail-able online).

while also specifying the dates encoded in the Temporal Coverage field. Infact, in 22% of collection metadata records in the sample (45% in OpeningHistory, 11% in The European Library, and 10% in American Memory) theDescription field complemented two or more controlled-vocabulary subjectmetadata fields.

Data values in controlled-vocabulary subject metadata elements alsocomplemented data values in free-text Description (Figure 1). For example,in this same collection-level metadata record above (Figure 4), GeographicCoverage provided spatial information absent in Description (“United States(nation), Southern U.S. (general region), Florida (state)”), while Subject listedadditional topics (e.g., “Architecture”) not covered by Description.

The Subject metadata element was found to complement Description(Figure 1, Figure 5) the most often—in 52% of collection metadata recordsoverall (70% in Opening History, 60% in The European Library, and 30%in American Memory). Representative examples included: “860 Spanish andPortuguese literatures” in Subject when this topic was not mentioned atall in Description; “Tennessee Valley Authority, African Americans, forestry”in Subject when these topics were not mentioned at all in Description;15 specific subject strings (e.g., “North Carolina—African-Americans, NorthCarolina—Agriculture, North Carolina—Economics and Business”) in Subjectversus much broader topical and spatial coverage in Description (“NorthCarolina, . . . story of the Tar Heel State”).

The Temporal Coverage metadata element was found to complementDescription data values in 43% of collection metadata records (72% in The Eu-ropean Library and 67% in Opening History, but only 3% in American Mem-ory). Representative examples included: “1400s–1699, 1700–1799, 1800–1849,1850–1899, 1900–1929, 1930–1949, 1950–1969, 1970–1999, 2000 to present,

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

Subject Metadata in Large-Scale Digital Libraries 85

Pre-1400” in Temporal Coverage when no time information was providedin Description; “1783–1789” in Temporal Coverage when no time informa-tion was provided in Description; “1200–1900” in Temporal Coverage versus“European age of chivalry” in Description. Figure 3 provides an example ofcollection-level metadata record, which includes complementing temporaldata values.

The Geographic Coverage metadata element was found to comple-ment Description much more often than the Description complementedGeographic Coverage (Figure 1), or in 43% of collection metadata recordsoverall (56% in The European Library, 55% in Opening History, and 24%in American Memory). Representative examples included: “Poland, Lithua-nia, Ukraine, Belarus” in Geographic Coverage versus only “Poland” inDescription; “Germany” in Geographic Coverage when no geographic in-formation was provided at all in Description; “Europe, Italy, Great Britain” inGeographic Coverage versus “US and abroad” in Description; “United States(nation), Midwest U.S. (general region), Illinois (state), Randolph (county),Knox (county)” in Geographic Coverage versus “Randolph County, Illinois”in Description.

The Object Type metadata element values also often complemented in-formation found in Description in two digital libraries—Opening History(52%) and American Memory (14%)— for 23% of analyzed collection meta-data records overall. No such trend was observed in The European Library,which can be explained by inconsistent application of Object Type metadataelement in this digital library. In 59% of collection-level metadata recordsin The European Library sample the Object Type metadata element wasblank or missing, while in the remaining 41% this field contained a broadsingle-word term (e.g., “images,” “maps”). Representative examples of theObject Type metadata element values complementing Description included:“Film transparencies—Color, Cityscape photographs” in Object Type versus“photographs” in Description; “Gelatin silver prints, Safety film negatives,Nitrate negatives” in Object Type versus “original negatives and photographicprints” in Description; “books and pamphlets, photographs / slides / nega-tives, newspapers, posters and broadsides, periodicals, prints and drawings”in Object Type versus “manuscripts, photographs, ephemera and publishedmaterials” in Description.

In addition, one-way complementarity between different controlled-vocabulary metadata elements was also observed. In particular, geographicalsubdivisions (as in “Japanese Americans—California—Manzanar”) and tem-poral qualifiers (as in “World War, 1914–1918”) in Subject metadata elementincluded information that complemented Temporal Coverage and GeographicCoverage values. In Opening History, Subject complemented GeographicCoverage in 12% of collection metadata records and Temporal Coveragein 18% of the records in the sample. In American Memory and EuropeanLibrary, cases when Subject field values complemented data values in

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

86 O. L. Zavalina

Object Type metadata field were often observed—in 28% and 33% of an-alyzed collection-level metadata records, respectively.

The cases of two-way complementarity between the two collection-level subject metadata elements were less numerous than cases ofone-way complementarity. No cases of two-way complementarity were ob-served between the two or more controlled-vocabulary subject metadata el-ements. Two-way complementarity between the free-text (Description) andcontrolled-vocabulary subject metadata elements, in contrast, occurred in40% of collection-level metadata records overall. Two-way complementaritywas widespread in Opening History (79% of records), but occurred less of-ten in The European Library (41%) and significantly less often in AmericanMemory (8%). Most often two-way complementarity was observed betweenthe Description and Subject elements (29% of collection metadata recordsoverall: 58% in Opening History, 30% in The European Library, and 5%in American Memory). Two-way complementarity between Description andTemporal Coverage was observed only in Opening History (39% in Open-ing History or 13% overall). Two-way complementarity between Descriptionand Geographic Coverage was observed in 11% of the records overall: 24% inOpening History, 11% in The European Library, but in none of the AmericanMemory collection metadata records. The least overall two-way complemen-tarity was observed between Description and Object Type metadata element(7% overall: 18% in Opening History, 3% in American Memory and 0% inThe European Library). Representative examples of two-way complementar-ity included:

• “letters” in Description versus “autograph albums” in Subject (taken to-gether, the data values in two fields provide more comprehensive genreinformation).

• “dance instruction manuals, anti-dance manuals, histories, treatises onetiquette” in Description versus “Ballroom dancing—United States” inSubject (Subject information specifies Description information from“dance” to “ballroom dancing” and adds geographic coverage infor-mation, while Description adds information on specific aspects ofdancing—”etiquette“—and genre of materials in collection not coveredby any other metadata field in this record).

• “towns of Coal City, Braidwood, and Wilmington” in Description ver-sus “Illinois (state), Grundy (county)” in Geographic Coverage (state andcounty information in Geographic Coverage and town information inDescription complement each other for a more specific geographic repre-sentation).

• “contemporary, . . . European age of chivalry, . . . prior to 1900” in Descrip-tion versus “1200–1900” in Temporal Coverage (while Temporal Coveragespecifies the lower limit of the “prior to 1900” range of years—“1200”—andprovides the time frame for “European age of chivalry,” Description

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

Subject Metadata in Large-Scale Digital Libraries 87

introduces another “contemporary” time period not covered by Tempo-ral Coverage).

• “newspaper photographs” in Description versus “photographs/slides/negatives, archival finding aids” in Object Type (Description specifies genreinformation in Object Type from general “photographs” to “newspaper pho-tographs, while Object Type adds another genre not mentioned in Descrip-tion—“archival finding aids”).

DISCUSSION AND CONCLUSIONS

The findings presented in this article demonstrate high level of mutual com-plementarity between free-text and controlled-vocabulary subject metadatain collection-level metadata records in three large-scale digital libraries thataggregate cultural heritage digital collections: American Memory and Open-ing History in the United States of America, and The European Library inEurope. Quite predictably, the data values in the free-text Description meta-data element, due to its natural language values and higher length, oftencomplemented information in controlled-vocabulary subject metadata ele-ments. However, it was also observed in this study that data values incontrolled-vocabulary subject metadata elements, especially Geographic Cov-erage, complemented information encoded in Description quite often. Bothone-way complementarity and two-way complementarity were observed,with little redundancy. Results of this study empirically demonstrate thatmore detailed collection-level metadata records which include both free-textand controlled-vocabulary subject metadata allow a fuller representation ofthe intellectual content of information objects and ultimately improve subjectaccess for the users.

Completeness of metadata records—an extent to which objects aredescribed using all applicable metadata elements to their full accesscapacity—has long been emphasized as one of the most important meta-data quality criteria.20–23 Findings of the user studies conducted both decadesago, with card catalogs and early computerized library catalogs (e.g., studiessummarized by Krikelas24), and more recently, with various online informa-tion retrieval systems,25–28 demonstrate that users perceive both the free-textsubject metadata (e.g., the data values in 5XX MARC fields or Dublin CoreDescription element) and controlled-vocabulary subject metadata, such asthe data values in 6XX MARC fields or Dublin Core Subject and Cover-age metadata elements, to be among the most useful metadata to judgethe relevance of retrieved documents. Item-level metadata records in dig-ital libraries usually meet these user expectations by providing both free-text and controlled-vocabulary subject metadata. However, most newly cre-ated digital libraries limit their collection-level metadata to free-text Title andDescription elements for various reasons: lack of resources needed to create

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

88 O. L. Zavalina

detailed collection-level metadata records, limitations introduced by the de-fault settings in popular content management systems such as DSpace, oreven a belief that full-text indexing and keyword searching make controlled-vocabulary subject metadata redundant. Lack of best practice guidelines forcreation of collection-level metadata arguably contributes to this situation.

Results of this study indicate that including mutually complementarysubject information in free-text and controlled-vocabulary collection-levelmetadata elements is already a common practice among some of the large-scale digital libraries, and possibly is recognized by digital library developersas a benchmark in crafting rich collection-level metadata. The findings of thisstudy could be instrumental in developing best practice recommendationsfor creating collection-level metadata, including subject metadata, which arenot currently available. These guidelines can be incorporated in the nextedition of the Framework of Guidance for Building Good Digital Collections29

and/or the Guidelines for Digital Libraries that are currently being preparedby the International Federation of Library Associations and Institutions (IFLA)working group jointly with the World Digital Library Project.

This exploratory study focused on collection-level subject metadata indomain-specific digital libraries for one domain (aggregations of cultural her-itage digital collections that are created for history scholars, educators, andenthusiasts) and of two scales—national and international. The task of devel-oping best practice guidelines warrants more extensive content analysis ofcollection-level subject metadata, including those in domain-specific digitallibraries with a subject focus other than history (e.g., National Science DigitalLibrary) or non-domain-specific digital libraries with wide subject coverage(e.g., IMLS Digital Collections and Content Collection Registry), with dif-ferent scale (e.g., state-level aggregations such as Missouri Digital Heritageor regional-level aggregations such as Mountain West Digital Library, Doc-umenting the American South), and representing geographic areas beyondEurope and North America.

NOTES

1. Jane Greenberg, “Metadata and the World Wide Web,” in Encyclopedia of Library and Infor-mation Science, 2nd ed., ed. Miriam Drake (New York: Marcel Dekker, 2003), 1876.

2. Dagobert Soergel, “Digital Libraries and Knowledge Organization,” in Semantic Digital Li-braries, ed. S. R. Kruk and B. McDaniel (Berlin: Springer, 2009), 9–39.

3. ALCTS/CCS/SAC/Subcommittee on Metadata and Subject Analysis, “Subject Data in the Meta-data Record: Recommendations and Rationale: A Report from the ALCTS/CCS/SAC/Subcommitteeon Metadata and Subject Analysis,” http://archive.ala.org/alcts/organization/ccs/sac/metarept2.html (ac-cessed October 18, 2013).

4. George Macgregor, “Collection-Level Descriptions: Metadata of the Future?” Library Review 52,no. 6 (2003): 247–250.

5. Caroline R. Arms, “Historical Collections for the National Digital Library: Lessons andChallenges at the Library of Congress,” D-Lib Magazine, April–May 1996, http://www.dlib.org/

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4

Subject Metadata in Large-Scale Digital Libraries 89

dlib/april96/loc/04c-arms.html and http://www.dlib.org/dlib/may96/loc/05c-arms.html (accessed Octo-ber 18, 2013).

6. Erik Duval, Wayne Hodgins, Stuart Sutton, and Stuart L. Weibel, “Metadata Principles andPracticalities,” D-Lib Magazine 8, no. 4 (2002), http://www.dlib.org/dlib/april02/weibel/04weibel.html(accessed October 18, 2013).

7. Jung-Ran Park, “Metadata Quality in Digital Repositories: A Survey of the Current State of theArt,” Cataloging & Classification Quarterly 47, no. 3 (2009): 213–228.

8. Jung-Ran Park and Yuji Tosaka, “Metadata Quality Control in Digital Repositories and Col-lections: Criteria, Semantics, and Mechanisms,” Cataloging & Classification Quarterly 48, no. 8 (2010):696–715.

9. Besiki Stvilia, Les Gasser, Michael B. Twidale, and Linda C. Smith, “A Framework for Informa-tion Quality Assessment,” Journal of the American Society for Information Science and Technology 58,no. 12 (2007): 1720–1733.

10. Park, “Metadata Quality in Digital Repositories.”11. Thomas R. Bruce and Diane I. Hillmann, “The Continuum of Metadata Quality: Defining,

Expressing, Exploiting,” in Metadata in Practice, ed. Diane I. Hillmann and Elaine L. Westbrooks (Chicago,IL: American Library Association, 2004), 238–256.

12. William E. Moen, Erin L. Stewart, and Charles R. McClure, “The Role of Content Analysis inEvaluating Metadata for the U.S. Government Information Locator Service (GILS): Results from an Ex-ploratory Study,” http://www.unt.edu/wmoen/publications/GILSMDContentAnalysis.htm (accessed Oc-tober 18, 2013).

13. Diane I. Hillmann, “Metadata Quality: From Evaluation to Augmentation,” Cataloging & Clas-sification Quarterly 46, no. 1 (2008): 65–80.

14. Oksana L. Zavalina, Carole L. Palmer, Amy S. Jackson, and Myung-Ja Han, “Evaluating De-scriptive Richness in Collection-Level Metadata,” Journal of Library Metadata 8, no. 4 (2008): 263–292.

15. Oksana L. Zavalina, “Free-Text Collection-Level Subject Metadata in Large-Scale Digital Li-braries: A Comparative Content Analysis,” in Proceedings of the International Conference on Dublin Coreand Metadata Applications, ed. T. Baker, D.I. Hillmann and A. Isaac (The Hague: Dublin Core MetadataInitiative, 2011), 147–157.

16. Zavalina, “Free-Text Collection-Level Subject Metadata in Large-Scale Digital Libraries.”17. American Memory, http://memory.loc.gov (accessed October 18, 2013).18. Opening History, http://imlsdcc.grainger.uiuc.edu/history (accessed October 18, 2013).19. The European Library, http://www.theeuropeanlibrary.org (accessed October 18, 2013).20. Moen, Stewart, and McClure, “The Role of Content Analysis in Evaluating Metadata for the U.S.

Government Information Locator Service (GILS).”21. Bruce and Hillmann, “The Continuum of Metadata Quality.”22. Park, “Metadata Quality in Digital Repositories.”23. Park and Tosaka, “Metadata Quality Control in Digital Repositories and Collections.”24. James Krikelas, “Catalog Use Studies and Their Implications,” Advances in Librarianship 3

(1972): 195–220.25. Peiling Wang and Dagobert Soergel, “A Cognitive Model of Document Use during a Research

Project. Study 1. Document Selection,” Journal of the American Society for Information Science andTechnology 49, no. 2 (1998): 115–133.

26. Offer Drori, “How to Display Search Results in Digital Libraries: User Study,” in Proceedings ofthe New Developments in Digital Libraries, ed. P. T. Isaias, F. Sedes, J. C. Augusto, and U. Ultes-Nitsche(Angers, France: ICEIS Press, 2003), 13–28. http://www.global-report.com/drori/?l=he&a=3330 (accessedOctober 18, 2013).

27. Abe Crystal and Jane Greenberg, “Relevance Criteria Identified by Health Information Usersduring Web Searches,” Journal of the American Society for Information Science and Technology, 57, no.10 (2006): 1368–1382.

28. Karen Smith-Yoshimura et al., eds., Implications of MARC Tag Usage in Library Metadata Prac-tices (Dublin, OH: OCLC, 2010), http://www.oclc.org/research/publications/library/2010/2010–06.pdf(accessed October 18, 2013).

29. NISO Framework Working Group, A Framework of Guidance for Building Good Dig-ital Collections, 3rd ed. (Bethesda, MD: National Information Standards Organization, 2007),http://www.niso.org/publications/rp/framework3.pdf (accessed October 18, 2013).

Dow

nloa

ded

by [

New

cast

le U

nive

rsity

] at

13:

11 1

7 M

arch

201

4