21
1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for Digital Objects Professor Pat Galloway, Instructor 9 October 2006

1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

Embed Size (px)

Citation preview

Page 1: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

1

CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION

F. Miksa

The University of Texas at Austin, School of InformationINF 389K: Lifecycle Metadata for Digital Objects

Professor Pat Galloway, Instructor9 October 2006

Page 2: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

2

Topics to be Covered

1. Background Concepts related to Information Organization

2. Definitions

3. Purposes of and Special Considerations when Creating Controlled Vocabularies

4. Elements of Controlled Vocabularies

5. Sample Tools of Controlled Vocabularies

Page 3: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

3

1. General Concepts of Information Organization-I

1. Information organization—a broad term standing for the process of making information systems.

2. Information systems are created to provide access to:a. Informational objects (visual, audio, tactile) found in or with a variety of

different• material media (stone, skins, paper, celluloid, electronic, etc.)• production states (unique, reproduced; eye-readable, non-eye-readable

requiring special mechanisms for reading, etc.)• production methods (hand-created, mechanically or electronically produced,

etc.)• symbol systems (language, graphic, etc.)• genre or kinds (books, articles, poems, tracts, pictures, spoken word sound

recordings, music sound recordings, motion pictures, electronic data bases, websites, email, etc.)

b. Information inside informational objects (inside books, articles, music sound recordings, websites, databases, email, etc.) [i.e., “data strings”]

“Information” as it is used in this lecture refers to either or both of these things.

Page 4: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

4

General Concepts of Information Organization-II

Information organization systems have been the product of modern social traditions, for ex.,

a. Bibliography (15th c. +)b. Library cataloging (16th c. +)c. Indexing & abstracting (Late 19th c. +)d. Documentation (1890s-1960s)e. Archival organization (French Rev. +)f. Records organization (1900+, especially Post-WWII)g. Museum organization (19th c. +, especially 1990s +)h. Computerized Information Storage & Retrieval

(1950s+)

Convergence of information organization traditions

Page 5: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

5

General Concepts of Information Organization-III

Information organization system components:• Environments-contexts• Content• Users (needs, desires, habits in searching for information)

File types: Item files // Surrogate files (or combinations of these)

System vocabulary: the set of terms in a system available for searching and to which information is linked (i.e., each given system has its own “vocabulary” used in searching. Note!—System vs. Entry vocabulary

Terms: —Words, codes, & other metadata that represent attributes of information or

information objects (names, titles, concepts, other attributes, etc.) and by means of which that information is searched in a given system.

—Constitute metadata used in searching—Generated from or imposed on information or information objects to which

they refer

System vocabularies and traditions of Information organization has to do with how information objects have been represented

Page 6: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

6

Information Object Representation

A system’s vocabulary pertains to the attribute terms used for searching. Shall it be “natural language” (NL)—i.e., i.e., strictly as found in the information or information object—or controlled (CV) in some manner?

Page 7: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

7

Information Object Representation in Library Cataloging

Page 8: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

8

Library Cataloging—MARC TaggingSee MARC 21 Concise Format at

<http://www.loc.gov/marc/bibliographic/ecbdhome.html>

Page 9: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

9

ENCODED ARCHIVAL DESCRIPTION (EAD)

(See its TAG Library--http://www.loc.gov/ead/tglib/element_index.html)

Page 10: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

10

FULL TEXT INDEXING (Natural Language)(e.g., Google, for the term Controlled Vocabulary )

( Queensland Univ. of Tech’y )

Page 11: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

11

2. DefinitionsCONTROLLED VOCABULARY1. A controlled vocabulary is an established list of

standardized terminology for use in indexing and retrieval of information. An example of a controlled vocabulary is subject headings used to describe library resources. –(Library & Archives Canada)

2. A list of terms that have been enumerated explicitly. This list is controlled by and is available from a controlled vocabulary registration authority. All terms in a controlled vocabulary must have an unambiguous, non-redundant definition.—(ANSI/NISO Z39.19-2005)

3. “[O]rganized lists of words and phrases, or notation systems, that are used to initially tag content, and then to find it through navigation or search.” (Amy Warner)

4. “A controlled list of index terms is generally known as a controlled vocabulary or as an authority list.” (F. W. Lancaster, Vocabulary Control for Information Retrieval)

Page 12: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

12

Definitions (cont’d)

AUTHORITY CONTROL/AUTHORITY WORK1. Authority control is the means by which catalogers

maintain consistency of form, or a controlled vocabulary, in catalog headings (names, places, titles, subjects). (Moving Image Collections [MIC] website)

2. “[T]he consistent use and maintanence of the forms of names, subjects, uniform titles, etc. used as headings in a catalog.” An authority file is “a set of authority records listing the chosen form of a heading and its appropriate cross-references. Types of authority files include name authority files, series authority files, and subject authority files.” University of Buffalo Library. Central Technical Services Website

Page 13: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

13

Definitions Compared1. Controlled vocabulary is ordinarily spoken of in

terms of subject or topical indexing.2. Authority work is ordinarily spoken of in terms

of the specially created headings constructed in library catalogs, including those related to authors and titles, as well as those related to subjects (i.e., both subject headings and classification call numbers).

3. Keeping a record of CV is ordinarily done in the form of a thesaurus, whereas keeping a record of names, titles, and subject headings in authority work is ordinarily done in the form of an authority file. Most such files of the latter kind are open-ended and incomplete with respect to listing all possible terms.

Page 14: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

14

Purposes of Controlled Vocabulary (CV)

1. From the standpoint of the searcher:a. It disambiguates

• equivalent terms• homographic terms

b. It provides term relationships to aid system navigation

• for assisting in query formulation or reformulation• for searching efficiency

2. From the standpoint of the information objects & data strings to which it refers

a. It links similar or like objects and data stringsb. It gathers together similar or like objects and data

stringsIn short, CV accomplishes the act of “collocation.”

Page 15: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

15

Considerations when Creating a of Controlled Vocabulary (CV)

1. Relationship to “automatic” indexing2. Labor-intensive (even though it

represents a value-added activity of information organization)

a. Thus, expensiveb. Thus, given human work, will contain errorsc. Not all retrieval needs it

• Retrieval as “mapping”vs

• Retrieval as Question-Answer

Page 16: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

16

Elements of Controlled VocabulariesDisambiguation of Equivalent Names & Titles1. Names

a. Persons, family names• With surnames

– Single– Compound

• Forenames onlyb. Corporate body names (inc. private & public sectors)c. Geographic names

2. Titles a. Variant titlesb. Ambiguous titlesc. Constructed titles

3. Concepts/Subject terms, etc.4. Near-synonymy [synonym rings in Zeng, Kent State]For examples of names and titles, see Miksa, Kinds of

Access Points, or, his Chapter 7 (Access Points: Kinds and Forms)

Page 17: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

17

Relationships between Terms & Examples—from Zeng—Kent State4

1. Semantic Linking

2. Equivalency

3. Hierarchy

4. Associative

5. [Other]

Page 18: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

18

Structures of CVs—from Zeng—Kent State3

1. Lists

2. Synonym rings

3. Taxonomy

4. Thesaurus

Page 19: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

19

Establishing Terms and the Idea of “Warrant”

1. See Zeng—Kent State2 specifically, section 2.4

a. Literary warrant

b. User warrant

c. Organizational warrant

2. See the many writings of Claire Beghtol on the idea of “warrant”

Page 20: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

20

Sources of CV terms

1. Library of Congress a. LoC “Authorities” website

b. LoC ClassificationWeb for Library of Congress Subject Headings (LCSH) and the Library of Congress Classification System (LCC)

c. DDC website

d. LoC Cataloger’s Desktop for Subject Cataloging Manual (SCM:SH)

Page 21: 1 CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa The University of Texas at Austin, School of Information INF 389K: Lifecycle Metadata for

21

Established Thesauri & Taxonomies

1. E.g., UNESCO thesaurus2. Queensland University of Technology List

of Sources Queensland Univ. of Tech’y

3. Library and Archives Canada (Thesauri and Controlled Vocabularies—Bibliography)

4. Resource Description Framework (RDF)—Schema Web