58
IS 257 – Fall 2007 2007.04.04 - SLIDE 1 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of Information In Collections

2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 1

Thesaurus Construction and Use

University of California, BerkeleySchool of Information

IS 245: Organization of Information In Collections

Page 2: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 2

Lecture Overview

• Review– Facetted Classification

• Traditional vs. Facetted Classification• Designing Facetted Classifications

• Today– Thesaurus design– Steps in Thesaurus development– Indexing

Page 3: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 3

Hierarchical Classification

Literature

SpanishFrenchEnglish

DramaPoetryProse

18th17th16th

DramaPoetryProse

19th 18th17th16th 19th

...

... ... ...

...

Slide author: Marti Hearst

Page 4: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 4

Labeled Categories for Hierarchical Classification

• LITERATURE– 100 English Literature

• 110 English Prose– English Prose 16th Century– English Prose 17th Century– English Prose 18th Century– ...

• 111 English Poetry– 121 English Poetry 16th Century– 122 English Poetry 17th Century– ...

• 112 English Drama– 130 English Drama 16th Century– …

– 200 French LiteratureSlide author: Marti Hearst

Page 5: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 5

Facetted Categories

• Mutually exclusive– Non-overlapping, distinct categories

• Relational– Relations between facets, subfacets, and foci

(elements) are not restricted to hierarchical generalization-specialization relations

• Composable– Combined using grammars of order and

relation to form compound descriptions

Page 6: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 6

Facetted Classification Along With Labeled Categories

• A Language– a English– b French– c Spanish

• B Genre– a Prose– b Poetry– c Drama

• C Period– a 16th Century– b 17th Century– c 18th Century– d 19th Century

• Aa English Literature

• AaBa English Prose

• AaBaCa English Prose 16th Century

• AbBbCd French Poetry 19th Century

• BbCd Drama 19th Century

Slide author: Marti Hearst

Page 7: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 7

Ranganathan

• PMEST Facets– P(ersonality)

• WHO: The most important types or names of things for the particular discipline

– M(atter)• WHAT: Constituent materials

– E(nergy)• HOW: Action or activity terms

– S(pace)• WHERE: Where things occur

– T(ime)• WHEN: When things occur

Page 8: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 8

“Classical” CRG/BC2 Facet Analysis

• Entity

• Kind

• Part

• Property

• Material

• Process

• Operation

• Patient

• Product

• By-Product

• Agent

• Space

• Time

Page 9: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 9

“Classical” Facet Analysis

• What is being done?– Entity– Kind– Product– By-Product

• What are its parts?– Part

• What are its properties?– Property– Material

• How is this achieved?– Process

• By what means?– Operation

• By whom?– Agent– Patient

• Where?– Space

• When?– Time

Page 10: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 10

“Classical” Facet Analysis

• Nouns– Entity– Kind– Part– Patient– Product– By-Product– Agent

• Adjectives– Property– Material

• Intransitive Verb– Process

• Transitive Verb– Operation

• Adverb– Space– Time

Page 11: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 11

Semantic and Syntactic Relationships

• Semantic relationships– Is-A (thing/kind,

genus/species)• Mammals

– Primates

» Humans

– Has-Parts• Human

– Head

» Eyes

• Syntactic relationships– Compounds

• Wheat + harvesting = “wheat harvesting”

• Object + operation = operation on object

Page 12: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 12

Facetted Classification

• Clearly distinguishes between semantic relationships and syntactic relationships– Semantic relationships

• Within a facet• Containment relations

– Syntactic relationships• Across facets• Combinatoric relations

• Have a “syntax” for syntactic combination of semantic terms

Page 13: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 13

Power of Facet Combinations

• The syntactic relations of facetted classifications enable a small controlled vocabulary to produce– Many, many structured descriptions– Complex, but formally structured descriptions

using nested compound descriptions– Descriptions for things we do not have words

for

Page 14: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 14

Today

• More on thesaurus standards and examples

Page 15: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 15

Types of Indexing Languages

• Uncontrolled keyword indexing

• Indexing languages– Controlled, but not structured

• Thesauri– Controlled and structured

• Classification systems– Controlled, structured, and coded

• Facetted classification systems

Page 16: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 16

Thesauri

• A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among synonymous, equivalent, broader, narrower and other related terms

Page 17: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 17

Thesaurus Standards

• National and International Standards for Thesauri– ANSI/NISO z39.19-1994 — American National

Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri

– ANSI/NISO Draft Standard Z39.4-199x — American National Standard Guidelines for Indexes in Information Retrieval

– ISO 2788 — Documentation — Guidelines for the establishment and development of monolingual thesauri

– ISO 5964 — Documentation — Guidelines for the establishment and development of multilingual thesauri

Page 18: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 18

Thesaurus Examples

• Examples– Non-Facetted

• The ERIC Thesaurus of Descriptors

– Semi-Facetted • The Medical Subject Headings (MESH) of the

National Library of Medicine

– Facetted• The Art and Architecture Thesaurus

Page 19: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 19

ERIC Thesaurus – Entry

Page 20: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 20

ERIC Thesaurus – Alphabetic

Page 21: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 21

ERIC Thesaurus – KWIC Index

Page 22: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 22

ERIC Thesaurus – Hierarchies

Page 23: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 23

ERIC Thesaurus – Groups

Page 24: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 24

ERIC Thesaurus – Online

http://www.ericfacility.net/extra/pub/thessearch.cfm

Page 25: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 25

MESH – Entry

Page 26: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 26

MESH – Alphabetic

Page 27: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 27

MESH – Tree Structures

Page 28: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 28

MESH – KWOC Index

Page 29: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 29

MESH - Online

http://www.nlm.nih.gov/mesh/meshhome.html

Page 30: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 30

AAT – Facets

Page 31: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 31

AAT – Hierarchies (print)

Page 32: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 32

AAT – Hierarchies (online)

http://www.getty.edu/research/tools/vocabulary/aat/

Page 33: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 33

AAT – Entry (online)

Page 34: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 34

Lecture Overview

• Thesaurus Design and Development– Controlled Vocabularies for topical description– Thesaurus Design– Steps In Thesaurus Development (intro)

Page 35: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 35

Why Develop a Thesaurus?

• To provide a conceptual structure or “space” for a body of information– To make it possible to adequately describe

the topical content of information resources at an appropriate level of generality or specificity

– To provide enhanced search capabilities and to improve the effectiveness of searching (i.e., to retrieve most of the relevant material without too much irrelevant material)

Page 36: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 36

Why Develop a Thesaurus?

• To provide vocabulary (or terminological) control– When there are several possible terms

designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with

Page 37: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 37

Preliminary Considerations

• What is used now?– Continue using an existing thesaurus?– Ad hoc modification of existing thesaurus?– Develop a new well-structured thesaurus?

• What is the scope and complexity of the subject field?

• What kind of retrieval objects or data will be dealt with?

• How exhaustive and specific is the desired description of objects?

Page 38: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 38

Preliminary Considerations

• The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus– It is better to plan for a larger and more

comprehensive system than a smaller system that rapidly will become inadequate as the database grows

• Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists

Page 39: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 39

Development of a Thesaurus

• Term Selection.

• Merging and Development of Concept Classes.

• Definition of Broad Subject Fields and Subfields.

• Development of Classificatory structure

• Review, Testing, Application, Revision.

Page 40: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 40

1. Term Selection

• Select sources for the collection of terms.– Prearranged Sources– Open-ended Sources

• Assign codes to each source.

• Selection of terms– For part of pre-

arranged and for all open-ended sources

• Enter terms into database with all information.

Page 41: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 41

1.1 Kinds of Sources

• Prearranged Sources– Existing descriptor lists, classification schemes

thesauri. This includes universal schemes like DDC or LCSH.

– Nomenclatures of single disciplines– Treatises on the terminology of a field– Encyclopedias, lexica, dictionaries and glossaries.– Tables of contents of textbooks and handbooks– Indexes of journals or abstracting journals– Indexes of other publications in the field

Page 42: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 42

1.1 Kinds of Sources

• Open-ended sources– Lists of search requests or interest profiles– Description of projects/activities to be served by the

information retrieval system.– Discussion with specialists in the field– Sample of documents in the field

• Ask users why and how these documents relate to the field.• Have documents indexed by experts in the field

– Lists of titles of documents in the field– Abstracts and reviews of documents– Your own knowledge

Page 43: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 43

Selection of sources

• Prearranged sources require less effort in gathering the material, and may already indicate some relationships between terms and concepts and relationships among terms.

• Open-ended sources can reflect current terminology and may provide more complete coverage.

• Choose a set of sources that are current, as complete as possible, and considered authoratative.

Page 44: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 44

Selection of Sources

• Each selected source is assigned an ID for tracking its use in the development of the thesaurus.– Useful when making decisions about which

terms to prefer– Useful for backtracking when questions arise

(where did this come from?)

Page 45: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 45

Selection of Terms

• Terms can be transferred directly from prearranged sources to the recording medium (cards or database)– Have to decide which terms and references to

include, or to take the whole source

Page 46: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 46

Selection of Terms

• In open-ended sources you read through the source and pick out terms (I.e. words and phrases) that might be useful in retrieval or as references to other terms.

• Alternatively, use keyword and phrase extraction software to create lists of terms and select from those.

• Transfer selected terms to the recording medium (cards or database).

Page 47: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 47

2. Merging and Development of Concept Classes

• Sort Term DB into alphabetical order.

• First Round: Merge information for Identical terms -- possibly pulling info from additional sources.

• Second Round: Merge synonyms or terms in the same concept class.

Page 48: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 48

3. Definition of Broad Subject Fields and Subfields

• Define Broad Subject fields and sort terms into these broad fields

• Define subfields within each broad field and sort terms into these subfields.

• Work out the detailed structure– Select Preferred Terms– Merge information for

terms in the same concept class

• Repeat these steps– for each subfield within

a broad field– and for each broad field– Until all terms have

been consolidated and preferred terms selected

Page 49: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 49

4. Development of Classificatory Structure

• Produce preliminary version of classified index and update the working database.

• Improve classificatory structure

• Reality check: produce and distribute a version of the classified index. Distribute to users/experts.

Page 50: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 50

5. Final Stages

• Review

• Testing

• Application

• Revision

Page 51: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 51

Review

• Discuss classified index with users/experts. – Select descriptors and checklist descriptors.

• Assign Notational Symbols

• Produce Main Thesaurus & Indexes

Page 52: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 52

Review (cont.)

• Check cross references and insert where needed

• Produce Test Version

• Test by Indexing

• Modify as needed

• Produce Production Version.

Page 53: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 53

Testing a Thesaurus

• Assign descriptors to a sample set of NEW documents (use enough to get an idea of any gaps in the thesaurus.

• Test retrieval using sample questions and seeing how effectively the thesaurus maps to the appropriate descriptor

Page 54: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 54

Flow of Work in Thesaurus Construction

Select Sources

Assign codes

Select Terms

Record Selected Terms

Sort Terms

Merge identical Terms

Define Broad SubjectFields

Merge Terms in SameConcept class

Sort Terms into BroadSubject Fields

Define Subfields withinone Subject Field

Work out detailed structureof the Subject Field

Select Preferred Terms

All Subfields of BroadSubject finished?

All BroadSubjects finished?

Improve Class Structure

Yes

Yes

No

No

Print Classified Indexand review

Discuss with Experts andUsers

Select descriptors andchecklist items

Produce Full Thesaurusand Check references

Assign Notation

Review and Test

Many Modifications?

Based on Soergel, pp 327-333

Yes

No

Revise asneeded

Page 55: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 55

The Indexing Process

• Concept identification

• term selection (via thesaurus)

• term assignment

Page 56: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 56

Application: The Indexing Process (Manual)

Adapted from ISO 5963, p.5

IsTerm

suitable

NOSelect Alternativeterm to represent

Concept

WouldConcept be

better representedby one of

these terms

Is There

Another Concept

Consider Preferred

Term

Select Preferred

Term

Establish TermDenoting Concept

Examine Documentand Identify Significant Concepts

Consider First

Concept

PreferredTerm?

Start

NO

NO

NO

NO

YES YES YES

YES

YES

DoesThesaurus

contain termfor

Concept

Consider anyassociated terms inThesaurus (NT,BT)

Admit New TermInto Thesaurus

Can Conceptbe expressed

combining terms?

Consider Each ofThese Terms

Assign Termsto

Document

Prefer Alternative

Term(s)

End

YES

NO

Page 57: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 57

Thesaurus Revision and Updates

• There will always be new concepts, products, or expressions that need to be added to the thesaurus. – Set a regular schedule of reviews and

revisions.– Collect complaints, problems, etc. and fold

into revision of the thesaurus

Page 58: 2007.04.04 - SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of

IS 257 – Fall 2007 2007.04.04 - SLIDE 58

References

• Soegel, D. Indexing Languages and Thesauri: Construction and Maintenance. Los Angeles : Melville Publishing Co., 1974

• Foskett, A.C. The Subject Approach to Information. London: Clive Bingley, 1982.

• Standards:– ANSI/NISO z39.19--1994 -- American National Standard

Guidelines for the Construction, Format and Management of Monolingual Thesauri

– ANSI/NISO Draft Standard Z39.4-199x -- American National Standard Guidelines for Indexes in Information Retrieval

– ISO 2788 -- Documentation -- Guidelines for the establishment and development of monolingual thesauri

– ISO 5964-- Documentation -- Guidelines for the establishment and development of multilingual thesauri