42
知知知知知知 知知 () 知知知 (Thesaurus) 知知知 [email protected] 05/15/2003

知識組織工具(四) 索引典 (Thesaurus)

  • Upload
    roger

  • View
    125

  • Download
    0

Embed Size (px)

DESCRIPTION

知識組織工具(四) 索引典 (Thesaurus). 藍文欽 [email protected] 05/15/2003. Prelude. 索引典是字彙控制 (vocabulary control) 的工具之一。 索引典是索引用語及檢索詞彙的 authority list 。 索引典是由已知的概念查得代表該概念的適當用語。 [concept  term ] 索引典透過標準化詞彙的選用,使同一概念產生類聚 (grouping) 的作用。. Introduction. Thesaurus 的原義為: Treasury, Collection - PowerPoint PPT Presentation

Citation preview

Page 1: 知識組織工具(四)  索引典  (Thesaurus)

知識組織工具(四) 索引典 (Thesaurus)

藍文欽 [email protected]

05/15/2003

Page 2: 知識組織工具(四)  索引典  (Thesaurus)

Prelude 索引典是字彙控制 (vocabulary control)

的工具之一。 索引典是索引用語及檢索詞彙的 authorit

y list 。 索引典是由已知的概念查得代表該概念的

適當用語。 [concept term] 索引典透過標準化詞彙的選用,使同一概

念產生類聚 (grouping) 的作用。

Page 3: 知識組織工具(四)  索引典  (Thesaurus)

Introduction Thesaurus 的原義為: Treasury, Collection 通常用於同義字字典。

“A book of words and their synonyms” ( Merriam-Webster’s Dictionary )“A book of words that are put in groups together according to connections between their meanings rather than in an alphabetical list.” (Longman Dictionary of Contemporary English) e.g., Roget’s Thesaurus of English Words and Phrases

1957AD – H. P. Luhn 最早以 Thesaurus 代表「主題索引用語辭典」(簡稱索引典),並以之為字彙控制的工具。(一說 Brownson 於 1957 正式使用索引典一詞)

Page 4: 知識組織工具(四)  索引典  (Thesaurus)

Definition

“The vocabulary of a controlled indexing language, formally organized so that the a priori relationships between concepts (for example as “broader” and “narrower”) are made explicit. “

(Source: Guidelines for the establishment and development of monolingual thesauri, ISO 2788:1986)

Page 5: 知識組織工具(四)  索引典  (Thesaurus)

Definition (cont.)“A thesaurus may be defined either in terms of its function or its structure. In terms of function, a thesaurus is a terminological control device used in translating from the natural language of documents, indexers or users into a more constrained “system language” (documentation language, information language). In terms of structure, a thesaurus is a controlled and dynamic vocabulary of a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge.” (Source: Unesco. The UNISIST Guidelines for the Establishment and

Development of Monolingual Thesauri)

Page 6: 知識組織工具(四)  索引典  (Thesaurus)

Definition (cont.) “A compilation of words and phrases showing

synonymous, hierarchical, and other relationships and dependencies, the function of which is to provide a standardized vocabulary for information storage and retrieval. “

“A controlled vocabulary arranges in a known order in which equivalence, homographic, hierarchical, and associative relationships among terms are clearly displayed and identified by standardized relationship indicator, which must be employed reciprocally.”(Source: Guidelines for the Construction, Format, and Management of Monolingual Thesauri Document Number, ANSI/NISO Z39.19-1993)

Page 7: 知識組織工具(四)  索引典  (Thesaurus)

Definition (cont.)A thesaurus in the field of information storage and retrieval is a list of terms and/or of other signs (or symbols) indicating relationships among these elements, provided that the following criteria hold:

(a) the list contains a significant proportion of non-preferred terms and/or of preferred terms not used as descriptors;

(b)terminological control is intended.

(Source: Dagobert Soergel. Indexing Languages and Thesarui: Construction and Maintenance. Los Angeles: Melville, 1974. p. 38-39.)

Page 8: 知識組織工具(四)  索引典  (Thesaurus)

Definition (cont.)「就資訊儲存與檢索的範疇而言,索引典乃收集足以表示知識概念的字或詞,並將之以特定的結構加以排列,這些字彙控制了同義字,區別了同形異義字,並顯現各相關詞彙間階層及語意互屬上的各種關係,以作為索引者在分析處理資料及讀者在檢索資料時能選用一致的、經過控制的詞彙。換言之,及提供資訊儲存與檢索標準化的用語。」

( Source: 蔡明月。線上資訊檢索—理論與應用。台北:台 灣學生,民 80 。頁 177 。)

Page 9: 知識組織工具(四)  索引典  (Thesaurus)

Brief History 1959 – the Engineering Information Center of E. I. Du

pont de Nemours developed the first true thesaurus 1960 – the Armed Services Technical Information Age

ncy (ASTIA) produced the Thesaurus of ASTIA Descriptors

1961 – the American Institute of Chemical Engineers (AIChE) published the Chemical Engineering Thesaurus

1964 – the Engineers Joint Council (EJC) published the Thesaurus of Engineering Terms

1967 – Thesaurus of Engineering and Scientific Terms (TEST)

Page 10: 知識組織工具(四)  索引典  (Thesaurus)

Brief History (cont.) 1967 – the Committee on Scientific and Technical I

nformation (COSATI) published the first set of guidelines for thesaurus construction

1970 – Unesco Guidelines for the Establishment and Development f Monolingual Scientific and Technical Thesaurus

1974 – ANSI (American National Standards Institute) Z39.19 [a US national standard for thesaurus construction]

1974 – the first international standard for thesaurus construction – ISO 2788

Page 11: 知識組織工具(四)  索引典  (Thesaurus)

Purposes and Use of Thesauri

“Its purposes are to promoted consistency in the indexing of documents, predominantly for postcoordinated information storage and retrieval systems, and to facilitate searching by linking entry terms with descriptors” (ANSI Z39.19-1993, p. 38)

Four principal purposes are served by a thesaurus:a) Translation. To provide a means for translating the natural langu

age of authors, indexers, and users into a controlled vocabulary used for indexing and retrieval.

b) Consistency. To promote consistency in the assignment of index terms.

c) Indication of Relationships. To indicate semantic relationships among terms.

d) Retrieval To serve as a searching aid in retrieval of documents.(ANSI Z39.19-1993, p. 1)

Page 12: 知識組織工具(四)  索引典  (Thesaurus)

Vocabulary Control

The need to control the formation and use of terms stems mainly from two basic features of natural language:Synonyms – different terms representing the same conceptPolysemes – a word with multiple meanings [in spoken language, polysemes are homonyms; in written language, they are homographs – terms with the same spelling representing different concepts. Only the latter is relevant to thesauri.]

Page 13: 知識組織工具(四)  索引典  (Thesaurus)

Vocabulary Control (cont.)

Vocabulary control in a thesaurus is achieved through three principal means:

a) the delineation of the scope, or meaning, of descriptors Scope Note (SN)

b) the linking of synonymous and nearly (quasi) synonymous terms through equivalence relationship USE and UF

c) the disambiguation of homographs Qualifier(Source: ANSI Z39.19-1993, p. 1)

Page 14: 知識組織工具(四)  索引典  (Thesaurus)

Structure and Relationships An intrinsic feature of a thesaurus is its ability to distingui

sh and display the structural relationships between the terms it contains.

There are two broad types of relationships within a thesaurus: Micro Level – the semantic links between individual te

rms Macro level – how the terms and their inter-relationshi

ps relate to the overall structure of the subject field(Source: J. Aitchison, A. Gilchrist, & D. Bawden. Thesaurus Construct

ion and Use: A Practical Manual. 3rd ed. London: Aslib, 1997. P. 47)

Page 15: 知識組織工具(四)  索引典  (Thesaurus)

Basic Thesaural RelationshipsThree basic inter-term relationships: Equivalence: the relationship between preferred and non-pref

erred terms where two or more terms are regarded, for indexing purposes, as referring to the same concept

Hierarchical: this relationship shows levels of superordination and subordination. The superordinate term represents a class or whole, and the subordinate terms refer to its members or parts

Associative: the relationship is found between terms which are closely related conceptually but not hierarchically and are not members of an equivalence set.

( 本頁及以下關於各種 relationship 的敘述,主要參考 : Aitchison, Gilchrist, & Bawden, 1997, Section F)

Page 16: 知識組織工具(四)  索引典  (Thesaurus)

Equivalence Relationships Descriptors – Preferred terms Lead-in terms (Entry terms) – Non-preferred terms Lead-in term

USE DESCRIPTOR DESCRIPTOR

UF Lead-in term Example:

耗子 USE 老鼠 (preferred term)

老鼠 UF 耗子 (non-preferred term)

Page 17: 知識組織工具(四)  索引典  (Thesaurus)

Equivalence Relationships (cont.)

Synonyms – terms are virtually interchangeable or regarded as the same Popular names and scientific names Common nouns or scientific names, and trade names Standard names and slang Terms originating from different cultures sharing a co

mmon language (e.g., pavements/sidewalks) Competing names for emerging concepts (e.g., metadat

a 之各種中譯名 ) Current or favored term versus outdated or deprecated t

erm (e.g. dishwashers/washing-up machines)

Page 18: 知識組織工具(四)  索引典  (Thesaurus)

Equivalence Relationships (cont.)

Lexical variants – different word forms for the same expressing, such as spelling, grammatical variation, irregular plurals, direct versus indirect order, and abbreviated formats Variant spellings

e.g., moslems/muslims; mouse/mice; colour/color Direct and indirect form

e.g, academic library vs. library, academic Abbreviations and full names

e.g., ALA vs. American Library Association

Page 19: 知識組織工具(四)  索引典  (Thesaurus)

Equivalence Relationships (cont.)

Quasi-synonyms, or near-synonyms – terms whose meanings are generally regarded as different in ordinary usage, but they are treated as though they are synonyms for indexing purposes. Terms having a significant overlap

e.g., urban areas/cities gifted people/geniuses

Antonyms or terms representing different viewpoints of the same property continuume.g., dryness/wetness

equality/inequality

Page 20: 知識組織工具(四)  索引典  (Thesaurus)

Equivalence Relationships (cont.)

Upward posting (generic posting) – This is a technique which treats narrower terms as if they are equivalent to, rather than a species of, their broader terms. The effect is to reduce the size of the vocabulary. SOCIAL CLASS

UF Elite Middle class

Working class …… Elite

USE SOCIAL CLASS

Page 21: 知識組織工具(四)  索引典  (Thesaurus)

Hierarchical Relationships

The relationship is reciprocal and is set out in a thesaurus using the following conventions:

BT (Broader Term)NT (Narrower Term)e.g.,Public Libraries

BT LibrariesLibraries

NT Academic LibrariesChildren’s LibrariesPublic Libraries ……

Page 22: 知識組織工具(四)  索引典  (Thesaurus)

Hierarchical Relationships (cont.)

Generic/species relationship – identifies the link between a class or category and its members or species (e.g., Bird / Robin)

Whole/part relationship Systems and organs of the body (e.g., 消化系統 /

胃 ) Geographical location (e.g., Taipei / Ta-an District) Discipline or field of study (e.g., Chemistry / Organi

c chemistry) Hierarchical social structure (e.g., army and its rank

system)

Page 23: 知識組織工具(四)  索引典  (Thesaurus)

Hierarchical Relationships (cont.) Instance relationship – a general category of things and eve

nts, expressed by a common noun, and an individual instance of that category, the instance then forming a class of one which is represented by a proper name (e.g., SEAS / Pacific Ocean)

Polyhierarchical relationships – the relationship between the term and its two or more superordinate terms is said to be polyhierarchical.

NURSES HEALTH ADMINISTRATORS NT Nurse Administrators NT Nurse Administrators

NURSES ADMINISTRATORS BT Health administrators Nurses

Page 24: 知識組織工具(四)  索引典  (Thesaurus)

Associative Relationships The relation is reciprocal, and is distinguished by

the abbreviation “RT” (Related Terms)

e.g.,

TEACING

RT Teaching aids

TEACHING AIDS

RT Teaching

Page 25: 知識組織工具(四)  索引典  (Thesaurus)

Associative Relationships (cont.)

Two types of associative relationship: Terms belonging to the same category (e.g., motorcycle /

bicycle) Terms belonging to different categories

Whole-part (e.g., buildings / doors) A discipline and the objects studied (e.g., ethnography /

primitive societies) An operation or process and the agent or instrument (e.g.,

motor racing / racing cars) An occupation and the person in that occupation (e.g.,

accountancy / accountants) An action and the product of the action (e.g., publishing /

music scores)

Page 26: 知識組織工具(四)  索引典  (Thesaurus)

Associative Relationships (cont.) Terms belonging to different categories (cont.)

An action and its patient (e.g., data analysis / data) Concepts related to their properties (e.g., women / feminini

ty) Concepts linked by causal dependence (e.g., injury / accide

nts) A thing or action and its counter-agent (e.g., pests / pesticid

es) A raw material and its product (e.g., 皮革 / 皮衣 ) An action and a property associated with it (e.g., precision

measurement / accuracy) A concept and its opposite (e.g., single people / married pe

ople)

Page 27: 知識組織工具(四)  索引典  (Thesaurus)

A Sample Thesaurus Entry – from Thesaurus of ERIC Descriptors

COMPETENCY BASED EDUCATION Mar. 1980

CIJE: 884 RIE: 2881 GC: 330SN Educational system that emphasizes the specification, learning, and demonstrat

ing of those competencies (knowledge, skills, behaviors) that are of central importance to a given task, activity, or career.

UF Consequence Based EducationCriterion Referenced EducationOutput Oriented Education

NT Competency Based Teacher EducationBT EducationRT Academic Standards

AccountabilityBack to BasicsIndividualized Instruction ……

Page 28: 知識組織工具(四)  索引典  (Thesaurus)

Display Alphabetical Classified Hierarchical Permuted Keyword Index Graphical

(詳見課堂上所印發之講義)

Page 29: 知識組織工具(四)  索引典  (Thesaurus)

Planning and Design of Thesauri – Two Check Points Is a thesaurus necessary? If it is, which of the followings would be a better o

r more suitable approach? Buying Compiling Adapting

A very useful Web site to find information about thesaurus construction and use – prepared by Willpower Information http://www.willpower.demon.co.uk/thesbibl.htm

Page 30: 知識組織工具(四)  索引典  (Thesaurus)

Planning and Design of Thesauri – Information System Considerations

Subject field Type of literature/data Quantity of literature/data Language considerations System users Questions, searchers, profiles Resources available

(Source: Aitchison, Gilchrist, & Bawden, 1997, Section B)

Page 31: 知識組織工具(四)  索引典  (Thesaurus)

How to Build a Thesaurus – The Top-Down Method Convene a group of subject experts to decide on the sc

ope and broad categories of terms to be included. Use existing dictionaries and thesauri to decide on the

terms and their relationships. Review and organize the preliminary term set: decide

on preferred terms and make Use references from the variants and synonyms; and build hierarchical and associative relationships among the preferred terms.

Produce a draft thesaurus, test index and revise.

(source: http://www.asindexing.org/site/thesbuild.shtml)

Page 32: 知識組織工具(四)  索引典  (Thesaurus)

How to Build a Thesaurus – The Bottom-up Method

Develop a group of subject experts to serve as advisors; work with them to determine the scope if it is not already set.

If there is a set of representative already-indexed documents, use the index terms from this set as your preliminary term list.

If not, index a set of representative documents using free language (i.e., no vocabulary control), and take this term set as your preliminary list.

Build your thesaurus by reviewing and organizing these terms, using a variety of resources as aids, as in the top-down method.

Refer to your subject experts on terms whose meaning or usage is unclear, and for advice on which variant or synonym to prefer (or on whether two terms really are synonyms in the field).

Produce a draft thesaurus, test index, and revise. (Source: http://www.asindexing.org/site/thesbuild.shtml)

Page 33: 知識組織工具(四)  索引典  (Thesaurus)

Procedures Involved in Thesaurus Construction

Collecting terms Modifying and inventing terms Choosing preferred terms and standardizing the form of

words Establishing semantic relationships Thesaurus arrangement and display Testing and revising Thesaurus maintenance

The American Society of Indexers provides a list of thesaurus management software -- http://www.asi

ndexing.org/site/thessoft.shtml

Page 34: 知識組織工具(四)  索引典  (Thesaurus)

Standard The UNISIST Guidelines for the Establishment and Deve

lopment of Monolingual Thesauri. 2nd rev. ed. (Paris: UNESCO, 1981)

Guidelines for the establishment and development of monolingual thesauri, ISO 2788:1986

(http://www.nlc-bnc.ca/iso/tc46sc9/standard/2788e.htm) Guidelines for the establishment and development of mul

tilingual thesauri, ISO 5964: 1985

(http://www.nlc-bnc.ca/iso/tc46sc9/standard/5964e.htm)

Page 35: 知識組織工具(四)  索引典  (Thesaurus)

Standard (cont.) Guidelines for the Construction, Format, and Management

of Monolingual Thesauri Document Number, ANSI/NISO Z39.19-1993 (R1998)

(http://www.niso.org/standards/resources/Z39-19.html) Guidelines for Forming Language Equivalents: A Model B

ased on the Art & Architecture Thesaurus, prepared by International Terminology Working Group, 1999 (http://www.chin.gc.ca/Resources/Publications/Guidelines/English/index.html)

西方單一語文索引典編製標準( CNS 13224 )

Page 36: 知識組織工具(四)  索引典  (Thesaurus)

Examples 農業科技索引典 水資源索引典 立法資訊系統主題索引典 http://lis.ly.gov.tw/lghtml/cr

shelp/search.htm

食品科技索引典 科技索引典 中文教育類詞庫 (http://140.122.127.251/ttscgi/ttsweb1?@0:0:

1:ericthe::http|//140.122.127.251/edd/edd.htm@@0.57560553)

Page 37: 知識組織工具(四)  索引典  (Thesaurus)

Examples (cont.) Unesco Thesaurus: A Structured List of Descriptors

for Indexing and Retrieving Literature in the Fields of Education, Science, Social and Human Science, Culture, Communication and Information.

The Unesco: IBE Education Thesaurus Thesaurus of ERIC Descriptors Thesaurus of Sociological Research Terminology Thesaurus of Sociological Indexing Terms

Page 38: 知識組織工具(四)  索引典  (Thesaurus)

Examples (cont.) Arts and Architecture Thesaurus (http://www.getty.edu

/research/tools/vocabulary/aat/index.html) Thesaurus of Graphic Materials I: Subject Terms

(TGM I) (http://www.loc.gov/rr/print/tgm1/)

Thesaurus for Graphic Materials II: Genre and Physical Characteristic Terms (TGM II) (http://www.loc.gov/rr/print/tgm2/)

British Museum Materials Thesaurus(http://www.mda.org.uk/bmmat/matintro.htm)

Vocabulary of Basic Terms for Cataloguing Costume (http://www.mda.org.uk/costume/vbt00e.htm)

Page 39: 知識組織工具(四)  索引典  (Thesaurus)

Examples (cont.) British Museum Object Names Thesaurus Union List of Artist Names (ULAN) http://www.getty.ed

u/research/tools/vocabulary/ulan/index.html

Thesaurus of Geographic Names (TGN) http://www.getty.edu/research/tools/vocabulary/tgn/

Thesaurus of Monument Types mda Archaeological Objects Thesaurus Building Materials Thesaurus INSCRIPTION (http://www.mda.org.uk/fish/i_lists.htm)

Page 40: 知識組織工具(四)  索引典  (Thesaurus)

Examples (cont.) Macrothesaurus for Information Processing in the F

ield of Economic and Social Development Social Science and Business Microthesaurus: A Hie

rarchical List of Indexing Terms Used by NTIS Political Science Thesaurus SPINES Thesaurus: A Controlled and Structured Vo

cabulary of Science and Technology for Policy Making

Thesaurus of Psychological Index Terms

Page 41: 知識組織工具(四)  索引典  (Thesaurus)

Examples (cont.) Thesaurus of Engineering and Scientific Terms

(TEST) INSPEC Thesaurus NASA Thesaurus Thesaurus of Computing Terms Thesaurus of Scientific, Technical and Engineering

Terms International Road Research Documentation

(IRRD) Thesaurus Construction Industry Thesaurus

Page 42: 知識組織工具(四)  索引典  (Thesaurus)

Examples (cont.) ASIS Thesaurus of Information Science and Librar

ianship Thesaurus of Information Science Terminology Zoological Record Online Thesaurus Food: Multilingual Thesaurus Thesaurus of Agricultural Terms Medical Subject Headings (MeSH) The ISDD Thesaurus. Keywords Relating to Non-

Medical Use of Drugs and Drug Dependence