77
INDEXES AND INDEXING Ma. Theresa B. Villanueva Head, Microforms and Digital Resource Center Rizal Library, Ateneo De Manila University April 15-16, 2013 James O’Brien Library-Ateneo de Naga University

INDEXES AND INDEXING Ma. Theresa B. Villanueva Head, Microforms and Digital Resource Center Rizal Library, Ateneo De Manila University April 15-16, 2013

Embed Size (px)

Citation preview

INDEXES AND INDEXING

Ma. Theresa B. VillanuevaHead, Microforms and Digital Resource Center

Rizal Library, Ateneo De Manila University

April 15-16, 2013James O’Brien Library-Ateneo de Naga University

Index

a tool, which indicates to a user the information or a source of information that one needs

2

a systematic guide designed to indicate subjects, topics, or features of documents in order to facilitate their retrieval

DEFINITION OF TERMS

Indexing

the process of identifying and assigning index terms to a document, either to describe its physical characteristics, give facts about its creator or distribution, or describe its content

3

General Purposes of Indexes

To construct representations of documents in a form that is suitable to the users to browse through

To maximize the searching success of the users

To minimize the time and effort in finding information

4

• facilitate reference to the specific material or to locate wanted information

• serve as filter to withhold irrelevant materials

• make the information storage and retrieval system useful to individual

• disclose related information

• tool for current awareness services

5

Uses of Indexes

6

By Arrangement

7

a. Alphabetical Index - is based on the orderly principle of letters of the alphabet; used for the arrangement of subheadings, cross references as well as main headings

b. Classified Index – contents are arranged systematically by classes or subject headings

c. Concordance – is in alphabetical index of all principal words appearing in a single text or in a multi-volume of a single author w/ a precise pointer to the precise point at which the word occurs.

By Physical Form

8

a) Card index – an index in which 3” x 5” cards are used as the tools

b) Printed index – a tool for indexing or for researching and retrieval of information that is in printed form

c) Microform index – index to microforms such as microfiche and microfilm

d) Computerized index – uses computers to construct indexes

By Type of Materials Index

a. Audiovisual Material Index

- textual labeling (index terms or description) is needed along with image matching

- search on words may retrieve a particular image related to the search term which in turn can be used as input to find other related entries

9

b. Book index

- a list of words or group of words arranged

alphabetically, at the back of the book giving a page location of the subject or name associated with each word.

10

Periodical Index/Newspaper Index

- open-ended projects usually performed

by group of people

- consistency is a challenging part since

each periodical issue may deal with unrelated topics by several authors

- written in different styles and aimed at different users.

11

Classified Index Entry points are arranged in a hierarchy of related topics, starting with generic or broad topics and working down to the specific ones.Examples: - Index Medicus – classified index in the field of medicines and related disciplines - Engineering Index – classified index in the field of engineering and related disciplines

Alphabetical Subject Index

an alphabetical subject index covers a number of different kinds of indexes. The arrangement is in alphabetical order and follows a familiar pattern.

Examples:- Reader’s Guide to Periodical Literature (RGPL)- Index to Philippine Periodicals (IPP)

Author IndexEntry points are names of persons, organizations, government agencies, institutions, etc.

Examples: - Development Bank of the Philippines - Philippine Chamber of Commerce and Industry - Romulo, Carlos P.

Periodicals Indexes

12

- refers to the extent to which a document is analyzed to identify its subject content

– refers to the extent to which a concept or topic in a document is identified by precise term in the hierarchy of its genus-species relations

–refers to the extent to which agreement exists on the terms to be used to index contents of documents

INDEXING PRINCIPLES

Exhaustivi

ty

Consistency

Specificity

13

Principle of Exhaustivity

• Exhaustive indexing

use of various index terms to fully cover the major and minor themes of document

•  Selective indexinguse of a few terms to cover only the main or major theme of a document

14

Exhaustivity results to high recall but low precision.

Principle of Specificity

Example:

Genus: Citrus FruitsSpecies: ORANGES

LEMONS LIMES

GRAPEFRUITS

Specificity would result to high precision but low recall 15

There are two types of consistency level:

Inter-indexer consistency refers to the agreement between or

among indexers in assigning subject terms in a particular article

16

Principle of Consistency

Intra-indexer consistency refers to the extent to which one

indexer is consistent to himself/herself on assigning subject terms.

Indexing Methods

1. Derived or derivative indexing

– a method by which words and phrases occurring in the title or text of

documentary unit are extracted by a human or computer to serve as indexing terms.

- also called an extractive indexing.

17

2. Assigned indexing

- a method by which terms, descriptors or subject headings are selected by a human or computer to represent the topics or features of a documentary unit

- assigned terms are often times taken from a

source other than the document itself.

18

Indexing Language

An indexing language is a language that is used by the indexer to

represent the subject content of a document.

19

Purposes and Uses of Indexing Language:

20

to represent the subject content of a document either using the words of the author or assigning appropriate descriptors from a controlled vocabulary

to help users discriminate between terms and reduce ambiguity in the language

Types of Indexing Language

1. Natural Language

- uses index terms/words occurring in the printed text as index entries; it is

sometimes called derived-term system

21

Characteristics of using Natural Language:

• Improves recall because it provides more access point but reduces precision

• Redundancy is greater

• Uses more current terms

• Tends to be favored by end-users

22

2. Controlled vocabulary

- represent the general conceptual

structure of one or more subject areas and presents a guide to the users of the index

- categorized as assigned-term system

23

Controlled Vocabulary provides cross references in the form of Use:

24

To show the three relationships of terms:

a) equivalenceb) hierarchical c) associative

This is achieved by providing or showing under:

broader term (BT) narrower term (NT) related terms (RT)use for (UF)

see also (SA)

Relationships of Terms:

a. Equivalence relationship - implies that there will be more than one term denoting the same concept

25

Equivalence relationship:

Example 1

Use for (UF) or Use reference (see reference)

Example: EMPLOYEES

UF: Personnel Staff Workers

- refers to a preferred descriptor from a non-usable term

26

Equivalence relationship

Example 2:

BIRTH CONTROL UF : Family Planning

- reference deals primarily with synonymous or variant forms of the preferred descriptor

- it is also used to lead the indexer to more general terms

27

Examples that indicate Equivalence relationship:

28

Synonyms (e.g. Reason; Cause)

Quasi-synonyms (e.g. Law; Law Management)

Preferred spelling (e.g. Catalog; Catalogue)

Acronyms and abbreviations (e.g. ASEAN; Association of Southeast Asian Nations)

Current and established terms (e.g. Cellular Radio; Cellular Phone)

Translation (e.g. Coconut Coir; Bunot)

b. Hierarchical relationship

– refers to the general and specific or broad and narrow type of relationship

29

Broader term (BT)

EmployeesBT : People

- shows hierarchical relationship upward in the classification ranking

- it differs from the use for reference in that both the basic terms and its broader term are descriptor

terms and both can be used

30

Hierarchical relationship Example 1 :

CatsBT: ANIMALS

"ANIMALS" is a broader term to "CATS“ because all cats are animals.

Reference: http://publish.uwo.ca/~craven/677/thesaur/main05.htm

Hierarchical relationship:

Example 2

Narrower term (NT)

Employees

NT : HOTEL EMPLOYEES RAILROAD EMPLOYEES

- reference is similar to the broader term reference, except it goes down in the classification ranking

32

Hierarchical relationship: Example 3

HeadNT : NOSE

“NOSE” might be a narrower term to “HEAD”, because noses are normally parts of heads.

Reference: http://publish.uwo.ca/~craven/677/thesaur/main05.htm

Hierarchical relationship: Example 4

Genus – species relationship (represent class

inclusion) Example: Animals Domestic Animals

Cats

Whole-part relationship Example: Hand Fingers

Instance relationship Example: Mountains Mount Apo

34

c. Associative relationship

- refers to a non-hierarchical relationship of terms

35

Example 1 :

Related term (RT)

EMPLOYEE

RT : EMPLOYMENT

- reference refers to a descriptor that can be used in addition to the basic term but not

in a hierarchical relationship

36

Associative relationship

Other Examples :

Teachers – Student Tables – Chairs Education – Teaching Men – Women

37

Associative relationship

Scope Note:

Examples: INDEXING (SN) Assigning of natural language terms

to documents

HOSPITALIZATION (SN) Assign also terms for the conditions for which patients were

hospitalized, if applicable

Qualifier: Example: Security (Law)

Security (Psychology) 38

Reference: http://publish.uwo.ca/~craven/677/thesaur/main08.htm

Scope Note (SN) & Qualifier - used to give the users about the descriptor’s usage restrictions or to clarify ambiguity; a scope note may give additional instructions to indexers

Functions of Controlled Vocabulary:

• To control synonyms by choosing one form as the standard term

• To make distinction among homographs

• To link or bring together those terms whose meaning are closely related

Example: Cereals and Wheat

• Controls variant spelling

39

40

A controlled vocabulary may take the form of verbal expressions as illustrated by Subject Headings Lists and Thesauri or coded/nonverbal expressions as shown by Classification schemes.

Subject headings lists – are lists of terms representing several subject fields; some focus on specific fields

Thesauri – are another authority devices that cover more

specific or narrower subject fields

Classification schemes – generally contain coded expression

or notations to the relevant topics in a particular class or

subclass

INDEXING GUIDELINES & PROCEDURES

Part 2

41

INDEXING PROCESS:

1. Recording of bibliographic data

- recording of the important information or the elements that identify a particular document

The International Organization for Standards (ISO) set a Standards for bibliographic references:

ISO 690 1975 (E)- “Bibliographic References

Essential and Supplementary Elements” 42

43

- When indexing contents of a collection of documents, locators should give complete information about each document.

- for periodical articles, each entry normally consists of

the following elements:

Essential elements for an article or contribution in a

periodical are:

Name(s) of Author(s) with forenamesTitle of the article Title of the periodical or SourceVolume Number Issue Number Date of the issue Page number

Example: Name(s) of Author(s): [Xian, Jie]

Title of the article : [Hybrid rice: a new hope towards a

bountiful Philippines]

Title of the periodical or Source : [Impact]

Volume Number : [46]

Issue Number : [9]

Date of the issue : [September 2007]

Page number : [4-8]44

Sample entry:

________________ (subject/Topic)

Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact, Vol. 46, no.9, S ‘12, p. 4-8.

ISO FORMAT:

46

ATENEO FORMAT:

OTHER FORMAT:

________________ (subject/Topic)

_______________ (subject/topic)

Format comparison:

_______________ (subject/topic)

ISO FORMAT:

Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact, Vol. 46, no.9, S ‘12, p. 4-8.

Hybrid rice: a new hope towards a bountiful Philippines. Xian, Jie. Impact 46 (9) : 4-8. S ‘12.

Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact 46 (9) : 4-8. S ‘12

2. Subject determination

“aboutness of the material and the formulation of a

concept list

• Choose the most appropriate concepts; consider the users & the purpose of the index

• No arbitrary limit should be set to the number of terms or descriptors which can be assigned to a document.

- it should be determined fully by the amount of information

contained in the document - it should be related to the expected needs of the users of

the index. 47

• Modify the indexing guidelines and procedures if needed; but modification should not compromise the structure or logic of the indexing language.

• Concepts should be as specific as possible. More general concepts may be preferred in some circumstances, depending upon the following factors:

– over-specificity might adversely affect the performance of the indexing system.

– if an idea is not fully developed, or is referred to only casually by the author, then it might be justified to index at a more general level

48

3. Content/Conceptual analysis

– identifying the topics discussed in a

document and determining what aspects of its users will be interested in

49

Content Analysis

- Decide which topics in the item are relevant to the potential user of the document.

- Decide which topics truly capture the content of the document.

- Determine terms that come as close as possible to the terminology use in the document.

- Decide on index terms and the specificity of those terms.

50

Parts of the document that have to be

analyzed

Title of the document/article - it is considered as basic indexing unit

- it is the first stop in determining the subject content

Abstract - actual information-packed miniature of documents;

- good abstract can be fundamental indicator of subject content

51

Text itself - includes introduction, summary, conclusion, section heading, first & last sentences of the paragraph

Illustrations, diagrams, tables and captions

References - reference sources cited by the author may also

be considered as subject indicator

52

Factors that may affect content analysis:

if there is labor shortage or other critical time factor

the guidelines and policies imposed by institutions that generally concerns with the selection of index

content

decisions of the indexer which aspects of the subjects will be emphasized and which aspects will be deemphasized

53

4. Translation

- involves the conversion of terms in the natural language into standard terms drawn from a

controlled vocabulary such as thesaurus, subject headings list, etc.

- match terms in the concept list against those available in the controlled vocabulary

 

54

Practices to follow in the Translation process:

55

- Concepts which are already translated into indexing terms should be translated into their preferred terms

- Terms which represent new concept should be checked for accuracy and acceptability from the reference tools such as:

◦ Dictionaries and encyclopedias ◦ Thesauri (UNBIS Thesaurus)◦ Classification schemes (Library of Congress)◦ Established indexes (Reader’s Guide to Periodical Literature)

- Subject specialist, particularly those with some knowledge of indexing or documentation, may also be consulted

56

- If the concepts are not found in existing thesaurus or

classification scheme, these may be:

• expressed by terms or descriptors which are admitted into indexing language

• represented temporarily by more general terms; the new concepts being proposed as candidates for later addition

Translation

- Group references to information that is scattered in the text of the document.

- Combine heading and subheadings into related multilevel headings.

- Direct the user seeking information under terms not used to those that are being used by means of see references and to related terms with see also references.

- Arrange the index into a systematic presentation

57

Generating Index Entries

Index entries maybe generated manually or using the computer.

Manual generation- involves generation of index entries one by one using an ordinary or electric typewriter

Machine generation- involves the use of the computers in generating index entries; various software packages are available

58

Indexing Techniques for Periodicals

1. Topics that can be considered for indexing are the following:

- persons - local politics - sports events - entertainment - economic news - editorials & columns

- special features - first and last events   - social trends

59

• All article that have permanent value should be indexed under all topics and issues dealt with

• Editorials should be indexed under their topics as any other article but differentiated with others by adding (Ed.) or (E). The titles of editorials may be indexed under a collective heading “Editorials”.

• Letters to the editor if considered indexable should be indexed by topic, not under a caption that may have been assigned by the editor. It is advisable to index at least the name of the person who criticized an article as well as the author’s response.

60

2. Preference and Forms of Headings based on the

International Organization for Standardization

(ISO 999)

Personal Names:

– Provide as full a form as possible

– Choose the most recent/most commonly used form of personal name as the heading and add “see” cross-reference from other forms

– Personal names should be take the form used in the document, but if the text is not consistent the indexer should adopt one form. 61

– Compound and multiple surnames, whether hyphenated or not, should be indexed under the first part

e.g. Lee Chua, Queena, Loren ; Perez de Cueller, Javier

– Persons normally identified by title of honor or nobility should be indexed under the first name

e.g. Prince Charles see Charles, Prince of Wales Queen Elizabeth I see Elizabeth I, Queen of England

62

63

Corporate Bodies

• Names of the corporate bodies should normally be indexed without transportation and in as full a form as necessary. An initial article is omitted , unless specifically required for semantic or grammatical reasons

e.g. Lopez Museum

• Transposition maybe used if it is considered that this would help the users of the index

e.g. Department of Energy see Energy, Department of

• Choose the most recent, or the most commonly used, form of corporate name as the main heading and add “see” cross references from other forms

e.g. Philippine Normal College see Philippine Normal University

64

Geographic Names

• Geographic names should be as full as is necessary for clarity, with additions to avoid confusion with the otherwise identical names Example: J.P. Rizal (Quezon city)

J.P. Rizal (Marikina)

• An article or preposition should be retained in a geographic name of which it forms an integral part

Example: Santolan, Pasig City

• Where the article or preposition does not form an integral part of a name it should be omitted Example: New Day rather than The New Day

65

INDEXING STANDARDSPart 3

Standards serve as models and guidelines for the analysis of documents, construction and organization of indexes, indexing terminology, construction and use of thesauri, etc. they promote consistency and uniformity.

66

A. International Organization for Standardization

-is a network of the national standards institutes of 146 countries, on the basis of one member per country, with a Central Secretariat in Geneva, Switzerland that coordinates the system.

67

ISO 5963: 1985 Documentation – Methods for examining documents, determining their subjects, and selecting indexing terms

ISO 999: 1996 Information and documentation – Guidelines for the content, organization and

presentation of indexes

ISO 4: 1997 Information and documentation

– Rules for the abbreviation of title words and titles of

publications. It publishes a List of Serial Title Word Abbreviations which includes title word abbreviations

in over 50 languages.

68

B. National Information Standards Organization (NISO)

A nonprofit association accredited by the American

National Standards Institute (ANSI) that identifies, develops, maintains and publishes technical standards to manage information

in our changing and ever-more digital environment.

NISO standards apply both traditional and new technologies

to the full range of information-related needs, including retrieval, repurposing, storage, metadata, and presentation.

69

Standards developed by NISO:

– ANSI/NISO Z39.2 – 1994 (R2001) Information interchange format equivalent international standard: ISO 2709

– ANSI/NISO Z39.19 – 2003 Guidelines for the construction, format, and management of Monolingual Thesauri

*Equivalent international standard: ISO 2788

70

C. British Standards Institution (BSI)

– as the National Standards Body of the UK, it develops standards and applies innovative standardization solutions to meet the needs of business and society.

Standards developed by BSI (related to library and information science): – BS 1749: 1985 Recommendations for

alphabetical arrangement and the filing order of numbers and symbols

• Provides guidance on arranging entries within lists of all kinds, e.g. bibliographies, catalogues, directories and indexes.

– BS ISO 999: 1996 Information and Documentation – guidelines for the content, organization and presentation of indexes 71

Automatic Indexing

refers to indexing by machine, or the analysis of text by means of computer algorithms.

- The focus is on automatic methods used behind the scenes with little or no input from individual searchers, with the exception of relevance feedback.

- It does not include searching options and techniques used by human searches, such as methods for creating effective search statements, adding weights to terms, specifying proximity requirements, using truncation, wild cards or combining terms with Boolean or role operators.

72

Four Types of Approaches

• Statistical – based on counts of words, statistical associations, and collation techniques that assigns weights, cluster similar words

Example: Tf-idf (term frequency -inverse document frequency), which is frequency used in many search engines.

The intuitive philosophy behind tf-idf is that terms that are frequent in many documents are less suited to make discriminations, while terms that are frequent within a single document may indicate that this document has much information about the things the terms are referring to).

Source: Cleveland & Cleveland, 2001, p. 21173

• Syntactical – stresses grammar and parts of speech, identifying concepts

found in designated grammatical combinations, such as noun phrases

• Semantic systems – systems are concerned with the context sensitivity of words

in the text Examples: What does cat mean in terms of its context?

House cats? Heavy earthmoving equipment?

• Knowledge-based – systems goes beyond thesaurus or equivalent relationships

to knowing the relationship between words Example: ‘tibia’ is part of a leg, thus the document is indexed under ‘leg injuries’.

74

Human / Manual Indexing vs. Automatic Indexing

• Automatic methods have trouble handling synonyms, homonyms, and semantic relations. Conceptualizing is very poor. Human indexers go through cognitive processes that may be influenced by their background experience, education, training, intelligence, and common sense.

• Computers can, and humans cannot, organize all words in a text and in a given database and make statistical operations on them (e.g. Td-idf).

75

Websites for Indexers Indexing Services  H.W. Wilson Home Page (http://www.hwwilson.com/)

Wright Information (http://mindspring.com/~jancw/)

Susan Holbert Indexing Services ( http://abbington.com/holbert/)

Special Formats and Subjects IndexingASIS Thesaurus of Information Science (http://www.asis.org/Publications/Thesaurus/isframe.htm)

 The Library of Congress Thesauri (http://lcweb.loc.gov/pmei/lexico/liv/bsearch.html)

StandardsNational Information Standards Organization (http://www.niso.org/)

 ANSI/NISO Z39.41- 1997 Guidelines for Abstracts (http://www.ansi.org/)

 ANSI/Z39.4- 1984 Basic Criteria for Indexers (http://www.ansi.org/)

Indexing software 

HTML Indexer (for Windows) http://www.html-indexer.com/ 

Cindex (for DOS, Windows, and Macintosh) http://www.indexres.com

76

77www.comicstripgenerator.com

www.comicstripgenerator.com

http://sweetmud.tv/wp-content/plugins/thank-you-animation-for-powerpoint-free