10
Information Processing & Management Vol. 28. No. 2. pp. 239-268, 1992 03064573192 IS.00 + .OO Printed in Great Britain. Copyright 0 1992 Pergamon Press plc PERFORMANCE EVALUATION OF MAPPER ZORANA ERCEGOVAC and HAROLD BORKO Graduate School of Library and Information Science, University of California Los Angeles, Los Angeles, CA 90024, U.S.A. (Received 26 December 1990; accepted in revised form 22 August 1991) Abstract-We report on a two-year study funded by the Online Computer Library Cen- ter and the National Science Foundation to evaluate an experimental semi-automatic cataloging advisor-Mapper-designed to assist the intermittent user in the descriptive cataloging of certain US-produced, single-sheet maps. The study is concerned with ques- tions relating to research design and performance evaluation aspects. The study has shown that library students who are map cataloging novices can achieve significantly bet- ter cataloging answers when assisted by the Mapper than those who were not assisted by the Mapper. 1. INTRODUCTION We described the design and implementation of Mapper in a previous article (Ercegovac & Borko, 1992b). The main objective of this paper is to describe a study designed to eval- uate the performance of an experimental semi-automatic map cataloging advisor, Mapper, in the descriptive cataloging of single-sheet maps emanated by three US publishers: Cen- tral Intelligence Agency, National Geographic Society, and H.M. Gousha. The research reported in this paper is based on a three-part study (Ercegovac, 1990b). The study was performed to understand how a semi-automatic advice-giving system could be designed to assist the intermittent user in solving three important cataloging tasks, de- scribed shortly. The first part of the study used a multiple-observation approach to identify and ana- lyze instances of expert judgment in solving three cataloging tasks (Ercegovac, 1992a). At- tention was directed toward examining the question of various authorial responsibilities that participate in the process of map making (Ercegovac, 1990a). The second part of the study formalized a body of expert judgment into a set of rules and then entered these rules, along with a set of published rules, into Mapper’s knowledge base (Ercegovac & Borko, 1992b). The third part of the study evaluated Mapper’s performance in a laboratory setting. This article addresses the issues conceived within Part 3 of the three-part study. 2. RESEARCH DESIGN The Post-test-only control group design in this study is in the following form: R&per X Mexper R&m I I M contr where l X represents the exposure of a group to the experimental variable, Mapper, the ef- fects of which are to be measured; This article is based on the author’s doctoral dissertation (Ercegovac, 19!JOb), to which the reader is referred for details of the methodology, statistical procedures, and analysis. The study was supported in part by the OCLC Library and Information Science Research Grant Program under grant OCLC-40433058771, and by the National Science Foundation under grant SES-8814111. 259

Performance evaluation of mapper

Embed Size (px)

Citation preview

Information Processing & Management Vol. 28. No. 2. pp. 239-268, 1992 03064573192 IS.00 + .OO Printed in Great Britain. Copyright 0 1992 Pergamon Press plc

PERFORMANCE EVALUATION OF MAPPER

ZORANA ERCEGOVAC and HAROLD BORKO Graduate School of Library and Information Science, University of California Los Angeles,

Los Angeles, CA 90024, U.S.A.

(Received 26 December 1990; accepted in revised form 22 August 1991)

Abstract-We report on a two-year study funded by the Online Computer Library Cen- ter and the National Science Foundation to evaluate an experimental semi-automatic cataloging advisor-Mapper-designed to assist the intermittent user in the descriptive cataloging of certain US-produced, single-sheet maps. The study is concerned with ques- tions relating to research design and performance evaluation aspects. The study has shown that library students who are map cataloging novices can achieve significantly bet- ter cataloging answers when assisted by the Mapper than those who were not assisted by the Mapper.

1. INTRODUCTION

We described the design and implementation of Mapper in a previous article (Ercegovac & Borko, 1992b). The main objective of this paper is to describe a study designed to eval- uate the performance of an experimental semi-automatic map cataloging advisor, Mapper, in the descriptive cataloging of single-sheet maps emanated by three US publishers: Cen- tral Intelligence Agency, National Geographic Society, and H.M. Gousha.

The research reported in this paper is based on a three-part study (Ercegovac, 1990b). The study was performed to understand how a semi-automatic advice-giving system could be designed to assist the intermittent user in solving three important cataloging tasks, de- scribed shortly.

The first part of the study used a multiple-observation approach to identify and ana- lyze instances of expert judgment in solving three cataloging tasks (Ercegovac, 1992a). At- tention was directed toward examining the question of various authorial responsibilities that participate in the process of map making (Ercegovac, 1990a).

The second part of the study formalized a body of expert judgment into a set of rules and then entered these rules, along with a set of published rules, into Mapper’s knowledge base (Ercegovac & Borko, 1992b).

The third part of the study evaluated Mapper’s performance in a laboratory setting. This article addresses the issues conceived within Part 3 of the three-part study.

2. RESEARCH DESIGN

The Post-test-only control group design in this study is in the following form:

R&per X Mexper R&m I I M contr

where

l X represents the exposure of a group to the experimental variable, Mapper, the ef- fects of which are to be measured;

This article is based on the author’s doctoral dissertation (Ercegovac, 19!JOb), to which the reader is referred for details of the methodology, statistical procedures, and analysis. The study was supported in part by the OCLC Library and Information Science Research Grant Program under grant OCLC-40433058771, and by the National Science Foundation under grant SES-8814111.

259

240

l

.

2. ERCEGOVAC and H. BORKO

A4 exper and Wcrrtr refer to measurements in experimental and control groups, respectively; RA exWer and RA,,,,, indicate random assignment of subjects to experimental and control groups, respectively.

The experimental design was used to test the impact of Mapper on cataloging novices. The general statement of the null hypothesis was that there would be no significant differ- ence in mean scores between the students using Mapper and students not using Mapper. In this article, we present three representative hypotheses of the sixteen hypotheses de- scribed in Ercegovac (1990b). These hypotheses, discussed in Section 3, are:

HYPOTHESIS I

There will be no s~gn~~cani difference in mean scores between the ex~rimen~a[ group and control group on a/i three map types combined with respect to the choice.

HYPOTHESIS II

There will be no signi~~ant difference in mean scores bet ween the experimental group and the confrol group within each map type with respect to the choice.

HYPOTHESIS III

There will be no s~gnl~cant difference in mean scores between the experimen fai group and controol group on each of the seven task elements and on aif three map types combined with respect to the choice and form.

The study (Ercegovac, 1990b) considered three classes of independent variables (Sec- tion 2.1) and two classes of dependent variables (Section 2.2).

2.1 Independent variables The independent variables are shown in Table 1. As shown in Table 2, a two-by-three

research design was used with two columns representing two treatment groups and three rows representing three map types. The entries indicate the number of maps given to each subject, The subjects did not know which map types were included in the experiment.

2.1.1 Subjects. Subjects were volunteers, UCLA’s Graduate School of Library and Information Science (GSLIS) students enrolled in the Master’s program. The students had just completed an introductory ten-week cataloging course (GSLIS 410), but had no expe-

Table I. Independent variables

Description

Treatment group (TG) of subjects CTG ETG

Map-type (MT) of single-sheet maps MT1 MT2 MT3

(Machine-Readable Cataloging tags)

Task elements (TE) Task 1 110

TEl.1 Task 2 245

TEL1 TE2.2 TE2.3

Task 3 260 TE3.1 TE3.2 TE3.3

Control Treatment Group (n = IS) Experimental Treatment Group (n = IS)

CIA published maps (n = 21) NGS published maps (n = 21) HMG published maps (n = 21)

Description

to ascertain corporate body main entry (1 IO%a)

to determine title proper (24SSa) to transcribe other title information (245%b) to transcribe statements of responsibility (245%)

to determine place of publication (26O$a) to determine name of publisher (26OSb) to ascertain date of publication (260%)

Performance evaluation of Mapper 261

Table 2. Number of maps per subject in treatment groups

Map type

MT1 MT2 MT3 Total

ETG CTG

3 3 3 3 3 3 9 9

rience in map cataloging. The 410 course focuses on descriptive cataloging principles and practices related to monographs. The course, which is mandatory by the program and of- fered during the fall quarter in four different sessions, is taught once a year by three cataloging experts. Therefore, the students were introduced in the Anglo-American descrip- tive cataloging theory and practice. During a ten-week period, the students had plenty of opportunity to apply cataloging principles through numerous cataloging examples, in-class exercises, discussions, assignments, and written papers.

The subjects were required to use rules as embodied in Anglo-American Cataloguing Rules, 2nd revised edition (AACR2R 1988) (Gorman & Winkler, 1988), Library of Con- gress Rule Interpretations, and Cartographic Materials (Cartmell Jz Parker, 1982), and follow formal procedures as implemented by the Library of Congress on the OCLC bib- liographic database for maps. The subjects were paid for completed tasks. A total of 30 subjects, 15 in the experimental group and 15 in the control group (N, = NZ = 15) partic- ipated in the experiment during the period between February 20 and April 27, 1990.

A sample population consisted of 192 students in UCLA’s Graduate School of Library & Information Science (GSLIS). A sign-up sheet for our evaluation study was published in GSLIS’s Weekly Bulletin, sent out to students’ lockers and to mailboxes, and advertised in several GSLIS classes.

Case & Richardson (1990) found UCLA’s GSLIS students to be similar to the students in other schools of library and information science in terms of their academic background. Based on their work, we believe that our findings would hold for subjects drawn from other graduate schools of library and information science in the United States.

Thirty subjects were randomly assigned to the experimental and control groups as they signed up in batches of about five for the experiment.

Homogeneity among subjects was assumed on the following grounds:

1. All subjects were UCLA’s GSLIS students who passed entrance requirements es- tablished by the Graduate Division, and separately by GSLIS.

2. All subjects had completed the introductory course in descriptive cataloging. 3. None of the subjects had experience in map cataloging. 4. All subjects used computers in their work. 5. Eighty-four percent of 97 first-year GSLIS students had majored in the humani-

ties and the social sciences. 6. Students’ accuracy, obtained in terms of frequency and types of errors (e.g., mis-

spellings, punctuation, spacing, capitalization) was uniformly distributed.

Items 3 and 4 were assured by conditions of “Students Needed in Cataloging Study.” Item 5 was obtained from a survey of the first-year GSLIS students and reported earlier (Ercegovac, 1990b).

Ercegovac examined the attribute of accuracy (6) post facto on the cataloging answer sheets obtained in the experiment. To pretest on the attribute of accuracy would mean to develop a valid and reliable instrument that would measure accuracy on the interval scale along with other attributes (e.g., learning curve, retention, intelligence, knowledge, talent, etc.). Such an effort would have been beyond the scope of this study.

2.1.2 Maps. Three classes of maps under study were produced by Central Intelligence Agency (CIA), National Geographic Society (NGS), and H.M. Gousha (HMG). These

262 2. ERCEGOVAC and H. BORKO

maps were selected because they require one to solve difficult and interesting cataloging problems.

CIA maps were drawn from two different OCLC online databases: (a) the OCLC Elec- tronic Access to Subject Information (EASI) Reference Database with a total of 8,211 maps covering the period from 1985 to 1988, as of October 1, 1987 (from OCLC Fact Sheet); and (b) the OCLC Online Union Catalog with a total of 191,807 maps as of Second Quarter, 1987.

Within the limits just described, a total of 328 CIA records were printed and prepared for the processes of preliminary analysis, retrieval, and examination of the three catalog- ing tasks. Of 328 entries, 158 were cataloged by the Library of Congress staff. While the entire printout of 328 entires was analyzed, particular attention was directed to those cataloged by the Library of Congress staff. Of 158 entires, we retrieved a total of 75 maps from the Bruman Map Library at UCLA and the Geography and Map Division at the Li- brary of Congress.

The procedure described for CIA entries was replicated for NGS and HMG maps. NGS maps were drawn from the OCLC’s EASI database using the BRS software. The search produced 37 entries. The OCLC’s OUC was searched and limited to the cartographic file and the five-year coverage. The search produced a total of 88 records of which 37 were cataIoged by the Library of Congress catalogers. All these maps were printed and prepared for retrieval and analysis. The rem~ning 51 records were either duplicates, or cataloged by cataloging agencies other than the Library of Congress. All 37 maps were retrieved from the two institutions mentioned above.

Finally, HMG records were searched on the OUC. The search retrieved 340 records of which 304 records were cataloged by the Library of Congress staff. Of these records, we retrieved 66 maps for further analysis.

Twenty-one maps in each of the three map classes were selected for the evaluation study so that the features of the seven cataloging tasks were present. The word “features” is taken to mean values that certain attributes for a given map may have, such as presence of chiefly responsible emanator, no title statement, one title statement, more than one ti- tle, presence or lack of codes, and formal statements for publication place, for publisher’s name, and for date.

2.1.3 Cff~~lo~~ng tasks and their corresponding decagon points. All subjects were re- quired to complete three cataloging tasks, as described earlier. To complete these tasks, the subjects had to make both simple and complex decisions.

Simple decision points refer to the decisions that typically do not influence access, de- scription, and identification of a given map. Many of these decisions, which require remem- bering International Standard Bibliographic Description (ISBD), rules such as punctuation, spacing, and bracketing as well as the Machine-Readable Cataloging (MARC) conventions such as tags for data elements, indicators, and subfield delimiters, where “delegated” (Pais- ley & Butler, 1977) to Mapper.

Complex decision points refer to the decisions that typically influence description, identification and retrieval of a given map. Many of these decisions demand catalogers’ ex- perience and understanding of the bibliographic nature of publications and can not be read- ily automated. For instance, some published definitions lack operational tests with regard to criteria of responsibility for the intellectual content of a map. Many of the published cataIoging rules expressly call upon catalogers' judgment, imply catalogers’ knowledge of cartographic terms, and their experience about map publishing. These complex decisions were “augmented” rather than “accelerated” by or “delegated” (Paisley & Butler, 1977) en- tirely to Mapper.

2.2 Dependent variables Dependent variables of accuracy and affective measures are discussed next. 2.2.1 Accuracy measures. The study asked the question: Given two groups of sub-

jects, will the treatment group do cataloging tasks “more accurately” than the group without Mapper? The term “more accurately” is taken to mean with statistically signific~tly higher mean accuracy scores on each of the applicable decision points as identified shortly. Each

Performance evaluation of Mapper 263

answer sheet completed by subjects was compared against the corresponding printouts of cataloging records in the cataloging machine-readable database available on the OCLC sys- tem. The printouts representing maps in machine readable form were cataloged by a group of map cataloging experts from the Geography and Map Division at the Library of Con- gress. Cataloging records were examined by an independent judge, a head cataloger at one of the UCLA’s libraries. Regarded as “perfectly accurate,” these entries were considered accurate on each of the decision points.

2.2.2 Affective measures. Affective measures are taken to mean users’ perceptions of Mapper and their overall satisfaction with the system’s performance. The answers were ob- tained by using a forced-format questionnaire and final personal interview. The questions ranged from inquiries about students’ perceptions of the ease of learning to use Mapper, and their liking of the overall format and content as presented on screens, to asking for their opinion about response time.

2.3 Map scoring schemes Map Scores were defined with respect to: (a) liberal and strict scoring; and (b) partial

and complete scoring of choice and form. When liberal scoring was employed, each person in both the control and experimen-

tal groups received a point for each correctly answered task element. Note that a correct answer need not match a “perfectly accurate” answer produced by Mapper in the delegated mode. For example, the answer “s.1.” (sine loco), meaning that no probable place is given on a CIA map by the subject in the control group, is correct by rule 4C6 and gets a point. The same task element, however, would be produced automatically by Mapper as “[Wash- ington, D.C.]” for a given CIA map. This answer is perfectly accurate and also gets a point. Mapper student would behave as map cataloging experts would, drawing upon their expe- rience in map cataloging.

When strict scoring was employed, each person received a point for each correctly an- swered task that matched a “perfect score,” whether it was delegated to Mapper or not.

Partial scoring considered only the choice for each of the cataloging tasks and their subtasks, such as the emanator, the title and statement of responsibility, publication place, publisher, and date.

Complete scoring included both the choice and the form for each of the data elements under study.

All three hypotheses, discussed in this article, used the non-weighted scoring scheme, which treated each of the cataloging answers as one of two possible outcomes. The answer was either correct or incorrect. If a point was correct, it was coded with “1”; otherwise “0.”

The weighted scoring scheme, discussed in detail in Ercegovac (1990b), rated each of the cataloging answers on the basis of its relative importance in providing access to biblio- graphic information and identifying and describing a given map as the functions of descrip- tive cataloging. The study found similar performance results from both non-weighted and weighted versions. The map scoring schemes just described are illustrated in Table 3.

2.4 Testing environment The experimental group used Apple Macintosh computers in GSLIS. Mapper is im-

plemented using the Apple HyperCard’” system and runs on any Apple Macintosh com-

Table 3. Map scores

Map scoring Liberal vs. strict (MS) Partial vs. complete Weighting scheme

MS1

MS2

MSlnw MSlw

MS2nw MS2w

liberal & partial scoring

strict & complete scoring

non-weighted weighted

non-weighted weighted

IPH 28:2-H

264 2. ERCEGOVAC and H. BORKO

puter that supports HyperCard” 1.2. A detailed description of design and implementation of Mapper has been reported earlier (Ercegovac & Borko, 1992b). The control group did not use Mapper, but had access to printed cataloging rules, definitions, and answer forms.

Cataloging sessions took place on the UCLA campus in group settings of up to two and one half hours.

Each student was given a packet containing the following material: a set of written in- structions and a sample of how to fill out the answer forms; and a set of nine single-sheet maps, three maps produced by each of the three publishers.

Subjects were given minimal information about the study in order to blind them from more specific information that might give direction to their behavior.

The treatment group was given ten minutes to practice a direct style of manipulation on the Macintosh computer. A direct manipulation interface may be defined by the visi- bility of the objects and actions, rapidly reversible actions that can be manipulated visu- ally on a screen using various hardware and software devices, such as mouse and tablets to select/deselect objects on the screen, to scroll the window, and the like (Shneiderman, 1987).

The control group was given a set of cataloging rules necessary to perform the three cataloging tasks. These rules, reassembled from three published sources (AACR2R, 1988; CM 1982; LCRI), were presented in the students’ packets. The subjects were given about 10 minutes to familiarize themselves with a format of the packets, which contained the rules and definitions pertaining to the three tasks only. As noted in Section 2.1.1, all students had just completed a formal ten-week course in descriptive cataloging and were competent in the use of cataloging rules and procedures. Since the course focused on monographic publications, Ercegovac verbally explained to all subjects the nature of the phrase “chief source of information” as it applies to maps, and defined some of the cartographic terms, such as “panel-title” and “legend.” The investigator was present most of the time and was available to answer questions of a general nature.

3. PERFORMANCE EVALUATION: RESULTS

In this section we report on the results obtained in the experimental study with respect to (a) accuracy measure scores and (b) affective measures, as defined in Section 2.

3.1 ACCUFQC~ measure scores Accuracy scores were as follows: Map Score 1 non-weighted (MSlnw) used liberal scoring only and considered answers

either as correct or incorrect. MSlnw used a partial scoring scheme, which included only the choice for each of the tasks. Map Score 2 non-weighted (MS2nw) used strict scoring only and was based on the perfect score, defined earlier. MS2nw considered answers as cor- rect or incorrect. This scoring scheme used complete scoring, which included both the choice and the form for each of the tasks.

In the evaluation studies we applied a two-tailed ?-test to establish the statistical sig- nificance of results. Parameters used in all studies were: degree of freedom df = (15 - 1) + i 15 - 1) = 289 fcritical = 2.048, for Ievel of significance p < 0.05. A difference between CTG and ETG scores was interpreted as significant if the corresponding t-test satisfies

t t I > fcritical*

We also gave a relative difference between ETG and CTG scores, defined as: Diff = (ETGscore - CTGscore)/ETGscore (in percents).

We now present and discuss three representative hypotheses of the sixteen hypotheses described in Ercegovac (1990b).

Hypothesis I considered all three classes of maps and used only the liberal and par- tial scoring technique.

HYPOTHESIS I

There will be no sigmficant difference in mean scores between the experimental group and control group on all three map types combined with respect to choice.

Performance evaluation of Mapper 265

Table 4. HI MSlnw (all map types combined)

Treatment group

CTG ETG Diff

Mean

37.53 41.20 20.48%

Standard deviation

3.98 2.27

Variance

15.84 5.17

Itl Significant?

7.89 yes

As shown in Table 4, the Experimental Treatment Group (ETG) scored significantly better than did the Control Treatment Group (CTG) on all three map types combined. Therefore we rejected Hypothesis I at better than a 5% level of significance.

Hypothesis II, which also used the liberal and partial scoring technique, considered all maps within each of the three map types.

HYPOTHESIS II

There will be no significant difference in mean scores between the experimental group and the control group on all three maps within each map type with respect to choice.

As Table 5 demonstrates, the t-test determined the statistical difference among the groups on three map types to be significant for all map types.

The MS2nw procedure used the strict scoring technique in which each answer sheet was compared with the perfect copy. MS2 considered both choice and form for each of the data elements. That is, MS2nw used complete scoring.

Hypothesis III tests Mapper’s impact with respect to each of the cataloging tasks and their parts for all map types combined, based on the non-weighted and complete scoring scheme.

HYPOTHESIS III

There will be no significant difference in mean scores between the experimental group and control group on each of the seven task elements and on all three map types combined, with respect to the choice and form.

With respect to choice, ETG performed significantly better on all task elements except title proper (TE2.1) and other title information (TE2.2). With respect to form, ETG did significantly better on all task elements at p < 0.05.

With respect to form, significant differences were found on all task elements for CIA and NGS maps, and on all task elements except other title information (TE2.2) and pub- lisher’s name (TE3.2) for HMG maps.

3.2 Discussion of results We report and discuss results based on the three hypotheses just presented. The dis-

cussion is organized around seven patterns that emerged from the study’s results.

Table 5. H2 MSlnw (each map type separately)

CIA NGS HMG

CTG Mean Standard deviation Variance

ETG Mean Standard deviation Variance

Diff ItI Significant?

12.20 13.13 12.20 3.10 1.41 2.46 9.60 1.98 6.03

14.87 0.35 0.12

17.94% 3.20 yes

18.00 1.65 2.71

27.04% 8.40 yes

14.33 1.84 3.38

14.88% 2.60 yes

266 Z. ERCEGOVAC and H. BORKO

1. Regardless of the type of map scoring used in this study, defined earlier, the exper- imental treatment group (ETG) performed better than did the control treatment group (CTG) on the three hypotheses presented in this article. ETG did better in the liberal scor- ing in which the control group was given a credit even for partially correct answers.

2. Furthermore, the ETG scored significantly better than the control group when the three map types were combined across all evaluation studies (e.g., Hypothesis I).

3. The experimental group performed significantly better than did the control group in making the correct choice of the chiefly responsible emanator (TEl. 1). The success may lie in the importance of private knowledge obtained from the experts in map cataloging. This knowledge was especially useful for CIA and HMG maps, where novices could not determine the entry (e.g., Hypothesis III).

No significantly better performance was found between the two treatment groups for NGS maps where the chiefly responsible emanator was formally stated on all maps and operationalized by the distinct layout and wording, “produced by.” Even in the case of NGS maps where both groups received high scores, novices saw Mapper’s benefit by having the explanation why the entry was suggested. This unsolicited information was expressed by more than half of the experimental subjects.

4, The experimental group was found to perform significantly better than the control group in determining data elements for the publication area (TE3.1,TE3.2,TE3.3). The rel- ative success of ETG may be attributed to the experts’ ability to use a large body of precom- piled knowledge relating to publication place and publisher once the chiefly responsible emanator had been determined. This feature was implemented in Mapper’s delegated as- sistance mode (e.g., Hypothesis III).

The meaning of CIA unique codes helped us write an algorithm that converted codes into publication dates automatically, thus achieving the consistently significantly better per- formance of ETG over CTG for CIA maps (e.g., Hypothesis II).

The same was the intent with HMG maps. With the help of an expert in the Library of Congress Geography and Map Division, codes were converted into dates. However, the results did not show significant difference between the two groups on date (TE3.3). Al- though Ercegovac showed the codes to all the subjects before testing, the codes on HMG maps remained difficult to locate.

5. The experimental group scored significantIy higher than did the control group in the area of statement of responsibility for NGS maps (TE2.3) (e.g., Hypothesis III). The relatively successful performance could be attributed to the foilowing two reasons.

First, a large body of private knowledge has been gathered from the map cataloging experts and publishers in order to understand the way different responsibility functions par- ticipate in the process of map-making.

Second, Mapper gathered data from the subjects by means of a closed format tech- nique using text and graphics as well as binary and multiple-choice questions. Thus, the subjects were well prompted and guided throughout the test.

ETG did better than CTG on Gousha maps but not significantly better. 6. The experimental group was insignificantly better on the tasks related to the title

and statement of responsibility area (in particular, TE2.1 and TE2.2) (e.g., Hypothesis III). Part of the explanation for the relatively poor performance of TE2.2 (TE2.2 deals with

other title information) may lie in the nature of chief source of information for maps, and cataloging rules. First, there is no standardized and predictable layout on cartographic chief sources of information to show a possible break between the title proper and other title in- formation. Second, cataloging rules allow for the use of judgment in deciding whether a word or phrase is to be treated as part of the title proper or as other title information.

7. The experimental group scored significantly better on the form than on the choice when compared with the control group across all map classes and all task elements (e.g., Hypothesis III).

As described in Section 2.1.3, the form component consists of simple decision points. These decisions, which relate to the application of ISBD and MARC rules, have been im- plemented in Mapper using the delegated assistance mode. The MARC rules include the use of tags that uniqueiy identify data elements, indicators that may influence access of an

Performance evaluation of Mapper

Table 6. Affective measures obtained from Mapper students

% of subjects

267

A. Overall opinion of Mapper Very favorable 58.3 Somewhat favorable 33.3 Somewhat unfavorable 8.3 Very unfavorable 0.0

B. Difficulty to learn to use Mapper Very difficult 0.0 Somewhat difficult 8.3 Somewhat easy 58.3 Very easy 33.3

C. Subjects’ estimate of time needed to learn to use Mapper About 15 minutes 66.1 About 1 hour 33.3 Much more than 1 hour 0.0

D. Very easy Easy Undecided Difficult Very difficult

1. Scanning through a long display 2. Understanding explanations on the screen 3. Remembering what is in Mapper 4. Understanding the order in which items

are displayed 5. Reading long displayed messages

16.7 50.0 16.7 16.7 0.0 16.7 50.0 33.3 0.0 0.0 33.3 41.7 8.3 16.7 0.0

25.0 50.0 16.7 8.3 0.0 8.3 50.0 33.3 8.3 0.0

item, and subfield delimiters. The ISBD rules are generally well understood and accepted as the international standard for bibliographic description. Both ISBD and MARC rules were automated by the system. Applying these rules correctly requires catalogers’ memori- zation of the large number of rules, which change frequently. The relative difference in the performance between the treatment groups has demonstrated that the CTG had consider- able problems when dealing with the form component. A detailed examination of students’ answers revealed that students had special problems in applying ISBD rules rather than MARC rules. Therefore, our decision to automate the form component was useful.

3.3 Affective measures and discussion Affective measures were obtained at the end of each Mapper session. These were

stored online along with the answers to cataloging tasks provided by the experimental group.

As can be seen in Table 6, the affective measures related to students’ overall opinion of Mapper, and in particular, with respect to the degree of difficulty they had when using Mapper, their opinion about the amount of time needed to learn to use the system, and Mapper’s response time. They also provided information on the degree of ease to scan through displays, to understand explanations on the screen, to remember what was included in Mapper, and to understand the order in which items were displayed on the screen.

As Table 6 demonstrates, students had favorable opinions about Mapper and found the system easy to learn (91.6(‘10). Two thirds of the students felt that it would take about 15 minutes to learn how to use the system. All Mapper students found the system fast enough.

4. CONCLUSIONS

The main objective of this study has been to evaluate the effectiveness of an advice- giving system in the descriptive cataloging of single-sheet maps emanated from three US publishers. What follows is a summary of conclusions obtained in this project.

Mapper students produced signi~~antly better mean accuracy scores (p < .05) than the non-Mapper students with respect to all three classes of maps (Hypothesis I), all maps

268 Z. ERCE~~VAC and H. BORKO

within each class (Hypothesis II), and cataloging tasks and their parts for all maps except for title proper and for other title information (Hypothesis III).

In conclusion, the evaluation study has shown that library students who are map cataloging novices can achieve significantly better cataloging answers when assisted by the Mapper than those who were not assisted by the Mapper.

Mapper students expressed a favorable overall opinion of the system, and found it easy to understand and use (91.6%).

REFERENCES

Cartmell, V., & Parker, V. (Eds.) (1982). Cartographic Materials: A manual of interpretation for AACR2 (1982). Prepared by the Anglo-American Cataloguing Committee for Cartographic Materials. General Ed. Hugo L.P. Stibbe. Chicago: American Library Association.

Case, D.O., & Richardson, J.V. (1990). Predictors of student performance with emphasis on gender and ethnic determinants. Journal of Education for Library and Information Science, 30(3):163-182.

Ercegovac, 2. (1992a) A multiple observation approach in knowledge acquisition for expert systems: A case study. Journal of the American Society for Information Science (in press).

Ercegovac, Z. (1990a). Proposed definitional conditions as a basis to study the concept of map-author. Cataloging & Classification Quarterly, lO(4): 19-50.

Ercegovac, Z. (1990b). Research on knowledge-based descriptive cataloging of cartographic publications: An ex- perimental advice-giving system-Mapper. Unpublished doctoral dissertation, University of California Los Angeles.

Ercegovac, Z., & Borko, H. (1992b). Design and implementation of an experimental advisor-Mapper. Infor- mation Processing & Management, 28, 241-257.

Gorman, M., & Winkler, P.W. (Eds.) (1988). Anglo-American cataloguing rules, 2nd revised ed. Chicago: Amer- ican Library Association.

Library of Congress Rule Interpretations of AACRZ, 1978-1985 (1985). (Compiled by S.C. Tseng; 2nd cumu- lated edition), Metuchen, NJ: Scarecrow Press.

Paisley, W., & Butler, M. (1977). Computer assistance in information work. Palo Alto, CA: Applied Commu- nication Research. ERIC: ED 146900.

Shneiderman, B. (1987). Designing the user interface: Strategies for effective human-computer interaction. Reading, MA: Addison-Wesley.