13
Information-Seeking Strategies of Novices Using a Full-Text Electronic Encyclopedia Gary Marchionini College of Library and Information Services, University of Maryland, College Park, MD 20742 An exploratory study was conducted of elementary school children searching a full-text electronic encyclo- pedia on CD-ROM. Twenty-eight third and fourth graders and 24 sixth graders conducted two assigned searches, one open-ended, the other one closed, after two demon- stration sessions. Keystrokes captured by the com- puter and observer notes were used to examine user information-seeking strategies from a mental model perspective. Older searchers were more successful in finding required information, and took less time than younger searchers. No differences in total number of moves were found. Analysis of search patterns showed that novices used a heuristic, highly interactive search strategy. Searchers used sentence and phrase queries, indicating unique mental models for this search sys- tem. Most searchers accepted system defaults and used the AND connective in formulating queries. Transi- tion matrix analyses showed that younger searchers generally favored query refining moves and older searchers favored examining title and text moves. Sug- gestions for system designers were made and future re- search questions were identified. Introduction Electronic systems for storing and retrieving informa- tion are used by increasing numbers of people who are not information specialists (end-users), and as Ojala [28] pointed out, this trend is likely to accelerate. The use of Electronic Information Systems (EN) by novice or casual users is driven by hardware developments, for example, personal computers, optical storage, etc. and software de- velopments, for example, menu driven “user friendly” in- terfaces, pseudo-intelligent front ends, etc. The availability of inexpensive, easy to use full-text systems presents op- portunities and challenges for information professionals and end-users alike. The information-seeking process is a complex interaction among several factors and success is dependent on methods as well as tools. The challenge to information professionals is to design and implement effi- cient and effective search systems and databases for end- Received December 29, 1986; revised March 4, 1987; accepted March 10, 1987. % 1989 by John Wiley & Sons. Inc. users. The challenge to end-users is to understand the many facets of the information-seeking process so that they can make full use of these emerging systems. This ar- ticle reports the results of research on information-seeking strategies used by novice users searching a full-text elec- tronic encyclopedia. The research was conducted from a cognitive process perspective. The study examined the fol- lowing questions: 1. Can novices use such a system successfully with little formal training? 2. What features of full-text retrieval do novices apply? 3. What are the relationships among user, task and search pattern? 4. What search patterns are exhibited and how are these patterns related to information-seeking strategies? Answers to these questions will give immediate guid- ance to designers of full-text retrieval systems meant for end-users, and to designers of instructional materials for their use. Results and methods of data collection and analy- sis will also contribute to an emerging cognitive theory of information-seeking. Information-Seeking Theory Information-seeking is a special case of problem solv- ing. It includes recognizing and interpreting the informa- tion problem, establishing a plan of search, conducting the search, evaluating the results, and if necessary, iterating through the process again. Studies of where and how peo- ple look for information (see, for example, [10,24]) high- light the interaction of personal factors such as experience and knowledge, and the information need. As with prob- lem solving in general, understanding the information- seeking process requires exploration of human cognition, and we lack direct methods for such exploration. A general procedure is to observe behavior in well-controlled situa- tions and use the observations to construct a model of the cognitive process. By incrementally modifying the condi- tions of observation the model is refined and generalized. This article reports on a series of observations of a group of novice users applying a single database and JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 40(1):54-66, 1989 CCC 0002-8231/89/010054-13$04.00

Information-seeking strategies of novices using a full-text

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Information-seeking strategies of novices using a full-text

Information-Seeking Strategies of Novices Using a Full-Text Electronic Encyclopedia

Gary Marchionini College of Library and Information Services, University of Maryland, College Park, MD 20742

An exploratory study was conducted of elementary school children searching a full-text electronic encyclo- pedia on CD-ROM. Twenty-eight third and fourth graders and 24 sixth graders conducted two assigned searches, one open-ended, the other one closed, after two demon- stration sessions. Keystrokes captured by the com- puter and observer notes were used to examine user information-seeking strategies from a mental model perspective. Older searchers were more successful in finding required information, and took less time than younger searchers. No differences in total number of moves were found. Analysis of search patterns showed that novices used a heuristic, highly interactive search strategy. Searchers used sentence and phrase queries, indicating unique mental models for this search sys- tem. Most searchers accepted system defaults and used the AND connective in formulating queries. Transi- tion matrix analyses showed that younger searchers generally favored query refining moves and older searchers favored examining title and text moves. Sug- gestions for system designers were made and future re- search questions were identified.

Introduction

Electronic systems for storing and retrieving informa- tion are used by increasing numbers of people who are not information specialists (end-users), and as Ojala [28] pointed out, this trend is likely to accelerate. The use of Electronic Information Systems (EN) by novice or casual

users is driven by hardware developments, for example, personal computers, optical storage, etc. and software de- velopments, for example, menu driven “user friendly” in- terfaces, pseudo-intelligent front ends, etc. The availability of inexpensive, easy to use full-text systems presents op- portunities and challenges for information professionals and end-users alike. The information-seeking process is a complex interaction among several factors and success is dependent on methods as well as tools. The challenge to information professionals is to design and implement effi- cient and effective search systems and databases for end-

Received December 29, 1986; revised March 4, 1987; accepted March

10, 1987. % 1989 by John Wiley & Sons. Inc.

users. The challenge to end-users is to understand the many facets of the information-seeking process so that they can make full use of these emerging systems. This ar- ticle reports the results of research on information-seeking strategies used by novice users searching a full-text elec- tronic encyclopedia. The research was conducted from a cognitive process perspective. The study examined the fol-

lowing questions:

1. Can novices use such a system successfully with little formal training?

2. What features of full-text retrieval do novices apply? 3. What are the relationships among user, task and search

pattern? 4. What search patterns are exhibited and how are these

patterns related to information-seeking strategies?

Answers to these questions will give immediate guid- ance to designers of full-text retrieval systems meant for end-users, and to designers of instructional materials for their use. Results and methods of data collection and analy- sis will also contribute to an emerging cognitive theory of information-seeking.

Information-Seeking Theory

Information-seeking is a special case of problem solv- ing. It includes recognizing and interpreting the informa- tion problem, establishing a plan of search, conducting the search, evaluating the results, and if necessary, iterating

through the process again. Studies of where and how peo- ple look for information (see, for example, [10,24]) high- light the interaction of personal factors such as experience and knowledge, and the information need. As with prob- lem solving in general, understanding the information- seeking process requires exploration of human cognition, and we lack direct methods for such exploration. A general procedure is to observe behavior in well-controlled situa- tions and use the observations to construct a model of the cognitive process. By incrementally modifying the condi- tions of observation the model is refined and generalized.

This article reports on a series of observations of a group of novice users applying a single database and

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 40(1):54-66, 1989 CCC 0002-8231/89/010054-13$04.00

Page 2: Information-seeking strategies of novices using a full-text

search system to preselected search tasks. The results are

therefore specific to this combination of users, tasks and

tools but provide a base for future study of information- seeking and the design of similar systems. Three overlap-

ping bodies of literature support a theory of information- seeking and provided guidance in the conduct of this study: online searching, electronic information systems, and cog- nitive science.

Online Searching

The literature related to online searching is rich and varied. Many studies have examined the search behavior of expert searchers [ 1,2,16,26]. Fenichel [ 151 reviewed the online searching literature and illustrated the complexity of studying information-seeking. She noted that there is great variation in approaches taken even within controlled settings and that even experienced searchers did not take full advantage of system features (see [3,20] for additional reviews).

End-users who do not use an intermediary are typically novice or casual users of the retrieval system and their di-

rect use of such systems has been considered by several re-

searchers [5,23,25,33]. The results thus far show that novices use simple and direct strategies for conducting search. There is a need to systematically examine searches conducted by novices to understand their thinking so that features for improving their strategies can be built into fu- ture search systems and instructional materials.

Fidel and Soergel [ 171 described a framework for online searching which defined the elements for an information- seeking theory. Their framework included: setting, user,

request (task), database, search system, searcher, search process, and outcome. Each of these factors is complex,

having many facets which are not easily quantified. More- over, these factors are interdependent. This framework, with some modification, was used as the basis for con- structing the research reported here. One simplification caused by considering novice users was that the searcher and user were the same person.

Electronic Information Systems (EIS)

Two developments pertinent to this research provide ex- citing potentials for information storage and retrieval: full-

text databases and CD-ROM storage systems.

Full-Text Databases. Full-text databases offer great opportunities to professionals and end-users because they can provide primary information efficiently. Full-text data- bases on CD-ROM makes large scale end-user access pos- sible. However, at present, there is no conclusive evidence about the performance of full-text database systems; there are few results about novice users applying such systems; and, there is no research on what search strategies are best applied in full-text environments.

Questions about the performance of full-text systems have yielded both positive [32,36] and negative [4] results. Even user satisfaction results are mixed (see [12] for

positive indications and [ 191 for negative results). Tenopir [ 351 summarized differences between searching full-text

and controlled vocabulary databases. She suggested the use of proximity operators before boolean operators in full-

text systems, and attention to the use of synonyms and specific natural language terms. She recommended the combining of full-text and controlled vocabulary tech- niques, and pointed out how little is really known about full-text searching.

Full-text databases have great implications for end-user information-seeking because full-text systems can accom-

modate and compensate for end-users’ typically simple search strategies. However, because extensive full-text searching is not feasible manually, we have no experiential models upon which to develop conceptual models of full- text searching. Using a full-text database could signifi- cantly affect a users cognitive information-seeking system in a variety of ways. For example, a full-text database can be thought of as having an exhaustive index and the user can modify his/her information-seeking system by general- izing the existing rules for using paper-based indexes, eventually affecting the use of all indexes. On the other

hand, the electronic database can be viewed as a sequen- tially searchable main file, which empowers a simple facet not practical in large, print databases; thus calling for mod- ification of the user’s internal rules for selecting tactics and terms. Moreover, it is possible that the full-text lookup could be perceived as something entirely new, requiring additional strategies to be added to the user’s information- seeking system. Perhaps subjects who fail to generalize are confused by the full-text indexing and their performance suffers accordingly. Novice users’ experience with such systems can begin to illustrate how users cope with full- text databases and provide guidance for database and search system designers as well as to the end-users of these

systems. CD-ROM Technology. The ability to store up to

600 megabytes of data on a single surface twelve cen- timeters in diameter with seek times under two seconds is clearly cause for attention. See Chen, P. [ll] for a tech- nical description of CD-ROM technology and Chen, C. [9] or Desmarais [14] for overviews of applications.

The first widely distributed text database delivered on CD-ROM was Grolier Electronic Publishing’s The Elec-

tronic Encyclopedia. The text of the twenty-volume set oc- cupies about 60 megabytes. The database is full-text searchable through a fully inverted, 50 megabyte index of all non-stop words in the database. This encyclopedia is geared toward middle school and young adult readers. Kister [22] pointed out that the best feature of the print version of this encyclopedia was the excellent use of graphics. It is somewhat ironic that the electronic version is text-only. The search system was designed to require no instruction, and it is delivered on floppy disk with the driv- ers to enable various CD-ROM players. The software rep- resents the encyclopedia as hypertext, a highly interactive network of articles. It is menu-driven and supports both di- rect lookup of articles and full-text searching. The full-text

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989 55

Page 3: Information-seeking strategies of novices using a full-text

searching component supports boolean (and, or and not), character masking, right truncation, and proximity func- tions. As Connolly [ 131 pointed out, at approximately one- third the cost of the print version, this encyclopedia is attracting much attention in schools and libraries.

Because cost is not such a constraint when using a sys- tem like the electronic encyclopedia, searchers could em- ploy more display tactics in conducting the search, including actually reading articles online. Just as water and electricity seek paths of least resistance, so humans seek the path of least cognitive load. Because this system dis- plays article titles together with frequency of occurrence of query terms, users can scan these lists using this frequency data to make relevance judgments about whether to exam- ine the text of articles. Moreover, when an article is dis- played, the terms from the query formulation (which caused the hit) are highlighted on the screen, thus facilitat- ing browsing via scanning of the text for relevance. By supporting scanning which is a recognition task, this sys- tem provides a low cognitive load strategy for searching. Systems which encourage much refinement of queries de- mand higher cognitive load since the task of query formu- lation involves at least recall procedures which themselves are executed more slowly (Card, Moran & Newell, [7]). In

essence, it takes less concentration and cognitive effort to scan lists of items from which to make a choice, than to identify and recall synonyms and combine facets using logical connectives. Whether systems that support a scan and select strategy yield effective results requires investi- gation in a variety of databases and settings.

Mental Models

Of all the factors relevant to information-seeking, the

human information seeker is the most complex. Variables such as intelligence, experience, motivation, and a host of other individual characteristics certainly affect informa- tion-seeking performance. The perspective taken in this re- search is based upon a theory of cognition which proposes that performance in applying principles or tools is depen- dent on dynamic internal representations of those princi- ples or tools called mental models. Norman’s research [27]

demonstrated that mental models are ill-defined, incom- plete, and can be illogical. In general, a mental model is a cognitive representation of a problem situation or system which is active in the sense that it can take inputs from the external world and return predictions of effects for those inputs. It can be “run” to allow predictions which then de- termine what actions should be taken. Mental models serve the dual purposes of representing entities and relationships which are refreshed and extended by experience, and simu- lating the possible effects of acting on these entities and relationships. Thus, mental models allow us to both under- stand problem situations and predict consequences of ac- tions contemplated for solving the problems.

Johnson-Laird [21], in a seminal work, explained a mental model theory for inferencing from both empirical

and theoretical perspectives. Young [39] provided a taxon- omy for mental models which attempted to unify much of the cognitive science research on representations for com- plex systems and processes. Mental models for specific mathematical and physical systems have been explicated and Borgman [5] examined how mental models for an in- formation storage and retrieval system are best acquired (see [18] for a collection of studies, and [6] for a general review).

For this research, a system of mental models for the general problem of seeking information is assumed. This system controls the combination of several specific mental

models related to a particular information problem. Such a system can be described functionally and structurally. Functionally, an information-seeking system controls search by: extracting key concepts from the information problem, identifying criteria for search success, selecting

candidate information sources, monitoring lookup (search) and examination procedures, and using results to modify itself. Structurally, an information-seeking system includes

a set of mental models associated with various information sources (databases and accompanying search systems), a set of mental models pertinent to a particular information problem (task domain knowledge), an historical record of

past applications of the information-seeking system (self- awareness which allows analogy and checks context), and a set of rules for combining these components and moni- toring progress.

The information-seeking system is assumed to be a con- trol mechanism which can be “run” when instantiated with

inputs for a particular problem. Perhaps the most interest- ing characteristic of an information-seeking system is its

constant evolution. Questions about how the information- seeking system changes structurally and functionally have theoretical and practical implications. Each time the infor- mation-seeking system is applied, it must adapt. The exe- cution (run) of an individual’s information-seeking system for a particular information problem is considered an In- formation-Seeking Strategy (ISS). Each information- seeking strategy leads to modification of the general information-seeking system. An ISS is clearly task driven, the task serving as stimulus to activate the information-

seeking system. An ISS is manifested behaviorally by the actions taken in conducting a search-a search pattern. A single action of the information-seeking strategy is consid-

ered a tactic. Tactics are manifested behaviorally by indi- vidual moves made during a search, for example, lookup a particular term, examine a citation or article, etc. Before

the complexities of mental models for particular types of search systems can be understood, the databases, search systems, and task domains must be described and under- stood. The purpose of this research was to begin these de- scriptions from the perspective of novice users.

Method

The general procedure taken was to introduce students to an electronic encyclopedia; assign two search tasks;

56 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989

Page 4: Information-seeking strategies of novices using a full-text

collect data through observation and keystroke capturing techniques as subjects conducted searches for the tasks; summarize data for group and task by success, time and number of moves; and analyze search patterns by examin-

ing query formulations and search moves. Hypotheses re- lating groups and tasks on the dependent variables success, time and number of moves were tested, but since the re- search was mainly exploratory with respect to information- seeking strategies, descriptive data were presented and research questions were generated.

Twenty-eight third and fourth graders and twenty-four sixth graders in a talented and gifted program in an urban school setting participated in the study. Subjects were ex- posed to two forty-five minute explanation/demonstration sessions on use of the electronic encyclopedia. They were then assigned to pairs by their respective teachers and re- ported to the media center on two separate occasions, con-

ducting one of two assigned searches on each occasion. Student pairs conducted their searches in a separate, quiet area of the center using an IBM-PC computer driving a Phillips 100 CD-ROM player and color monitor. The

search topics were given to them upon arrival by a project member who observed and recorded the team’s actions

during the search. The observer gave a brief review of sys- tem commands before students began their searches but answered no questions and made no comments once the searches began. User keystrokes were captured in an unob- trusive manner during their searches.

Subjects

All students had had previous computer experience, in- cluding an introduction to keyboarding and some computer assisted instruction. None of the students had ever used an electronic encyclopedia, and the print version of this par- ticular encyclopedia was not available in the school.

The key subject characteristics were the cognitive infor- mation-seeking system, the knowledge base(s) for the task domains, and mental models for an encyclopedia and the search system. The effectiveness of a search was presumed to be dependent on the accuracy of mapping these internal representations onto the actual tasks, source, and system

used. Rouse ]31] pointed out that “. . . human information processing abilities plateau at a relatively early age while information-seeking abilities continue to improve.” (~131). Since mental models develop with experience it was rea- sonable to assume that young children would have less de- veloped information-seeking systems and thus exhibit less sophisticated and successful information-seeking traces than older students. Thus, it was assumed that outcomes for older subjects (sixth graders) would be superior to out- comes for younger subjects because they had more highly developed mental models for information-seeking and broader task domain knowledge bases.

Tasks

Information-seeking is problem driven; the problem sit- uation must certainly affect the information-seeking strat-

egy applied and the outcomes of searching. From a cognitive perspective, an information need occurs when a knowledge base for a task domain is activated and requires instantiation or modification. The information processing

system is called into action by passing relevant facets of the task domain to it for completion. The interplay be- tween task domain knowledge and the information-seeking system is manifested in the terms used in conducting a search. For this research, the problem situation was con- trolled; search topics were assigned. Note that this created a somewhat artificial setting for the information-seeking

system in that search was externally motivated and the search statement itself presented language which suggested query terms.

Two tasks were designed for use by all student pairs. These tasks were pilot tested and refined before the re- search began. Students were told to imagine that they were

media specialists asked by a teacher to find particular in- formation. Both tasks were stated in paragraph form (See Appendix A). One task required students to find a fact - the first year speed skating was introduced into the Olympic games. This task required students to combine three facets (concepts): place, activity and time. This was termed a closed task. The other task required students to

find information about women who have traveled in space. Three main facets were also combined for this task: per- son, place, and activity. This was termed an open task since there were many possible names and associated facts to retrieve. Both questions were designed to be motivating

to elementary student interests. All student pairs conducted both searches except one third grade pair who conducted only the open task due to absences from school. The order of the search was alternated for each group so that half the groups did the closed search first and the other half did the open search first. It was hypothesized that outcome would

be dependent on task, i.e., one task would take more time

and be considered more difficult by students; and that in- formation-seeking strategies would differ for the two tasks.

Outcome Measures

Three outcome measures were considered: success, time to complete a search, and total number of moves (tac- tics) used. Subjects recorded relevant information and

titles as they conducted the search. For the closed search, subjects were judged to be successful if they found the cor- rect fact, otherwise they were judged to be unsuccessful. Judging the open search was more difficult since it was possible to find a variety of information about the several women who have explored space. Two measures for suc- cess were used for the open task: a dichotomous scale, as in the closed search-subjects were judged to be success- ful if they found information about at least one female space traveler; and the number of relevant articles they

listed on their search worksheets. Students were given a maximum of 45 minutes to com-

plete each search. The observer noted starting and stopping time for each search and interrupted searches if 45 minutes

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989 57

Page 5: Information-seeking strategies of novices using a full-text

passed and subjects had not yet completed their search. Total number of moves were determined by corroborating

observer notes and keystroke data. These data served as dependent variables for comparing groups and type of task variables. It was hypothesized that success, time taken to complete search, and total number of moves made would be dependent on user group and task. To test these six hypotheses, non-parametric statistical procedures were

used. Because success was not measured on an interval scale, and neither success nor time were found to be nor- mally distributed, Kolmogorov-Smirnov tests for two in- dependent samples were used at an alpha level of 0.05 for all hypotheses related to outcomes (Siegal, [34]). The SPSS X computer program was used to conduct the data

analyses.

Search Process

It was assumed that the strategy a user applies is depen- dent on his/her mental models for the task, database, and

search system and how the information-seeking system manipulates those models. The discrete moves made by searchers are considered as traces of this information-seek- ing strategy. Which moves are used and in what combina- tions depend on the interaction of user, setting, task, database, and search system factors. The process used by a

professional searcher using an online system is surely dis- tinct from the process used by an end-user searching a CD- ROM based encyclopedia. The set of tactics available to users are constrained by the search system as well as the users’ mental models. The search process is difficult to un- derstand because it is composed of the interaction of all the factors discussed above.

An approach to analyzing the results of a human- machine interaction is to have the machine record the in- teractions unobtrusively. Rice and Borgman [30] discussed the advantages of this type of data collection and Card, Moran and Newell [7] made extensive use of the method in building a theory of information processing systems. Although the method of data collection is appealing, prob- lems of analyzing these large quantities of data remain.

One approach to analyzing search data is to examine key aspects of the data in discrete, descriptive fashion. In

this study, this approach was taken for use of system fea- tures, query terms used, type of query formulation, and

use of system feedback. These data were extracted from keystroke traces and observer notes and frequencies or pro- portions used to summarize them by group and task.

Another approach to organizing and analyzing key- stroke level data is to define a state map of possible moves and characterize each search pattern as a sequence of state changes according to the state map. By assuming that ar- rival at a certain state is dependent on the previous state, the search pattern can be modeled as a Markovian process. Transition matrices for various lengths of sequences can then be formed and compared. This method was first used by Penniman [29] to examine user search and system re- sponse patterns in a bibliographic database system. Tolle

[37] used the method to describe use of different online catalogs, and Tolle and Hah [38] used the method to com-

pare NLM databases. Borgman [5] used transition matrices as one of the measures for comparing training treatments and Chapman [8] used the method to compare groups of searchers. A variation of this technique was used in this study to identify search patterns and, by inference, infor-

mation-seeking strategies. The actual procedure was to develop a state map, count

occurrences of moves in each state, form transition ma- trices, examine these frequencies and matrices, collapse the state map and original matrices into a simpler state map and corresponding matrices and compare them across groups and tasks. First, the system options for querying and viewing results were considered and a state map of tactics (moves) constructed. Examination of all possible actions resulted in a state map definition which took into account moves assumed to be related to the information- seeking strategy. It is important to note that the state map was constructed from the user’s rather than the system’s point of view. The states were abstractions of user behav-

ior rather than simply keystrokes or menu choices. Al- though some of the states correspond to a single keystroke, for example, show titles and show text, most required a se- quence of keystrokes, for example, enter or edit an entire query formulation. By this method, what would appear as

a single system state- the key which initiated search- was considered as six states from the user’s view- six ways of entering or refining queries. Since the focus was on query formulation and revision, the codes repre-

sented types of query revisions rather than system features. Thus, the possible states were constructed to include all possible intellectual moves rather than system moves.

The state map developed included the following actions which were grouped under the two categories of lookup and examine: Lookup-original term, new term, broader

term, narrower term, synonym, reorder terms, change rela- tion (system defaults); Examine-show titles, show text.

Note that lookup was used to group moves that initiated or

refined queries, or altered system defaults; the query moves were always completed by pressing the function key which signaled the system to begin search with the en-

tered query and the change relation moves were signaled by one of two function keys. Since users in this study were

not permitted to print or save their results, examine was used to group only two moves which allowed them to view results of search; these moves each corresponded to a func- tion key.

This state map was used to tabulate frequencies of moves for each state and to form first order transition ma- trices for each search. A first order transition matrix was formed by crossing the state map with itself, thus forming a nine by nine grid defining all possible two-step moves. The matrix was completed by counting all two-step moves for each search. Since judgments about how queries were changed (for example, whether a change was a broadening or narrowing) could only be made by humans, all coding of individual transition matrices was done by hand. As

58 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989

Page 6: Information-seeking strategies of novices using a full-text

searches were coded, patterns began to emerge. Two pat-

terns in particular were noted. Some searches exhibited heavy concentrations in the upper left quadrant (lookup/

lookup) i.e., these searchers made more query formulation and refinement moves than examine moves. Others were more heavily concentrated in the lower right quadrant (ex- amine/examine). This led to concentration on the general moves, lookup and examine. The nine by nine first order matrices were collapsed into two by two matrices with the following cells: lookup/lookup, lookup/examine, examine/

lookup, and examine/examine. The ratios of each resulting cell to the total number of moves were formed and used to make comparisons between groups and tasks. Two other ratios were formed to allow comparisons based upon the patterns described above. One ratio compared number of lookup moves to number of examine moves. The other represented the tendency to stay in a state and was formed by dividing the sum of lookup/lookup and examine/exam- ine by the sum of lookup/examine and examine/lookup.

To conduct comparison tests with data which reflected frequency of state use or state change, assumptions about population distributions were made which allowed para- metric statistical tests to be made. The overall sample dis-

tribution of moves was found to be normally distributed via a Kolmogorov-Smirnov goodness of fit test (2 = .595,p = .87) which supported the assumption of popula- tion normality. The level of measurement was clearly interval and thus a t-test procedure was used to test differ- ences between group and task types of moves. Because hy- potheses were not stated a priori, these analyses yielded suggestive rather than demonstrative results.

Results

Results were organized into three groups:

1. Descriptive results for success, time, number of moves, and use of system features;

2. Analyses of query formulation from three perspec- tives: distinct facets used in query formulation, combi- nation of facets used in query formulation, and use of system feedback to modify query formulation;

3. Examination of search patterns through transition ma- trices, in particular, the use of lookup and examine moves.

Descriptive Results

Data for success, time and number of moves were sum- marized and compared by group and task.

Success. Most subjects were successful in finding the required information. Frequency of success is reported by group and task in Table 1. Two-thirds of all the searches were judged to be successful on the dichotomous scale. Sixth graders were clearly more successful than the younger subjects. These differences were statistically sig- nificant (Kolmogorov-Smimov 2 = 1.78, p = .003). It is not surprising that older, more experienced subjects were

TABLE 1. Frequency of success by group and task.

Successful Unsuccessful Open Closed Open Closed

Grades 314 6 5 8 8 Grade 6 11 10 1 2 Total 17 15 9 10

more successful in finding information. What portions of

these differences are due to task domain knowledge base, information-seeking experience, and system manipulation require further study. Since subjects were asked to record all relevant articles, a second measure of success was pos- sible for the open task which required multiple articles for full consideration. The mean numbers of articles listed

were 1.4 by the younger group and 2.9 by the older group. Older subjects consistently found more relevant articles and these differences were statistically significant (Kol- mogorov-Smimov 2 = 1.574, p = 0.014).

Subjects were equally successful on both tasks; that is, fifteen searcher pairs were successful on both, six were un- successful on both, and only four were successful on only one of the tasks. No statistical difference was found (Kol- mogorov-Smimov Z = 0.099, p < 0.99). Although task certainly plays a role in determining what queries are for- mulated, either the user factor was more dominant in de- termining success or the two tasks assigned in this study

were not discriminating enough to affect success. Time. The amount of time taken to conduct searches

was dependent on both group and task. The mean time taken for all searches was 36 minutes (39.4 minutes for younger group, 32.2 minutes for the older group). The dif- ferences in time taken by the two groups to complete a search was statistically significant. (Kolmogorov-Smimov Z = 1.419, p = 0.036). It should be noted that 18 of the younger group searches were stopped at the 45 minute maximum and only 8 of the older group searches were stopped. It is likely that even greater differences would have been found had there been no time limits on searches. That younger subjects would take more time was pre- dictable since their reading rates when examining articles were generally slower. Future studies should consider indi- vidual components such as query formulation time and reading time rather than only the aggregate search time.

Time was also dependent on task. The mean time for the open search was 41.1 minutes and the mean time for the closed task was 30.7 minutes. Differences in time be- tween the two tasks was statistically significant (Kol- mogorov-Smimov Z = 1.598, p = 0.012). Since subjects were looking for multiple facts in the open task, it is not surprising that they took longer to search for those facts.

Moves. The number of moves taken to complete each search was examined by group and by task. The mean number of moves taken for all searches was 21 .O (22.4 for the younger group, and 19.5 for the older group). Differ- ences between the groups on number of moves made were not statistically significant (Kolmogorov-Smirnov

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989 59

Page 7: Information-seeking strategies of novices using a full-text

L = 0.644, p = 0.802). Although older subjects took fewer moves in general, they typically found more relevant information for the open task; their moves were more effi- cient. As expected, number of moves was dependent on task. The mean numbers of moves for the tasks were 24.7 for the open task and 17.2 for the closed. These differences were statistically significant (Kolmogorov-Smirnov z = 1.598, p = 0.012).

Overall, results for this sample produced predictable patterns with respect to success, moves, and time. An in- teresting relationship between success and number of moves was found (Pearson R = -0.43. p < 0.01). Suc- cess was more likely to occur with a small number of moves. This supports the rather obvious notion that success is related to the quality of moves rather than the quantity.

Use of System Features. The search system offers a variety of features to control search. The main screen used for query formulation consists of six windows: function key menu to aid in move selection; option menu to allow scope changes; relation menu to allow proximity changes and enable the NOT operator; result window to display in- termediate results of a search; query formulation window which provides five lines for queries; and a status window which shows a trace of screens displayed, i.e. where one is

in the program. The query formulation window begins the second through fifth lines with the words “along with” to explicate that terms on separate lines are linked by AND. To change these defaults to NOT, the relation menu must be selected and NOT chosen from within that menu. No subjects used the NOT feature. To use the OR connective, terms are linked on a single line by a comma, but there is no explicit indication of this feature anywhere except in the manual. Likewise, masking and right truncation features are explained only in the manual. Table 2 presents a sum- mary of what system features were used by group and task.

Note that most subjects generally used the default con- ditions of the program. Forty of the fifty one searches (78%) used the AND connective. Given the subjects’ will- ingness to explore other features of the system, this non- use of OR and NOT should be considered in future studies. In particular, the system’s handling of these fea-

tures should be carefully considered. The use of the trun- cation feature illustrates that students did learn from the

TABLE 2. Number of subjects using of system features.

Third/Fourth Sixth

Feature Open Closed Open Closed

AND 12 (49) 9 (53) 11 (73) 8 (40) OR 0 0 1 (3) 0 MASK 0 0 0 0 TRUNCATE 5 (20) 4 (14) 2 (15) 0 SCOPE 2 (2) 0 1 (4) 0 PROXIMITY 1 (1) 2 (5) 5 (10) 5 (11)

Note: Figures in parentheses are total number of times each feature

was used.

instructional demonstrations since that was the only intro-

duction to that feature and students were not given access to the manual. Use of proximity features is interesting, but based on observer notes, it is clear that many of these ef- forts were meant for exploring the system rather than fo- cusing or expanding search. Two sixth grade pairs were particularly fascinated by proximity selection and changed proximity in both of their searches.

When success was considered, use of features other than AND differed for the younger and older subjects. Of the fourteen sixth grade searches that used some system feature other than AND, eleven were successful. Of the fourteen third/fourth grade searches that used some system feature other than AND, only one was successful. Clearly, the younger users who were successful attended (wisely) to the query formulation and feedback and ignored the com- plexities of the system by accepting defaults. Study of the correct use of powerful search features deserves attention from both system designers and educators if novices are to take full advantage of electronic information systems. Overall, these summary data reveal reasonable and consis- tent results. If anything was surprising, it was the gener- ally high level of performance of these young novices.

Query Formulation

Examination of query formulations served a primary role in exploring users’ information-seeking strategies.

Query formulation was examined from three perspectives: the use of key facets for each task; the actual formulation of queries by combining facets, other vocabulary, and sys-

tem features; and the effects of system feedback on refine- ment of the original query formulation.

Task Facets. Consideration of the selection of terms provided an indication of how well the task was under- stood and internally represented - an inferred look at users’ task domain knowledge base. The two tasks used in this study were constructed to be concrete and minimally complex. Although stated in paragraph form, with distract- ing information present, subjects typically used terms present in the task statement. This surely presented a skewed look at subject knowledge bases since the task was formulated for them.

The open task required subjects to find information about women who have travelled in space. The facets- person, place, and activity-were easily grasped by sub-

jects and five terms were commonly used to represent these facets. Table 3 presents the proportion of occurrence for these five terms for all open task queries by age group. Note that multiple terms often occurred in single queries and thus the percentages do not sum to 100. In general, subjects chose appropriate terms. Other terms used less often included: lady, NASA, spacecraft, human, pilot, etc. The fact that most subjects were able to identify reasonable terms for the tasks indicated that their task domain knowl-

edge and ability to extract key facets from this task state- ment was good and the results of their searches were due

60 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989

Page 8: Information-seeking strategies of novices using a full-text

TABLE 3. Proportion of occurrence of key terms in open task.

Term Third/Fourth Sixth

Woman/women 32% 50%

Female 18% 22%

Space 53% 52%

Traveler 27% 20%

Astronaut 11% 19%

Note: Percentages are based on the total number of queries. There were

191 queries for third/fourth graders and 151 queries for sixth graders.

more to their information-seeking experience and use of the EIS itself.

The closed search required subjects to find the year that speed skating was introduced into the Olympic Games. For

the closed search, sixth graders made many fewer queries than the third/fourth graders. These results are presented in Table 4.

The older subjects were much more likely to use the

specific term “speed skating”, which resulted in 10 occur- rences of that term in four articles, including articles on

ice skating and Olympic Games, either of which when se- lected, displayed speed skating on the same page as the date, i.e., completed the task. Whether the ability to select an appropriate level of conceptual specificity is a result of physiological differences between these two age groups as suggested by developmental psychology, or is due to infor- mation-seeking experience should be explored in future studies. Like the open task, this task involved three facets-place, activity and time. The abstract facet, time, was largely ignored and activity at specific and general levels dominated search. For a well-defined activity, like speed skating, this was sufficient to allow quick success.

Terms in Query Formulations. A second view of term

selection was gained by examining the actual query formu- lations. Since subjects were novices from both informa- tion-seeking and search system vantages, it was not surprising that so many used natural language queries. Some subjects actually entered full length questions to the system. Examination of all queries allowed classification

into six categories-a single term on a single line, single

terms on multiple lines (terms connected by AND), a phrase consisting of terms and/or adjectives on a single line, phrases on multiple lines (phrases connected by AND), a sentence (term(s), verb(s) and possibly modifiers) on a single line, and sentences on multiple lines. Term

here actually meant facet since expressions such as “speed

TABLE 4. Proportion of occurrence of key terms in closed task.

Term Third/Fourth Sixth

skating 14% 2%

speed skating 52% 77%

Olympics 37% 59%

event 19% 17%

Note: Percentages are based on the total number of queries. There were

139 queries for third/fourth graders and 66 queries for sixth graders.

TABLE 5. Query type by group and task.

Third/Fourth

Quev ‘be Open Closed

Sixth

Open Closed

one term 35 (18%) 23 (17%) 22 (15%) 12 (18%)

one term

per line 30 (16%) 30 (22%) 40 (26%) 16 (24%)

one phrase 51 (27%) 43 (31%) 35 (23%) 11 (17%)

one phrase

per line 20 (10%) IO (7%) 27 (19%) 17 (26%)

one sentence 52 (27%) 26 (19%) 22 (15%) 5 (8%)

one sentence

per line 3 (2%) 7 (5%) 5 (3%) 5 (8%)

skating” and “space travel” were counted as single terms.

Table 5 presents frequencies and proportions of each type of query by group and task.

Younger subjects were more likely to use actual sen- tences to query the system. In a sense, these novices as- signed considerable “intelligence” to the system. This reflected a lack of understanding on the part of these users

about how the system worked- a poorly defined mental model of this system. This is not at all surprising, since none of the subjects had any previous online searching ex- perience. One possible explanation is that subjects used their existing mental model for an encyclopedia as a base and added a computer component. The computer compo- nent was “intelligent” since all subjects knew that comput-

ers are interactive. Future studies should examine how use of natural language (sentences and phrases) lessens as users become more experienced with the system.

Less than half of all the query formulations were single terms or single phrases. These would likely be the type of queries formulated by these subjects using a print encyclo-

pedia because they had reasonable amounts of experience with how lookup is performed in a print encyclopedia. Clearly, subjects recognized that the electronic encyclo- pedia was conceptually different than the familiar print version.

Feedback Effects on Query Formulation. A major

advantage of an interactive search system is the ability to plan for and use intermediate results in conducting a search. To examine how subjects used system feedback to modify their queries, an analysis of initial queries and first refinements was conducted with respect to the system feedback. Because the subjects were novices, the initial

query likely represented a best approximation of an infor- mation-seeking system “run”. Since no strategies like the building block approach were apparent, this is particularly reasonable.

The results of original query formulation and subse- quent modification were categorized into five cases: no hits on initial query and no hits on subsequent query (la- beled 0->O); no hits on initial query and some hits on sub- sequent query (labeled 0->hits); some hits on the initial and no hits on the subsequent (labeled hits->O); hits on both queries (labeled hits->hits); and hits on the first query and immediate success (labeled hits->succ). Table 6

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989 61

Page 9: Information-seeking strategies of novices using a full-text

TABLE 6. Results of first two queries by group and task.

Third/Fourth Sixth

Results Open Closed Open Closed

O-0 8 4 7 3

0 --, hits 4 1 3 1

hits + 0 2 3 1 3

hits -+ hits 0 2 1 3

hits + succ 0 3 0 2

presents these results by group and task. Most of the

searches (43%) yielded no hits on the first two queries. These results were typically due to the use of phrases and sentences in forming queries. In general, subjects eventu- ally dropped modifiers and verbs and found title lists to ex- amine; they adapted their information-seeking tactics.

Searchers who began with no hits and adjusted their query formulation to find hits would seem to have good

understandings of the system and be making progress to- ward success. Most (seven out of nine) of the searches in this category were on the open task, and seven out of nine were eventually successful. For five of these searches, pro- gress was made via the second query. Two of the searcher pairs broadened their original query to “space”, thus re- trieving 894 articles -an overreaction to having no hits.

The other two searches illustrate some unique characteris- tics of this full-text system.

One pair of searchers began with “women astronauts”, a reasonable term, which yielded no hits. The modified

query, “woman in space” is conceptually more specific

which logically would not be a good choice if the aim was to move to a broader topic. However, this query retrieved six articles (all relevant). Moreover, had the revised query used the plural form “women in space”, no hits would have been reported. The other search in this category rein- forces the somewhat arbitrary nature of full-text retrieval when phrases or multiple terms are used. The original query was a phrase, “year of speed skating”, which yielded no hits. The revised query, “year” AND “of speed” AND “skating” yielded three articles. (Note that the de- fault proximity condition limited occurrences to a para- graph.) This awkwardly constructed query was actually

useful in retrieving relevant information. It is clear that some thesaurus should accompany such systems to help control the arbitrary nature of written language and help users identify word forms and synonyms.

Some insights into subjects’ lack of strategy and follow- through was gained by examining the nine searches which began with some hits but which yielded no hits after refor- mulation. These searches would seem to indicate counter- productive moves and only four of the nine were eventually successful. Most of these searches (seven of nine) involved the closed task. For two of the searches, the original query formulation did not yield fruitful begin- nings-one retrieved 894 articles, the other a single false drop. One pair of subjects did not bother to examine a good set of 16 titles at all, and another pair selected a false

drop and ignored the relevant articles before reformulating

their query. The other five searches were all similar in that they selected a relevant article, “Olympic Games”, which brought up a page of text that indicated in two sentences that the winter Olympics began in 1924 and included speed skating as an event. None of these subjects (three pairs of sixth graders and two pairs of third/fourth graders) ex- tracted this fact from the two sentences and thus continued

with their search. Only two of these pairs eventually suc- ceeded in finding the required fact. The extent of such fail- ure to extract information given appropriate text, and whether this inability to extract information once its con- text was retrieved is due to subjects losing sight of the goal

because they were focused on the system, the situation, or reading, bears future investigation.

Observer notes indicated that some subjects used terms they discovered in reading text to refine subsequent queries. For example, names of astronauts were subse- quently used in queries and the term “astronaut” itself was

used in queries immediately following examination of the article “space exploration” in which it appeared. The ob- server also noted subjects’ comments about selecting articles to examine because of high frequencies of term oc- currence. These observations demonstrated that searchers used feedback from the system in examining title lists and

article text as well as from system reports of number of hits immediately after a query.

Search Pattern Analysis

In this study, a search pattern was the set of moves

users made during an entire search session. A general characteristic of search patterns that makes them difficult to compare is that they are unique entities. In this study, they varied in length from two to 51 moves, were distinct with respect to query formulation, how the queries were ordered, use of system features, and yielded a variety of outcomes.

Using the state map described in the methodology sec- tion, frequency counts for each state (move) were analyzed and order one matrices were formed and discussed. The state map used to generate the transition matrices was then collapsed to focus on two conceptual states-lookup and

examine. New transition matrices were formed and ana-

lyzed by forming ratios among various cells and compar- ing those ratios by group and task.

State Map Frequency Analysis. Table 7 presents the mean number of times users entered each state (made a particular move) organized by group. The mean differ- ences between groups was statistically significant for only

one type of move-narrow. Younger subjects in general spent more time refining their queries, and narrowing was the most frequently made move. Narrowing was used by subjects even when no hits were obtained. This illogical action may be due to the inverse relationship between use of AND and resulting outcome, i.e., adding terms linked by AND (the default in this system) restricts search- actually retrieves less articles. This relationship was obvi-

62 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1969

Page 10: Information-seeking strategies of novices using a full-text

TABLE 7. Mean number of moves by group.

Move (state) Third/Fourth Sixth

original 1.00 1.00

new 1.52 1.63 broaden 3.81 3.00 narrow 4.67 2.54 synonym 1.26 0.88 reorder 0.00 0.13 relation 0.44 0.54 titles 5.93 5.71 text 4.78 5.04

Note: each search had exactly one original (initial) move.

TABLE 8. Mean number of moves by task.

Move (state) Open Closed

original 1.00 1.00

new 2.31 0.80 broaden 4.04 2.80 narrow 4.12 3.20

synonym 1.65 0.48 reorder 0.12 0.00

relation 0.46 0.52

titles 6.38 5.24

text 5.62 4.16

Note: each search had exactly one original (initial) move.

ously not understood by some of the subjects. Table 8 presents similar data by task.

Since the open task required more total moves, it fol- lows that components of the open moves should outnum- ber components of the closed task. Mean differences were statistically significant for only the new and synonym states. This was due to the richer set of terms subjects used for the open task.

The coarser state map which included only lookup and examine states was also analyzed. For the entire sample, there were a total of 576 lookup moves and 547 examine moves. The mean numbers of lookup moves made were 12.7 and 9.7 for younger and older users respectively. Al- though younger searchers made an average of three lookup moves more than older searchers, the differences in these means were not statistically significant (T = 1.43,~ = 0.158). The mean numbers of examine moves made were 10.7 and 10.8 for the younger and older groups respec- tively. These means were much more alike than the lookup means and the differences were not statistically significant (T = -0.03,~ = 0.978). The ratio of lookup moves to examine moves was formed for each search and com- pared across the two groups. The mean ratio for the younger group was 1.52 and the mean ratio for the older group was 1.11. Although the older searchers showed more balance between lookup and examine, the differences between these means were not statistically significant (T = 0.95,p = 0.347).

Similar comparisons were made by task. The mean numbers of lookup moves were 13.7 and 8.8 for the open

and closed tasks respectively. These differences were statistically significant (T = 2.43,~ = 0.019). The means for examine moves were 12.0 and 9.4 for the open and

closed task respectively. These differences were not statis- tically significant (T = 1.57,~ = 0.122). Since the open task required more total moves than the closed task, it was predictable that the open task would require more lookup and examine moves than the closed task. The larger dis- parity between the tasks on lookup may be due to subjects identifying more terms to try for the open task. The ratios of lookup to examine were close for the two tasks, 1.22 and 1.43 for the open and closed respectively. Perhaps the

ratio is a better estimate of user characteristics than task characteristic.

Overall, the results of considering individual moves paralleled the results for total moves. Although older searchers tended to use fewer moves of all types and had a better balance of lookup and examine moves, these differ-

ences were not statistically significant. Perhaps the two groups were more alike as novices using a new system than different due to information-seeking experience. Task differences also paralleled differences found for total num- ber of moves.

Order One Analysis. The order one transition matrix for all searches is pre-

sented as Figure 1. The states: original, new, broaden, nar- row, synonym, reorder, change relation, show titles, and show text were coded as 0, B, N, NA, S, R, RE, TI and TE respectively. The zeroes in the original and show text columns were due to the impossibility of going to an origi- nal state once a move has been made and showing text from any state except a title list or the always-active change relation. The large number of transitions from a narrow to broaden state was likely due to a combination of narrow being the most common state and the common use of sentence or phrase queries. That is, when a user finally moved from a phrase or sentence to a term or set of terms, this is a broadening of the search. It would be interesting to see if expert users exhibited the opposite effect-more narrowing to eliminate false drops.

To explore the information-seeking patterns by compar- ing groups and tasks, the coarser state map having only two states was considered. Figure 2 presents the resulting transition matrices by group and by task. As is apparent from these data, the two states where users made the same

M

0

V

e F

r

0

m

0 N

Original 0 3 New 0 13

Broaden 0 11

Narrow 0 12

synonym 0 4

Reorder 0 0 Relation 0 4

Tifles 0 7 Text 0 26

Move To

B NA S

11 10 4 20 7 6

29 38 7

66 39 14 16 6 15

0 0 1

9 11 0

8 25 2 16 51 6

FIG. 1.

R RE TI TE

0 3 20 0 0 3 30 0

1 6 82 0

1 6 46 0

0 2 10 0

00 0 0

00 2 0

0 2 0 250 1 3 107 0

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989 63

Page 11: Information-seeking strategies of novices using a full-text

Transition Matrices by Group

Third/Fourth Graders Sixth Graders

Lookup Examine Lookup Examine

Lookup 233 103 Lookup 145 87

Examine 83 186 Examine 64 171

Transition Matrices by Task

Open Task Closed Task

Lookup Examine Lookup Examine

Lookup 254 97 Lookup 124 93

Examine 76 215 Examine 71 142

FIG. 2.

type of move dominated the search patterns. This is not surprising since common sequences of moves were to for-

mulate and reformulate a query until some hits were gener- ated and to examine a set of articles resulting from a successful query. Since most users ended their search in the display text state, we would expect that the number of transitions from lookup to examine would generally be higher due to a cumulative effect.

To compare data across group and task, two sets of six ratios were formed. The four cells of a transition ma- trix were represented as LL (lookup/lookup), LE (lookup/ examine), EL (examine/lookup), and EE (examine/ examine), and the total number of moves was named T. The first four ratios were simply the quotient of number of

state occurrences and total number of moves. These ratios represented single transition measures for staying in the lookup state (LL/T = looklook), staying in the examine state (EE/T = examexam), and changing from one of the states to the other (LE/T = lookexam, and EL/T = examlook). The fifth ratio compared lookup to examine moves and was named as leratio (LL/EE = leratio). The

sixth ratio represented the tendency to stay in a state and was formed by dividing the sum of the lookup and ex- amine moves by the sum of the moves changing from one state to another and was coded as stay (LL + EE)/ (LE + EL) = stay). These ratios are presented for group and task in Table 9. With respect to group, none of the mean differences between these ratios were statistically significant. Older searchers again showed greater balance in their searches than younger searchers. They tended to use examine moves more heavily and to move between states more readily. The younger searchers were more likely to get bogged down in query formulation and less

TABLE 9. Mean ratios of state transitions by group and task.

Group Task

Ratio Third/Fourth Sixth Open Closed

looklook 0.37 0.29 0.40 0.25 lookexam 0.18 0.21 0.15 0.24 examlook 0.11 0.13 0.11 0.13 examexam 0.34 0.37 0.35 0.37 leratio 1.94 1.30 1.73 1.54 stay 3.71 2.77 4.05 2.45

likely to take full advantage of title lists or article texts.

The older searchers used more examine moves than lookup moves and were more tenacious in examining relevant titles and reading text. Since older subjects were generally more successful, the relationship between lookup and ex- amine should be examined more fully in future studies.

Ratios for task were different at statistically significant levels in three cases: staying in lookup (looklook, T = 2.39,~ = 0.021), changing from lookup to examine (lookexam,T = -3.87,~ = O.OOO), and tendency to stay in a state (stay, T = 2.12,~ = 0.039). The open task re- quired more moves and subjects were able to generate more terms for formulating queries for it, thus leading to

more combinations of queries to try.

Overall, these search pattern analyses depicted a strat- egy that might be called interactive browsing. It seemed that novices really made no plans, but reacted to system responses once an initial query was formulated.

Conclusions and Recommendations

This research considered information-seeking at the sys- tem interaction level only and was further limited by fo- cusing on a single database and its accompanying search system. The former addressed critical human-machine co- operation in information-seeking and the latter served to control two key information-seeking factors (database and search system) isolating attention on user and task factors. The results demonstrated that, in general, young novice users could successfully use a full-text, electronic encyclo- pedia with minimal introductory training. Subjects in third or fourth grade were less successful and took more time than subjects in the sixth grade. Subjects were equally suc-

cessful on the open and closed task, but the open task took longer to complete and required more moves. Although the system provided powerful search features, most novices

accepted the system defaults. System designers should carefully consider what features are made explicit to users and which are hidden and how defaults are set if they ex-

pect novices to take full advantage of a system. User strategies were heuristic in that they were highly

interactive rather than planned. Subjects were able to iden- tify key facets of the tasks but had difficulty formulating

effective queries. Many, especially the younger searchers, used sentences or phrases as queries, reflecting an ill- defined mental model of the search system, a kind of hybrid between a print encyclopedia and an interactive computer program. System feedback was used to reformu- late queries by default when no articles were retrieved for a query, and voluntarily by using terms found in the text of articles. Feedback was also used to judge the relevance of titles by observing the frequency of occurrence of terms in an article. Some searchers found relevant information be- cause they were lucky enough to use a phrase query that occurred in the text of the encyclopedia and others found nothing when using logical refinements of previous queries. Addition of a thesaurus or usage-sensitive search aid would

64 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989

Page 12: Information-seeking strategies of novices using a full-text

certainly lead to more efficient searches and likely to more effective ones as well.

Older, typically more successful, searchers exhibited a better balance between lookup and examine moves, actu- ally favoring examine moves overall. Although it is likely that expert searchers could locate relevant information in this database in a more direct manner using carefully planned strategies, this type of database may lend itself to highly interactive, heuristic searching. Perhaps a viable strategy in a full-text, no-connect-charge environment is a “scan and select” technique where the searcher uses one general term or phrase to locate a title list and then uses scanning methods and frequency count feedback to quickly judge which articles to examine; followed by scanning of the article by using the highlighted terms in the text to focus on relevant information and locate other terms to use in subsequent queries. Since these subjects had no previ- ous experience with such a system it is likely that they formed mental models based upon existing models of print encyclopedias and computers. Knowing that computers are interactive because they had experience with computer as- sisted instruction, they sought information from this sys- tem by initiating dialogues rather than controlling it by issuing commands. It may be that full-text systems such as this electronic encyclopedia are inherently compatible with novice users’ simple, interactive information-seeking behaviors.

Much remains to be learned about information-seeking strategies, full-text databases, and searching by end-users. Some questions which bear study include: What mental models do people have for search systems? How do mental models for search systems change as electronic informa- tion systems are experienced? What conceptual models are best devised to help users build appropriate mental models for these systems and improve their information-seeking strategies? From a design perspective, this system was found to be a good first start toward full-text retrieval sys- tems for the end-user market. Questions about actual sys- tem performance from recall and precision vantages remain, and the problem of providing powerful search fea- tures and default conditions in ways that do not clutter screens or threaten end-users must also be addressed.

Full-text, CD-ROM databases and search systems are sure to proliferate. What remains is to determine whether

they are effectively used and what their effects are on in- formation specialists and end-users. End-users can suc- cessfully use such electronic information systems, whether they actually will use them remains to be seen.

Acknowledgment

This research was partially funded through a University of Maryland Division of Human and Community Re- sources Provost’s Award. The author acknowledges the as- sistance of Dr. Gerald Teague and Ms. Diane Patrick in conducting the data collection.

Appendix A. Search Task Statements

Open Task

Travel in space is for real. First animals were sent into space to orbit the earth. Now humans pilot spacecraft in our universe. Although most pioneers are male, there are some females who have also explored space. Your task is to gather facts about these women who have been space travellers

Closed Task

The ancient Olympics began in Greece with only a few events of strength and speed for men alone. Today’s Olympics include many more events taking place in the summer and winter games for both men and women. Your task is to identify the year in which the speed skating event was introduced into the modem Olympics.

References

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

Bates, M. J. “Information Search Tactics.” Journal of the American Sociep for Information Science. 30(4):205-214; 1979.

Bates, M. J. “Idea Tactics.” Journal of the American Society for In- formation Science. 30(5):280-289; 1979. Bates, M. .I. “Search Techniques.” In Martha Williams, Ed. Annual Review of Information Science and Technology. White Plains, NY: Knowledge Industry Publications; 1981: 139-169. Blair, D. C.; Maron, M. E. “An Evaluation of Retrieval Effective-

ness for a Full-text Document-retrieval System.” Communicafions of rhe .4CM. 28(3):289-299; 1985.

Borgman, C. L. “The User’s Mental Model of an Information Re-

trieval System: An Experiment on a Prototype Online Catalog.” In-

ternational Journal of Man-Machine Studies, 24(1):47-64, 1986. Borgman, C. L. “Psychological Research in Human-Computer In-

teraction.” In Martha Williams, Ed. Annual Review of Information Science and Technology. White Plains, NY: Knowledge Industry

Publications; 1984:33-64.

Card, S. K.; Moran, T. P.; Newell. A. The Psychology of Human- Compufer Interaction. Hillsdale, NJ: Lawrence Erlbaum Associates;

1983. Chapman, J. L. “A State Transition Analysis of Online Information-

Seeking Behavior.” Journal of the American Society for Information Science. 32(5):325-333; 1981.

Chen, C. “Micro-based Optical Videodisc Applications.” Micro- computers for Information Management. 2(4):217-239; 1985.

Chen, C.; Hemon, P. Information Seeking. New York: NeaI-Schu-

man Publishers; 1982.

Chen, P. P. “The Compact Disk ROM: How it Works.” IEEE Spec- trum. April:44-54; 1986.

Cohen, M. E; Flagle, C. D. “Full-text Medical Literature Retrieval

by Computer.” Journal of the American Medical Association. 254(19):2768-2774; 1985. Connolly, B. “The Inverted File (Guest Editorial).” Online. November:6-8; 1986. Desmarais, N. “Laser Libraries.” Byte. May:235-246; 1986.

Fenichel, C. H. “The Process of Searching Online Databases: A Re-

view of Research.” Library Research. 2:107-127; 1980. Fidel, R. “Online Searching Styles: A Case-study-based Model of

Searching Behavior.” Journal of the American Society for Informa- tion Science. 35(4):211-221; 1984.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989 65

Page 13: Information-seeking strategies of novices using a full-text

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

Fidel, R.; Soergel, D. “Factors Affecting Online Bibliographic Re- trieval; A Conceptual Framework for Research.” Journal of the American Society for Information Science. 34(3): 163-180; 1983.

Gentner, D.; Stevens, A. L. (Eds.) Mental Models-. Hillsdale, NJ:

Lawrence Erlbaum Associates; 1983.

Harman, J. “Reuters: A Survey of End-User Searching.” Aslib Pro- ceedings. 38(1):35-42; 1986. Hawkins, D. T. “Online Information Retrieval Systems.” In Martha

Williams, Ed. Annual Review of Information Science and Technol- ogy. White Plains, NY: Knowledge Industry Publications;

1981:171-208.

Johnson-Laird, P. N. “Mental Models in Cognitive Science.” Cog-

nitive Science. 4:71-115; 1980.

Kister, M. K. F. Encyclopedia Buying Guide (3rd Edition). New

York: Bowker; 1981.

Lancaster, F. W. Evaluation of On-line Searching in MEDLARS (AIM-TWX) by Biomedical Practitioners. Urbana, IL: University of

Illinois, Graduate School of Library Science; 1972 (Occasional Pa-

pers No. 101).

Mancall, J. C.; Drott, M. C. Measuring Student Information Use. Littleton, CO: Libraries Unlimited; 1983.

Marchionini, G.; Teague, J. “Elementary Students’ Use of Elec-

tronic Information Services: An Exploratory Study.” Journal of Re- search on Computing in Education. 20(2):139-155; 1987. Markey, K.; Cochrane, P. A. Online Training and Practice Manual for ERIC Data Base Searchers (2nd Edition). Syracuse, NY: ERIC

Clearinghouse on Information Resources, (ED 212-296); 1981.

Norman, D. A. “Some Observations on Mental Models.” In Dedre

Gentner and Albert Stevens, Eds. Mental Models. Hillsdale, NJ:

Lawrence Erlbaum Associates; 1983. Ojala, M. “Views on End-user Searching.” Journal ofthe American Society for Information Science. 37(4): 197-203; 1986.

29. Penniman, W. D. “Rhythms of Dialogue in Human-computer Con- versation.” Ph.D. dissertation, Ohio State University, Columbus,

OH; 1975.

30. Rice, R. E.; Borgman, C. L. “The Use of Computer-monitored Data in Information Science and Communication Research.” Journal of the American Society for Information Science. 34(4):247-257; 1983.

31. Rouse, W. B.; Rouse, S. H. “Human Information Seeking and De-

sign of Information Systems.” Znformation Processing L Manage- ment. 20(1-2):129-138; 1984.

32. Salton, G. “Another Look at Automatic Text-retrieval Systems.”

Communications of the ACM. 29(7):648-656; 1986.

33. Sewell, W.; Teitelbaum, S. “Observations of End-user Online

Searching Behavior Over Eleven Years.” Journal of the American Society for Information Science. 37(4):234-245, 1986.

34. Siegal, S. Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill Book Co. 1956.

35. Tenopir, C. “Full-text Databases.” In Martha Williams, Ed. Annual Review of Information Science and Technology. White Plains, NY:

Knowledge Industry Publications; 1984:215-246.

36. Tenopir, C. “Full Text Database Retrieval Performance.” Online Review. 9(2):149-164; 1985.

37. Tolle, J. E. Current Utilization of Online Catalogs: Transaction Log Analysis. Final Report to the Council on Library Resources (Vol-

ume 1). (Research Report No. OCLCOPIURR-8312). Dublin, OH:

OCLC; 1983.

38. Tolle, J. E.; Hah, S. “Online Search Patterns: NLM CATLINE

Database.” Journal of the American Society for Information Sci- ence. 36(2):82-93: 1985.

39. Young, R. M. “Surrogates and Mappings: Two Kinds of Conceptual Models for Interactive Devices.” In Dedre Gentner and Albert

Stevens, Eds. Mental Models. Hillsdale, NJ: Lawrence Erlbaum

66 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-January 1989