8
Document Retrieval Expert System Shell with Worksheet-based Knowledge Acquisition Facility Chizuko Yasunobu, Rei Itsuki, Hiroshi Tsuji, and Fumihiko Mori Systems Development Laboratory, HITACHI, Ltd. 1099 Ohzenji, Asao-ku, Abstract This paper d e s c r i b e s ESOCKS, a domain shell for intelligent document retrieval. An important feature of ESOCKS is associative retrieval capability. ESOCKS associates input keywords with other keywords, and utilizes the augmented keywords to retrieve documents. Keyword association prevents users from missing sui table documents. The certainty factor attached to each retrieved document enables the user to select suitable documents from certain candidates. Another important feature of ESOCKS is its knowledge acquisition facilj ty, using eight kinds of worksheet. For experts, developing an expert system consists of extracting knowledge according to the format, and entering it in worksheets. Experience of using ESOCKS indicates that an expert can build an expert system by himself, and that associative retrieval is more intelligent than conventional keyword retrieval. 1, I nt roduct i on This paper presents an expert system shell, called ESOCKS, for intelligent retrieval of documents stored in an electronic filing system. Docuaient r e t r i e v a l is an important field. It is important and it’s potential applications. are varied enough to justify the development of an expert system shell of its own. Domain specific shells are now widely used to develop expert systems in various domains, such as classification [Z] , diagnosis, scheduling arid consultation. This is because a shell for a specific domain is equipped with a knowledge representation scheme, an inference mechanism, and a knowledge acquisition support tool suitable for the specific domain. Therefore, it is easier to use domain specific shells than to build a system from scratch using general purpose hi I ding tool sC41. The ESOCKS system [SI has been developed to support building i n t e l l igent document retrieval systems based on the keyword matching method. It Kawasaki-shi 215, Japan. provj des intelligent retrieval by associating input keywords with other keywords, and utilizing the augmented set of keywords to retrieve relevant documents. A unique feature of ESOCKS is its knowledge It can acquisition support function. ~iutouiatically translate the knowledge written on a formatted worksheet jnto executable rules and frames in the knowledge representation language of the generic tool behind the domain shell. Therefore, an expert user need not know the computer language, but has only to write domain knowledge on worksheets, a ubiquitous medium for intelligent work in the offices. Section 2 treats the domain of ESOCKS and its system structure. Section 3 discusses associative retrieval. Section 4 discusses the knowledge acquisiton support based on worksheets. Section 5 presents results and Section 6, applications. The domain of ESOCKS is intelligent retrieval of documents filed in an electronic form. Document retrieval from a file is a basic activity or work carried out in offices, research laboratories, and households. The advent of office automation technology, such a s multi-media databases, has made it possible to store a large amount of documents in electronic files. This in turn has made it very important to facilitate retrieval from those files [SI. High speed paging [SI arid browser are attempts to implenient file -searching in an electronic environment. A b a s i c and common search method is keyword matching. Several keywords are attached to each document when it is filed for the first time. Ylieri a user inputs certain keywords, those doc:uiiients with the sainc keywords are retrieved. However, the keyword matching method has 0730-3157/89/OOO0/0278$01.00 0 1989 IEEE 21s

Inteligencia Artificial

Embed Size (px)

DESCRIPTION

Inteligencia Artificial, paper en ingles

Citation preview

Page 1: Inteligencia Artificial

Document Re t r ieva l E x p e r t Sys tem S h e l l w i t h W o r k s h e e t - b a s e d K n o w l e d g e A c q u i s i t i o n F a c i l i t y

C h i z u k o Y a s u n o b u , R e i I t s u k i , H i r o s h i T s u j i , a n d F u m i h i k o Mori

S y s t e m s D e v e l o p m e n t L a b o r a t o r y , H I T A C H I , L t d .

1 0 9 9 O h z e n j i , A s a o - k u ,

A b s t r a c t T h i s paper describes ESOCKS, a domain shell

f o r i n t e l l i g e n t document r e t r i e v a l . An important fea ture of ESOCKS is assoc ia t ive r e t r i e v a l capabili ty. ESOCKS assoc ia tes input keywords w i t h other keywords, and u t i l i z e s the augmented keywords t o retrieve documents. Keyword association prevents users from missing sui t a b l e documents. The cer ta in ty f a c t o r attached t o each retrieved document enables the user t o select su i tab le documents from cer ta in candidates. Another important fea ture of ESOCKS is its knowledge acquisit ion f a c i l j t y , using eight kinds of worksheet. For experts, developing an expert system cons is t s of ex t rac t ing knowledge according t o the format, and entering i t i n worksheets. Experience of using ESOCKS indica tes t h a t an expert can build an expert system by himself, and t h a t assoc ia t ive r e t r i e v a l i s more in t e l l i gen t than conventional keyword r e t r i e v a l .

1, I n t roduct i on

T h i s paper presents an expert system s h e l l , ca l led ESOCKS, f o r i n t e l l i g e n t r e t r i e v a l of documents stored i n an e lec t ronic f i l i n g system. Docuaient r e t r i e v a l is an important f i e l d . It i s important and it’s poten t ia l applications. a r e varied enough t o j u s t i f y the development of an expert system shell of i t s own.

Domain specific s h e l l s are now widely used t o develop expert systems i n various domains, such a s c l a s s i f i c a t i o n [Z ] , diagnosis, scheduling arid consultation. T h i s i s because a s h e l l f o r a s p e c i f i c domain i s equipped w i t h a knowledge representation scheme, an inference mechanism, and a knowledge acquisit ion support t o o l su i tab le f o r the spec i f ic domain. Therefore, it is e a s i e r t o use domain s p e c i f i c shells than t o build a system from scratch us ing general purpose hi I ding tool sC41.

The ESOCKS system [SI has been developed t o support building i n t e l l igent document r e t r i e v a l systems based on t h e keyword matching method. It

K a w a s a k i - s h i 2 1 5 , J a p a n .

provj des i n t e l l i g e n t r e t r i e v a l by associating input keywords w i t h o ther keywords, and u t i l i z i n g the augmented set of keywords t o r e t r i e v e relevant documents.

A unique fea ture of ESOCKS is i ts knowledge It can acquisit ion support function.

~iutouiatically t r a n s l a t e the knowledge written on a formatted worksheet j n t o executable r u l e s and frames i n the knowledge representation language of the generic t o o l behind the domain she l l . Therefore, an expert user need not know the computer language, but has only t o write domain knowledge on worksheets, a ubiquitous medium f o r i n t e l l i g e n t work i n the of f ices .

Section 2 t r e a t s the domain of ESOCKS and its system s t ruc ture . Section 3 discusses assoc ia t ive r e t r i e v a l . Section 4 discusses the knowledge acquisiton support based on worksheets. Section 5 presents r e s u l t s and Section 6,

applications.

The domain of ESOCKS i s i n t e l l i g e n t r e t r i e v a l of documents f i l e d i n an e lec t ronic form.

Document r e t r i e v a l from a f i l e i s a basic a c t i v i t y or work car r ied out i n o f f i c e s , research laboratories, and households. The advent of o f f i c e automation technology, such a s multi-media databases, has made i t possible t o s t o r e a large amount of documents i n e lec t ronic f i l e s . T h i s i n t u r n has made it v e r y important t o f a c i l i t a t e r e t r i e v a l from those f i l e s [SI. High speed paging [SI arid browser a r e attempts t o implenient f i l e -searching i n an e lec t ronic environment.

A basic and common search method i s keyword matching. Several keywords a r e attached t o each document when i t i s f i l e d f o r the f i r s t time. Ylieri a user i n p u t s cer ta in keywords, those doc:uiiients w i t h the sainc keywords a r e retrieved.

However, the keyword matching method has

0730-3157/89/OOO0/0278$01.00 0 1989 IEEE 21s

Page 2: Inteligencia Artificial

several shortcomings. If i n s u f f i c i e n t keywords :we specif ied by the user, t h e user may not get the documents he or she r e a l l y wants. If too many keywords a r e i n p u t , too many documents may be re t r ieved , so t h e user may have t rouble ident i fying which ones a r e re levant . A fundamental problem i s t h e mismatch between the keywords or ig ina l ly given by t h e person who cataloged a document and keywords t h e user inputs t o re t r ieve it.

T h i s nieans t h a t keyword r e t r i e v a l of documents is r e a l l y a knowledge-intensive a c t i v i t y ,

requiring D l o t of knowhow. Therefore, document r e t r i e v a l needs t o be supported by the knowledge engineer ing approach.

2, 2 ESOCKS: A D o m a i n She1 I

ESOCKS stands f o r Empty Software for Common Knowledge Transfer Expert Systems. An important fea ture of ESOCKS is assoc ia t ive r e t r i e v a l capabi l i ty where a cer ta in ty f a c t o r is assigned t o each retrieved document. It provides i n t e l l i g e n t r e t r i e v a l using a knowledge base by associat ing i n p u t keywords with other keywords and u t i l i z i n g t h e augmented set of keywords t o retrieve documents. Keyword associat ion prevents users from missing su i tab le documents due t o insuf f ic ien t i n p u t keywords. The cer ta in ty fac tor attached t o re t r ieved documents enables the user t o select s u i t a b l e documents from cer ta in candidates.

2, 3 S y s t e m C o m p o n e n t s

ESOCKS cons is t s of a special ized inference engine f o r the assoc ia t ive r e t r i e v a l , and knowledge acquis i t ion too ls , a s shown i n Figure 1.

The knowledge acquis i t ion t o o l s el icit knowledge from an expert. The inference engine per forins assoc ia t ive r e t r i e v a l by u t i l i z i n g t h e knowledge base.

end-user

0 expert

word inference

Figure 1 ESOCKS components

ESOCKS runs on a workstation. Retr ieval documents a r e edi ted by a word processor, word processing software on the same workstation, and s tored i n document data bases. Worksheets a r e a l s o s tored i n document data bases. ESOCKS’s knowledge base is a par t of t h e knowledge base f o r ES/KERNEL, a general purpose expert system building tool . Together with t h e inference engine,

i t provides an executable knowledge base f o r ES/ KERNEL.

3:- A s s o c i at i v-EeRg-t r i eva_l-

3- 1 I n-ference M o d e I-,

T h i s sect ion descr ibes assoc ia t ive r e t r i e v a l , an important fea ture of ESOCKS. The associat ive

r e t r i e v a l method was decided during SOCKS (Software Common Knowledge Transfer System) rlevelopment. SOCKS’S purpose is t o t ransfer programming knowhow from s k i l l e d programmers t o unski l led programmers. It allows a user t o re t r ieve ins t ruc t ions , such a s ” Don’t continue waiting w i t h no expectation” by choosing c e r t a i n keywords such a s ” resource waiting” , ” DISK” . SOCKS’ s assoc ia t ive r e t r i e v a l adopts the OR method,

re l r iev ing a l l documents having a t l e a s t one input keyword, and proved t o be e€fective. In the same way t h a t EMYCIN came from MYCIN [ I ] , ISOCKS’ s inference engine is a general izat ion of SOCKS.

In conventional OR keyword r e t r i e v a l , there is d i f f i c u l t y selecting keywords. I f there a r e insuff ic ient keywords, the user may miss t h e dosired documents. Conversely, i f there a r e too iiiariy kcywords, many unrelated docunients inay be re t r ieved. Associative r e t r i e v a l can overcome 1 hese problems by keyword associat ion and documents re la t ing .

Figure 2 shows an example, where a user i n p u t the keywords ”resource waiting” and ”DISK”. Keyword associat ion augments the keywords f o r

r e t r i e v a l by adding the associated keyword ”truck overflow” t o the i n p u t keywords, using the following rule:

I f ”DISK” i s i n p u t t h e n assoc ia te ”truck overflow”

w i t h a strength of 0.8. The cer ta in ty f a c t o r s of both input ”resource waiting” and ”DISK” a r e set t o 1.0. The cer ta in ty fac tor of associated ” truck overflow” is determined by the s t rength of associat ion, 0.8.

” resource waiting” with a s t rength 0.5, i ts cer ta in ty fac tor would be set to: 0.8 + 0.5 - 0.8 X0.5 = 0.9.)

( I f ”truck overflow” were a l s o associated with

279

Page 3: Inteligencia Artificial

m keyword input select keywords

No. Don’t continue waiting o.94 10 with no expectation No- . . . . - . . 0.76

U resource waiting 0 DISK

n

Document r e l a t ing decides on the relevant document ” Don’t continue waiting w i t h no expectation”, using the following rule:

i f

then r e l a t e ”Don’t continue waiting with no

t h e cer ta in ty fac tor of ”truck overflow” is greater than 0

expect a t ion” wi th a s t rength of 0.5.

Then the ce r t a in ty f ac to r of ” Don’t continue waiting w i t h no expectation” becomes 0.8X0.5 = 0.4, because the cer ta in ty f ac to r of ” truck overflow” is 0.8 and the s t rength of r e l a t ion is 0.5. The document i s a l so re la ted t o ”resource waiting” w i t h a s t rength of 0.9. Its cer ta in ty fac tor i s l .0X0.9 = 0.9. Together the cer ta in ty fac tor of ” Don’t continue waiting w i t h no expectation” is 0.4 + 0.9 - 0.4X0.9 = 0.94.

Associative r e t r i e v a l broadens t h e objects fo r r e t r i e v a l by keyword associat ion, and prevents users from missing su i t ab le documents. By assigning ce r t a in ty f ac to r s t o retrieved documents and displaying them i n cer ta in ty fac tor order, it reduces the number of output documents and makes it easy t o f ind su i tab le ones.

3, 2 Inference Engine

ESOCKS’ s inference engine has s i x functions: r e t r i eva l method se lec t ion , basic r e t r i eva l , keyword r e t r i eva l , assoc ia t ive r e t r i eva l , keyword

i n p u t , and document display. These functions a re invoked according t o the system flow shown i n Figure 3.

start

a

retrieval

selection retrieval

er threv io

display display

next/previou

~ 1 associaiv; retrieval s!;topl

keyword input

U I keyword

keyword

relatine

Figure 3 System f low

ESOCKS has two r e t r i e v a l methods other than associat ive r e t r i eva l . Basic r e t r i e v a l shows a user the contents of documents, and the user can se l ec t and display a document. Keyword r e t r i eva l der ives a l l documents re la ted t o input keywords without considering the cer ta in ty factors . Associative r e t r i e v a l displays a l l associated keywords, and enables a user t o remove some of them i f he or she regards them a s i r re levant .

The keyword i n p u t function displays the keyword menu i n Japanese alphabet ical order or by the c l a s s i f i ca t ion category on the screen, a s shown i n Figure 4 ( a ) . The user selects a couple of keywords w i t h a mouse.

The document display function presents derived documents on the screen i n the cer ta in ty fac tor order, a s shown i n Figure 4 (b) . Reading retr ieved documents i n order, the user s e l ec t s su i tab le ones.

-. 3, 3 Knowledge B a s e .

An expert who i n t e n d s t o build a knowledge base must define four k i n d s of knowledge; documents, keywords, associat ion ru les , and re la t ion rules. These a re organized a s shown i n Figure 5.

Documents a re objec ts t o be retrieved and consis t of contents and tex ts . Document contents

1

280

Page 4: Inteligencia Artificial

Cancel I Finish I keyword m e n u

lar material Wary task SYSteB price(cost) responsible office tine necessary items pepers(fi1dreceive) agency frequency procedures

Figure 4(a) Sample k e p r d input

I N e x t PageIPreviousPageIP r i n tlE n dl No.1-1 hport Quota Application certainty factor = 0.991931

** Import Quota Application (Do not forget to obtain import quota!) **

Certificate of import quota is effective only for four months, starting from the day of issuance. In principle, the term of validity cannot be extended. If the time for delivery of an item is expected to be long, it is necessary to make a contract before filing an import quota application with the authorities. Therefore, you should be very careful not to forget the application procedure for import quota.

Figure 4(b) Saaple docuntent display Note: The original displaya in Japanese are translated.

Figure 4(c) Original display in Japanese

Figure 4 Sample screen

281

Page 5: Inteligencia Artificial

Input Output

title Don’t continue uniting with no expectation

documnt file then No.10 0.5 alphabetical order t

Figure 5 Knowledge structure

include titles like ”Don’ t continue waiting with no expectation” and pointers t o texts . Document contents are described i n frames. Texts a r e s tored i n document f i l e s . A document f i l e can be edi ted and divided i n t o pages with a word processor. ESOCKS re t r i eves documents i n a uni t of a page.

The keywords a r e ”resource waiting”, ”DISK” and ” t ruck overflow”. They a r e used t o specify object ive documents. Each keyword has i ts c l a s s i f i ca t ion category and Japanese alphabet ical posit ion, and a r e described i n frames.

The associat ion r u l e i s given by: i f ”DISK” is input then associate ”truck overflow”

w i t h a s t rength of 0.8. It is used by keyword associat ion, and def ines the relat ionships among keywords and t h e i r strength. The r e l a t ionsh ip from one set of keywords is described by one ru le .

On t h e other hand, t h e r e l a t ion r u l e is given by:

i f the ce r t a in ty f ac to r of ”truck overflow” is g rea t e r than 0

then r e l a t e ”Don’t continue waiting with no expectation” with a s t rength of 0.5.

It i s used by document r e l a t ing , and def ines re la t ionships between keywords and documents and their strength. The r e l a t ionsh ip from one keyword is described by one rule.

~ 4, K n o w l e d g e A c q u i s i t i o n F a c i l i t y

4, 1 Worksheet-based Approach

A domain s h e l l can o f f e r a knowledge acquis i t ion f a c i l i t y su i t ab le f o r experts. Some knowledge acquis i t ion f a c i l i t i e s a r e based on an interview with an expert [SI [71. Such methods

f ind def ic iencies and ask an expert t o augment and r e f ine them. In these inethod, t h e system takes the i n i t i a t i v e , and experts have only t o r e p l y t o t h e system’s questions f o r knowledge base building.

However, such methods are not appropriate f o r p rac t i ca l knowledge base building. The p rac t i ca l knowledge base i s re l a t ive ly l a rge and subjective. If the amount of knowledge is large, an expert may get t i r e d of being interviewed. If ’he focuses on a piece of subject ive knowledge and def ines i t , l a t e r when he makes another de f in i t i on , he may change t h e former t o conform with t h e

l a t t e r . The ESOCKS knowledge base is a l s o r e l a t ive ly

large and subjective. The number of documents and keywords va r i e s from about f i f t y t o f i v e hundred. This i m p l i e s t h a t t h e number of r u l e s and frames should be about a hundred or a thousand.

If the number of documents i s less than f i f t y , i t i s more important t h a t su i t ab le documents a r e iden t i f i ed accurately. If the number of documents is more than f i v e hundred, i t becomes d i f f i c u l t f o r an expert t o def ine knowledge.

ESOCKS’s purpose i s t o o f f e r a knowledge acquis i t ion environment i n which an expert can build a l a rge knowledge base by himself. A worksheet -based approach j s adopted a s ESOCKS’ s knowledge acquis i t ion f a c i l i t y , t o make experts act ively organize t h e i r knowledge on worksheets.

There a r e some merits i n using worksheets. Various kinds of worksheet a r e used widely i n o f f i ces , and a r e a medium understandable t o many people. Experts i n o f f i c e s understand t h e format of ESOCKS knowledge s t ruc tu re worksheets, and can f i l l them i n themselves. Worksheets f i l l e d i n with OA devices can be t r ea t ed by a computer, and have the poss ib i l i t y of being t r ans l a t ed i n t o cxecutable knowledge form and so on.

Worksheet format should not be so complex tha t an expert cannot understand the way of f i l l i n g i t in . A ce r t a in degree of freedom is desirable , but ca re must be taken t o avoid making the burden of maintaining consistency too great for the expert.

4 - 2 W o r k s h e e t s and K n o w I edse A c q u i s i t i on Too I s -

ESOCKS of fe r s e ight kinds of worksheet and three kinds of t oo l , based on t h e knowledge base building method developed by SOCKS’ s knowledge base building experience. The steps of co l l ec t ing and organizing knowledge, and worksheets referred t o or writ ten a t each step, a r e a s shown i n Figure 6.

There a r e two kinds of worksheets each f o r the four types of knowledge. Figure 7(a) shows

282

Page 6: Inteligencia Artificial

11 I document decision I I

4 association definition vs6 writer I date 5 association strmrth definition association rules approval I (0.1 input keyword I associated keyword k.14 value,

I P.

2

documnt teat document list

k a r t approval foraim exchawe bank 0.5 import quota cartificate 0.7 rermest i q o r t l i c e n s e 0.53

' keyword extraction relation s t r e n ~ t h

definit ion

3

relation rules reverse index

i w r t quota i w r t announcemt 0.5 MITI 0.3 applicant import quota csrtificate 0.7

1Japane.w alphabetical orded 1 1 1 of relation rules I I

by classif icat ion association rules 1 I

Figure 6 Steps and worksheets of knowledge base building method

an example of a f i l l e d i n worksheet f o r associat ion rules . Formats r e f l ec t ing knowledge s t ruc tu re a r e represented using ruled l ines . A user f i l l s i n a piece of knowledge i n a cell par t i t ioned by ruled l ines . The user may write a word a t any posi t ion in a cell. An empty l i n e for understandabili ty is acceptable. An empty ce l l may be regarded t o have the same entry a s dbove. The user a l s o may s l i d e the posi t ion of a ruled l i n e o r change heading formats.

Worksheets i n Figure 6 a r e explained using the example of t he import procedure guidance system. WS1 is the list of import procedures t o be retr ieved, such as "import quota application" and "import approval application". WS2 i s a document of every procedure's explanation. WS3 shows keywords "IQ items" I' import licence" given t o t h e procedure "import approval application". WS4 i s a list of a l l keywords i n Japanese alphabet ical order. WS5 is a list of a l l keywords by c l a s s i f i ca t ion category such a s "item", "task". WS6 shows relat ionships such a s t h a t keyword "nuclear reactor" associates keyword "IQ items'' with a s t rength of 0.9. Conversely, WS7 shows reiat ionships such a s t h a t "IQ items" i s associated with "nuclear reactor". WS8 shows relat ionships such a s t h a t keyword "IQ items" is related t o procedure " import quota application" and " import approval application" the opposite of WS3. A f i l l e d i n example of WS6 i s shown i n Figure 7(a).

Three kinds of t o o l for knowledge acquis i t ion a r e described:

I.!.l""" .............. (.!.??..M"!EY!!! ..... ... 1o;p .... ..,. ..... . . . . .. . . . .. .. .. . . . . . . . . . . . . . . . ........ ..... .. i s o r t . . . . . . . . . . . . . approval . . . . . . . , . . . . . . . . . . . . . . . . . . . 0 7 . i w r t . . . . . . . . qwta . . . . . , . certificate . . , . . . . , , . , . . . . . . . . I i:: 1

I 1 I I 1 I I

Figure 7(a) h p l e wrksheet

('association rule IO' if ('associative retrieval' %'input keywords' 3 'IQ items')

then (send 'import announceuent' raise-cf (0.5)) (send 'RITI' raise-cf (0.3)) (send 'import approval' raise-cf (0.7)) (send 'import quota certificate' raise-cf (0.9))

1 Figure 7(b) Sample rule Figure 7 Sample knowledge

Note: The orisinal worksheet and rule in Japanese are translated.

(1) Knowledge base t r ans l a to r , The knowledge base t r ans l a to r

automatically t r a n s l a t e s knowledge i n worksheets i n t o executable forms ( ru les , frames) .

The t r ans l a to r a l s o v e r i f i e s consistency of knowledge i n one worksheet, o r between worksheets, dur j ng t r ans l a t ion , and ind ica t e s duplication or shortage of knowledge. Association r u l e s t ranslated from WS6 i n Figure 7(a) a r e shown i n Figure 7(b).

(2) Worksheet r e a r r a n e e n t too ls . Two of t he worksheets (WS7 and WS8)

described above can be automatically generated from knowledge written i n other worksheets (WS6 and WS3) . These two worksheet rearrangement tools automatically rearrange knowledge i n one worksheet i n t o another worksheet. In the same way a s (1) , they ve r i fy consistency, A rearranged worksheet is useful f o r viewing knowledge organized from a d i f f e ren t point of view. There a r e two worksheet rearrangement tools. One rearranges associat ion r u l e s from WS6 t o WS7. The other rearranges r e l a t ing rules from WS3 t o WS8.

(3) W m h e e t f i l l i n g support tools . When an expert f i l ls i n a worksheet,

ca re l e s s mistakes such a s spe l l i ng e r r o r s may

283

Page 7: Inteligencia Artificial

be entered. Worksheet filling support tools support user to fill in worksheets. They enable fewer operations and input miss mistakes than a word processor, because they display alternative keywords and then the user can select one from these alternatives when user must fill the same keyword in other worksheets. This way, users have to learn fewer operations because they use the inference engine’s keyword input and document display function.

There are two worksheet filling support tools:

(a) keyword extracting. Keywords are mainly extracted from

document texts. This tool displays documents on the screen and enables the user to specify a string of characters for a keyword as shown in Figure 8, like in the case of using a word processor. It fills in WS3, and also has the facility to classify the extracted keywords, and then fill them in the two keyword lists, in Japanese alphabetical order (WS4) , and by classification category (WSS) .

irt p-4 b a t p-4 pevious P W ~ k e y w o r d extracting

P Igort pwta Application (Do w t forget to obtain i g o r t quotal)

)(eywrd h t a )(eyworci begistratid b

f.

is effective from the day

e , the term of validity cannot bs extended. If the time for delivery of an item is expected to be long. I t is necessary to make a before f i l i n i an import quota app with the authori t ies . Therefore. you should be very careful not to forget the application procedure for import wta.

Figure 8 Sample screen for keyword extracting

Note: Ths oriclrul display in J a ~ m e w ia translated.

(b) association rule generation. As shown in Figure 6, a knowledge base

is built by extracting keywords, 1inkj.ng association keywords, and specifying the association strength. This tool displays keywords in the same way as the keyword input function of the inference engine, and enables the user to specify associating keywords. It fills in worksheet WS6.

5- Results

5-1 Effect iveness of Associative Retrieval

The effectiveness of associative retrieval was experimentally evaluated using an expert system for software module retrieval, one of BOCKS’S applications. This system selects usable modules from 78 graphic modules and displays the retrieved specifications. Two programmers chose keywords for the 74 modules. One hundred and forty eight cases of retrieval were tested. The results were as follows:

(1) Conventional OR keyword retrieval did not retrieve the intended or the most suitable module in 16 cases. On the other hand, associative retrieval missed in only 13 cases, a decrease of 19%.

(2) The rank of the most suitable module among those retrieved by associative retrieval was 2.2 on an average. If the user looked at the first three modules retrieved, he would obtain the most suitable one in 58% of cases.

5, 2 Effect iveness of Worksheet-based Know I e dge Acquisition Tool

The effectiveness of the worksheet -based approach was evaluated.

(1) Knowledge on worksheets could be translated automatically into executable forms (rules and frames). For example, a knowledge base of 280 frames and 412 rules was translated from entries in 45 A4 sized worksheets. The translation time was estimated to be reduced by more than 90% of that of manual translation.

(2) Experts could easily understand the 8 kinds of worksheet after a demonstration of ESOCKS, before building the knowledge base. The time and effort required to learn the tools were far less than for a formal knowledge representation language and operation.

(3) Experts needed to collect and organize knowledge to build a practical knowledge base. However, they were relieved of secondary work such as detecting and correcting careless mistakes and the detailed constraints of the grammar, so they could concentrate on the essential part of knowledge base construction.

(4) Filled in worksheets could be used as documentation for knowledge base managenien t and maintenance, so making separate documents was not necessary.

284

-1-- - - - I

Page 8: Inteligencia Artificial

6, Appl i ca t ions

ESOCKS has been applied i n various f i e lds . Some examples are:

(1) SOCKS(Software Common Knowledge Transfer System) 181

Since SOCKS is the bas i s of ESOCKS, a s mentioned i n 3.1, it is not s t r i c t l y an application of ESOCKS. of the associat ive r e t r i e v a l discussed i n t h i s paper. It is used f o r programming. design, troubleshooting, and t r a in ing novice engineers wi th the aim of improving software products.

But i t is an application

(2) Procedure guidance There is often a large volume of rules and

procedures i n o f f i c e s and public agencies. It is not always easy t o f ind t h e relevant procedures. Therefore, a c l e rk i n charge often has t o be consulted i n order not t o miss a required procedure. ESOCKS has been applied t o develop procedure guidance expert systems i n the procurement department or i n banking off ices .

(3) Retrieval of trouble r epor t s Although records of past t roubles a r e

often f i l e d , they a r e not e f f ec t ive ly referred t o because of r e t r i e v a l d i f f i cu l ty . Using ESOCKS, an expert can eas i ly build an expert system t h a t retrieves trouble r epor t s of a s imilar past s i tuat ion. T h i s system should have the e f f e c t of preventing recurring trouble.

7, Conc I us ion

A domain shell cal led ESOCKS has been presented, which f a c i l i t a t e s the development of document r e t r i e v a l type expert systems. In t e l l i gen t document r e t r i e v a l has become necessary i n various f i e l d s , such a s knowhow, technology information, check points , and procedures.

Associative r e t r i e v a l capabi l i ty i s an important f ea tu re of ESOCKS. It prevents users from missing su i t ab le documents due t o in su f f i c i en t i n p u t keywords, and enables the user t o se l ec t su i t ab le documents from ce r t a in candidates. Another important f ea tu re is its knowledge acquis i t ion support f a c i l i t y . For experts, developing an expert system amounts t o extract ing knowledge according t o t h e format, and entering it i n worksheets.

Experience of us ing ESOCKS ind ica t e s t ha t an expert can build an expert system by himself, and tha t associat ive r e t r i e v a l is more i n t e l l i g e n t than conventional keyword r e t r i eva l .

Worksheet-based knowledge acquis i t ion can be

effect ively used f o r expert system building i n off ices . Specif ic worksheets can be designed depending upon the appl i lcat ion domain. Knowledge s t ruc tu re should be made a s simple a s possible so t ha t i t can be represented i n a t ab le - l i ke format. The s implici ty and ease of system development offered by t h e approach presented i n t h i s paper w i l l contr ibute t o the spread of knowledge engineering technology i n off ices .

A c k n o w l e d g e m e n t

The authors would l i k e t o express sincere thanks t o Koichiro Ishihara , and Toshiro Yamanaka f o r t h e i r valuable guidance and encouragement. Special thanks a r e a l s o due t o Ha jime Hashimoto, Tatsuya Kameda, and Akira Sugino f o r their useful advice and valuable discussion.

c11

c21

c31

C41

C5l

C63

C71

cai

C91

R e f e r e n c e s

Barstow, D. R., et al.: Language and Tools f o r Knowledge Engineering, i n F. Hayes-Rothe, e t a l . , ed. Building Expert Systems, Addison

-Wesley(1983). Bylander, T., et al . : CSRL: A Language f o r Classif icatory Problem Solving and Uncertainty Handling, AI Magazine, Vo1.7, No.

Boose, J. H.: Personal Construct Theory and the Transfer of Human Expertise, Proc. of

Chandrasekaran, B. : Generic Tasks i n Knowledge -based fieasoning: High - level Building Blocks f o r Expert System Design, IEEE Expert, Fall(1986). Fujisawa, H . , et al . : Document Analysis and Decomposition Method for Multimedia Contents Retrieval, Proc. of IsIIS’ 88 (1988). I toh, S. , et al . : HITFILE 650 Optical Disk Fi l ing System, Hitachi Review, V01.36, No.4 (1987). Kahn, G., et al . : MORE: An In t e l l i gen t Knowledge Acquisition Tool, Proc. of IJCAI’ 85 (1985). T s u j i , H. , e t a l . : Expert System f o r Transferring Programming Knowhow from Ski l led t o Unskilled Programmers, Trans. of JSAI, Vo1.3, No.6(1988). Yasunobu, C., e t al.: IR Type Expert Systems, Proc. of 2nd Annual Convention, JSAI(1988).

3(iga6).

AAAI’ a4(19a4).

cl01 Yoshimura, K., et a l . : Knowledge Representation and Inference Method i n ES / KERNEL, Hitachi Review, Vo1.37, No.5(1988).

285