159
An Investigation of Knowledge Based Help Facilities Robert T. Plant Oxford University Computing Laboratory Programming Research Group and Wadham College Oxford September 1985 A dissertation submitted in partial fulfilment of the degree of Master of Science in Computation.

An Investigation of Knowledge Based Help Facilities

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

An Investigation of Knowledge Based

Help Facilities

Robert T. Plant

Oxford University Computing Laboratory Programming Research Group

and

Wadham College Oxford

September 1985

A dissertation submitted in partial fulfilment of the degree of Master of Science in Computation.

5.4 Language Choice

The next stage was to decide upon how to represent this B.N.F syntax as a data structure. Prior to this however it was necessary to select a language with which to implement the system.

The choice of languages with which the implementation could be performed was wide, ranging from procedural languages like Pascal through the traditional artifical intelligence languages of Prolog and Lisp to the specialist languages for NLP like LIFER [40]. Other classes of languages available were those of the 'functional' category i.e., ORWELL [41] , KRC (42) and lispkit [43].

Having considered the possible languages available it was decided to use the functional language KRC. A functional language was chosen for many reasons. The main advantage over procedural languages being that it effectively reduces the amount of work the programmer has to perform. It does this in two ways: 1) By handeling the allocation of storage. 2) which is more important, the system assumes all responsibility for the evaluation order of the functions, removing the problems associated with structuring the program in order to obtain the desired sequence of evaluation.

The "lazy evaluation" strategy [44] used in KRC also has the advantage of greatly simplifying the parsing process. In order to perform the task of parsing ambigious grammars, it is essential, for the system undertaking the task, to use a form of lazy evaluation in order to search for all possible solutions. The logic based languages such as prolog use the technique of backtracking to do this, however, this is not as easy to use as lazy evaluation. Implementation in KRC will also save the programmer from having to explicitly write any backtracking code.

The use of infinite data structures are advantageous in that they are ideal for processing the input and output of a parsing program which can be regarded as infinite streams of information. This approach to input/output, by-passes the problem of explicit 'reads' and having to decide upon the sequencing of all events prior to processing, also by programming in terms of infinite I/O streams this allows program modules to be combined easily.

Pattern matching is a feature of KRC that is extremely useful in the domain of NLP. These techniques allow complex conditions to be expressed very simply; especially when the functions have many arguments, reducing the need to use guards.

The functional approach also allows the use of higher order functions and a recursive equation style of programming. These powerful features combined with the use of set expressions [45] allow for more readable, shorter programs to be developed.

65

5.5 B.N.F. implementation in K.RC.

Having decided upon the BNF form of grammar and to use KRC as an implementation language, the next stage in the development was to devise a representation for the BNF in KRC.

There were four components making up the BNF. i) Terminal symbols e.g, words like "dog", ii) Non terminal symbols i.e., <noun> these being the name of structural units and denoted by being enclosed within angular brackets, iii) The third and fourth components being disjunction and conjunction of the first two components. The 'or' is represented by the ' | ' symbol, with the 'and' being represented by joining the symbols together.

For example

<s>::= <noun>j<verb>

here V is a <noun> or a <verb> , where as

<s>: := <noun><verb>

represents V being a <noun> followed by a <verb>. It was decided to represent these four components in terms of lists and by

defining some terms which could be regarded by the parser as reserved words.

Terminal symbols remain in a similar format the word dog becomes the string "dog".

Non terminal symbols now become lists i.e., <noun> would become ["noun") and <verb> similarly to ["verb"].

The reserved words "or" and "seq" were then introduced. These allowing conjunction and disjunction of symbols. For example

<verb>l<noun>

being represented in KRC by

["or",["verb"],["noun"]]

and the sequence

<verb><noun>

66

by

["seq",["verb"],["noun"]].

It is now only left to complete the BNF expression. Thus

<s>::= <noun><verb>

would be represented by the KRC equasion:

bnf = [ V , [ " o r " , [ " v e r b " ] , [ " n o u n " ] ] ]

Having devised a representation in which to express the four components we were able to write equations for any set of BNF definitions.

The BNF given earlier can now be expressed in KRC as:

g l = [ M s \ [ " seq " , [ " n p h r " ] , [ " v e r b " ] , [ " r e s t " ] ] ] g2 = [ " n p h r " , [ " s e q " , [ " a r t i c " ] , [ " a d j s " ] , [ " n o u n s " ] ] ] g3 = [ " a r t i c " , [ " o r " , " t h e " , " a " , " a n " , " s o m e " ] } ] g4 = ["adjs",["or",["adj"],["seq",[adj"],[adjs"]]]] g5 = ["adj",["or",["big","red","quick","brown","lazy"]]] gS = ["nouns",["or",[noun"],["seq" , ["noun"], ["nouns"]] ] ] g7 = ["noun", ["or", "fruit", "fly","flies","banana","fox","dog", "dogs"]] g8 = ["verb",["or","jump","jumps","fly","flies","like", "likes"]] g9 = ["rest", ["or",["object"],["descr"]]] glO = ["object",["nphr"]] g l l = [ " d e s c r " , [ " s e q " , [ " o r " , " l i k e " , " a s " ] , [ " n p h r " ] ] ] bnf = [gl ,g2,g3,g4,gS,g6,g7,g8,g9,glO,gi l3

and the sentence to be parsed could be given as a list of terminal symbols i.e.,

sentence = ["frui t" ,"FlIes" ,"1ike","a" ,"banana"]

Once the conventions for the BNF devised it was possible to develop a BNF for the UNIX domain. The normal BNF was first created, this then being translated into KRC.

The standard BNF description can be seen in appendix D and it's KRC implementation in appendix E, under the heading BNF.TEXT.

67

68

5.6 The B.N.F Parser

Having devised the BNF and it's KRC representation the next step was to develop a parser which would input a sentence in English and check to see if this was an allowable construct according to the BNF definitions.

The first stage in this process was to allow the user to input an English sentence in free form. The system then taking the input and translating it into a list of words.

Thus the sentence input as:

> f r u i t f l i e s l i k e a banana

becomes

[ " f ru i t " , " f 1 i es " , " 1 i ke " , " a " , "banana" ]

Having achieved this the next and most important stage was to devise the functions to parse the list of words. The most important function upon which the recursive nature of the parser is based is (match5.

Match is a function whose type ideally would be:

match: BNF x LIST(WORDS) —» PARSE_TREE

It takes two arguments: A fragment of BNF and a list of words which it then attempts to match together. In the ideal case the function having performed it's matching operation would return a single parse tree. This however can only occur when the grammar produces nothing but unambigous parses.

Due to the problem of ambigous grammars the function match has the type:

match: BNF x LIST(WORDS) —> P (PARSE_TREE x LIST(WORDS))

Where the output from match is a set of pairs representing all vaild parses. The first element of the pair being the parse tree and the second being the list of, as yet, unmatched words that remain after the function match has been applied to match the input list if words with the BNF fragments. If the match does not succeed then an empty set results. In the case of an unambigous parse match returns a set containing just one pair where the first element is the only valid parse

69

an the second element is the remaining input. When the whole sentence is matched unambigously again a set containing only one pair is returned this being the whole parse tree and the remaining input which should be empty as all of the sentence has been parsed.

Thus

i) Failure results in the empty set {}.

ii) An unambigous parse in {<the only possible parse>j<remaining input>}.

iii) An unambigous parse of the whole sentence gives: {<the only possible parse tree>j<»

The function match has to be able to cope with any fragment of the BNF that it is given.

This can be broken down into four cases:

i) Terminal symbols, i.e words such as (dog', which is represented by the string "dog".

ii) Non terminal symbols, e.g. <noun> becomes ["noun"].

iii) A list of alternatives, e.g. alt ] alt |... |alt" is represented by [Korwlalt1,alt2,.»)alt''].

iv) A sequence of parts e.g. part part2...part" is represented by ["seq^part1

part ...part"].

When match encounters a terminal symbol it takes the symbol and the list of words and tries to match them. If the symbol matches (i.e. is equal to) the head of the list of words then this is clearly a unique and unambigous parse so the result is the set containing one pair as in the description above. This pair consists of the symbol which was parsed and the remaining elements from the list of words. Otherwise the empty set is returned indicating failure.

Hence the equations of match that cover the terminal symbols are:

70

match word (word: inp) = [ [ [ w o r d ] , inp]J match word (o the r : inp) = [ ]

The case of non-terminal symbols can now be examined. If the match function recieves as it's first argument a non-terminal symbol then an attempt is made to expand it into the corresponding fragment of BNF. This being done through a function called lookup, which takes as it's arguments the symbol that it is trying to expand and the whole of the BNF definitions; it then searches the left hand sides of the BNF productions for a match, if it is successful the expanded definition is retuned, this expanded definition is given along with the original list of words to another call of match.

The equation that covers non-terminal symbols therefore is:

match [nts] inp = name (match (lookup nts bnf) inp) nts

For example when the root of the BNF; <s>, is given to match along with a list of words:

match ["s"] ["l is t" ,"of","words"]

the lookup function would transform [V] into:-

[ " s e q " , [ " n p h r " ] , [ " v e r b " ] , [ " r e s t " ] ]

and the equation above would then call match again, this time with:

match [ " s e q " , [ " n p h r " ] , [ " v e r b " ] , [ " r e s t " ] ] [ " l i s t " , " o f " , " w o r d s " ]

[The function 'name' used above will be discussed later.]

Having dealt with the cases of terminal (e.g "dog") and non-terminal (e.g. ["noun"]) symbols we can now consider the cases when more complex BNF fragments are given to match.

If the fragment of BNF to be matched against the list of words was composed of alternatives any of which could be matched then a separate case is needed. In this case, each of the alternatives is matched against the list of words, each match resulting in a set of possible parses. For example if the list of words to be matched was ["a","b>},V] and the BNF is defined by:

71

[ " o r " , [ " s e q " , " a " , [ " seq" , "b " , "c" ] ] , [ " s e q " , [ " s e q " , " a " , " b " ] , " c " ] , [ " s e q " , " c " , " d " , " e " } ]

Then the resulting sets would be:

{ [ [ " a " , " b " , " c " j , [ ] ] } { [ [ " a ' V ' b V c " ] , [ ] 3> {>

The union of which clearly gives all the valid parses for the whole construct:

{ [ [ V V ' b ' V ' c " ] , [ ] ] }

However, the language KRC in which the parser is implemented, does not have the ability to model sets fully. So the union operation is simulated by appending all of the resulting sets together giving:

[ [ [ "a", "b" , " c " ] , [ ] ] , [ [ " a ' V ' b ' V ' c " ] , [ ] ] ]

The append operation can result in the repetition of pairs within the list (as above); it would be possible to write a function to look for and remove repetitions, however this is unnecessary because the appearence of a parse more than once in the list does not cause concern as it is treated as another valid anbigious parse.

The matching of alternatives is handled by the equations:

match { " o r " : ! ) inp = matchor 1 inp

malchor [ ] inp = [ ]

matchor ( i t e m : ! ) inp = match i tem inp ++ matchor 1 inp

If the fragment of BNF to be matched is a sequence of parts, then match has to try to match this sequence with the list of words. The initial attempt is a match between the first part of the sequence and the first of the words in the list, if this match is successful then the parser can attempt to match the rest of the sequence with the rest of the words. In order for the rest of the sequence to be matched it

72

is necessary to know which words remain unparsed (it may be more than one as the first part could have been a composition of other sub-parts). Thus match returns a set of pairs as described above then takes each of these pairs and attemps to match the rest of the sequence with it's list of unmatched words. When a sucessful match is found the new parse tree and the previous one are combined to form the first part of a new pair, the second part being formaed by the list of words that are still unmatched. Match thus prouces a new list of pairs and by doing this for all parts in the sequence, on all previous pairs, produces all possible parses for the whole sequence.

The functions that handle sequences being:

match ( " s e q " : l ) inp = matchseq 1 inp

matchseq [ ] inp = [ [ [ ] , i n p ] ]

matchseq ( i t e m : ) ) inp = matchseq* (match item inp) 1

matchseq' [ ] 1 = [ ]

matchseq' ( [ t r e e , i n p ] : r e s t ) 1 = combine t r ee (matchseq 3 inp) ++ matchseq' res t 1

Combine is the function which combines trees together. It takes the tree produced by a new application of match and appends it onto the end of the previous version of the tree, building a new tree from the two sub trees. Combine is defined by:

combine t l [ ] = [ ] combine t l ([12, i 2 ] : r e s t ) = [tl++l2, i2]:(combine t l res t )

So far the matching process has produced parse trees of only two types. Lists that contain only one word, or two parse trees appended together. Thus the parse trees produced so far have no sturcture and are almost the same as the original sentence. In order to put structure into the parse trees we need to place brackets arround parts of the tree. For example if we had the BNF:

<£>: := <verb><object> <verb>: := eat | jump < o b j e c t > : : = " t h e " , " w a l 1 " | " t h e " , "app le"

which in KRC is:

[ u s " , [ " s e q " , [ " v e r b " ] , [ " o b j e c t " ] ] ]

73

[ " ve rb " , [ " o r " , "ea t " , " j ump" ] ] [ "ob ject" , [ "or" , ["seq", " the" , "waM"3 , [ " seq \ " the", "app le" ] ] ]

Then the flat tree for the sentence:

[ "ea t " , " the" , "app le " ]

looks like

[ " e a l ' V t h e " . "apple"]

The function used to add the required structure is 'name' which is called by match when handling non-terminal symbols. It takes a parse tree,, adds a label to this and encloses it in an extra set of brackets. Name is defined in the following way:

name [ ] nts = [ ] name ( [ t ree , inp, i n f o ] : r es t ) nts = [ [ n t s : t r e e ] , inp, info]:name rest nts

When the sentence is completely parsed it's structure will have been retained. So for the example above the following tree structure is produced.

[ " s " , [ " v e r b " ] , [ " o b j e c t " ] ] [ " s " , [ " v e r b " , " e a t " ] , [ "ob jec t " ] ] [ " s " , [ " v e r b " , " e a t " ] , [ " o b j e c t " , " t h e " , " a p p l e " ] ]

Finally, the non-terminal symbol <none> should be mentioned. For example the BNF:

<object>::= <adjs><noun> <adjs> : := <>l<adj><adjs>

becomes

[ "ob jec t " , [ " seq" , [ "ad j s " ] , [ "noun"] ] ] [ " a d j s " , [ " o r " , [ " n o n e " ] , ["seq", [ " a d j " ] , [ " a d j s " ] ] ] ]

When represented in KRC. This being a tail recursive definition of <adjs>. The use of <none> is often objected to in normal BNF definitions as it allows

74

ambigutiesj however since the role of the system is to cater for ambigous parses then its use is justified.

The equation covering this is:

match [none] inp = [ [ [ ] , inp] J

The parser thus takes in a list of words such as:

[ " f r u i t " , " f 1 i e s " , " 1 i k e " , " a " , " b a n a n a " ]

and uses the BNF definitions to produce all possible legal parses of the list from it, which in this example results in:

[ [ " s " , [ " n p h r \ [ " a r t i c " ] , [ " a d j s " ] , [ " n o u n s " , [ " n o u n " , " f r u i t " ] J ] , [ " v e r b " , " f l i e s " ] , [ " r e s t " , [ "desc" , " l i k e " , [ " n p h r " , [ " a r t i c " , " a " ] , [ " a d j s " ] , [ "nouns" , [ "noun" , "banana" ] ] ] ] ] ] ,

[ " s " , [ "nphr " , [ " a r t i c " ] , [ " a d j s " ] , [ " n o u n s " , [ " n o u n " , " f r u i t " ] , [ "nouns" , [ " n o u n " , " f l i e s " ] ] ] ] , [ " v e r b " , " l i k e " ] ,

[ " r e s t " , [ " o b j e c t " , [ " n p h r " , [ " a r t i c " , " a " ] , [ " a d j s " ] , [ " nouns " , [ "noun" , "banana" ] ] ] ] ]

75

5.7 Knowledge Base

Having constructed a system of the form:

USER PARSER BNF

which allowed the parsing of English sentences according to a BNF definition. The next stage was to make the system respond to the users queries with meaningful answers. This being done through the use of a knowledge base.

Several representations for the knowledge base having been considered, it was decided that an association list of facts would be the best solution.

The aim of the association list was to use condensed forms of the questions as the references to the answer part of the list. The condensation of the question was necessary to try to remove the 'noise* produced within the query whilst amplify the meaning.

In order to do this the BNF had to be extended with the addition of a third reserved word. The word 'query' being used to distinguish a section of the BNF as significant. It does this by associating sections with tags that contain information on them. For example, if a section of the BNF contained the terminal symbols [«seqYfullYtheY(diskVisn) then this could be condensed to ["quota"]. This being represented by:

bnf = [ " t a g - e x a m p l e " , [ " q u e r y " , [ " s e q " , " f u l l " , " t h e " , " d i s k " , " i s " ] , " q u o t a " ] ]

in the KRC.

The parser, in order to successfully process this new reserved word, needed to be modified. The aim was for the parser to return two lists: a parse tree as before, and an information list containing all of the query tags encountered during the parse.

To achieve this a new equation was added to the function match:

match ("query":1:command) inp i n f o = matchquery 1 command inp i n f o

76 .

Which utilises the function:

matchquery 1 command inp in fo = [ ] , match 1 inp i n f o = [ ] = match 1 inp ( i n f o ++ command)

Thus every time a query-tag is encountered it will be appended onto an information list parameter, 'info*. This new parameter allows information to be accumulated which upon sucessfully parsing the sentence can passed for analysis to the knowledge base.

Thus if the user input the query:

> how can I p r i n t a f i l e on the anadex p r i n t e r

Then the parser would return two lists. The parse tree and the information list which would be of the form:

[ " p r i n t " , " f i l e " , " a p r i n t " ]

This would then used by a function which searches the knowledge base for a suitable match. An example of which would be equation k21:

k2 i = [ [ " p r i n t " , " f i l e " , " a p r i n t ] , k 2 1 1 ] k2ii = ["Type the command: lp -Pan filename",nl]

which outputs the reply:

> Type the command: lp -Pan f i lename

this form of knowledge base provides a means of outputing textual answers to direct questions. An example of the kind of dialogue that can be achieved is:

Th is i s the UNIX help system - how can i he lp you? Type a ques t i on or <ct1 2> t o e x i t

what i s the best way to c rea te a f i l e

The best way to c rea t a f i l e i s to use the e d i t o r i . e . , ed f i lename

Type a ques t ion or < c t l 2> to e x i t

77

what can you tell me about the file system

The file system provides a hierarchiacl naming structure.

Each directory contains the names of files or further directories.

Type a question or <ct1 z> to exit

unix

UNIX describes a family of computing operating systems developed at Bell labs. The UNIX system includes both the operating systems and it's associated commands, The operating system manages the resources of the computing environment by providing a hierarchical file system, process management, editors, assemblers, compilers and text formatters.

A powerful command interpreter is available and this allows individual users or projects to tailor the environment to suit their own style by defining their own commands.

Type a question or <ct1 z> to exit

how can i compare two files

The manual reference for cmp can be looked at via: man cmp(I)

The command being used to compare two files. Uarious options are available but under the default no comment is made if the files are the same. If they differ then it announces the byte and line number at which the difference occured. If one file is an initial subsequence of the other that fact is noted

Type a question or <ctl z> to exit

what are the best references to unix

Much of the text output by this system has been drawn from the following sources:

[1] UNIX Programmers Manual

4.2 Berkeley Software Distribution

Uirtual UAX-il version, August 1983.

[2] The UNIX System S.R.Bourne Addison Wesley, 1983

Type a question or <ctl z> to exit

how can i print a file on the anadex printer

Type the command: lp -Pan filename

78

Type a question or <ct1 z> to exit

cp

The manual re ference fo r cp can be found v i a : man c p ( l ) The cp command i s used to copy a f i l e from one name to another the o r i g i n a l being r e t a i n e d .

Thus it can be seen that varying types of query were allowed from the user ranging from question on specific topics to general queries. Another format that had to be catered for was the input of abbreviated queries. For example one of the first things that people try when using a help system is to input the mnemnoic of the command under investigation (this may be due to the lack of adequate help systems in the past).

The dialogue however was now totally one sided, emanating from the user and thus it was felt that a more balanced dialogue between the user and the system was needed. Thus the next step was to look at developing a "mixed initative" dialogue to provide the user with a more stimulating interaction, allowing far more probing questions to be asked and more meaningful answers given.

79

5.8 Achieving A Mixed Initative Dialogue,

Having developed the system into the form:

USER -*• PARSER

where a basic dialogue between the user and the system could take place the next step was to develop it's interactive capabilities further.

In order to do this the knowledge base had to be extended. However it was felt that the specialised knowledge required for a dialogue should be seperated from the general information and facts stored in the knowledge base.

The modularisation of the knowledge base was felt to be benefiical for several reasons. Firstly, the ability to enter facts independently of each other; allowing the knowledge base to grow incrementally as new facts need to be added. Secondly, if modularistaion is not used the addition of new facts may sometimes adversely effect other parts of the system. A further feature of the modular approach is to increase the 'modifability' of the knowledge base [20], making it easier to add/delete and change facts due to the system being easier to understand.

Modularity also helps to maintain completeness and consistency, as problems do arise when large numbers of facts are stored within a knowledge base. This problem is currently an area of active research (46].

Having examined several alternative approaches to the problem of developing the desired mixed initiative dialogue it was decided to use a form of script based processing, [47].

The ideas behind scripts are that they specify the normal or default sequence of events; as well as exceptions and possible errors, associated with a particular

80

situation.

One of the underlying aims of developing this help system was to try to simplify the users task in understanding the parameters associated with running a command. Scripts, it was felt would provide the ideal mechanism with which to achieve this aim.

Having previously selected a group of commands with which to work (these being the chronological commands: cai, calendar, date and time) and experienced working with them it was decided to develop scripts for these commands in the first instance.

After looking at several forms of script, a form with the following interdependencies was chosen:

USER PARSER BNF

KNOWLEDGE

BASE

i SCRIPTS

Having accepted the users query and responded with helpful advise drawn from the knowledge base. The next task for the system to perfrom is the selection of the most appropriate script. The selection taking into account the current situation and the context upon which the query is based. In order to decide which script to use the information list produced by the parser is again utilised. As in the knowledge base an attempt is made to to match this information list within the association list. If the attempt is successful the control of the dialogue is passed on to the script else no existing script is appropriate for that query and the system prompts the user for the next question.

a

The structure, contents and use of a script is best explained through the use of

an example.

If the user were to ask the question:

> what is the cal command

Then firstly the advise generated by the knowledge base would be given. This

being:

The manual reference for the cal command is given by

NAME cal - print calendar

SYNOPSIS cal [month] year

DESCRIPTION

Cal prints a calendar for the specified year.

If a month is also specified a calendar just for that month is printed.

Year can be between 1 and 9999.

The month is a number between I and 12.

The calendar produced is that for England and her colonies.

Try September 1752.

BUGS

The year is considered to s ta r t in January even though t h i s is h i s t o r i c a l l y naive.

Once this has been output, the system then searches the association list of scripts for the pattern:

[ " c a l V c m d " ]

Which is the contents of the information list. A match being found at:

cOi = [["cal","cmd"],cOn]

82

which then directs control to the appropriate script.

Each of the scripts follow a similar format. They start off with a question, asking the user if they want to perform the action that the script undertakes. The rest of the script being divided up into sets of equations, each set dealing with a certain aspect of the scripts role.

In the case of the 'cal' command the initial question is held in the following equation.

cOHlq kb = "Do you want to simulate entering the cal command?": nl: cOi 1 la kb

and is output as:

> Do you want to simulate entering the cal command?

Once this question has been output, control is passed to a set of equations named

cOllla:

cOllla (["yes"]:rest) = "do you wish to enter a month?":nl:cOi12a rest 0 0 cOllla (t"y"]:rest) = "do you wish to enter a month?":nl:cOi12a rest 0 0

cOllla {["no"]:rest) = "Enter a command or <ct1 z> to exit:";nl:loop rest

cOllla (["n"]:rest) = "Enter a command or <ctl z> to exit:":nl:loop rest cOllla f[x]:rest) = "Please enter 'yes' or * no'":nl:cOi1lq rest

These accept the users responce and take the appropriate action, including any error handeling that may be necessary.

If the responce is negative then the system does not enter into a dialogue and returns with the prompt asking the user to enter another question.

If the responce to the question is positive then a dialogue is entered into. Taking and expanding in turn on each of the parameters in the UNIX command cal.

The first two equations of cOllla prompting the user with the question:

> do you wish to enter a month in the calendar?

This question being necessary due to the definition of cal:

83

ca l [month] year

The square brackets indicating that the parameter 'month5 is optional and can be left out if necessary. Thus the user has to be allowed the choice of entering a month or passing straight on to the second parameter.

The responce to the question being processed by the set of equations c0112a:

c0112a ( [ " y e s " ] : r e s t ) m y = "Which month i s the calendar f o r ? " : n l : cOl 12aa r e s t 0 0 c0i!2a (["y"]:rest) m y = "Which month is the calendar for?":nl:cOl12aa rest 0 0

c0112a (["no"]:rest) m y = c0113a rest [" "] y

c0112a (["n"]:rest) m y = c0113a rest [" ** ] y

c0112a ([x]:rest) m y = "Please enter 'yes' or * no'":n):cOl12a (["yes"]:rest) 0 0

c0112a (["show"]:rest) m y = "no parameters to the cal command have been entere d yet":nl:c0112a (["yes"]:rest) 0 0

c0112a (["why"]:rest) m y = "A month is required for successful execution of th e 'cal1 command":nl:cO112a (["yes"]:rest) 0 0

c0112aa ([a]:rest) m y = c0112b rest a y, numval a >= 1 8, numval a <= 9 = c0113a rest a y, numval a >= i £ numval a <= 12 = "The month has to be an integer between 01 and 12":nl:c

0112a (["yes"]:rest) 0 0

If the answer is negative control is passed on to the equations covering the second parameter, having noted that no month would be needed upon production of the final output, when advise is given on how to enter the desired command.

If the answer is positive the first two equations act by prompting the user with the question:

> Which month i s the calendar f o r ?

passing control on to a third set of equations and waits for a responce.

The user at this point has several options available. The first option being to ask the system about the data the user has input so far. This is done by using the keyword 'show1. However, as this is the first of cal's parameters and the user has

84

not yet input any data, a suitable responce is output.

Thus

> show

yields

> no parameters to the cal command have been entered yet

This facility being handeled by the equation:

cOI12a {["show"]: rest) m y = "no parameters to the cal command have been entere

d yet":nl:c0112a {["yes"]:rest) 0 0

The second option that the user has is to ask for an explanation of why the parameter is needed and this is done through the keyword 'why'.

Thus

> why

gives

> A month i s requ i red f o r success fu l execut ion of the ' c a l ' command

The role of this option is to aid the user decide what the responce should be to the systems questions. The questions asked by the system are aimed at a level above novice to make the system acceptable to a wider range of users.

The usual course of action is for the user to input the answer to the question. The responce to this question being the months number.

When inputing numbers UNIX allows the user to input them with or without leading zeros and this is catered for by the scripts adding the extra leading zeros when necessary.

The system also caters for when the user inputs a wrong answer, for example if

85

they input:

> 13

in reply to the question asking for a month to be input. Then an error message would be given:

> The month has to be an integer between 01 and 12

The input responce being covered by the following equations:

cQ112aa ( [ a ] : r e s t ) m y = c O l l l b r es t a y, numval a >= 1 £ nurnval a <= 9 = c0113a res t a y, numval a >= 1 & numval a <= 12 = "The month has to be an in teger between 01 and 12"

: nhc0112a ( [ " y e s " ] : r e s t ) c0112b kb m y = c0113a kb [ [ " 0 " ]++[rn] ] y

It is necessary at this point to mention the convention used for the naming of parameters. In general the number of parameters used in the script are in equal number to those used in the UNIX command it represents. The parameter names in the script being an abbreviated form of those used in the UNIX command. For example the manual definition of cal gives it's parameter names as 'month' and 'year* and so W and 'y* a r e usec* *n t n e script.

Having covered the first parameter, control now moves on to the second with the question:

> which year i s the calendar f o r ?

being asked.

This time if the 'show' option is used then the value that was entered for the first parameter is output. Thus if (8' had been entered:

> show

86

would give

> month = 08

This feature being very useful when a large number of complex parameters have been entered for a command.

The second parameter having a similar form to the first and composed of the equations:

c0113a kb m y = "which year i s the calendar fo r?" :n l : cOl13aa kb m y c0113aa ( [ " s h o w " ] : r e s t ) m y = showheading:nl:"month = ": m : n l : c0113a r e s t m y c0I13aa ( [ " w h y " ] : r e s t ) m y = "ft year i s requi red for successful execut ion of the

' c a l ' command":nl:cOl13a r e s t m y c0113aa ( [ a ] : r e s t ) m y = c0113b r e s t m a , numval a >= 1 £ numval a <= 9

= cOllout r e s t m a , numval a >= 1 8, numval a <= 9993 = "The year has to be an in teger between 1 and 9999"

: n l : c l 1 3 a r e s t m y c0113b kb m y = cOllout kb m [ ["0"]++[y] ]

This grouping together of equations aiding readability, completeness, consistency and maintainability. For example if a commands definition were to be changed in say an updated version of the operating system. Then it would be quite easy to add, delete or modify a set of equations independently of the rest of the system.

Having input all of the equations the end of the dialogue has been reached and all that is left for the script to do is output it's final advisory text.

For example, if the month had been 6 and the year 1966 then:

cOl lout kb m y = " type the command: cal " : m : s p : y : n l : l o o p kb

would output

> type the command: cal 06 19GB

The use of scripts thus provides a means by which users can do more than just access a help system which gives them textual information in a standard form leaving the user to decypher it's often criptic content. It's intention is to animate the manual and for the system to act as an advisor rather than a reference.

87

To justify this claim of superior perfomance due to the use of scripts. The process of developing a script for the UNIX sort command was undertaken. The manual reference can be found in appendix F and shows that careful study would be needed before the sort command could be used. This textual reference can be contrasted against the following dialogue:

This i s the UNIX he lp system - how can i he lp you? Type a ques t ion or <ct1 z> to e x i t

how can i s o r t a f i l e

In order to sort a file the command 'sort' should be used

sort reads the named files and by default sorts the results

onto the standard output

Do you want to simulate the sort command? yes

Do you wish to use the standard input? no

what is the input file called? inputfile

Do you only wish to merge files?

y

Do you wish to suppress all but one in each set of equal lines?

y

Do you wish to ignore leading blanks(spaces and tabs) in field comparisons?

yes

Do you wish only letters,digits and blanks to be significant in comparisions?

no

Do you wish to fold upper case letters onto lower case? why

Upper case characters can be regarded as equivalent to lower case ones if this parameter is set

Do you wish to fold upper case letters onto lower case? show

The value of the parameters entered so far

The input file is = inputfile merge switch is = m

equal lines switch - u leading spaces = b

diet ionary order =

Do you wish to fold upper case letters onto lower case?

y

Do you wish to ignore characters outside the ascii range 040-017G in non numeric compar is ions?

y

Do you wish to sort in reverse order?

y

how many fields would you like to sort on?

3

Input start of field number 1

4

Input the end of field number 1

G Input start of field number 2 8

Input the end of field number 2

9 Input start of field number 3

12

Input the end of field number 3 14

Do you wish to use a specific directory in which the temporary files should be made?

yes

what is the directory name?

di rname

Do you wish to use the standard output?

no

what would you like to call the output file? outfile

Are the fields seperated by spaces? show

The value of the parameters entered so far

The input file is = inputfile merge switch is = m equal lines switch = u

leading spaces = b

dictionary order =

UPPER = lower case = f

asci i 1imits set = i reverse sortrnerge = r field positions = +4-G +8-9 +12-14 directory name = -T dirname field delimi ter = 0

output file name = -o outfile

Are the fields seperated by spaces?

why

Fields within a record are normally seperated by spaces,

however it is possible to assign other characters as a field delimiter

Are the fields seperated by spaces?

n

please enter the character the fields are seperated by x

type the command:

s o r t i n p u t f i l e mub f i n r - t x +4-6 +8-9 +12-14 -o o u t f i l e -T dirname

From this it can be seen that the system provides a very useful facility for a user, filling the gap between the standard help facility and the human advisor.

Full listings of the KRG program-scripts used by the system are given in appendix E.

90

CONCLUSIONS

During the course of this report we have looked at the problem of producing an expert system which could act as an assistant to users requiring help on the UNIX operating system.

We then looked at two approaches in detail. The first approach involved the use of an expert system shell and the technique of text animation. This method of solution initially looked very promising. The aim of using a shell was to provide an environment in which a system could be developed quickly whilst not missing any of the explanation facilities that would be present in a system specifically designed for the domain under consideration. In reality the shell approach proved very constricting. The structures offered by it's knowledge representation language proved to be less flexible than desired. Making the mapping from the conceptual level to the implementation level very difficult. To sucessfully achieve this mapping a conceptual model was built. This proved to be a very useful tool, forcing consistency and other desirable properties on the system under design. It was possible, after much manipulation, to provide a limited dialogue between the user and the system. The major problem found however, was that the dialogue was totally driven by the system not allowing the user to ask their own questions directly. The consequence of this mode of dialogue provedto be a lengthy process to obtain an answer.

The second approach was the development of a system that allowed the user to input their questions in English and then to have the system respond with a meaningful answer. A parser based system was built using a BNF grammar to understand the input. It then utilised a knowledge base and a series of dialogue-scripts to generate its advisory text. The system proved to be very successful and allowed mixed initaitive dialogues to take'place. The system was written in the functional language KRC which allowed the development to proceed quickly.

Had more time been available, however, it would have been possible to have developed both approaches further. An interesting way to develop the first system would be to take the conceptual model and develop a new shell from it. This could be done by writing a translator that could take the frames and transforming them into prolog or some other language. This would then allow developers to use the shell as a prototyping tool and implement directly from the design. The second approach is also very open to further development. It is possible to improve the efficiency of the system and to develop better representations upon which it is based. The BNF can be extended and more information drawn from the parsing process. A useful feature to build in to the system would be a memory enabling it to understand questions that rely on previous contexts. It may also be possible to build tools to turn the entry of BNF definitions and the entry of facts into an automatic process, enabling automatic checking for consistency, completeness etc., to take place enabling the system to be offered as a shell for use by experienced expert system developers.

91

Finally, as a personal note, the project has enabled me to gain a useful insight into the development of an expert system providing stimulus to encourage me to continue research in this area.

92

References

1. D'AgapeyeffjA. Expert systems, Fifth generation and U.K suppliers. NCC, Manchester. 1983.

2. Newell,A. and Simon,H. GSP - A program that simutates human thought. In E.A.Feigenbaum and J.Feildman(eds), Computers and Thought, McGraw Hill, New York, 9163, pp279-293.

3. Post,E. 1943. Formal reductions of the general combinatorial problem. American Journal of Mathematics 65, pp 197-293.

4. Brachman} R.J. What's in a concept: Structural foundations for semantic networks. International Journal Of Man Machine Studies 9,1977,pp 127-152.

5. Minsky,M. 1975, A framework for representing knowledge. In P.Winston, The psychology of computer vision. New York, McGraw Hill.

6. WilksjY.A. 1977. Knowledge structures and language boundaries. IJCAI 5, pp 151-157.

7. Funt,B.V. 1976 .WHISPER: A computer implementation using analogs in reasoning. Rep No. 76-09. Computer Science Department; University Of British Columbia.

8. QuilliaxijM.R. 1968. Semantic Memory. In Minsky 75, pp 227-270.

9. Winograd,T. 1973. A procedural model of language understanding. In Schank,R.C. and Colby,K.M. 1973, Computer Models of thought and language. San Francisco; Freeman. ppl52-186.

10. Nilsson,N.J. 1971. Problem solving methods in A.I., New York; McGraw Hill.

11. Davis,R., and King,J.J. 1977. An overview of production systems. In Elcock,E. and Michie,D.(eds) Machine Intelligence 8. Chichester England, Ellis Horwood 300-332.

12. Lindsay,R., Buchanan,R.G., Feigenbaum,E.A., and Lederberg,J. 1980, DENDRAL, New York ; McGraw Hill.

13. Shortliffe,E.H. 1976. Computer Based Medical Consultations: MYCIN. New York; American Elsevier.

14. Duda, R.O., Gaschnig, J., Hart, P.E., Konolige, K., Reboh, R., Barrett} P., & Slokum,J. 1978. Development of the PROSPECTOR consultation system for mineral exploration. Final report SRI projects 5821 and 6415, SRI International Inc., Melo Park .CA.

93

15. Feigenbaum,E.A., and McCorduck, The fifth generation, Addison Wesley, Reading, MA, 1984.

16. Hayes-Roth,F. et a!., 1983. Building Expert Systems, Addison Wesley.

17. Alvey News, Issue Number 5, June 1984, ppl2-15.

18. Nii,H.P., and Aiello,N. 1979. AGE (Attempt to Generalise); A knowledge based program for building knowledge based programs. In IJCAI 6; pp645-655.

19. Forgey.C.L. 1981. The OPS5 users mannual. Technical report. CMU-CS-81-135 Computer Science Department, CMU Pittsburg; PA.

20. Feigenbaum,E.A. 1980, The Handbook of Artificial Intelligence Vols I, II & III; Pitman.

21. Bourne,S.R. 1983, The UNIX System. Addison Wesley.

22. Bell Labs. 1983(August), UNIX Programmers Manual, 4.2 Berkley Software Distribution, Virtual VAX-11 version.

23. Weizenbaum,J. 1966. A computer programm for the study of natural language communication between man and machine. CACM 9: 36-45.

24. Weigenbaum,J. 1967. Contextual understanding by computers. CACM 10: 327-360.

25. Chomsky,N. 1965. Aspects of the theory of syntax. Cambridge, Mass: M.I.T Press.

26. Bruce,B.C. 1975. Case frames for natural language. A.I.6: 327-360.

27. Woods,W.A. 1968. Procedural semantics for a question answering machine. Procedings of fall conference, New York. 457-471.

28. Woods,W.A. 1970. Transition network grammars for natural language. CACM 13: 591-606.

29. Kunz, J., et al. 1978. A phyciological rule-based system for interpreting pulmonary function test results. Heuristic Programming project report No. HPP-78-19, Computer Secience Department, Stanford University.

30. Hendrix,G.G. 1977. Human Engineering for applied natural language processing. IJCAI 5, 183-191.

31. Quilian,J.R. 1979. Induction inference as a tool for the construction of high performance programs. In R.S.Michalsk. et al., Machine Learning. Palo Alto, Calif.

94

32. Software packages for business, 1985. Financial Times survey. F.T. May 1 1985.

33. ESP/Advisor user guide and reference manual, 1984(June), Expert systems international.

34. Sufrin,B. et al. 1985. Notes for a Z handbook, part 1 - mathematical language. Computing Laboratory, Programming Research Group, Oxford university.

35. Chomsky,N. 1957. Syntactic structures. The Hague: Mouton

36 Postal,P. 1964. Limitations of phrase structure grammars. In J.A.Fodor & J.J.Katz, The structure of language. Englewood cliffs,N.J.:Prntice Hall p!37-151.

37. Filmore,C. 1968. The case for case. In E.Bach & R.Harms (Eds), Universals in linguistic theory. New York: Holt, Reinhart & Winston, pl-88.

38. Halliday,M.A.K. 1961. Categories of the theory of grammar. Word 17: p241-292.

39. Backus,J. 1959. The syntax and semantics of the proposed international language in Zurich ACM-GAMM conference, In Proc. Int. Caf. Inf. Processing UNESCO, June 1959 pp!25-132.

40. Hendrix,G.G. 1977. The LIFER manual: A guide to building practical natural language interfaces. Technical note 138 SRI AI center, Menlo Park, Calif.

41. Wadier,P. 1985. An introduction to ORWELL , Internal publication ,Computer Laboratory , Programming Research Group, Oxford.

42. Henderson,P. 1980. Functional programming: application & implementation Prentice Hall.

43. Henderson,P. et al., 1983. The Lispkit manual Vols I&II. O.U.C.L. P.R.G Technical Monograph 32. Oxford.

44. Henderson,P & Morris,J.M. 1976. A lazy evaluator. Proceedings 3rd. POPL Symposium, Atlanta Georgia.

45. Turner,D. Recursive equations as a programming language. Newcastle summer school in functional programming.

46. Proceedings of the second workshop on architectures for large knowledge bases. July 1984. Manchester University.

47. Schank,R.C. h Ableson,R.P. Scripts, plans, goals and understanding. Hillsdale N.J. Lawrence Erlbaum.

95

APPENDIX E

THE NATURAL LANGUAGE UNIX SYSTEM

COMPOSED OF THE KENT RECURSIVE CALCULATOR SCRIPTS:

BNF.TEXT, PARSER.TEXT, KBASE.TEXT, SCRIPT.TEXT, SORT.TEXT

15 Sep B5 BNF.TEXT

bnf = tg010,g020,g030,g040,3050,g060,g070,9080,s090,3100,gll0,gl20,gl30,gl40,3l5 0,gl60,3170,9l80,9l30,g200,9210,9220,9230,9240,3250,g260,9270,9280,g230,3300,g31 0,9320,9330,9340]

9010 = t"s",t"or",["genq"3,E"ftndq"3,C"whatq"3,t"wlongq"),t"d i rectq"133

g020 = ["genq",["genhow"31

g030 = ["f i ndq",E"seq",["f i ndhow"],E"or ",["howgen"3,E"howlons"3,1"howwho"3 3 3 3

9040 = ["how3en",["seq",["howto"3,["rest"]33

9050 = ["riowlong",t"seq",["how"3,E"lon3hobj"]]]

9O6O = E"howwho",E"seq",E"whoq"3,["lonswobj"13 J

9070 = E"whatq",E"seq",E"or",E"whatquest"3,E"longw"3 3,E"or",t"seq",["whatobj"3,E "cmd"3 3,E"seq","the",E"whatobj"3,E"cmd"31,E"seq",E"cmd"3,E"whatobj"3 J,t"seq",["w hatext"),E"rest"33]J3

3080 = ["wlonsq",t"seq",["lon3w"3,["or",["seq",["action"J,t"longquery"3 3,["rest" 3,E"lon9query"3JJ3

g030 = ["directq",["or",E"shortq"],E"longquery"3)J

glOO = E"genhow",E"or",E"seq",E"howq"3,E"rest"3 3,1"seq",t"howq"3,E"rest"3,E"d i re ction"3,E"object"333]

gllO = E"findhow",E"seq",t"howq"3,E"find"JJ3

S120 = ["rest",["seq",["action"!,["object"!)]

gi30 - E"howq",["or",E"seq","how","can","i"3,["seq","how","do","i"3 3 3

gl40 = ["act ion",E"or",E"act iontwo"3,"create",["query","compare","cmp"1,["seq"," set","help","on"3,["query",E"seq","references","to"J,"references"3,E"query","sor t","sort"3,E"query",t"seq","change",["access"3 3,"chmod"J 3 3

gl50 = I"actiontwo'M "or",["query","compare","diff"I,["query","remove","rem"3,1" query","pr int","pr int"3 3 3

S160 = ["object",t"or",["query",E"seq",["number"J,"files"3,"nfiles"3,["query",!" seq","a","file"3,"file"3,["query",E"seq","a","directory"],"dir"J,E"query",["seq" ,"anadex","pr inter"3,"apr int"3,E"query",["seq","laser","pr inter"3,"lpr int"3,["no ne"l]3

9170 = E"access",E"or",E"seq","permission","to"3,["seq","access","to"3 3]

gl80 = ["number",E"seq",["digit"],["more"],["none"333

3130 = ["digit",E"or","l","2","3","4","5","6","7","8","3","one","two","three","f our","f i ve","s i x","seven","e i ght","n ine"3 3

g200 = E"more",["or","plus",["seq","more"1,E"none"J 3 3

gZIO = E"find",["seq","find","out"]3

g220 = E"howto",["seq","how","to"33

g230 = ["direct!on",E"or",["seq","on","the"J,["seq","from","the"],["seq","from", "a"3,"on","from"33

9240 - ["how","how"J

g250 = E"lon9hobj",t"query",["seq","full!',"the","disk","is"l,"quota"3J

9260 = ["whoq",["query",["or",E"seq","who","has"3,E"seq","who","i s"3 3,"who"J1

15 Sep B5 BNF.TEXT

g2?0 = t"longH0bj",["or",["query",["seq","access","to","my","files"],"lsl"],t"qu ery",["seq","on","the","system"],"users"]]]

9280 = t"whatquest",C"or",["seq","what","i s","the"3,E"seq","what","are","the"3,t "seq","what","i s","a"J11

g290 = E"whatobj",t"or",E"query",E"seq","search","path"3,"spath"],E"query","t ime ","time"],t"query","sort","sort"1,E"query","date","date"1,t"query","cal","cal"3, £"query","calendar","calendar"13 3

9300 - E"cmd",t"query",E"or","command",t"seq","the","command"3,E"none"J1,"cmd"3 3

3310 = E"whatext",E"query",["seq","best","way","to"3,"best"3 3

g320 = ["longw",["or",E"seq","what","do","you","know","about"3,E"seq","what","ca n","you","ten","me","about"],t"seq","what","are","the"3 3 3

9330 = E"shortq",E"or",E"query","cal","cal","cmd"3,E"query","sort","sort"],["que ry","calendar"."calendar","cmd"3,E"query","chmod","chmod"3,t"query","cmp","cmp"3 ,E"query","date","date"1,E"query","cp","cp"3,["query","help","help"1,E"query","1 s","ls"3,["query","quota","quota"],["query","rm","rm"],["query","time","time"3 3 3

3310 = t"longquery",E"or",E"query","unix","unix"l,E"query",["seq","the","file"," system"3,"filesys"1,E"query","pipes","pipes"3,["query",E"seq","the","standard"," input"3,"std i n"1,E"query",E"seq","the","standard","output"3,"stdop"]31

15 Sep 85 PARSE.TEXT

breakline a t) = a breakline a (b:x) = (a ++ [sp]):lines x> b = nl

= breakline (a ++ lb)) x

clean [] = E] clean ([Etree)*t 3 > info]:rest) = (tree/info 3:clean rest clean EE[tree),inp*info]:rest) = clean rest

combine tl E 3 = (3 combine tl (EtZ/\Zr info]:rest) = Etl ++ \Z> \Z, info 3:combine tl rest

getinfo 1 = getinfo' 1 El getinfo' ECa*b3:x) c = getinfo' x (b:c) get info' E 3 c = Ec3

getwords x = words E 3 x

heading = ["This is the UNIX help system - how can i help you?"»nl/l headingtwo = ["Type a question or <ctl z> to exit"/n!3

kb = "sysSinput"

layout 1 = Ell ++ Enl3

lines E3 = El lines (a:x) = lines x, a = nl

= breakline Eal x

lookup sub E J = sub lookup sub (Esub/val3:rest) = val lookup sub (tother/vall:rest) = lookup sub rest

loop [] = ["end"] loop (expirest) = advise (getinfo (parse exp)) rest

match [3 inp info = E[[3,inp> info 13 match £"none"] inp info = [E[3 > inp,info3) match patt E3 info = E3 match Ents3 inp info = name (match (lookup nts bnf) inp info) nts match ("or":D inp info = matchor 1 inp info match C"seq":l) inp info = matchseq 1 inp info match ("query":1Jcommand) inp info = matchquery 1 command inp info match word (word; inp) info = [[[word3 > inp* info3 3 match word (other: inp) info = [3

matchor E3 inp info = E3 matchor Eitem:!) inp info = match item inp info ++ matchor 1 inp info

matchquery 1 command inp info = [3> match 1 inp info = E3 = match 1 inp Einfo ++ command)

matchseq E) inp info = E[[J,inp,info 3 3 matchseq Eitem:1) inp info = matchseq* Ematch item inp info) 1 matchseq' E 3 1 = [3 matchseq' EEtree*inp/info]:rest) 1 = combine tree Ematchseq 1 inp info) ++ m atchseq' rest 1

name E 3 nts = [3

name EEtree>inp,info J:rest) nts = ttnts:tree]/inp/info 3:name rest nts

parse x = clean (match E"s"3 x E3)

prompt = ["Question or <Ctl Z>",nl3

unix kb = heading:loop (map getwords (lines (read kb)))

words E] El = [3 words w E 3 = concat w

15 Sep 85 PARSE.TEXT

words w Ca:x) = concat w:words t3 Xi a - sp = words (w ++ Eal) x

15 Sep 85 KBASE.TEXT 1

advise [[33 kb = advisetext:nl:headingtwo:(loop kb) advise 1 kb = (split Chd 1) kb) advisetext = t"I do not have any information on this, perhaps you would like to" ,nl> "rephrase the question"]

find xx £] - [] find xx Caiyy) = tl a, xx = hd a

= find xx yy

kOl = [["cal"3,k013)

kOll = [E"cal","cmd"3,k013]

k012 = [["cmd","cal"3,k0133

k013 = [k0H,k015,k0i6,k017,k018,k019,k0191,k01923

k O H = ["The manual reference for the cal command is given by",nl,n!3

k015 = E"NAt1E",nl," cal - print calendar" ,nl ,nl 3

kOlS = ["SYNOPSIS",nl," cal tmonthJ year ",nl,nl3

kOl? = ["DESCRIPTION",nl," Cal prints a calendar for the specified year. If a month is also specified ",n!3

k018 = [" a calendar just for that month is printed. Year can be between 1 an d 9999."/nl3

k019 = [" The month is a number between 1 and 12.",nl," The calendar produ ced is that for England and her colonies."!

k0191 = [nl,nl," Try September 1752.",nl,nl,"BUGS",nl," The year is consid ered to start in January even though this is"3

k0192 = E" historically",nl," naive.",nl," Beware that cal 78 refers to th e early Christian era, not the 20th century.",nlJ

k02 = IE"sort","file"1,"In order to sort a file the command 'sort' should be us ed",nl,"sort reads the named files and by default sorts the results",nl,"onto the standard output",nl 3

k021 = [["sort"],k02413

k022 = [E"sort","cmd"],k02413

k023 = [["cmd","sort"3,k02<H3

k024 = [["sort","file"],k02413

k0241 = (k02423

k0242 = ["The manual reference can be looked at via man sort(l)",nll

k03 = [["calendar"3,k0333

k031 = [["calendar","cmd"],k0333

k032 = [["cmd","calendar"!,k033J

k033 = [k034,k035,k03G,k037,k038,k039,k0391,k0392,k0393,k0394,k0395,k039G,k0337, k0398,k0399,k03991,k03992,k039933

k034 = ["The manual reference for calendar is given by",nl,nl,"NAME",nl," cal endar - reminder service",nl,nl3

k035 = ["SYNOPSIS",nl," calendar [-3",nl,nl3

15 Sep 85 KBASE.TEXT

k035 = ["DESCRIPTION",nl," calendar consults the file calendar in the current directory and prints",nl]

k03? = [" out lines that contain todays or tomorrows date anywhere in a line. Most",nl]

k038 = [" reasonable month-day dates such as 'Dec ?', 'December 7','12/7', et c., are",nl3

k033 = [" recognised but not '7 December' or '7/12'' If you give the month as '*'",nl]

k039i = [" with a date/i.e. '* 1' that day in any month do. On weekends 'tomo rrow*",nl)

k0392 = E" extends through Honday.",nl,nl]

k0393 = [" file 'calendar' tn his login directory and sends him any positive resuHs",nl ]

k0394 = E" by mail normally this is daily in the wee hours under the control of",nl,nll

k0395 = I" cron(8). The file calendar is first run through the C preprocessor ,",nl]

k039S = E" /lib/cpp, to include any calendar files specified with the usual"/ nil

k0397 = E" '^include' syntax, included calendars will usually be shared by"/n 13

k0398 = ["FILES",nl/" calendar",nlJ

k0399 = E" /usr/1ib/calendar to figure out todays and tomorrows dates"/nl*" /etc/passwd"/nl3

k03991 = t" /tmp/cal"/nl/" /lib/cpp/egrep/sed/mai1 as subprocesses"/nl/nl/ "SEE AL50",nl," at(i),",nl]

k03992 = E" crontS)/ mai1(1)"/nl/nl/"BUGS",nl,nl," calendars extended idea of tomorrow doesn't"]

k03993 = [" account for hoiidays",nlJ

k04 = tE"chmod"),hCm3

k(Hl = Ek042,k043,k044,k045,k046,k047,k048,kCH9]

k042 = ["The manual reference for chmod can be looked at via: man chmod",nl,nl1

k043 = ["The command is used to change the mode of acess to a file.",nl]

kG44 = ["The mode may be set by an absolute or symbolic command."/nl3

k045 = ["Absolute being an octal number constructed from the OR of the",nl]

k046 = ["modes described in chmod(2). The symbolic mode being of the form",nil

kQ47 = [" [who] op permission [op permission]... Further details can be",nl]

k048 = ["found in chmodCl). It should be noted that only the owner of a file",nil

k049 = ["change its mode.",nl]

k05 = [["cmp"/"nfile"],k0513

k051 = Ek052,k053,k054,k055,k056,k0S7]

15 Sep B5 KBASE.TEXT

k052 = ["The manual reference for cmp can be looked at via: man cmpE1)",nl,nl1

kQ53 = ["The command being used to compare two files. Various options are">nll

k054 = ["available but under the default no comment is made if the files"/nll

k055 = ["are the same. If they differ then it announces the byte and line",n!3

k056 = ["number at which the difference occured. If one file is an initial"»nl3

k057 = ["subsequence of the other that fact is noted"/nil

kOB = E["date"3,kOG3]

k061 = [["date","cmd"]/k0633

kOGZ = [["cmd'V'date"),k063]

kOB3 = ["look up man dated)",nil

kO? = [E"cP"l,k0713

. k071 = Ek072,k073,k074]

k072 = ["The manual reference for cp can be found via: man cpd)",nlJ

k0?3 = ["The cp command is used to copy a file from one name to ahother">nll

k071- = ["the original being retained.nl" 3

k08 = [["best'Vf ile"3,k0813

kOBl = EkOS23

k082 = ["The best way to creat a file is to use the editor i.e./ ed filename",nl3

k09 * [["filesystem"),k0913

k091 = Ek092,k09373

k092 = ["The file system provides a hierarchiacl naming structure. Each"/nl3

k093 = ["directory contains the names of files or further directories."/nil

klO = [["expl'V'unix"3,"helP"3

kll = [["ls"l,klll,kn2,kll3,klH,kU5]

kill = ["The manual reference for Is can be found via: man lsd)",nl,nl)

kll2 = ["The command is used to list the contents of dirctories and can be",nl]

ki13 = ["used in conjunction with parameters for varying types of opt ions.",nl3

k l H = ["i.e., ls-1 = long, Is-g gives group-id instead of user-id in "/nil

k!15 = ["log listing.",nil

k!2 * [t"rm"],kl21,kl223

kl21 = ["The manual reference for rm can be found via: man rmd)"#nl3

k!22 = ["rm is used to remove one or more files from a directory"/nl3

k!3 = [["best","references","unix"3/kl3i]

k!31 = [kl32,kl33,kl34,ki35,kl36,kl37,ki38,kl333

15 Sep B5 KBASE.TEXT

kl32 - ("Much of the text output by this system has been drawn from the",nlJ

kl33 = ["following sources:",nlJ

ki34 = t" tU UNIX Programmers Manual",nil

ki35 = [" 4.2 Berkeley Software Distribution",nl]

ki3B = C" Virtual VAX-11 version, August 1983.",nl,nl]

kl37 = [" (23 The UNIX System",nl)

kl38 = E" S.R.Bourne",nl]

kl33 = E" Addison Wesley, 1983",n!3

k H = [["pipes"],kHO]

kHO = [kl41,kl42,kl43,kl44,kl45,kl46,kH7,kl48,kl49,kl431,kl4923

kl41 = ["pipe - create an inter-process channel",nl,nl I

k!42 = t"piPe(fides)",nl,"int f i ldes[2]; " ,nl,nl 1

kl43 = ["The pipe system call creates an input-output mechanism called a pipe. T he file'\nl]

ki44 = ["descriptors returned can be used in read and write operations. Uhen the pipe is",nl3

k!45 = ["written using the descriptor fildesEl) up to 4096 bytes of data are buf fere>d",nl3

kl46 = ["before the writing process is suspended. The data may be read from fild esE03.",nl]

kl47 = C" It is assumed that after the pipe has been created,two Cor nore) co-operat in9",n!3

kl4B = ["processes (created by subsequent fork calls) will pass data through the pipe",nl3

kl49 = ["with read and write calls.",nil

kl491 = [" Read calls on an empty pipe (no buffered data) with only on end Cal 1 write",nl]

kl492 = ["file descriptors closed) returns an end-of-file.",nl3

klS = [["stdin"3,ki51J

kl51 = Eki52,kl52,kl53,kl543

k!52 = E"As everything in the UNIX operating system is regarded as a file",nil

kl53 = ["the input has as its 'standard* or default file the keyboard",nl3

kl54 = ["from which data is read or 'input'",nl3

klS = [["stdoP"3,kl613

kl61 = [kl62,kl63,kl64,klG5,kl663

k!62 = ["As everything in the UNIX operating system is regarded as a file",nl]

klB3 = ["the output has as its 'standard' or default file the screen",n!3

klB4 = ["The standard output can be changed and redirected to a normal file",nl3

15 Sep B5 KBASE.TEXT !

k!65 = ("by using the > symbol, i.e./ Is > newfile directs the listing of",nl]

klGG = ("the directory to a file called 'newfile'•",nl1

k!7 = (t"time"3,kl711

(<1?1 = (kl72,kl73,km,kl75,kl76J

k!72 = ("The manual listing of the time command can be found via: man time(l)"/nl]

kl?3 = ("The time command is used to find out how long another command takes "/nil

kl7<* = ("to execute. ",nll

k!75 = ("If the 'chronological time' is required then the date command",nl3

k!76 = ("should be used",nil

klB = [t"unix"3,kl813

klBl = [kl82,kl83,kl8t,klB5,kl86,kl87,kl8B,kl89,kl891,kl8923

ki82 = ("UNIX describes a family of computing operating systems developed",nl]

klB3 = ("at Bell labs.",nl)

kl8^ = ("The UNIX system includes both the operating systems and it's",nil

klB5 = ("associated commands. The operating system manages the resources",nl1

klBB = ("of the computing environment by providing a hierarchical file",nil

k!87 = ("system, process management, editors, assemblers, compilers and!',nil

kl88 = ("text formatters. A powerful command interpreter is available and",nl)

klBB = ("this allows individual users or projects to tailor the ",n!3

kl891 = ("environment to suit their own style by defining their own ",nl3

kl892 = ("commands.",nl3

kl9 = [("helP"3,kl913

k!91 = (ki92,kl93,kl94,kl95,kl9G3

kl92 = ("The primary purpose of this help system is to aid people to use",nil

kl93 = ("the UNIX operating system and this is done by providing information",nl3

k!94 = ("such as that which is contained in manuals, more general and",nl3

klS5 = ("descriptive information drawn from text books and users of this",nl3

kl9B = ("system can be guided through the UNIX commands and their parameters",nl3

k20 = (("print","file"J,k2013

k201 = [k202,k2031

k202 = ("Type the command: lpr filename ",nl3

k203 = ("This should output the file to the default printer",nl]

k21 = [["print","file","aprint"3,k2113

k211 = Ek212I

15 5ep B5 KBASE.TEXT

k212 = ["Type the command: lpr -Pan filename ",nl)

k22 = [["print","file","lPrint"3,k2213

k221 = Ek222]

k222 = ["Type the command: lp -Pis filename ",nll

kZ3 = E["quota"3,k231J

k231 = [k2323

k232 = ["Typing 'df will tell you how to full the disk is",nil

k24 = EE"who","lsl"],k2'U]

k2fl = tk242,k2'B)

k242 = ["Typing 'Is -1 filename' will tell you what the protection on",nil

k243 = ["the file namesd is",nil

k25 = [["who","users"3,k2513

k251 = [k2521

k252 = ["Typing 'who' will tell you who is on unix",nil

k26 = [["diff","nfile"3,k2613

k2Sl = [kZ62,k2633

k2S2 = ["To compare two files, type 'diff filel ftle2' and the lines that",nil

k2B3 = ["are different will be given"»nl3

k2? = [["sPath"3,k2?13

k2?l = [k2?2,k2733

kZ?2 = ["ft search path in unix is a list of directories in which the",n!3

k273 = ["operating system searches for programs to execute",nl3

k28 = [["best","dir"3,k281]

k281 = [k2823

k282 = ["Typing 'mkdir directoryname' will create a directory with the",nl]

k283 = ["name directoryname in your current dirctory",nl3

kbase = [k01,k0il,k012,k02,k021,k023,k024,k03,k031,k032,k04,k05,k06,k061,k062,k0 7,k08,k09,kl0,kll,kl2,kl3,kH,klB,klB,kl?,kl8,kia,k20,k21,k22,k23,k24,k25,k2B,k2 7,k28)

15 Sep 85 SCRIPT.TEXT

split (a:x) kb = find a kbase:split2 a kb

s p l i t Z a kb = (hd ( f i n d a s c r i p t ) ) k b * f i n d a s c r i p t \= [ ]

= headin9twD:loop kb

cOl = [["cal'V'cmd"],cOIl]

c02 = E["cmd'V'cal"],c0113

c03 = EE"cal"],c0113

cO^ = U"calendar","cmd"3,c021)

c05 = [["cmd","calendar"],c0213

cOB = Et"calendar"3,c0213

cO? = £["5ort","cmd"],c031]

cOB = [["cmd","sort"],c031I

c03 = [["sort","file"],c0313

clO = EE"sort"3,c0313

ell = E["date'\"cmd"J,c041]

c!2 = [E"cmd","date"],c0413

cl3 = E("date"3/c041]

cOli kb = IcOlilq kb]

cOlllq kb - "Do you want to simulate enterins the cal command?":nl:c0111a kb

cOllla (E"yes"]:rest) = "Do you wish to enter a month in the calendar?":nl:c0112 a rest 0 0

cOllla (t"y"J:rest) = "Do you wish to enter a month in the calendar?":nl:c0112a rest 0 0

cOllla £f"no"]:rest) = "Enter a command or <ctl z> to exit:":nl:loop rest

cOllla (E"n"]:rest) = "Enter a command or <ctl z> to exit:":nl:loop rest

cOllla (Ex):rest) = "Please enter 'yes' or *no*":nl:cOlllq rest

c0112a (E"yes"]:rest) m y = "Which month is the calendar for?":nl:c0112aa rest 0 0

c0112a CE"y"]:rest) m y = "Uhich month is the calendar for?":nl:c0112aa rest 0 0:nl

c0112a CE"no"]:rest) m y = c0113a rest t" ") y

c0112a Ct"n"3:rest) m y = c0113a rest E" "] y

c0112a (Ex]:rest) m y = "Please enter 'yes' or 'no'":nl:c0112a (E"yes"3:rest) 0 0

c0112a £["show"3:rest) m y = "no parameters to the cal command have been entere d yet":nl:c0112a (("yes"]:rest) 0 0

c0112a CE"why"1:rest) m y = "A month is required for successful execution of th e 'cal' command":nl:c0112a (E"yes"3:rest) 0 0

c0112aa (Ea3:rest) m y = c0112b rest a y> numval a >= 1 8, numval a <= 3

= c0113a rest a y , numval a >= 1 8. numval a <= 12

15 Sep 85 SCRIPT.TEXT

= "The month has to be an integer between 01 and 12":nl:c 0112a (["yes"1:rest) 0 0

c0112b kb m y = c0113a kb tE"0"3++Em33 y

c0113a kb m y = "which year is the calendar for?":nl:c0113aa kb m y

c0113aa CE"show"!:rest) m y = showheading:nl:"month = ": m :nl: c0113a rest m y

c0113aa (E"why"3:rest) m y = "ft year is required for successful execution of the 'cal' command":nl:c0113a rest m y

c0113aa t£al:rest) m y = c0113b rest m a , numval a >= 1 £ numval a <= 9

= cOllout rest m a , numval a >= 1 £ numval a <= 9999

= "The year has to be an integer between 1 and 9999":nl:c 113a rest m y

c0113b kb m y = cOllout kb m EE"0"3++ty33

cOllout kb m y = "type the command: cal ":m:sp:y:nl:loop kb

c021 kb = Ec0211q kb]

c0211q kb = "Do you want to simulate entering the calendar command?":nl:c0211a k b

c0211a (E"yes"3:rest) = "which month is the calendar for?":nl:c0211aa rest 0 0

c0211a U V ' h r e s t ) = "Which month is the calendar for?" :nl: c02ilaa rest 0 0

c0211a CE"no"1:rest) = "Enter a command or <ctl z> to exit":nl:loop rest

c0211a CE"n"]:rest) = "Enter a command or <ctl z> to exit":nl:loop rest

c0211a CExl:rest) = "Please answer 'yes' or 'no'":nl:c0211q rest

c0211aa EE"show"1irest)m d = "No parameters to the calendar command have been en tered yet":nl:c0211a CE"yes"]:rest)

c0211aa CE"why"1:rest) m d = "ft day is needed for successful execution of the 'c alendar' command"ml:c0211a t["yes"]:rest)

c0211aa (Ea3:rest) m d = c021ib rest a d , numval a >= 1 £ numval a <= 9

= c0212a rest a d , Cnumval a >= 1 £ numval a <= 12) j -Ca =x ; x <- E"Jan."/'Feb."/'(lar."/'Apr.*',"flay."/'Jun.","Jul."/'ftug."/"Sep.","Oct." /'Nov. "/'Dec "/'*", "January", "February", "March", "April", "hay", "June" /'July" /'Aug ust","September","October","November","December"]>

= "The month has to be an integer between 1 and 12 or of the form 'Jan.' or 'January' or '*' can be used":nl:c0211a CI"yes"]:rest) m d

cOZllb kb m d = c0212a kb EE"0"3++Em]] d

c0212a kb m d = "Uhich day is the calendar for?":nl:c0212aa kb m d

c0212aa EE"show"1:rest) m d ~ showheading:nl:"month = ":m:nl:c0212a rest m d

c0212aa CE"why"]:rest) m d = "A day is needed for successful execution of the 'c alendar' command":nl:c0212a rest m d

c0212aa (Ea]:rest) m d = c0212b rest m a , numval a >= 1 £ numval a <= 9

= c021out rest m a , numval a >= 1 £ numval a <= 31

= "The day should be an integer between 1 and 31":nl:c021

15 Sep 85 SCRIPT.TEXT

Za rest kb m d

cOZlZb kb m d = c021out kb m CE"0"3++Ed33

c021out kb m d = "type the command: calendar ":m:d:nl:loop kb

c041 kb = tctmiq kb]

cCMllq kb = "Do you want to simulate the 'date' command?":nl:c0411a kb

c0411a (t"yes"3:rest) = "Uhich year is the date to be set to (last two digits on Iy)?":nl:c04i2a rest 0 0 0 0 0 0 0

c0411a (E"y"3:rest) = "Which year is the date to be set to (last two digits only )?":nl:c0412a rest 0 0 0 0 0 0 0

c0411a CI"no"3:rest) = "Enter a command or <ctl z> to exit":nl:loop rest

c0411a (t"n"):rest) = "Enter a command or <ctl z> to exit":nl:loop rest

c0411a (Exhrest) = "Please answer 'yes' or 'no'":nl:c0411q rest

c0412a C["show"]:rest) y m d h min s g = "No parameters for the date command hav e been entered yet":nl:c0413a (t"yes"3:rest)

c0412a <t"why"):rest) y m d h min s g = "A year is needed for successful executi on of the date command":nl:c0411a (E"yes"3:rest)

c0412a (Ea3:rest) y m d h min s g = c0412b rest a m d h min s g> numval a >= 0 B. numval a <= 9

= c0413a rest a m d h min s Q, numval a >= 0 & numval a <= 99

= "The year should be an integer between 0 and 9999":nl:c0411a EE"yes"3:rest)

c0412b kb y m d h min s g = c0413a kb E["0"l++Ey3] m d h min s g

c0413a kb y m d h min s g= "Which month is the date to be set to?":nl:c0413aa kb y m d h min s 9

c0413aa (["show"!:rest) y m d h min 5 9 = showheading:nl:"year = ":y:nl:c0413a r est y m d h m i n s g

c0413aa (E"why"]:rest) y m d h min s g = "A month is needed for successful execu tion of the date command":nl:c0413a rest y m d h min s g

cG413aa (Ea3:rest) y m d h min s 9 = c0413b rest y a d h min s g> numval a >=1 & numval a <= 9

= c0414a rest y a d h min s g> numval a >=1 & numval a <= 12

= "The month should be an integer between 1 a nd 12":nl:c0413a rest y m d h min s 3

c0413b kb y m d h min s g = c0414a kb y tE"0"3++Em33 d h min s g

c0414a k b y m d h m i n s g = "Which day is the date to be set to?":nl:c0414aa kb y m d h min s g

c0414aa (t"show"3:rest) y m d h min s g = showheading:nl:"year = ":y:nl:"month = ":m:nl:c0414a rest y m d h min s g

c04Haa (t "why"]: rest) y m d h min s 9 = "A day is needed for successful executi

15 Sep B5 SCRIPT.TEXT

on of the date command":nl:c0414a rest y m d h m i n s g

c0414aa tlalsrest) y m d h min s 3 = c0414b rest y m a h min s 9/ numval a>= 1 8. numval a <= 9

= c0415a rest y m a h min s 3* numval a>= 1 & numval a <= 31

- "The day should be an integer between 1 and 31":nl:c0414a rest y m d h min s 9

c 0 4 H b kb y m d h min s 9 = c0415a kb y m [ t " 0 " ] + + t d l 1 h min s g

c0415a kb y m d h min s 9 = "Uhat hour is the clock to be set to?":nl:c0415aa kb y m d h min s g

c0415aa (["show"]:rest) y m d h min s g = showheading:nl:"year = ":y:nl:"month = ":m:nl:"day = ": d: nl: c£H15a rest y m d h min 5 g

c0415aa CE"why"3:rest) y m d h min s g = "An hour is needed for successful execu tion of the date command":nl:c041Sa rest y m d h m i n s g

c0415aa (Eal:rest) y m d h min s g = c0415b rest y m d a min s g» numval a >= 0 & numval a <= 3

= c0416a rest y m d h min s 9* numval a >= 0 £ numval a <= 23

= "The hour should be an integer between 0 an d 23" snl: c0415a rest y m d h min s g

c0415b kb y m d h min s g = c0416a kb y m d EE"0"'J++Eh]] min s g

cCHlBa kb y m d h min 5 9 = "What number of minutes is the date to be set to?":n l:c0416aa k b y m d h m i n s g

c0416aa t["show"]:rest) y m d h min s g = showheading:nl:"year = ":y:nl:"month = ":m:nl:"day = ":d:nl:"hour - ":h:nl:c0416a rest y m d h min s g

cEHlBaa E["why"J:rest) y m d h min s g = "A minute figure is needed for successf ul execution of the date command":nl;c0416a rest y m d h min s g

c0416aa (Ea]:rest) y m d h min s g = c0416b rest y m d h a s g» numval a >= 0 8, numval a <= 3

= c0417a rest y m d h a s g* numval a >= 0 & numval a <= 59

= "The number of minutes should be an integer between 0 and 59"ml:c041Sa rest y m d h m i n s g

c041Bb kb y m d h min s 9 = c041?a kb y m d h [["0"]++Emin]] s 3

c041?a kb y m d h min s 9 = "How many seconds is the date to be set to?";nl:c041 ?aa k b y m d h m i n s g

c041?aa (t"show"]:rest) y m d h min s a = showheading:nl:"year - ":y:nl:"month = ":m:nl:"day = ":d:nl:"hour - ":h:nl:"minutes = ":m:nl:c0418a rest y m d h min s 9

c041?aa tt"why"]:rest) y m d h min s 9 = "ft second figure is needed for successf ul execution of the date command":nl:c0417a rest y m d h min s g

c0417aa CEaJsrest) y m d h min s g = c0417b rest y m d h min a g, numval a >= 0 & numval a <= 9

= c0418a rest y m d h min a 9/ numval a >= 0 & numval a <= 59

15 Sep 85 SCRIPT.TEXT

= "The number of seconds should be an intege r between 0 and 59":nl:c(H17a rest y m d h min s g

c0417b kb y m d h min s 9 = c0418a kb y m d h min Et"0"]++[s)] g

c0418a kb y m d h min s 9 = "Is the date GMT?":nl:c(H18aa kb y m d h min s 9

c£H18aa (["show"Jsrest) y m d h min s 9 = shouheading:nl:"year = ":y:nl:"month =

":m:nl:"day = ":d:nl:"hour = ":h:nl:"minutes - ":m:nl:"seconds = ":s:nl:cCHlBa rest y m d h min s 9

c(H18aa (["why"]:rest) y m d h min s 9 = "if the date is to be set to GMT then a parameter needs to be set":nl:cCHlBa rest y m d h min s 9

cCHlSaa (£"yes"]:rest) y m d h min s 9 = c041out rest y m d h min s ["-u"]

cCHlSaa C{"y"3:rest) y m d h min s 9 = cCHlout rest y m d h min s t"-u")

c0418aa CI"no"]trest) y m d h min s 9 = cCHlout rest y m d h min s [" "]

cOHBaa CE"n"J:rest) y m d h min 5 g = cCHlout rest y m d h min s C" "]

c0418aa (lx):rest) y m d h min s g = "Please enter 'yes* or 'no* ":nl:c(H18a res t y m d h min s g

cEHlout kb y m d h min s 9 = "type the command: date " :g:sp: y: m: d:h: mimsml: loo P kb

script = [c01,c02,c03,c04,c05,c0G,c07,c08,c03,cl0,cll,cl2,cl3]

numval 1 = mknum 0 (explode 1)

mknum n II = n mknum n Ca:x) = mknumElO * n + digitval a) x / digit a

showheading = ["The value of the parameters entered so far"]

15 Sep 85 SORT.TEXT

c031 kb = Ec0311q kb]

c0311q kb = "Do you want to simulate the sort command?":nl:c0311a kb

c0311a (E"yes"1:rest) - "Do you wish to use the standard input?":nl:c0312a rest 0 0 0 0 0 0 0 0 0 0 [" "3 0 0

c0311a C["y"3:rest) = "Do you wish to use the standard input?":nl:cQ312a rest 0 0 0 0 0 0 0 0 0 0 (" "1 0 0

c0311a £["no"3:rest) = "Enter a command or <ctl z> to ex it":nl:loop rest

c0311a EE"n"3:rest) = "Enter a command or <ctl z> to exit":nl:loop rest

c0311a ((xl:rest) = error 1:nl:c0311q rest

c0312a £t"show"3irest) si m u b d f i n r t x p o t = "No parameters for the sor t command have been input yet":nl:c0311a C["yes"J:rest)

c031Za C ( " w h y " 3 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 1 2 w : n l : c 0 3 1 1 a C E " y e s " 3 : r e s t )

. c0312a ( t " y e s " 3 : r e s t ) s i m u b d f i n r t x p o t = c0313a r e s t [ " - " ] m u b d f i n r t x p o t

c0312a C t " y " J : r e s t ) s i m u b d f i n r t x p o t = c0313a r e s t E"-"3 m u b d f i n r t x p o t

c0312a ( [ " n o " J : r e s t ) s i m u b d f i n r t x p o t = c0312b r e s t s i m u b d f t n r t x p o t

c0312a ( E " n " ) : r e s t ) s i m u b d f i n r t x p o t = c0312b r e s t s i m u b d f i n r t x p o t

c0312a C E x l i r e s t ) s i m u b d f i n r t x p o t = e r r o r 1 : n l : c 0 3 1 1 a t E " y e s " 3 : r e s t )

c0312b kb s i m u b d f i n r t x p o t = "wha t i s t h e i n p u t f i l e c a l l e d ? " m l : c 0 3 I 2bb k b s i m u b d f i n r t x p o t

c0312bb ( E x 3 : r e s t ) s i m u b d f i n r t x p o t = c0313a r e s t EEx3/sp3 m u b d f i n r t x p o t

c0313a k b s i m u b d f i n r t x p o t = "Do you o n l y w i s h t o merge f i l e s ? " : n l : c O 313aa k b s i m u b d f i n r t x p o t

c0313aa C E " s h o w " J : r e s t ) s i m u b d f i n r t x p o t = s h o w h e a d i n g ^ n l : " s i " : s i : n l : n l : c 0 3 1 3 a r e s t s i m u b d f i n r t x p o t

c0313aa {I"why"3:rest) s i m u b d f i n r t x p o t = c 0 3 1 3 w : n l : c 0 3 1 3 a r e s t s i m u b d f i n r t x p o t

c0313aa C [ " y e s " 11 r e s t ) s i m u b d f i n r t x p o t = c 0 3 H a r e s t s i [ " m " l u b d f i n r t x p o t

c0313aa ( [ " y " 3 : r e s t ) s i m u b d f i n r t x p o t = c0314a r e s t s i E"m"3 u b d f i n r t x p o t

c0313aa ( t " n o " 3 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 H a r e s t s i E" "3 u b d f i n r t x p o t

c0313aa C [ " n " 3 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 H a r e s t s i E" " J u b d f i n r t x p o t

c0313aa C E x 3 : r e s t ) s i m u b d f i n r t x p o t = e r r o r l m l : c0313a r e s t s i m u b d f i n r t x p o t

c 0 3 H a k b s i m u b d f i n r t x p o t = "Do you w i s h t o s u p p r e s s a l l b u t one i n

15 Sep 85 SORT.TEXT

each s e t o f e q u a l 1 i n e s ? " ; n l : c 0 3 1 4 a a k b s i m u b d f i n r t x p o t

c 0 3 H a a E [ " s h o w " 3 : r e s t ) s i m u b d f i n r t x p o t = s h o w h e a d i n s : n l : p a r a m i : s i : n l ! p a r a m 2 : m : n l : n l : c 0 3 1 4 a r e s t s i m u b d f i n r t x p o t

c 0 3 H a a ( [ " w h y " 3 : r e s t ) s i m u b d f i n r t x p o t = c0314w: n l : c 0 3 H a r e s t s i m u b d f i n r t x p o t

c 0 3 H a a C E " y e s " 3: r e s t ) s i m u b d f i n r t x p o t = c0315a r e s t s i m [ " u " ) b d f i n r t x p o t

c 0 3 H a a U V ' l i r e s t ) s i m u b d f i n r t x p o t = c0315a r e s t s i m E"u " ] b d f i n r t x p o t

c 0 3 H a a C E"no "3 : r e s t ) s i m u b d f i n r t x p o t = c0315a r e s t s i m I" "3 b d f i n r t x p o t

c0314aa E [ " n " 3 : r e s t ) s i m u b d f i n r t x p o t = c0315a r e s t s i m E" " 1 b d f i n r t x p o t

c 0 3 H a a U x 3 : r e s t ) s i m u b d f i n r t x p o t = e r r o r l : n l : c 0 3 H a r e s t s i m u b d f i n r t x p o t

c0315a k b s i m u b d f i n r t x p o t = "Do you w i s h t o i g n o r e l e a d i n 3 b l a n k s ( s p aces and t a b s ) in f i e l d c o m p a r i s o n s ? " : n l : c 0 3 1 5 a a k b s i m u b d f i n r t x p o t

c0315aa ( E " s h o w " 3 : r e s t ) s i m u b d f i n r t x p o t = s h o w h e a d i n g i n l : p a r a m l : s i : n l : p a r a m 2 : m : n l : p a r a m 3 : u : n l : n l : c 0 3 1 5 a r e s t s i m u b d f i n r t x p o t

c0315aa E E " w h y " 3 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 1 5 w : n l : c 0 3 1 G a r e s t s i m u b d f i n r t x p o t

c0315aa E I " y e s " J : r e s t ) s i m u b d f i n r t x p o t = c031Sa r e s t s i m u E"b"3 d f i E"n"3 r t x p o t

. c0315aa E E " y " 3 : r e s t ) s i m u b d f i n r t x p o t = c031Ga r e s t s i m u [ " b " 3 d f i E"n"3 r t x p o t

c0315aa E E " n o " 1 : r e s t ) s i m u b d f i n r t x p o t = c031Ga r e s t s i m u E" "3 d f i [ " " ) r t x p o t

c0315aa ( I " n " ] : r e s t ) s i m u b d f i n r t x p o t = c0316a r e s t s i m u E" "3 d f i E" "3 r t x p o t

c0315aa C E x ] : r e s t ) s i m u b d f i n r t x p o t = e r r o r l m l : c0315a r e s t s i m u b d f i n r t x p o t

c031Sa k b s i m u b d f i n r t x p o t = "Do you w i s h o n l y l e t t e r s , d i g i t s and b l a n k s t o be s i e n i f i c a n t i n c o m p a r i s i o n s ? " : n l : c 0 3 1 6 a a k b s i m u b d f i n r t x p o t

c0316aa C E " s h o w " ] : r e s t ) s i m u b d f i n r t x p o t = s h o w h e a d i n g : n l : " s i " : s i : n l : p a r a m 2 : m : n l : p a r a m 3 : u : n l : p a r a m ^ : b : n l m l : c 0 3 1 6 a r e s t s i m u b d f i n r t x p o t

c0316aa C E " w h y " 3 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 1 6 w : n l : c 0 3 1 G a r e s t s i m u b d f i n r t x p o t

c031Gaa ( t " y e s " 3 : r e s t ) s i m u b d f i n r t x p o t - c031?a r e s t s i m u b E"d"3 f i n r t x p o t

c031Gaa E E " y " 3 : r e s t ) s i m u b d f i n r t x P o t = c0317a r e s t s i m u b E"d"3 f i n r t x p o t

c0316aa ( [ " n o " 3 : r e s t ) s i m u b d f i n r t x p o t = c031?a r e s t s i m u b E" "3 f i n r t x p o t

c031Gaa C E " n " 3 : r e s t ) s i m u b d f i n r t x p o t = c0317a r e s t s i m u b [ " "3 f i n r t x p o t

15 SBP B5 SORT.TEXT

c031Gaa ( E x ] : r e s t ) s i m u b d f i n r t x p o t = e r r o r 1 : n l : c 0 3 1 6 a r e s t s i m u b d f i n r t x p o t

c0317a kb s i m u b d f i n r t x p o t = "Do you w i s h t o f o l d upper case l e t t e r s o n t o l o w e r c a s e ? " : n l : c 0 3 1 7 a a k b s i m u b d f i n r t x p o t

c031?aa ( I " s h o w " 3 : r e s t ) s i m u b d f i n r t x p o t = s h o w h e a d i n g : n l : p a r a m l : s i : n l : p a r a m 2 : m : n l : p a r a m 3 : u : n l : p a r a m ^ : b : n l : p a r s m 5 : d : n l : n l : c 0 3 1 7 a r e s t s i m u b d f i

n r t x p o t

c0317aa ( [ " w h y " 1 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 1 ? w : n l : c 0 3 1 7 a r e s t s i m u b d f i n r t x p o t

c031?aa (E"yes"1:rest) si m u b d f i n r tx p o t = c0318a rest si m u b d t"f "3 i n r tx p o t

c0317aa (£ * 'y" ] : r e s t ) s i m u b d f i n r t x p o t = c0318a r e s t s i m u b d [ " f " ] i n r t x p o t

c0317aa ( t " n o " ] : r e s t ) s i m u b d f i n r t x p o t = c031Ba r e s t s i m u b d [ " " J i n r t x p o t

c0317aa ( t " n " ] : r e s t ) s i m u b d f t n r t x p o t = c0318a r e s t s i m u b d E" " ] i n r t x p o t

c0317aa ( t x ] : r e s t ) s i m u b d f i n r t x p o t = e r r o r l : n l : c 0 3 i ? a r e s t s i m u b d f i n r t x p o t

c0318a k b s i m u b d f i n r t x p o t = "Do you wish to isnore characters outs id e the ascii range (W0-017G in non numeric comparis ions?":nl:c0318aa kb si m u b d f i n r t x p o t

c0318aa (["show"3:rest) si m u b d f i n r tx p o t = showheadinginl:paraml:si: nl:param2:m:nl:param3:u:nl:param4:b:nl:param5:d:nl:7param6:f:nl:nl:c0318a rest s i m u b d f i n r t x p o t

c0318aa C [ " w h y " 1 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 1 8 w : n l : c 0 3 1 8 a r e s t s i m u b d f i n r t x p o t

c0318aa C [ " y e s " 1 : r e s t ) s i m u b d f i n r t x p o t = c0313a r e s t s i m u b d f E " f " 3 n r t x p o t

c0318aa ( [ " y " 3 : r e s t ) s i m u b d f i n r t x p o t = c0313a r e s t s i m u b d f [ " i "3 n r t x p o t

c0318aa ( t " n o " 3 : r e s t ) s i m u b d f i n r t x p o t = c0313a r e s t s i m u b d f E" "3 n r t x p o t

c0318aa (I"n"3:rest) si m u b d f i n r tx p o t = c0313a rest si m u b d f [" "J n r tx p o t

c0318aa (tx]:rest) si m u b d f i n r tx p o t = errorl:nl:c0318a rest si m u b d f i n r t x p o t

c0313a kb s i m u b d f i n r t x p o t = "Do you w i s h t o s o r t i n r e v e r s e o r d e r ? " : n l : c 0 3 1 9 a a k b s i m u b d f i n r t x p o t

c031Saa CE"show"]:rest) si m u b d f i n r tx p o t = showheading:nl:paraml:si: nl:param2:m:nl:param3:u:nl:paramb:b:nl:paramS:d:nl:param6:f:nl:param?:i:nl:nl:cO 313a rest si m u b d f i n r t x p o t

c0313aa ( E " w h y " 3 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 1 3 w : n l : c 0 3 1 3 a r e s t s i m u b d f i n r t x p o t

c0313aa C E " y e s " ] : r e s t ) s i m u b d f i n r t x p p t = c0320a r e s t s i m u b d f i n E" r "3 t x p o t

c0319aa ( E " y " ] : r e s t ) s i m u b d f i n r t x p o t = c0320a r e s t s i m u b d f i n

14 Sep 85 SORT.TEXT

E"r"] tx p o t

c0319aa ( ( " n o " 1 : r e s t ) s i m u b d f i n r t x p o t = c0320a r e s t s i m u b d f i n E" " ) t x p o t

c0319aa ( E " n " 3 : r e s t ) s i m u b d f i n r t x p o t = c0320a r e s t s i m u b d f i n [ " " ] t x P o t

c0319aa ( t x ] : r e s t ) s i m u b d f i n r t x p o t = e r r o r l m l : c 0 3 1 9 a r e s t s i m u b d f i n r t x p o t

c0320a k b s i m u b d f i n r t x p o t ^ "how many f i e l d s Mou ld you l i k e t o s o r t o n ? " : n l : c 0 3 Z 0 a a k b s i m u b d f i n r t x p o t O O

c0320aa £ E " s h o w " 3 : r e s t ) s i m u b d f i n r t x p o t c o u n t f i e l d = s h o w h e a d i n a m 1 : p a r a m i : s i : n l : p a r a m 2 : m : n l : p a r a m 3 : u : n l : p a r a m 4 : b : n l : p a r a m S : d : n l : p a r a m B : f m l : p a r a m 7 : i : n l : p a r a m 8 : r ; n l : n l ; c 0 3 2 0 a r e s t s i m u b d f i n r t x p o t

c0320aa ( t " w h y " 1 : r e s t ) s i m u b d f i n r t x p o t c o u n t f i e l d = c 0 3 Z 0 w : n l : c 0 3 2 Oa r e s t s i m u b d f i n r t x p o t

c0320aa (Ea3:rest) si m u b d f i n r t x p o t count field = c0320b rest si m u b d f i n r t x p o t Cnumval a) (pred(numval a))»numval a >=0

c0320b kbsi m u b d f i n r t x p o t count field= "Input start of field number ":(count-field):nl:c0320bb kbsi m u b d f i n r t x p o t count field

c0320bb (la):rest) si m u b d f i n r t x p o t count field = c0320bbb rest si m u b d f i n r tx EEp3:E"+"3:Ea31 o t count field

c0320bbb kb si m u b d f i n r t x p o t count field = "Input the end of field nu mber ":(count-field):nl:c0320c kbsi m u b d f i n r t x p o t count field

c0320c (Eal:rest) s i m u b d f i n r t x p o t count field = c0321a rest si m u b d f i n r tx EEp3:["-"3:[a3,sp3 o t, pred field < 0

= c0320b rest si m u b d f i n r tx ttp]:E"-"3:Ea3,sp3 o t count Epred field)

c0321a kbsi m u b d f i n r t x p o t = "Do you wish to use a specific directory in which the temporary files should be made?":nl:c0321aa kbsi m u b d f i n r

tx p o t

c0321aa (("show"3:rest) si m u b d f i n r t x p o t = showheadina:nl:paraml:si: nl:param2:m:nl:param3:u;nl:paramB:b:nl:param5:d:nl:paramB:f:nl:param?:j:nl:param 8:r:nl:paramS:p:nl:nl:c0321a rest s i m u b d f i n r t x p o t

c0321aa ( ( " w h y " 1 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 2 1 w : n l : c 0 3 2 1 a r e s t s i m u b d f i n r t x p o t

c0321aa ( t " y e s " ] : r e s t ) s i m u b d f i n r t x p o t = c0321b r e s t s i m u b d f i n r t x p o t

c0321aa ( E " y " 3 : r e s t ) s i m u b d f i n r t x p o t = c 0 3 Z l b r e s t s i m u b d f i n r t x p o t

c0321aa ( E " n o " 3 : r e s t ) s i m u b d f i n r t x p o t = c0322a r e s t s i m u b d f i n r t x p o E" "3

c0321aa C t " n " 3 : r e s t ) s i m u b d f i n r t x p o t = c0322a r e s t s i m u b d f i n r t x P o E" "3

c0321aa ( E x 3 : r e s t ) s i m u b d f i n r t x p o t ^ e r r o r 1 : n l : c 0 3 2 1 a r e s t s i m u b d f i n r t x p o t

c0321b k b s i m u b d f i n r t x p o t = "wha t i s t h e d i r e c t o r y n a m e ? " : n l : c 0 3 2 1 b b k b s i m u b d f i n r t x p o t

H Sep 85 SORT.TEXT

c0321bb t t x J : r e s t ) s i m u b d f i n r t x p o t = c0322a r e s t s i m u b d f i n r t x p o [ [ " - T " 3 : [ x ] , s p 3

c0322a kb si m u b d f i n r tx p o t = "Do you wish to use the standard output? ":nl:c0322aa kb si m u b d f i n r t x p o t

c0322aa (["show"J:rest) si m u b d f i n r tx p o t = showheadingrnl:paraml:si; nl:param2:m:nl:param3:u:nl:param4:b:nl;paramS:d:nl:param6:f:nl:param?: i:nl:param 8: r:nl:param9:pinl:paramlO:t:nl:nl:c0322a rest si m u b d f i n r t x p o t

c0322aa ( I " w h y " ) : r e s t ) s i m u b d f i n r t x p o t = c 0 3 2 2 w : n l : c 0 3 2 2 a r e s t s i m u b d f i n r t x p o t

c0322aa ( t " y e s " 1 : r e s t ) s i m u b d f i n r t x p o t = c0323a r e s t s i m u b d f i n r t x p [ " " 1 t

c0322aa U " y " } : r e s t ) s i m u b d f i n r t x p o t = c0323a r e s t s i m u b d f i n r t x P I " " ] t

c0322aa ( C " n o " ] : r e s t ) s i m u b d f i n r t x P o t = c0322b r e s t s i m u b d f i n r t x p o t

c0322aa ( t " n " l : r e s t ) s i m u b d f i n r t x p o t = c0322b r e s t s i m u b d f i n r t x p o t

c0322aa ( t x ] : r e s t ) s t m u b d f i n r t x p o t = e r r o r 1 : n l : c 0 3 2 2 a r e s t s i m u b d f i n r t x p o t

c0322b k b s i m u b d f i n r t x p o t = "wha t w o u l d you l i k e t o c a l l t h e o u t p u t f i l e ? " : n l : c 0 3 2 2 b b k b s i m u b d f i n r t x p o t

c0322bb C t x l i r e s t ) s i m u b d f i n r t x p o t = c0323a r e s t s i m u b d f i n r t x p [ [ " - o " b s p : E x 3 , s p } t

c0323a k b s i m u b d f i n r t x p o t = " A r e t h e f i e l d s s e p e r a t e d by s p a c e s ? " : n l : c 0 3 2 3 a a k b s i m u b d f i n r t x p o t

c0323aa ( t " s h o w " 1 : r e s t ) s i m u b d f i n r t x p o t = s h o w h e a d i n s : n l : p a r a m l : s i : n l : p a r a m 2 : m : n l : p a r a m 3 : u : n l : p a r a m 4 : b : n l : p a r a m 5 : d : n l : p a r a m B : f : n l : p a r a m ? : j : n l : p a r a m B : r : n l : p a r a m 3 : p : n l : p a r a m l O : t : n l s p a r a m l l : t x : n l : p a r a m l 2 : o : n l : n l : c 0 3 2 3 a r e s t s i m u

b d f i n r t x p o t

c0323aa C [ " w h y " ] : r e s t ) s i m u b d f i n r t x p o t = c 0 3 2 3 w : n l : c 0 3 2 3 a r e s t s i m u b d f i n r t x p o t

c0323aa (("yes"3:rest) si m u b d f i n r tx p o t = c031out rest si m u b d f i n r I" "3 p o t

c0323aa Ct " y " 3 = r e s t ) s i m u b d f i n r t x p o t = c031ou t r e s t s i m u b d f i n r [ " " J p o t

c0323aa C [ " n o " J : r e s t ) s i m u b d f i n r t x p o t = c0323b r e s t s i m u b d f i n r t x p o t

c0323aa ( [ " n " 3 : r e s t ) s i m u b d f i n r t x p o t = c0323b r e s t s i m u b d f i n r t x p o t

c0323aa ( f x 3 : r e s t ) s i m u b d f i n r t x p o t = e r r o r 1 : n l : c 0 3 2 3 a r e s t s i m u b d f i n r t x p o t

c0323b k b s i m u b d f i n r t x p o t = " p l e a s e e n t e r t h e c h a r a c t e r t h e f i e l d s a r e s e p e r a t e d b y " : n l : c 0 3 2 3 b b k b s i m u b d f i n r t x p o t

c0323bb ( E x l : r e s t ) s t m u b d f i n r t x p o t = c031ou t r e s t s i m u b d f i n r £ t " - t " l + + E x 3 ] p o t

c 0 3 1 o u t kb s i m u b d f i n r t x p o t = " t y p e t h e command: s o r t " : s i : m : u : b : d : f : i : n : r : t x : p : o : t : n l : l o o p kb

H Sep 85 SORT.TEXT

pred x = x - 1

c0312w = tnl,"If no input files are named, the standard input is sorted.",nl]

c0313w = tnl/"The command allows files to be merged as well as sorted",nl]

c0314w = tnl,"There is a choice between deleting all but one equal line or",nl," outputing them all",nil

c0315w = Cnl»"Uhen records are compared the number of leading spaces may be",nl, "different in each file and thus can be ignored",nl]

c0316w = Cnl/'The sort or merge order being the same as a dietionary",nl,"i.e.» letters, digits and then blanks. Special characters being ignored",nl)

c031?w = [nl,"Upper case characters can be regarded as equivalent to lower case" ,nl,"onesif this parameter is set",nl]

c0318w = Enl,"In comparisons only the normal keyboard characters are to be used ,"nl",from ascii 040 CsP) to 17B (•*)",nl!

c0313w = Enl,"The sort can be decending or 'reversed' i.e., in nondecending orde r",nl)

c0320w = Enl,"The sort can be based upon one or more keys within a record",nl)

c0321w = tnl,"The argument is the name of a directory in which ",nl,"temporary f iles should be made",nl]

c0322w = [nl,"The argument is the name of an output file to use instead of ",nl, "the standard output. The file may be the same as one of the inputs",nl]

c0323w = tnl,"Fields within a record are normally seperated by spaces,",nl,"howe ver it is possible to assign othsr characters as a field delimiter",nl]

errorl = [nl,"Please answer 'yes' or 'no'"]

parami = [nl,"The input file is = "]

param2 - ["merge switch is = ")

param3 = ["equal lines switch = "]

param4 = ["leading spaces = ")

param5 = ["dictionary order - "]

paramB - ["UPPER = lower case = "]

param7 = ["ascii limits set = "]

paramB = ["reverse sort/merge = "]

param9 = ["field positions = "]

paramlO = ["directory name = "]

paramli = ["field delimiter = "]

paramlE = ["output file name = "]

APPENDIX F

THE MANUAL REFERENCE FOR THE SORT COMMAND

FROM THE UNIX PROGRAMMERS GUIDE

APPENDIX A

THE B.N.F FOR THE CONCEPTUAL MODEL

-<sect ton-name>-

<desc-spec>

<section-body>

<para-type>

<body-type>

<text-name>

<text>

<sectiorv-spec>::= <sect ion-narne> <desc-spec> <section-body>

<sect ion-name>::= <simple-str ing>

<section-body>::= <pred> =£• ref <section-spec> J <pred> =4 <text-gen>

<pred>::= (<paragraph-spec>)? 1 (<paragraph-spec> <operator> <action>)?

{(<paragraph-spec>)? <operator> <pred>}*

<paragraph-spec>::= <para-type> <body-type>

<action>::= <pred>|<const>

<const>::= <sitnp le-str ing>

<para-type>::= <para-name> <type> | <xor> <xor-para>

<xor-para>::= <para-name> [ <para-name> <xor-para>

<text-gen>::= <text-name> <text>

<para-~name>: : = <simple-string>

<texl~name>::= <simple-str ing> <text~no>

<type>::= [<type-body>] | <type-double-body>

<type~body>::= fact | rule | category ] number | phrase

<type-double-body>::= [category] [rule] | [fact] [rule]

<body-type>:: = <fact-body> | <rule-body> [ <category>

<number-body> ] <phrase-body> j <category-rule-body>

<fact-rule-body>

<fact-body>::= <desc-spec> <expl-spec> <query-spec>

<rule-body>::= <desc-spec> <expl-spec> <rule-spec> <query-spec>

<category-body>::= <desc-spec> <expl-spec> <options-spec> <query-spec>

<number-body>::= <desc-spec> <expl-spec> <number-spec> <query-spec>

<phrase-body>:: = <desc-spec> <expl-spec> <query-spec>

<category-rule-body>::= <desc-spec> <expl-spec> <options-spec>

<rule-spec> <query~spec>

<fact-rule-body>::= <desc-spec> <expl-spec> <ru3e-spec> <query-spec>

<string-item>::= (3<para~name> | »<para-name> | <simple string)

<simple string>::= <name> \ <char sequence)

<name>::~ <lowercase> | <name> <letter> j <name> <digit>

<lowercase>: : = a..z

<letter>::= <lowercase> | A.. Z

<digit>::= 0..9

<text>::= <string> {<string»*

<string>::= <string-item> {<string-item>}*

<desc-spec>::= desc: <string>

<expt-spec>::= exp1: <string>

<options-spec>::= options: option {, option}"

<number-spec>::= range: <number>..<number>

<rule-spec>::= rule: <rule> {,<rule>}* j

rules: <rule> {,<rule>}*

<query-spec>::= query: <text>

<option>::= <name>-<string> ) <name>

<rule>::= <expression> [if <pattern>] | <pattern> [if <pattern>]

| use <goal> [if <pattern>]

<pattern>::= <pl> {or <pi»"

<pl>::= <P2> {and < P2»

<p2>::= not p2 l<valuespec> <relation> <valuespec>|<name>=<name>

1 (<pattern>)| <true> | <false> 1 <name>

<valuespec>::= <name>[<number>

<operator>::= <relation>1<boolean-op>

<relation>::= =|<>|<|>|<=|=>

<expression>: := <el> { <addop> <el>}*

<addop>:: = +|-

<multop>::= *|/

<boolean-op>::= A | v I not

<e2>::= ( <expression> ) | <valuespec>

<valuespec>::=<name><number>

<text-no>::= <digit> \ <text-no><digit>

<number>::= <unsigned-number> | <sign> <unsigned-number>

<sign>::= +j-

<unsigned-number>::= <digi t-sequence> | <decimal-number> t <exp-number>

<char-sequence>::= <letter>|<char-sequence><letter>

<digit-sequence>::= <digit>|<digit-sequence><digit>

<decima1-number>::= <digit-sequence> . <digit-sequence>

<exp-number>::= <digit-sequence> <manttsa-number> | <decimal-number> <mantisa-number>

<mantisa-number>::= E <sign> <digit-sequence>

APPENDIX B

THE CONCEPTUAL MODEL FOR THE UNIX DOMAIN

r—t i 11e

unix help faci1ity

(beginner)? =^ ref getting-started

(not beginner)? =£• ref system-commands

I—getting-started-

Desc: This section enables people to get started with Unix.

(not reg user)? =^ ref reg-user-inst

(not reg user)? A (not log-in)? =£• ref log-in-inst

(not reg-user)? A (not log-in)? A (cont)? =£• ref system-commands

(not reg-user)? A (not log-in)? A (not cont)? =4 textl

(reg-user)? A (not log-in)? =£• ref log-in-inst .

(reg-user)? A (not log-in)? A (cont)? =$ ref system-commands

(reg-user)? A (not log-in)? A (not cont)? =#• text2

(reg-user)? A (log-in)? A (cont)? =$ ref system-commands

(reg-user)? A (log-in)? A (not cont)? =4 text3

beginner [fact]

desc: you are new to this operating system

expls this is to determine the level of help necessary

query: are you a beginner with UNIX

reg-user [fact]

desc: tests for registered users

expl: to determine the procedure to become a registered user

needs to be explained

query: have you been set up as a registered user

i—reg-user-inst-

desc: this section deals with becoming a registered user

(user-type = D.Phil)? =4- text4

(user-type = M.Sc)? =4 text5

(user-type = B.A)? A (student-year = 4)? *=> textG

(user-type = B.A)? A non-fresher =$ text7

(user-type = B.A)? A (student-year = 4)? A (not non fresher)? =7- textS

(user-type = lecturer)? =#• textS

log-in [fact]

desc: tests for logins

expl: to determine whether the instructions regarding logging in

need to be specified

query: have you ever logged in

user-type [category]

desc: type to be registered under

expl: tests to see for which type of user the system should be set up for

options: D.phi 1, M.Sc, B.A, Lecturer

query: what type of user are you

student-year [number]

desc: year of study

expl: tests to see which computers undergraduates can use

range: 1..4

query: which year are you in

cont [fact]

desc: user is not a beginner but may want command information

expl: to see if the user requires further help

query: do you wish to look at the system commands

text!

you should now be able to log in and out N.B please remember your password,

text2

you can now log in please remember your password

text 3

you are not considered to be a beginner by this system

text4

The D.Phil students have access to all research machines

text5

The M.Sc students can use PRGU which runs UNIX and UAX2 under UMS

textG

The fourth year undergraduates have access to UAX3

text?

The undergraduates can use the 3B0Z micro computers

textB

first years should seek permission from their tutors to use the computers

text9

The lecturers can use any machines including those of other universitys

via the PAD.

log-in-inst

desc: this section deals with how to log in

((username) A (loged-ok v loged-ko)}? =4 textlO

username [phrase]

desc: the users name

expl: the users initials are needed in order to compose the users loggin code

query: what are your initials

xor

ogok

ogko

3ogok [fact]

desc: the user has loged in sucessfully

expl: the login status of the user is needed

query: is your prompt visible

logko [fact]

desc: the user has not been able to login

expl: the user may still be having problems logging in

query: are you loged in ok

text 10

PRG ,. ©username is your username and your password is .. ©password

switch on the terminal and when the prompt

login:

appears, type your username i.e. PRG .. ©username

then the prompt

password:

will appear. So type in your password i.e. ©password

you should now be loged in and the prompt

$

should appear.

password [phrase]

desc: the user needs a password

expl: ©username is your username but a password is needed for security

query: what is your password

non-fresher [fact] [rule]

desc: not first year student

rule: student-year = 1

—system-commands 1

desc: this section should cover the UNIX commands 8, enable the user to

use and select the correct one.

(help-area = basic-login)? ^ ref sectionl-empty

(help-area = boolean-functions}? =£• ref sect ionl-empty

(help-area = change)? =$ ref sectionl-empty

(help-area = data-manipulation)? =£• ref sectionl-empty

(help-area = editors}? =# ref sectionl-empty

(help-^area = help-facilities)? =4 ref sectionl-empty

(help-area = hardware-system-calls)? =$ ref sectionl-empty

(help-area = library-routines)? =#• ref sectionl-empty

(help-area = maths-functions)? =#• ref sectionl-empty

(help-area = printing)? =4 ref sectionl-empty

(help-area = prog-env-cmds)? =# ref sectionl-empty

(help-area = remote-useage)? =4 ref sectionl-empty

(help-area = system-calls)? ^ ref sectionl-empty

(help-area = tabs)? =^ ref sectionl-empty

(help-area = chronological-functions)? ref slG-watch

(help-area = others)? ref sectionl-empty

(help-area = not-sure)? ref sectionl-empty

help-area [category]

desc: area of unix that help is needed on

expl: tests to see which area of the operating system further expansion

is needed upon to aid the user

opt ions: basic-login-funct ions, boolean-funct ions, change,

data-manipulation-tools, editors, help-faci1ities-in-unix,

hardware-system-calIs, ]ibrary-routines, mai1-faci1ities,

maths-funct ions, pr inting, programming-envtronment-commands,

remote-useage-faci 1 i t ies, systern-cal Is, tabs,

chronological-functions, others, not-sure.

query: in which area do you require help

I—sect ionl-empty 1

desc: this is a dummy section used whilst the knowledge base is being

built for completeness and consistency

empty body

-chrono 1 og i ca I -f unct i ons 1

desc: this section deals with the calendar, date and time functions,

(option-lS = cal)? =#• ref section-16.1

(option-lG = calendar)? =^ ref section-16.2

(option-16 = date)? =^ ref section-16.3

(option-16 = time)? =4 ref section-16.4

(option-16 = expanded-format)? =v ref section-16.5

(option-16 = none)? =£• quit

option-16 [category]

desc: the calendar, date and time functions

expl: tests to see which function from the chronological ones is to be used

options: cal, calendar, date, time, expanded format, none

query: please select the function you are interested in

i—section-16. 1-

desc; this section deals with the "CAL" command which prints out the

calendar

(full-listing)? =$ textlG.1

(16.1-example-cmd)? =# text!6.lb

ful 1-1isting [fact]

desc: manual description of the command

expl: gives a full description of the command

query: do you require a full listing of the commands manual reference

textlG.i

NAME

cal - print

SYNOPSIS

cal [month]

DESCRIPTION

calendar

year

cal prints a calendar for the specified year. If a month is also

specified, a calendar for just that month is printed. Year can be

between 1 and 9999. The month is a number between 1 and 12.

The calendar produced is that for England and her colonies

Try September 1752

BUGS

The year is always considered to start in January even though this is

historically naive.

Beware that "cal 78" refers to the early christian era. not the 20th

century.

1G.1-example-cmd [fact]

desc: to determine whether an example is needed

expl: the user may require an example

query: do you wish to print a calendar out

textiG.ib

type the command: CAL Seal-month Seal-year

cal-month [number]

desc: month of the required calendar

expl: a month is needed for successful execution of the command

range: 1..12

query: which month is the calendar for

cal-year [number]

desc: year of the required calendar

expl: a year is require for successful execution of the cal command

range: 0..9999

query: which year is the calendar for

•sect ion-16. 2-

desc: this section deals with the "CALENDAR" command which is a reminder

service

(full listing)? =$• textlG.2

(16.2-example-cmd)? =#• text!6.2b

10

text 16.2

NAME

calendar - reminder service

SYNOPSIS

calendar [-]

DESCRIPTION

calendar consults the file 'calendar' in the current directory and prints

out lines that contain todays or tomorrows date anywhere in a line. Most

reasonable month-day dates such as 'Dec 7', 'December 7', '12/7', etc.,

are recognised, but not '7 December' or '7/12'. If you give the month as

'*' with a date, i.e. '* 1' that day in any month do. On weekends

'tomorrow' extends through Monday.

When an argument is present, calendar does its job for every user who has

a file 'calendar* in his login directory and sends him any positive

results by mail(l). Normally this is done daily in the wee hours under

the control of cron{8).

The file 'calendar* is first run through the 'C1 preprocessor, /lib/cpp,

to include any other calendar files specified with the usual '^include'

syntax. Included calendars will usually be shared by all users,

maintained and documented by the local administration.

FILES

calendar

/usr/1ib/calendar to figure out todays and tomorrows dates

/etc/passwd

/tmp/cal

/1ib/cpp,egrep,sed,mai1 as subprocesses

SEE ALSO

at(l), cron{8), mail(l)

BUGS

calendars extended idea of tomorrow doesn't account for holidays.

11

IB.2-example-cmd [fact]

desc: determine whether an example is needed

expl: the user may require an example

query: do you wish to use the calendar reminder service

text 16.2b

type the command: CALENDAR ©calendar-month ©calendar-day

ca1endar-month [number]

desc: a month of the year is needed

expl: a month is required for successful execution of the command

range: 1..12

query: which month of the year is the calendar for

ca1endar-day [number]

desc: day of the month is required

expl: a day is needed for sucessful execution of the command

range: 1..31

query: which day of the month is the calendar for

-sect ion-16.3-

desc: this section deals with the "DATE" command which prints and sets

the date

(full-listing)? =^ textlG.3

(16.3-example-cmd)7 =^ text 16.3b

12

text 16.3

NAME

date -

SYNOPSIS

date [-

print and set

u] [yymmddhhmrr

date

[ . SE

1

]]

DESCRIPTION

If no arguments are given, the current date and time are printed. If a

date is specified, the current date is set. The -u flag is used to

display the date in GMT (universal) time. This flag may also be used to

set GMT ; yy is the last two digits of the year; the first mm is the

month number, dd is the day number in the month; hh is the hour number

(24 hour system), the second mm is the minute number, .ss is optional

and is the seconds.

For example: date 10080045

sets the date to Oct 8, 12:45 AM. The year,month and day may be omitted

the current values being the defaults. The system operates in GMT. Date

takes care of the conversion to and from local standard and daylight time.

FILES

/usr/edm/wtmp to record time setting

SEE ALSO

utmp(5)

DIAGNOSTICS

Failed to set date: Not owner if you try to change the date but are not

the super-user

BUGS

The system attempts to keep the date in a format closely compatible with

UMS. UMS however, uses local time (rather than GMT) and does not

understand daylight saving time. Thus if you use both UNIX and VMS, UMS

will be running on GMT.

13

16.3~example-cmd [fact]

desc: determine whether an example is needed

expl: the user may require an example

query: do you wish to use the print and set date service

text1G.3b

type the command: DATE ©date-gmt [.. ©date-year .. ©date-month

..©date-day ,, ©date-hour .. f§date-mins [. .. ©date-sees]]]

date-gmt [phrase]

desc: is GMT needed

expl: if GMT is needed a parameter -u needs to be set

query: if GMT is needed type -u

date-year [number]

desc: the year the date is to be set to

expl: a year is needed for sucessful execution of the date command

range: 0..93

query: which year is the date to be set to (last two digits only)

date-month [number]

desc: the month the date is to be set to

expl: a month is needed for sucessful execution of the date command

range: 1..12

query: which month is the date to be set to

14

date-day [number]

desc: day the date is to be set to

expl: a day is needed for sucessful execution of the date command

range: I..3I

query: which day is the date to be set to

date-hours [number]

desc: hours the day is to be set to

expl: an hour is needed for sucessful execution of the date command

range: 0.,24

query: what hour is the clock to be set to (24 hour clock)

date-mins [number]

desc: minutes the date is to be set to

expl: a minute figure is needed for sucessful execution of the date command

range: 0..59

query: what number of minutes is the date to be set to

date-sees [number]

desc: seconds the date is- to be set to

expl: a seconds figure is needed for sucessful execution of the date command

range: 0..59

query: how many seconds is the date to be set to

I—sect ion-16.4-

desc: this section deals with the "TIME" command which times how long a

command takes to execute

(full listing)? =* textlG.4

{16.4-example cmd)? =4- textlG.4b

15

text 16

NAME

time

4

-

SYNOPSIS

time

t ime a

command

1

command

DESCRIPTION

The given command is executed; after it is complete, time prints the

elsapsed time during the command, the time spent in the system, and the

time spent in execution of the command.

Times are reported.

On a PDP-11, the execution time can depend on what kind of memory the

program happens to land in; the user time in MOS is often half what

it is in core.

The times are printed on the diagnostic output stream.

The time is built into csh(l), using a different output format.

BUGS

elapsed time is accurate to the second, while the cpu times are

measured to the 100th second. Thus the sum of the CPU times can be

up to a second longer than the elapsed time.

Time is a built in command to csh(i), with a much different syntax

this command is available as 'bintime' to csh users.

1G.4-examp1e-cmd [fact]

desc: to determine whether an example is needed

expi: the user may require an example

query: do you wish to time a command

16

textiG.4b

type the command: TIME .. @time-command

t ime-command [phrase]

desc: a command is needed to time

expl: the time command will not work without a command

query: what is the command

-sect ion-16. 5-

desc: this section gives a fuller account of the commands open to

interrogation under this heading

textlG.5

17

textlS.5

1) CAL - print calendar

- cal [month] year

- cal prints a calendar for a specified month and year

2} CALENDAR - reminder service

- calendar [-]

- calendar consults the file calendar in the current direcory

and prints out lines that contain todays or tomorrows date

anywhere in the line.

3) DATE - prints and sets the date

- date [-u] [yymmddhhmm [. ss]]

- if no arguments are given, the current date and time are

printed, if a date is specified the current date is set.

4) TIME - time a command

- time command

- the given command is executed after it is complete 'time'

prints the elapsed time during the command.

5) EXPANDED FORMAT - this text

6) NONE - quits the section.

18

APPENDIX C

KNOWLEDGE BASE FOR THE UNIX DOMAIN

WRITTEN IN ESP ADVISORS

KNOWLEDGE REPRESENTATION LANGUAGE

IB Sep 85 UNIX.TEXT

/x x /

/* */ /* A KNOWLEDGE BASED HELP SYSTEM FDR UNIX */ /* WRITTEN IN KRL FOR THE ESP/ADVISOR SHELL */ /* AND RUN ON THE IBM PC */ /* */ /* vERSION 1.0 AUGUST 19BS */ /* R.T.PLANT Programmes Research Group */ /* */ /* */

title 'The UNIX help facility'.

' The aim of this system is to help guide users through the UNIX '& ' operating system and the associated system calls'.

<beginner> re fe rence sl_jgett i n g _ s t a r t e d .

•{not beginner> re fe rence ch2_system_commands.

beginner: 'you are new to this operating system' fact explanat ion

'This is to determine the level Df explanation necessary' askable

'are you a beginnner with UNIX ?'.

/* */

/* */ /* SECTION 1 : GETTING STARTED */ /* */ /* x /

section sl_getttng_started:

'this section enables people to get started with unix'.

{not sl_reg_user> reference s2__reg_user_inst.

{not sl_reg_user and (not sl_log_Jn)> reference s3_log_tn_inst.

•Cnot si_reg_user and (not sl_log_in) and sl_conO reference ch2_system__commands

{not sl_reg_user and not sl_log_in and not sl_cont>

'you should now be able to log in and out: N.B. Please remember your password'

<si_reg_user and (not sl_log_in)> reference s3_log_in_jnst.

<sl_reg_user and (not sl_log_in) and sl_cont> reference.chZ_system_commands. {si_reg_user and (not sl_log_in) and (not sl_cont)> 'you can now log in please remember your password*.

{sl_reg_user and sl_log_in and sl_conO reference ch2_system_commands.

{sl_reg_user and sl_log„in and (not sl_cont)> 'you are not considered to be a beginner by this system!'.

sl_reg_user: 'tests for registered users' fact explanat ion

'to determine whether the procedure to become a registered user'8, 'needs to be explained'

askable

15 Sep 85 UNIX.TEXT

'have you been set up as a UNIX user ?'.

sl_log_in: 'tests for loggins' fact explanat ion

'to determine whether instructions regarding '& 'logging in need to be specified.'

askable 'have you ever logged in ?'.

sl_cont: 'user is not a beginner but may want command information1

fact explanat ion

'to see if the user requires any further help* askable

'do you wish to look at the system commands ?'.

/* */

/* */ /* SECTION 2 : REGISTER USER INSTRUCTIONS */ /* */ / M */

sect ion s2_reg_user_inst: 'this section deals with the details of becoming a registered user'.

<s2_user_type=dphi1> 'The D-Phil students have access to all research machines'.

<s2„user_type=msc> 'The rise students can use the PRGV which runs UNIX and VAX2 under VMS'.

<s2_user_type=ba and s2_student_year=^> 'The forth year undergraduates have access to the vax3'.

-Cs2_user_type=ba and s2_non_fresher> 'The undergraduates can use 3B0Z micros'.

<s2_user_type=ba and s2_s tuden t_yea r04 and (not s2_non_fresher )> 'first years should seek permission from their tutors to use the computers'.

<s2_user_type=lecturer> 'The lecturers can use any machines including those of other un i verst t ies' 8. 'via the PAD'.

s2_user_type: 'type to be registered under' category explanat ion

'this parameter tests to see for which type of user the '& 'system should be set up for.1

opt i ons dphil - 'D.phil', msc - 'El. Sc' > ba - 'B.A', lecturer - 'Lecturer ' askable 'what type of user are you ?'.

s2_student_year: 'year of study' number explanat ion

'this parameter tests to see which computers undergraduates' &

15 Sep B5 UNIX.TEXT

'can use.' range 1..4 askable

'which year are you in ?'.

s2_non_fresher: 'first year student' fact rule s2_student_year<>l.

/x x /

/* */ /* SECTION 3: LOGIN INSTRUCTIONS */ /* */

/Jf v

section s3_log_in_inst: 'This section deals with how to log in'.

<username> prg' .. ©username.. ' is your username and your password is * .. ^password £

•a switch on the terminal and when the prompt's loggin:'S appears type your user number i.e. prg' .. Susername & then the prompt'£ passwd:'£ will appear so type in your password i.e.' .. Spassword £ you should now be logged in and the prompt's $ '£ should appear'.

xor logok* logko.

username: 'the users name' phrase explanat ion

'the users initials are needed in order to '& 'compose the users loggin code*

askable 'what are your initials ?'.

password: 'the user needs a password' phrase explanat ion

Eusername.. 'is your username but a password is also-needed '8. 'for secur i ty'

askable 'what is your password ?'.

logok: 'user has loged in sucessfully' fact explanat ion

'the loggin status of the user is needed' askable

' is your $ prompt visible ?'.

logko: 'the user has not been able to log in' fact explanat ion

'the user may still be having problems logging in' askable

'are you loged in o.k ?*.

15 Sep 85 UNIX.TEXT

/* */

/* ' */ /* CHAPTER TWO : SYSTEM COMMANDS */ /* */

/it */

section ch2„system_commands: 'this section should cover the Unix commands and enable the user to use and'S. 'select the correct one'.

-Chelp_area=bas ic_login_funct ions> reference sect ionl_empty.

{help_area=boolean_functions> reference sect ionl_empty.

<help_area=change> reference sectionl_empty.

<help_area=data_manipulation> reference sect ionl_empty.

<help_area=editors> reference sect ionl_empty.

{help_area=help_fact1ities> reference sectionl_empty.

<help_area=hardware_system_calls> reference sect tonl_empty.

<help_area=library_routines> reference sectionl_empty.

<help__area=maths_funct ions> reference sect ionl_empty.

•Chelp_area=pr int ins> reference sect ionl_empty.

{help_area=prog_env_cmds> reference sect ionl_empty.

{help_area=remote_useage> reference sectionl_empty.

<help_area=system_calls> reference sect ionl_empty.

<help_area=tabs> reference sectionl_empty.

<help_area=chronological_funct ions> reference slB_chronological.

<help__area=others> reference sect ionl_empty.

•Chelp_area=not_sure> reference sect ionl_empty.

help^area: 'area of un i x that help is needed on* category explanat ion

'tests to see which area of the operating system further expansion' 8. 'is needed upon to aid the user-'

opt ions basic_login_functions - 'basic login'? boolean_functions - 'boolean values'/ change - 'change', data_manipulation - 'data manipulation tools'* editors - 'on line editors'/ help_faci1ities - 'hellp facilities in unix'/ hardware_system_calls - 'hardware system calls', 1ibrary_routines - 'library routines', matl_faci1 ities - 'mail facilities', maths^functions - 'maths functions', printing - 'printing', prog_env_ctnds - 'programming environment commands'/ remote_useage - 'remote useage', system_calls - 'system calls ', tabs - 'tabs',

15 Sep 85 UNIX.TEXT

chronolo9ical_funct ions - 'chronolog ical funct ions'> others - 'others'/ not_sure - 'not sure'

askable 'in which area do you require help ?'.

/ J t * / / * * / / * SECTION 1 : EMPTY SECTION * / / * * / / * * /

section sectionl_empty: 'this is a dummy section used whilst the knowledse. ' 8. 'base is beins built for completeness and consistency.'.

/ * * / /* SECTION IS : CHRONOLOGICAL FUNCTIONS */ /* */ /* _ */

sect ion sl6_chronological:

'This section deals with the calendar/ date and time functions'.

{options_16=cal> reference sect ion_lB_l.

{options_16=calendar> reference section_16_2.

<options_16=date> reference section_16_3.

•Copt i ons_16=t ime> reference sect ion_lS_4.

<options_lS=expanded_format> reference section_16_5.

•Copt ions_16=none> quit.

options_lG: 'The calendar* date and time functions' category explanat ion

'tests to see which function from the chronological ones is to '& 'be used'

opt ions cal - 'CAL - print calendar', calendar - 'CALENDAR - reminder servive'* date - 'DATE - print and set date'/ time - * TIME - time a command'/ expanded_format - 'EXPANDED FORMAT - helps explain these options'/ none - 'QUIT this set of functions'

askable 'please select the function you are interested in'.

/* x /

/* */ /* SECTION 16.1 : THE TAL' COMMAND */ /* */ /* K /

section sect ion_16__l: 'This section deals with the ''CAL'1 command which prints out the calendar'.

{full listin9> /* TEXT16.1 */ * NAME'&

15 Sep 85 UNIX.TEXT

cal - print calendar'6

SYNOPSIS'8. cal [month] year'&

'£ DESCRIPTION'a

cal prints a calendar for the specified year. If a month is also specified' ,a calendar for just that month is printed. Year can be between 1 and 9999. The month is a number between 1 and 1Z.'& The calendar produced is that for Enaland and her colonies'** '£ Try September 1752'£ >

<full_listin3> BUGS'£

The year is always considered to start in January even though this is'R historically naive.'£ Beware that ''cal 78'' refers to the early christian era/ not the ZOth'S, century.'.

/* END OF TEXT 16.1 */

<sl6_l_example_cmd > 'Type the command: CAL ' .. lcal_month.. ' ' .. §cal_year.. ' '.

sl6_l_example_cmd: 'To determine whether an example is needed' fact explanat ion

'The user may require an example' askable

'do you wish to print a calendar out ?'.

full_listins: 'The manuals description of the command' fact explanat ion

'sives a full description of the command' askable

'do you require a full listina of the commands manual reference ?'.

cal_month: 'month of the required calendar' number explanat ion

'A month is required for successful execution of the CAL command' ranse 1..12 askable

'which month is the calendar for ?'.

cal_year: 'Year of the required calendar' number explanat ion

'A year is required for successful execution of the CAL command' range 1..9399 askable

'which year is the calendar for ?'.

<full_listina or. slB_l_example_cmd> reference slG_chronolo3ical.

/* */

/* */ /* SECTION IS.2 : THE ^CALENDAR' COMMAND */ /* */ /* x /

15 Sep 85 UNIX.TEXT

section section_16_2: 'This section deals with the ''CALENDAR1' command, which is a reminder service*.

{full_listing> /* TEXT1E.Z */ NAtfE'S

calendar - reminder service'8.

SYNOPSIS'a calendar [-]' .

<full_listing> DESCRIPTION*&

calendar consults the file ''calendar'* in the current directory and'& prints out lines that contain today'*s or tomorrow''s date anywhere in a'& line. Host reasonable month-day dates such as ''Dec.?*' ''december 7,'' '£ " 1 2 / 7 " , e t c are recognised, but not " 7 December" or " 7 / 1 Z " . If you'S 9ive the month as ' ' *'' with a date, i.e ''* 1*', that day in any month '8, do. On weekends ''tomorrow*' extends through Monday.'& '£

When an argument is present, calendar does its job for eyery user who has'£ a file ''calendar'* in his login directory and sends him any positive '£ results by mailEl). Normally this is done daily in the wee hours under'fc the control of cron(8).'& '£

The file ''calendar'' is first run through the ''C' preprocessor, '& /lib/cpp, to include any other calendar files specified with the usual '£ ''^include1' syntax. Includeed calendars will usually be shared by all '£ users, maintained and documented by the local administration.'.

<full_listin9> riLES'a

calendar' £ /usr/1ib/calendar to figure out today''s and tomorrows dates'S /etc/passwd'& /tmp/cal*'S /Iib/cpp,egrep,sed,mai1 as subprocesses'&

SEE ALSO'fc a t U K cronEB), ma iKD' f t

'& BUGS ' a

calendar''s extended idea of ''tomorrow'1 doesn''t account for holidays.'.

/* END OF TEXT 16.2 */

<slS_2_example_cmd> 'type the command: CALENDAR ' .. Ecalendsr_month.. §calendar_day.. ' '.

calendar_day: 'day of the month is required' number explanat ton

'A day is needed for successful execution of the command' askable

'which day of the month is the calendar for ?'.

calendar_month: 'ft month of the year ts required' number explanat ion

'A month is needed for successful execution of the command' askable

'which month of the year is the calendar for ?'.

sl6_2_example_cmd: 'To determine whether an example is needed' fact

15 Sep 85 UNIX.TEXT

explanat ion 'the user may require an example'

askable 'do you wish to use the calendar reminder service ?'.

/ 9 t * / / * * / / * SECTION IS.3 : THE *DATE' COMMAND * / / * * / / * * /

section section_16_3: 'this section deals with the ''DATE'' command '& 'which prints and sets the date'.

<full_listing> /* TEXT IS.3 */ NAME'8

date - print and set date'& '£

SYNOPSIS'£ date E-uI Iyymmddhhmm [.ss]]'.

{full_listing> DESCRIPTION'S

If no arguments are given/ the current date and time are printed. If a '£ date is specified* the current date is set. The ~U flag is used to display'! the date in GMT (universal) time. This flag may also be used to set GMT * & yy is the last two digits of the year; the first mm is the month number'8. ;dd is the day number in the month; hh is the hour number (24 hour system)'! ;the second mm is the minute number; .ss is optional and ts the seconds.'£ For example :'& '£

date 10080045'& sets the date to Oct 8> 12:45 AM. The year/month and day may be omitted,'£ the current values being the defaults. The system operates in GMT. Qate'2, takes care of the conversion to and from local standard and daylight time.'!

<full_listing> FILES'S

/usr/adm/wtmp to record t ime-sett ing'2, '£

SEE ALSO'£ utmp(5)'& '&

DIAGNOSTICS'£ Failed to set date: Not owner'' if you try to change the date but are not'S the supei—user.'fc '8.

BUGS'8 The system attempts to keep the date in a format closely compatible with '8 VMS. UMS however, uses local time (rather than GMT) and does not '£ understand daylight saving time. Thus if you use both UNIX and UMS, vT1S '8 will be running on GMT.'.

/* END OF TEXT IS.3 */

<sl6_3_example_cmd> 'Type the command: DATE ' .. @date_gmt.. ' [ ' .. §date_year.. @date_month.. ©date_day.. @date_hours. . f3date_mins. . ' f. ' .. idate_secs. . ']] '.

slB_3_example_cmd: 'determine whether an example is needed' fact explanat ion

'the user may require an example' askable

'do you wish to use the print and set date service'.

15 Sep 85 UNIX.TEXT

date„gmt: 'Is GMT needed' phrase explanat ion

'If GMT is needed a parameter -u needs to be set' askable

'If GMT is needed type -u'.

date_year: 'the year the date is to be set to' number explanat ion

'ft year is needed for sucessful execution of the date command' range 00..33 askable

'Which year is the date to be set to (last two digits only).'.

date_month: 'The month the date is to be set to* number explanat ion

'ft month is needed for sucessful execution of the date command' range 01..12 askable

'Which month is the date to be set to'.

date_day: 'Day the date is to be set to' number explanat ion

'A day is needed for successful execution of the date command' range 0..31 askable

'which day is the date to be set to'.

date_hours: 'Hours the day is to be set to' number explanat ion

'an hour is needed for sucessful execution of the date command' range 0..24 askable

'what hour is the clock to be set to (24 hour clock)'.

date_mins: 'Minutes the date is to be set to' number explanat ion

'a minute figure is needed for successful execution of the date command' range 0..53 askable

'what number of minutes is the date to be set to'.

date_secs: 'seconds the date is to be set to' number explanat ion

'a seconds figure ts needed for sucessful execution of the date command' range 0..53 askable

'how many seconds is the date to be set to*.

/9f */

/* */ /* SECTION IS.4 : THE VTIME' COMMAND */ /* . */ /* */

sect ion sect ton 16 4:

15 Sep 85 UNIX.TEXT 10

'this section deals with the ''TIME1' command which takes how long a command's. 'takes to execute'.

•Cfull_listing> /* TEXT 16.^ */ NAME'S

time - time a command*& *£

SYNOPSIS' S t ime command'^ >

-Cful l_ l is t ing> DESCRIPTION'S

The given command is executed; after it is complete* time prints the'S elapsed time during the command/ the time spent in the system/ and the '6 time spent in execution of the command.'S Times are reported in seconds.'S *S On a PDP-11/ the execution time can depend on what kind of memory the 'S program happens to land in; the user time in MOS is often half what it'S is in core.'S *S The times are printed on the diagnostic output stream's '& Time is built in to csh(l)» using a different output format's >

<full_listing> BUGS'S

elapsed time is accurate to the second/ while the CPU times are measured'& to the 100th second. Thus the sum of the CPU times can be up to a second's longer than the elapsed time-'8. 'S Time is a built in command to csh(l), with a much different syntax's This command is available as ''/bin/time1' to csh users.'.

/* END OF TEXT IS.4 */

<siS„4_example_cmd> 'Type the command: TIME ' .. @time_command. . ' '.

sl6_5_ex3mple_cmd: 'Determine whether an example is needed' fact explanat ion

'The user may require an example' askable

'Do you wish to time a command ?'.

time_command: 'A command is needed to time' phrase explanat ion

'The time command will not work without a command' askable

'what is the command ?'.

/ x */

/* */ /* SECTION IS.4 : EXPANDED FORMAT */ /* */ / # */

sect ion sect ion_lG__5; 'This section gives a fuller account of the commands open to interagation '& 'under this subset of unix'.

15 Sep 85 UNIX.TEXT 11

•Ctrue> 1) CAL - p r i n t ca lendar '&

- c a l (month) year '& - cal prints a calendar for a specified year*&

'£ Z) CALENDAR - reminder service'8.

- calendar [-3*8, - calendar consults the file calendar in the current directory'^ and prints out lines that contain todays or tomorrows date'8. anywhere in the line.'fi

»

<true> 3) DATE - print and set date'R

- date [-u] lyymmddhhmm [.ss33'& - if no arguments are given/ the current date and time are'a printed if date is specified the current date is set.'.

•a 4) TIME - time a command'^

- t ime command '& - the given command is executed/ after it is complete ''time'1 '& prints the elapsed time during the command.'8,

•a 5) EXTENDED FORMAT - This text'8.

•a B) NONE - QUITS this section '.

APPENDIX D

THE B.N.F DEFINITIONS

USED TO CONSTRUCT

THE KRC SCRIPT BNF.TEXT

<s> : : = <genq>|<findq>!<whatq>[<wlongq>|<di rectq>

<genq> ::= <genhow>

<findq> ::= <findhow>(<howgen>[<how1ong>1<how-who>)

<howgen> :: = <howtoXrest>

<howlong> :: = <howX longhobj>

<how-who> ::= <whoqXlongwobj>)

<whatq> : : = (<whatquest>(<longw>)(<whatobjXcmd>|<cmdXwhatobj>]<whatextXrest>)

<wlongq> ::= <actionX1ongquery>|(<rest>i<longquery>)

<directq> =:= <shortq>[<longquery>

<genhow> ::= <howqXrest> | <howqXrestXdi ret ionXobject>

<findhow> ::= <howqXfind>

<rest> ::= <action><object>

<howq> ::= "how can i"("how do i"

<action> ::= "create"["get help on"|"references to"["sort"j<action2>

<action2> ::= "compare"|"remove"["print"["change"<access>|"delete"

<object> ::= <number>"files"["a file"|"a directory"[<object2>

<object2> ::= "anadex printer"|"laser printer"[<none>

<access> ::= "permission to"i"access to"

<number> ::= <digi tXmoreXnone)

<digit> ::= "one"|"two"["three"|"four"]"five"j"six"["seven"["eight"]"nine"|<digi12>

<digit2> ::= "1"|"2"["3"["4"|"5"|"6"["7"i"8"["9"

<more> :: = "plus"|"or more"|none

<find> ::= "find out"

<h0Wt0> ::= "how to"

<how> ::= "how"

<dirction> : :

<longhobj> : :

<whoq> : :

<longwobj> ::

<whatquest> ::

<whatobj> ::

<cmd> ::

<whatext> ::

<longw> ::

<shortq> ::

<shortq2> ::

<longquery> : :

<longquery2>: :

= "on the"["from the"["from a"["on"|"from"

= "full the disk is"

= "who has"1"who is"

= "has access to my files")"on the system"

= "what is the"!"what are the"["what is a"

= "search path"|"sort"|"time"|"date"|"cal"["calendar"

= "command"|"the command"|<none>

= "best way to"

= "what do you know about"("what can you tell me about"]"what are the"

= "cal"("sort"["calendar"|"chmod"|"cmp"["date"|"cp"[<shortq2>

= "he 1P"1"1s"|"quota"["rm"]"t i me"

= "unix"|"the file system"["pipes"]<longquery2>

= "the standard input"["the standard output"

Abstract

We discuss the development of an intelligent knowledge based help system for UNIX. Two approaches being examined in detail. The first is based on an expert system (shell', the second a natural language approach in which a custom built system was implemented in the functional language KRC.

Acknowledgements

I would like to thank my supervisor Dr. lb Sorensen for his encouragement and guidance during the project. Many thanks also go to Graham van Terheyden of British Petroleum who initially devised the project and provided encouragement, help and financial backing throughout,

I would also like to thank the many people in the PRG who have given help both during this project and during the MSc. course in general.

I am also greatful to the Science and Engineering Research Council of Great Britain for their financial support during this year.

Finally thanks to Stephen Murrell and David Gold for keeping me sane during the many hectic moments.

CONTENTS

Chapter 0. Introduction.

Chapter .1. An overview of knowledge engineering. 1.1 - The knowledge acquisition process.

Chapter 2. Problem domain. 2.1 - Problem definition. 2.2 - Identification stage. 2.3 - Comparison of solution methods.

Chapter 3. A shell based approach. 3.1 - Architecture of an expert system. 3.2 - The concept of expert system shells. 3.3 - Shell selection criteria. 3.4 - Conclusion to shell selection. 3.5 - ESP Advisor.

3.5.1 - Knowledge representation language. 3.5.2 - KRL compiler. 3.5.3 - Consultation shell.

3.6 - Conceptualisation stage.

10 10 12

15 17 19 20 22 23 25 26 27

Chapter 4. Conceptual model. 4.1 - formalisation stage.

4.1.1 - A walkthrough of the B.N.F definition. 4.2 - A conceptual model for. the UNIX domain. 4.3 - Implementation stage. 4.4 - Evaluation of the conceptual model. 4.5 - Evaluation of the ESP/Advisor shell's performance. 4.6 - Conclusions and summary to the shell approach.

Chapter 5. A- Natural Language Approach. 5.1 - An investigation of NLP Techniques. 5.2 - The UNIX assistant Natural Language Processor. 5.3 - Grammar and BNF definitions. 5.4 - Language Choice. 5.5 - B.N.F implementation in KRC. 5.6 - The B.N.F Parser. 5.7 - The Knowledge Base. 5.8 - Achieving a mixed initative dialogue.

Conclusions and summary

References

29 30 44 51 55 55 58

60 61 62 64 65 69 76 80

91

93

Appendix A: The BNF for the conceptual model.

Appendix B The conceptual model for UNIX.

Appendix C Knowledge base for the UNIX domain written in KRL.

Appendix D The BNF used to construct the KRC script BNF.TEXT.

Appendix E KRC scripts for the natural language UNIX assistant.

Appendix F - Manual reference for the sort command.

exit

Introduction.

The aim of the project was develop an expert system to act as a help facility in advising users in use of the UNIX* operating system.

The problem of communication with an operating system by a user (naive or not) is. an appealing domain for study and well suited to the application of artificial intelligence techniques. The domain is complex enough to provide substantial subproblems in many areas of artificial intelligence and well enough defined that the knowledge base or a subsection of it is capable of being modeled and constructed.

In the development of this expert system the unusual step was taken of extracting all the knowledge from textual experts. This means that instead of a human expert being consulted and the knowledge acquisition process being carried out with the human, the process will took place on texts such as the manuals (on line and printed) and text books. This area of investigation having previously received very little attention yet potentially being a very useful and fruitful area, for there are vast amounts of human knowledge available in textual form.

One of the bases of the project is the new area of knowledge engineering from textual sources and in order to gain experience in knowledge engineering an investigation of existing techniques was undertaken, a summary of which is given in chapter one.

The problem domain is investigated in detail in chapter two as well as a feasibility study of the different approaches open in solving the problem.

In chapter three the expert system shell based approach is examined and after an investigation of criteria for selection of the best shell the chapter focuses on the ESP/Advisor shell. This section deals with the technique known as 'text animation' and discusses the problems associated with the conceptualisation stage in the development cycle when the knowledge has to be extracted from textual rather than human sources.

In order to overcome the conceptual problems discussed in chapter three a conceptual model was created; this is discussed in chapter four. The chapter takes the conceptual model and produces a representation for the UNIX domain with it. This is then implemented as a knowledge base for the ESP/Advisor shell.

The second approach open to implementing an expert system is to develop a customised system and this is the focus of chapter five, where the area of natural language query systems is investigated. The chapter follows the development of the expert system from formaiisation through to implementation in KRC (Kent Recursive Calculator).

* UNIX is a trade mark of AT & T Bell Laborarories.

CHAPTER, ONE

An overview of knowledge engineering

Artificial intelligence is the general title that encompasses many facets of computer science, including robotics, computer vision, natural language understanding, cognitive modelling and logical reasoning. From these areas of artificial intelligence has emerged the field of "expert systems" which has been defined by the British Computer Society Committee of the Specialist Group on Expert Systems as:

"The embodiment within a computer of a knowledge based component from an expert skill in such a form that the machine can offer intelligent advice or take an intelligent decision about a processing function. A desirable additional characteristic which many would describe as fundamental is the capability of the system on demand to justify its own line of reasoning in a manner directly intelligible to the enquirer. The style to obtain these characteristics is rule based programming."[l].

The first period of expert systems research was dominated by a naive belief that a few laws of reasoning coupled with powerful computers would produce expert and even superhuman performance. This approach was first attempted by Newell, Shaw and Simon who started work on their General Problem Solver in 1957 [2]. The GPS was the first problem solving program to separate its general problem solving methods from the knowledge specific to the problem domain. This step of separating the problem solving part of the system which gave no information about the kind of data worked on, and the task specific knowledge was a very significant one in the history of expert systems. This separation of knowledge types coupled with the thoeretical basis laid down by Post in 1943 [3], lead to the production system concept.

By the mid 1970's several expert systems had begun to emerge and as more systems were developed it came to be recognised that the knowlege base plays a central role in the performance of these systems. The appreciation of this fact lead to two main lines of research, one being the investigation of epistemology (4] and the other the investigation of knowledge representation. The investigation of epistemology concluded that a general theory of knowledge was not possible, on the grounds that it was far too broad an area that the debate was at times more suited to a philosophical approach. The knowledge representation research was more fruitful and yielded various different approaches to the encapsulation of knowledge in a machine useable form. For example the use of frames [5] is a technique for providing a framework within which new data is interpreted in terms of concepts acquired through previous experiences. Other representational forms are possible such as scripts, semantic primitives [6], direct (analogue) representations (7j, semantic networks [8], procedural representations [9], logic [10] and of course

3

production systems (11].

The expert system research had, by the iate 1970's, produced many working and useful systems which could deservedly be called an expert system or consultant. Among the outstanding ones are:

DENDRAL [12], which analyses mass spectographic nuclear resonance and other experimental data to infer the plausible structures of an unknown compound.

MYCIN [13], the most famous example of an expert system, addresses the problem of diagnosing and treating blood diseases. When a panel of experts evaluated the performance of several different agents including medical experts, interns, and MYCIN, MYCIN'S performance was judged to be as good if not superior to that of all others.

PROSPECTOR [14], which assists geologists working on certain problems in "hard rock" mineral exploration, has proved extremely useful. For example, it predicted a Molybdenum deposit would be found in a certain location; the prediction was confirmed with a find worth $100 million. These working systems seem to show that Feigenbaum is correct in his statement "The expert's knowledge provides the key to expert performance, whilst knowledge representation and inference provide the mechanisms for its use" [15]. The remaining question is - if we have the hardware and software to implement an expert system in a suitable representation what is preventing the widespread use of expert systems? To answer this, it is necessary to examine the development cycle of an expert system.

Creating an expert system is not an easy process, the DENDRAL project has taken twenty years of effort and resources to achieve the position of being regarded as a true consultant system. General problems in the development of expert systems stem from the fact that they are unable to recognise or deal with problems for which their own knowledge is inapplicable or insufficient. In other words this is due to a lack of knowledge and is a consequence of the phenomenon known as the "knowledge acquisition bottleneck".

Knowledge acquisition is the transfer and transformation of problem solving expertise from some knowledge source to a program. Potential sources of knowledge include human experts, textbooks, data bases and one's own experiences. The knowledge source is thus a specialist in the narrow subject area under consideration. The expertise is encapsulated by a collection of specialised facts, procedures and judgemental rules about the narrow domain area rather than general knowledge about the domain. However, because much of the knowledge is subjective, ill codified and partly judgemental, the process of extracting the knowledge from an expert and representing it in machine compatible form is not an easy one. This process is known as "knowledge acquisition" and involves problem definition, implementation and refinement as well as representing facts in a suitable form (frames, scripts etc) or creating a new or hybrid representation.

Knowledge acquisition is the bottleneck in expert system construction because it is a very slow and difficult process. The knowledge engineer's job is to act as an interface to help an expert to build a system, a very close analogy being the role of a systems analyst who is the interface in a conventional data processing application between user and developer.

4

1.1 The Knowledge Acquistson Process.

The knowledge acquisition process can be divided into five stages and the following diagram [16], shows how these stages relate together. However, as can be seen the development is not linear and reiteration of stages may be necessary.

IDENTIFICATION

CONCEPTUALISATION

FORMALI SATI ON

IMPLEMENTATION

TESTING

Identify

problem

characteristics

REQUIREMENTS

Find

concepts to

represent

knowledge

CONCEPTS

Design

structure to

organise

knowledge

STRUCTURE

Formulate

rules

to embody

Knowledge

RULES

Uaiidate

rules

that organise

knowledge

REFORMULATION

REFORMULATION

REDESIGN

REFINEMENTS

The first step in acquiring knowledge for an expert system is to characterise the important aspects of the problem. This is called the identification or requirements stage. The first thing to be clarified is role definition. The members of the knowledge acquisition team must be identified, as must the experts. The usual procedure then is to assign domain experts to knowledge engineers on a one to one basis. The domain expert then acts as an informant telling the knowledge engineer about their subject of expertise. It is the role of the knowledge engineer is to absorb this output, familiarising themselves with the subject matter (perhaps at a later date by consultation with other sources). In most cases the knowledge engineer works with the domain expert for the greater part of the formalisation, but for clarifying basic points, especially at the start of the project, he may be helped by an assistant of the expert; for example, in a hospital system a houseman may clarify basic points or vocabulary in place of the consultant. The knowledge engineer will check that he has, in fact, correctly understood the expert by producing test cases and example situations for the expert to confirm.

When all of the expert's and engineer's roles have been established, the domain expert and a knowledge engineer can then work towards identifying the specific problem under consideration. This will involve an informal discussion on the various aspects of the problem, it's definition, characteristics and subproblems. The aim at this stage is to specify the problem and the necessary knowledge inter-relationships in order to begin the development of the knowledge base. As in the requirements stage of any systems analysis exercise, it is important to understand exactly what problem the system will be expected to solve, and how the problems can be characterised or defined. If the problem is a large one (in terms of analysis for the knowledge engineer) the subproblems must be identified. The data should be inspected and assessed for machine problem solving suitability. A large problem area that the knowledge engineer is initially faced with, is that of vocabulary and terminology, so a lexicon of the words used in the particular application should be created. A solution definition should also be devised in order to give an indication of the goal.

In looking at the whole subject area that the expert system is meant to cover, the knowledge engineer should extract from the domain expert information as to which aspects of expertise he uses - consciously or subconsciously - in solving these problems. It is important, but very difficult, for the knowledge engineer to obtain information as to where the domain expert derives his "relevant knowledge" from, for a particular solution to a particular problem. The knowledge engineer and domain expert will need to work closely together to define the problem before looking at the processes behind a solution. An initial informal characterisation of the problem by the domain expert should lead to questions by the knowledge engineer to clarify terms and key concepts. The domain expert gives example solutions to typical problems and after several iterations, the knowledge engineer and domain expert arrive at a definition of the requirements of the expert system.

Knowledge acquisition is a process that requires heavy resource support, in terms of time, computer facilities and expert availability, all such measures can be

6

very expensive. It is the cost of the expert's time that needs careful assessment on the part of the knowledge engineer in order to utilise it to the full. It may be more cost effective if a knowledge engineer works with domain specialists who vary in levels of expertise and draw the required information at the most appropriate level.

At the end of this first stage it may be a good idea to seperate out the goal itself from the sub-tasks from which it is composed, as these may have become slightly merged and the clarity of separation may aid in identifying available approaches. The second stage in the knowledge acquisition process is that of "conceptualisation" in which the key concepts and inter-relation determined during the identification stage are made explicit. The knowledge engineer can use many different techniques to formalise these ideas, from diagnostic representations to frame-like ideas. These may be of help later in the investigation of the implementation options but this should not bias any of the aims at this stage. In order to guide the conceptualisation process the knowledge engineer must find out what types of data are available, whether this data is produced in a raw form or if it has been produced by an inference process. The problem area which is usually modularised will need to be examined and each subtask discussed with the domain expert to check whether it is usually defined as a specific problem in it's own right within the expert's domain. The concepts that need to be formed around the problem area may be related to existing techniques used within the field of the domain. For example, a particular type of numerical analysis may be known to give a result that solves a certain problem in a more efficient way than any other. It has to be remembered that an expert system should use the methods of the existing domain expert in order to reproduce their actions - improvements can wait until later - and disimilar techniques from those used by the domain expert may produce different reasoning paths and different results.

The strategies to overcome each of the subproblems will relate together in order to give a solution to the whole problem. The knowledge engineer must therefore try to discover their hierarchical relationships. This is one of the most difficult tasks faced by a knowledge engineer as the domain expert generally does not think of solving the problem in terms of the underlying concepts but in an integrated environment which draws upon experience. Experience, which is defined by the Collins Dictionary as "Practical knowledge gained by trial or practice; personal proof or trial; continuous practice", is the very essence of a knowledge engineer's task and the major problem is condensing a lifetime's observations into a series of codified rules. In order to do this the knowledge engineer must ask questions not only of how the individual concepts work but how they are linked together and what constraints are placed on them. Techniques used in systems analysis can be used here; for example, data flow analysis can be used to help interpret the expert's information flow between concepts. An underlying theme that needs to be considered all through the conceptualisation phase is to be able to justify and explain any solutions reached. Explanation is an important facet of any expert system, a solution that cannot be justified may not be as useful as one where the reasoning processes have been explained. This is especially important as the result of an expert system will, usually, be used in a situation where it acts as an assistant to another expert. If differences of opinion are found then the reasons for

7

them will need to be investigated. This is not to say that if a different process of reasoning has been used by the system to a human, it is wrong, however this point can become a philosophical one.

The conceptualisation stage, like that of identification is an iterative one. However, once an outline of the concept has been drawn up, rather than to keep reiterating on this a good idea is to develop a small prototype system. By using a small "expert system shell" and an "expert systems development environment" it is possible to make fairly quick progress towards a mini-system, perhaps just dealing with a small subsection of the whole problem. From this mini-system the various concepts and reasoning processes can be examined by the knowledge engineer and domain expert together both gaining enormous insights into clarifying the situation. The knowledge engineer can ask questions regarding points which he originally considered straight forward but upon secondary analysis are more complex, and it gives the domain expert the chance to review his reasoning processes. The knowledge engineer, however, may not actually show the domain expert this computerised system as it may make him more implementationally orientated than necessary at this stage in the development. The use of a mini prototype system is often more cost effective than to developing a full system, hoping that the concepts were identified correctly the first time and the representation chosen to model them was the most suitable.

The third phase in the development of an expert system is that of "formalisation", or systems design. Formalisation involves mapping the key concepts, subproblems and information flow characteristics isolated during conceptualistaion into a more formal representation such as production systems, frames or direct representations. The knowledge engineer now takes a more active role in the development of the system. He discusses how the system could be implemented and with the insight given by the prototype system should be able to develop a suitable knowledge representation for the situation. If an existing formalism is not entirely suitable then a new one can be developed or a hybrid produced.

There are three important factors in the formalisation process that can be identified: The hypothesis space, the underlying model of the process and the characteristics of the data. The hypothesis space is the formalisation of the concepts and the linkages between them. The grain size, scope and structure of the concepts all give insights into the nature of this hypothesis space, whether it is finite, uncertain - indicating the need of probabilistic apportionment of weighting, or whether different levels of abstraction are needed. The uncovering of an underlying model of the process used to generate solutions in the domain, can be an important step in formalising knowledge whether the model be a simple relational, or mathematical one and it may aid in maintaining consistency of the knowledge base. Understanding the nature of the data in the problem domain is also important in formalising knowledge. If the data can be explained in terms of an hypothesis, it may help to explain how these hypotheses relate to the goals in the problem solving process. Consideration of whether the data is: sparse, insufficient, plentiful, redundant, noisy, uncertain, time dependent, reliable, inacurate, continious, complete

or in need of pre-processing needs to be need to be asked to gain as much from the data as possible, the questions being modified to suit the application involved.

The result of this formalisation stage is the specification of the knowledge representation that will be used to hold the knowledge base and a specification of the domain specific knowledge consisting of the inter-relations, data and information flow, as well as guide lines for the structure of the inference engine that will be created.

The fourth phase in the development cycle is the implementation stage which will involve building from the specification a second prototype system which, depending upon it's success, may either be scrapped or will evolve into the full working system. The formalisation stage will have produced a full specification of the knowledge base and how to represent it, the processes of the inference engine and domain dependent knowledge. From this basis the knowledge engineer can either develop his system by means of tools and shells or from scratch in an appropriate language.

The final stage of the development is that of testing. This involves evaluating the system and representational forms used. It should be compared to cases that have been evaluated correctly by the human expert. The domain specialist will be able to pin-point areas in which he thinks the system is liable to be weak. From this, if the system is in a modular form, revisions of the knowledge base and inference engine can be made to make more accurate deductions. The revision of a knowledge base can be a considerable problem as the needs for consistency and completeness have to be adhered to or else inconsistent deduction problems can follow. If the rules use weighted reasoning then the weights and scope of rules may need adjusting.

The process of revising the performance of a system is an iterative one which may involve refinements of the rule base, perhaps redesign of the representation used, or even reformulation of the entire system, if the selective refinements do not solve the problem. The use of formal specification techniques may help in the initial stage and also help keep consistency under control, but because of the subjective nature of the domain experts knowledge this is not an easy task to perform.

The development of an expert system is not therefore a straight forward process but one which involves a lot of careful steps which should not be rushed. The processes also involve a considerable amount of creativity and understanding on the part of the knowledge engineer as many situations they encounter will be new and domain-specific. However, as the last few years have shown, research into expert systems tools, environments and formalisms has been increasing rapidly permitting far more advanced systems with increased explanation facilities to be developed over a wider spectrum of domains.

C H A P T E R T W O

Problem Domain

2.1 Problem definition

The problem was specified as being the creation of an expert system that could act as an assistant to users who require help with the UNIX operating system, with special emphasis being placed on the synthesis of a knowledge base from textual information.

The aim was to produce a UNIX users assistant which could guide users around the UNIX operating system. The assistant should have enough depth of knowledge to be able to deal with super-users of UNIX as well as flexible enough to aid the novice user to whom a different approach and level of help would be necessary.

During the course of the project, several fundamental questions concerning the problems of building expert systems will be investigated. One of the most important areas upon which conclusions should hopefully be drawn is, how much of the knowledge engineers task is aided by the provision of expertise in textual form, and to what extent additional knowledge acquistion is necessary to confirm, justify, explain and fill in missing or assumed knowledge.

2.2 The identification stage

The first task for the knowledge engineer is to perform the identification stage in the development cycle and to try to understand exactly what problem the system is expected to solve. It is known that the system should be capable of acting as an assistant who is thoroughly conversant with UNIX, however to understand the consequences of this it was necessary for the knowledge engineer also to become familiar with the proposed systems domain, which in this case is the UNIX operating system. This familiarisation process is usually the place where the knowledge engineer and domain expert work together closely so that the knowledge engineer can gain an understanding of the problems reqirments. The domain expert in this

problem being the UNIX user manuals (both on-line and printed) and text books [21], [22].

A useful introduction was gained by studying the "UNIX for beginners" section in the programmers manual. This takes the beginner step by step through from logging on for the first time to executing simple commands, and this gave a good feel for the approach used in the operating system command structure. Having gained this introduction the entire set of UNIX commands was then looked at in order to see the scope of the system and gain an insight into the structure of the

10

commands and the "UNIX shell". As it is useful to compile a lexicon of information this was initiated with the commands investigated at this stage.

Part of the identification stage is, usually, to isolate a small but interesting subproblem area which may be identified and used to focus the knowledge acquisition process. This subsection of the problem can then be investigated in far greater depth and a real understanding of it can be gained. Once this has been achieved the consequences are usually that many useful techniques will have been developed which can then be used in the wider domain, simplifying the process of performing knowledge acquisition upon it. To this end the set of commands was subdivided into groups, for example all of the commands that were concerned with the mail facility were grouped together (biff,binmail,from,mail,mesg,...), this then provided a better picture from which a suitable group could be chosen for further investigation. The group chosen was that for the "chronological" commands, this included "cal" - the calendar command, "calendar" - the calendar reminder command, "date" - the print and set date command and "time" the command that times another command. This group seems to have all the qualities that are needed to represent all the different aspects of UNIX that will need to be understood. For instance, the "date" command is usually only ever used by the super-user whilst the "cal" command is a very simple user command and "time" can invoke any other command if additional ones to be drawn into this investigatory subset of the operating system.

An investigation of the operating system and the subset in particular showed that an assistant would need to be multi-faceted, in that help would need to be in many different contexts and levels.

The assistant would have to cope not only with users of differing experiences and backgrounds but also give help in differing circumstances. For example, 'general* help - this is help of the form that answers general queries such as if the user wants to know "what are the different editors available?", 'command' help - this is help that is concerned with the issuing of commands, for instance a user may wish to know "what is the command to delete a file?" they may then require further help in interpreting how to execute this command as quite often the commands are complex with parameters that are not very well defined - the "sort" command is a good example of a poorly defined command, 'direct* help - when a user asks a direct question such as "is it possible to create subdirectories?" a more direct answer may be prefered than a 'help' answer. The differences between the types of commands is subtle but illustrates that some flexibility and sensitivity will be needed by the expert system in order to give the required output in the correct context.

For example if the user asked

"how can I get more disk space?"

then they should be informed by the assistant to "delete all the files that you do 11

not need" or perhaps "ask the system manager for more disk space allocation" rather than be told to type:

"try rm #"

which if executed would delete all of the users files.

Thus the identification stage has proved valuable in that the operating system domain was looked at and a small subsection located upon which to base an investigation. The aim of the investigation being to discover representations and techniques upon which a large implementation could be based.

The identification stage also showed that a flexible and sensitive system would need to be developed in order to overcome the contextual difficulties involved in coping with a human interface and a complex domain. The use of a textual base to the domain expert did not at this point prove to be of any great hinderance and the advantage of the twenty four hour accessibility was greatly appreciated. The knowledge contained in the manuals however, is not the -most convenient format, and not particularly user friendly, even to users with past operating systems experience. For example the commands are in alphabetical order and only limited cross referencing facilities and aids to finding the correct command are provided. The command mnemonics are also not particularly helpful. These factors suggest that this area is indeed in need of a help system that can produce a higher level of performance than the "on-line" manual which is the only source of advice currently open to the user.

2.3 Comparision of solution methods

Having established the problem definition, selecting a small area of the domain to use as a development and testing base, the next stage was to investigate the possible approaches open that could lead to workable solutions. This investigation showed that the choice was between using a "shell" upon which to develop the expert system, or, to write a system specifically to cover the problem area being considered.

The shell approach would be a quicker method of development where by most of the programming work load would be removed, however, the scope for choice in matters such as the representation of knowledge, the user interface and perhaps even the explanation and justification facilities available would also be removed.

The task of writing a custom program to solve the problem is very appealing in that it allows the developer to choose the best representations, suitable language and other features with which to solve the problem. Unfortunately it has the disadvantage however, of being a time consuming and complex task. A tailor made system, if developed, could be of several types. It would be possible to develop a

12

system that is in the conventional expert system mould and which the shells are based upon, namely an inference engine and knowledge base. The knowledge base would be developed around a representation specifically designed for the task and the inference engine has its deductive, reasoning and explanation facilities all designed to achieve the best from this representation. The second choice is to aim at a system that tends away from "pure" expert systems and towards the natural language processing part of the artifical intelligence spectrum. Here again there are many choices, it would be possible to develop a system that is based on "matching" techniques to produce an advisory assistant that could hold "conversations" with the user in an ELIZA-like way [23], [24]. The more advanced NLP techniques could also be used. For example utilising transformational grammars [25], case frames [26] or even augmented transition networks grammars could be used [27], [28]. These parsing and natural language techniques would allow the user to ask more specific questions than those allowed under the rigid shell framework, however the development time would be far greater for such a system.

Thus there are several approaches open, the options being:

!IX a s s i s t a n t

SHELL CUSTOM DESIGN

TRADITIONAL EXPERT SYSTEM

APPROACH NLP

MATCHER PARSER

HYBRID SYSTEM

13

Each solution has its own particular advantages and disadvantages. The most interesting development is that of a "hybrid system" - a cross between a traditional expert system and a natural language processor. This would be the most suitable answer to this problem domain in that an interactive "language rich" system rather than a deductive one is desirable.

In an attempt to gain understanding about the problem domain and to produce an actively working system it was decided that an expert system shell would provide the most appropriate starting point and if this proved too constrictive it could be used as a prototype for the investigation of a custom design.

14

CHAPTER THREE

A Shell Based Approach

3.1 Architecture of an expert system.

USER

LANGUAGE

PROCESSOR

PLAN

AGENDA

SOLUTION

BLACKBOARD

FACTS

AND

RULES

KNOWLEDGE

BASE

JUSTIFIER

INTERPRETER

SCHEDULER

CONSISTENCY

ENFORCER

The diagram above shows the components of a hypothetical expert system [16], such a system does not exist but each expert system has one or more of the components in its composition.

The user interacts with the system through a language processor. This may be

15

via an "object orientated language", menu driven displays, a semi natural language interface or even an attempt at a full natural language dialogue. Graphics and other input/output tools may also be used. Typically the language processor would parse and interpret the users questions, commands and input; it may also perform the inverse role in taking the information generated by the system and produce output in a user orientated format.

The blackboard records intermediate hypotheses and decisions that the system manipulates during its deduction processes. The blackboard area is divided into three parts. The plan contains information which aids the system to direct its solution. The agenda records the potential actions that are awaiting execution. This includes rules and ideas that seem helpful in developing the solution but have not been fully pursued but may be useful later on. The solution part of the blackboard

- contains candidate hypotheses and decisions that the system has generated so far and the links that mark their interdependencies.

The scheduler controls the agenda in relation to the planned direction of solution. The scheduler needs to have intelligence as it has to estimate the effect of applying different actions to the current situation before making its selection.

The interpreter then executes the chosen agenda item. The interpreter will also update the blackboard by placing on it any relevant information that the action produced during its execution.

The consistency enforcer attempts to maintain a consistent representation of the emerging situation. This may be in terms of truth maintainence procedures wherein the solution elements represent logical deductions and their truth value relationships.

The justifier explains the action of the system to the user. It is usually able to answer questions from the user as to why a certain conclusion was reached and another not. This procedure usually involves an edited trace back through the solution path. It is the justifier who has to collect all the intermediate steps and interpret them to answer the user's questions.

The heart of an expert system is the knowledge base which is a collection of facts, rules and information, consulted by the rest of the system in the derivation of solutions.

16

3.2 The concept of expert system shells

From the idealised expert system a simpler model of a more realistic system can be drawn:

USER

INPUT

OUTPUT

PROCESSOR

,

INFERENCE

UilSC

KNOWLEDGE

BASE

This generally only involves three parts. The input/output processing which can be at any level of sophistication from a query language through to a semi-natural language front end. This interprets the user's questions and passes them on in a suitable form to the inference engine which generally uses a fixed plan and agenda to obtain its solution based upon facts contained within the knowledge base.

An example to this type of approach to expert system construction is MYCIN which uses an inference engine plus a knowledge base in its design and the system reproduces expert level performance in the domain of infectious diseases.

If the knowledge base from MYCIN was to be removed - in that the knowledge base and inference engine are independent separate entities - what is left is EMYCIN (standing for Empty MYCIN) an inference engine with no knowledge base. Thus was developed the idea of having one inference engine, being able to attach to it knowledge bases about any possible domain, this is a very appealing idea, and is the basis of the expert system "shell" concept.

17

EMYCIN has in fact been used in this way with several knowledge bases. The most famous example is the knowledge base on pulmanary disorders which has produced an expert system known as PUFFJ29]. Theoretically therefore it could be invisaged that if the inference engine is truely general purpose then it should be possible to combine this with a knowledge base on any domain, resulting in a competent expert system in that domain. Unfortunately this is not yet the case.

It was possible to remove the original knowledge base from MYCIN and add the new one that resulted in PUFF; however both these knowledge bases are essentially concerned with medical diagnosis and treatment, which allows the knowledge in both cases to be represented in similar ways i.e, in a form suitable for EMYCIN to understand, the same inference engine being applied to both of them. If the knowledge base of the DENDRAL expert system is considered, the knowledge representational structures are found to be completely incompatable with those for the EMYCIN knowledge bases. DENDRAL is designed to be an expert in the field of analysing molecular structures for chemists on the basis of mass spectrometer and N.M.R. readings. The knowledge base is expressed in terms of directed graphs to represent molecular structures and the inferencing is carried out on the assumption that there are millions of molecular structures. It can be seen therefore that removing the current knowledge base from DENDRAL would not allow a knowledge base in the medical diagnostic field to be added to it and conversely EMYCIN will never be able to deduce molecular structures for chemists.

The overall result is that expert system shells do, in general confirm in theory to the idea! expert system shell - but only in a limited sense. The general principle is that the powerful inference engines are only suitable for knowledge bases with very similar domains to the for which the expert system was originally designed. The more general inference engines are only general due to a weakening of the reasoning processes caused by having to be able to adapt to alternative situations if the need were to arise. This does not imply that the shell idea is not a useful one, for shells have provided the developer with a means of creating a system in a cheaper and quicker way than that of custom design but with many of the advantages. For example the developer will not have to re-implement the complex and time consuming inference processing, tracking and explanation facilities every time they create a system for a different domain.

The shell based development environment has become a very popular vehicle for constructing expert or prototype systems and there are many of these semi custom development environments available. For example GEC have developed APEX 3 [17] a shell for building fault diagnosis expert systems which is derived from PROSPECTOR'S inference engine, as is MICRO-EXPERT developed by ISIS Systems Ltd. [17]. Timeshare U.K. has a shell called REVEAL [17] which features the ability to buiid up a logical structure of statements of facts and to use fuzzy set theory to allow inquiry on a broad basis rather than within a precise range of parameters. These and other shells can be combined with a selection of expert system tools such as AGE [18], a software tool that aids the design, construction and testing of a variety of of frameworks of knowledge based programs to reduce the construction time of an expert system, making the use of prototypes a feasible

18

option. If no tools or shells are available a system could be custom built from scratch using languages such as OPS-5 [19], LISP, PROLOG or even pascal. On existing hardware in a normal computing environment, however, this is a far more complex process.

3.3 Shell selection criteria.

Having defined the problem domain and having chosen to investigate the applicability of developing an expert system through the use of a shell, an assessment of the commercially available shells needed to be undertaken. In order to guide this assessment various aspects needed to be taken into consideration. These considerations can be partitioned into two fairly broad categories. Firstly those conserned with technical issues and secondly those factors which Hendrix calls "human engineering issues" [30].

Technical issues cover areas such as systems performance, robustness, versatility, ease of development and maintainence. Performance factors are important when considering how well the system will run under a given load, so the question to be asked is - does the shell have adequate capabilities for solving significant problems using acceptable recources? This is a very difficult question to answer without previous exposure to the given system.

A feature which is desirable in any computer system is, that it is robust in the sense that it is tolerant of data errors and omissions. The shell should be able to cope with problems of vague, incomplete and sometimes unreliable data by degrading gracefully if the data is inaccurate or missing. In looking at available shells a point of consideration was that small errors or unknown data should have a proportionally small effect on the outcome and not lead to gross changes in conclusions drawn.

A shell must be able to cope with a range of problems and decisions without its generality degrading the systems inferential power excessively. Many shells, however are not as versatile as their developers claim, in that they allow one to choose amongst a set of alternatives but little else. Problems require much more complex solutions than simple choice, consequently many shells make use of the implementation language to allow the users to write his own features on top of the existing ones. This .can be done for instance by having, say, a PROLOG interface to the shell for "external processing", however this increases both the cost and the technical skills required. It also loses many of the advantages of having self contained knowledge bases, indicating deficiencies within the inference engine itself.

A significant factor regarding the use of a shell, is the ease with which it facilitates development and maintainence. The complexity of the knowledge representation language from which the knowledge base is built needs to be examined ascertaining how well it will match a representation for the domain that for which it is to be used. Careful analysis of the structure of the task needs to be undertaken and whether any helpful models are available for the user to implement

19

as an intermediate step if necessary between the domain representation and the knowledge base.

Maintenance of the system's knowledge base can cause problems, the provision of any tools or environmental aids should be investigated and studied, assessing how the useability, intelligibility and accountability are handled by the system. The system should be easy to use and flexible. This is reflected in whether the system is menu driven, consultation driven or gives the user a choice of interface. The user should be able, if possible, to have as much influence over the flow of control as possible. Most shells tend to be dictatorial in their interaction with the user, advantages would be gained if the user could volunteer information, change information already given, or redirect the system if necessary.

An important aspect of the user interface is the way that it presents knowledge to the user. Shells store knowledge in a technical format or in a knowledge representation language that is highly syntactic and difficult even for trained experienced users to understand. Some systems like EMYCIN have overcome this problem by translating the rules into acceptable English for the developer to review. Another area of importance is the way that the shell handles uncertainty. A common technique is for the shell to say that the fact is unknown unless its truth value is "true", more sophisticated systems however, like EMYCIN, allow rules to have weighting factors or to allow terms to indicate the level of confidence a factor has.

The third factor that is important from the user's point of view is that of accountability. The resulting conclusion of an expert system is not a truly valid one unless the deduction can be justified. One way for the system to do this is for the system to be able to explain why it has done something and to trace through its reasoning processes, accounting for each step as necessary. This is not only true of the system being able to justify its final conclusion to the user but also in the development of the knowledge base. It is very important that the developer be able to examine the system's inference process and alter the causes of poor reasoning. Other features, apart from the standard explanation, are desirable; these include being- able to ask "why do you require this information?" which will force the system to inform the user of the system's future intentions, and questions such as "what if" (the data were different?) which aid the user to explore the sensitivity of different findings and look down different avenues in tackling the problem.

Having drawn up these guidelines for the selection of a shell the next step was to examine several of the shells that are commercially available. After an initial survey it was decided that three shells were worth further investigation: Expert-Ease, ESP/Advisor and Apes.

3.4 Conclusion to shell selection.

Having considered the three packages it was decided that although each package has its own strengths the ESP/Advisor package with its "text animation" facility

20

would be the most suited to achieving the goal of transfering information from the UNIX manuals into an expert help system.

The Expert-Ease package provided a useful mechanism to build knowledge bases from general instances of rules using Quinlan's ID3 algorithm [31], which is a useful technique that could be utilised further in future systems. The shell is an important one in that it is the first to use this method of rule creation and by being witten in pascal gives a far better responce time than the logic based systems.

Overall Expert-Ease would have made an interesting development environment but due to the system being suited to a deductive rather than advisory it was felt that the it was not the most appropriate.

The Apes system was found to be a very good one in that it has a very powerful deductive system, coupled with modules that give considerable flexibility to the system. The package of tools of which the shell is composed was most impressive and covered all of the necessary areas for developing a system; for example it has the capabilities for automatically generating dialogue with a natural language syntax that allowed "restrictive" queries from the user. It also provides the Apes user dialogue facilities i.e. validity constraints and consistency checker which provide a sound basis upon which to develop a system. The Apes system was released in 1983 and has proved [32] to be a reliable system with a wide variety of applications.

The Apes shell was not selected for the the development shell as the syntax of the prolog base proved to be more conduceive to developing deductive systems than those handleing large amounts of text. This is not to say that the expert help system could not be developed using Apes but merely that it was a less natural representation than that of the ESP/Advisor shell.

21

3.5 ESP/Advisor

The ESP/Advisor package is developed by Expert Systems International, of Oxford and is specifically designed to cater for an area of the market that concerns the creation of knowledge bases from text. This process is known as "text animation". This means that the rules, knowledge and advice present in a piece of text can be represented in a precise and formal way using a "knowledge representation language" to create a knowledge base. The knowledge base can then be "animated" (queried) by the consultation shell to provide an interactive consultant for the text involved. It is aimed to produce knowledge bases that capture the kinds of informationheld by manuals, guide books, legal documents etc. The shell itself is composed of three parts. The knowledge representation language (KRL), the KRL compiler and the consultation shell. These fit together in the following configuration:

TEXT

KRL

KNOWLEDGE

BASE

KRL

COMPILER

COMPILED

KNOWLEDGE

BASE

CONSULTATION

SHELL

USERS

In order to start answering the questions raised in the second and third stage of the development cycle, the conceptualisation and formalisation stages, it was necessary to investigate the shell further, its control structures, its facilities and how best to utilise its interactive capabilities.

It was decided to use as an example domain for this "familiarisation period" the problem of setting up a new user to the O.U.C.S (Oxford University Computer service) The system should say which machines the user be allowed to use, what usernames and initial passwords they should be given etc. It was hoped that this would prove useful in that the developer would be allowed a "dynamic" rather than "static" view of the shell (without having to get tied down in the structural details

22

of a complex domain). After having worked in this "familiarisation domain", the conceptual model of UNIX, when developed, could be done in the full light of any representational or structural difficulties encountered with the shell and its environment.

3.5.1 The Knowledge Representation Language.

The ESP/Advisor KRL is used for specifying the knowledge needed by the consultation shell. The language is used to express rules, regulations, conditions and advisory text in an "English like" syntax. The aim of KRL is to express the textual base in a form that models the original document as closely as possible.

A KRL knowledge base may be divided into sections each of which may contain three kinds of object: paragraphs of text, section references and quit points, any of which may be associated with a condition. The objects describe actions to be taken by the consultation shell i.e., display a paragraph, use a different section, or quit the current section. If the object has associated with it conditions, then the actions it describes will only be taken if the condition can be shown to be true. Objects which are not conditional describe actions that will always be taken.

Conditionals, the mechanism that drives the system in its backward chaining mode of operation, are evaluated as soon as they are encountered. The parameters (which are unknown quantities whose values are supplied by the user during consultation), occur within each condition and are investigated when they are needed in order to evaluate the condition. The investigation of a parameter may involve evaluating an expression - which may involve the investigation of other parameters or asking a question. The chaining mechanism is automatically handled by the shell. The shell incorporates a redundency mechanism in that if it is called to evaluate the same parameter twice, then it will remember the parameters value from the first instance and will not evaluate it the second time nor ask for it from the user.

A full description of KRL is given in [33] however a few examples are presented here for clarity.

There are four types of parameters used i) factual, ii) number, iii) category, and iv) phrase.

Factual parameters ask a question to which a boolean answer is required from the user and this is then associated by the system with the parameter name, i.e.

log in: 'this tests to see if the user has ever logged in' fact explanation

'To determine whether the instructions regarding'^ 'logging in need to be specified to the user*

askable 23

'have you ever logged in?'.

The parameter is held in a self contained definition which includes other information such as text to aid in the explanation of that parameter and a general description of the parameter. The three other parameters follow a similar format of description, explanation and query.

An example of the number parameter is

student year: 'year of study' number explanation

'This parameter tests to see which computers'^ 'undergraduates can use*

range 1..4 askable

'which year are you in?'.

Which is very similar except instead of a boolean input being accepted a number in a pre-specified range must be input.

The phrase parameter is of the form:

username: 'the users name* phrase explanation

'The users initials are needed in order to composed 'the users login code'

askable 'what are your initials?'.

which allows text to be input and held under the parameter name, this can be used and output if desired later.

The category parameter is used to offer a selection facility to the user and is of the form:

user type: 'type to be registered under' category explanation

'This parameter tests to see for which type of user the' 'system should be set up for*

options dphil - 'D.Phir,

24

msc - 'M.Sc', ba - 'B.A'

askable 'what type of user are you?'.

The options from which the user can select, 'D.Phil, M.Sc, B.A', are then displayed on the screen as a menu to which the user is prompted to respond.

The parameters are used in the conditional statements. The simplest form of conditional is one that outputs a textual answer once all the parameters are satisfied, i.e.

{user type = ba and student year = 4} 'The fourth year undergraduates have access to VAX3'

The way that a problem is usually decomposed is by use of sections that modularise a certain aspect of i t For example, the domain to set up new users was decomposed into three sections. The first was "getting started" which decided the users needs, i.e. if they needed to be told about the process of becoming a registered user or whether they needed to be told how to log in to the machine, the other two sections "reg user inst" and "log in, inst" dealt with these two processes.

Once a control sequence has been decided it is implemented by using conditions in conjunction with section references i.e.,

{not reg user} reference reg user inst.

This then is the basis of the knowledge representation language in which the knowledge base is implemented. However the KRL program is not used by the consultation shell directly but is compiled into a more suitable format.

3.5.2 The KRL compiler

The KRL compiler which could be more accurately described as a pre-processor, translates the KRL program into a format that the consultation shell can use more easily, this being a form of Prolog-1. The compiler does have a secondary role in that apart from performing a syntactic check of the knowledge base, it performs consistency checks of the parameters and sections.

Other features that the compiler gives is the ability to list out a summary of the knowledge base sections and parameters either to a file or a printer. Suppression of the consistency check is also available as this reduces the compile time considerably, however this is not a recommended.

25

3.5.3 The consultation shell

The consultataion shell is the part of the ESP/Advisor package that "animates" the knowledge base text allowing the expertese and information to be inspected and used through an interface. As described in section 3.5.1 the dialogue is directed in response to the structure of the sections and ultimately driven to solving the parameters of each conditional statement. To do this the "askable" questions for each parameter get displayed on the screen to which the user has to respond with a suitable command. When the system has reached its conclusion to the problem it then outputs the advisory text.

The consultataion shell has several tools to help the developer construct the system and the user to run and understand the reasoning process. The screen area of the shell displays five types of information to inform the developer of the status of the system. Two types are by default - GOAL and SECTION, the other three -TRACING, PRINTING and LOGGING are by request.

GOAL - displays the name of the parameter which is currently being interrogated by the shell.

SECTION * displays the name of the current section of the knowledge base through which the system is working.

TRACING mode - displays each new goal of investigation and reports whether each goal succeeds or fails.

PRINTING mode - This allows any information or advice that is displayed on the screen to be sent to the line printer.

LOGGING mode - this enables a log record of the consultataion session to be sent to a file and records everything which occurred during the session whilst the logging mode was active.

If the user requires help in understanding the current question or needs an explanation of what options are open, the shell provides a help system. This lists out the available commands most of which are self explanatory but amongst the most useful are:

EXPLAIN, which shows what information a question is trying to establish.

WHY N, which is used to explain why the system is trying to establish a fact.

HOW, gives the line of reasoning that has been used to establish a fact.

A useful feature is the SAVE command which allows the user to save a consultation as it stands and return to it at a later date. This then allows the user to research a particular point and then return to the system at a later date without

26

having to re-enter any data.

3.6 Conceptualisation stage

Having developed a small system for the domain of adding new users to a computer it was felt that enough experience had been gained to be able to proceed on to the conceptualisation stage of the UNIX domain problem.

The conceptualisation stage involved consideration of the problem area and refining it one stage further towards an implementation but stopping prior to deciding upon any data structures. The way that this was approached was to consider firstly what types of data were available to be used. The answer to this was that any textual information on UNIX could be used as a potential data source. So there was a large amount of information available upon which the system could draw. The type of data then becomes the next concern; is the knowledge directly represented in the data (the text in this case) or does it need to be inferred from one or more data sources. The answer to this depends upon the types of query posed by the user that the system is trying to solve. If the query were to be of the type "what is the command to copy a file?" then this could be looked up from the manual directly, however if the query were "how do I print a page from the manual on the laser printer?" then more information is inferred i.e., that the user understands "pipes" and "I/O streams". Thus the problem of inferred and contextual information has to be noted and catered for.

The conceptualisation stage would normally involve repeated interactions between the knowledge engineer and the domain expert in an attempt to analyse problems and define solution methods. In this problem however, the domain expert is "static", in that the textual sources can only be used to provide limited examples; it is for the knowledge engineer to interpret these in a wider sense than in the normal knowledge engineering process. One method developed for identifying the processes and constraints imposed on a problem solution was to ask questions that related a subproblem of the UNIX domain to a similar problem in a domain where the solution is known i.e., "on the VMS operating system one types Mir* to list a directory, what is the command on UNIX and how do the concepts relate to one another?" this method proved to be useful.

The conceptualisation stage was felt to be a far more difficult process to undergo with textual domain expertise than with a human expert who could provide a more stimulating interaction. It is easier to miss subtle points, misunderstand ambiguities and wrongly interpret inter-relationships with text than when they are correctly explained by a human tutor who is aware of the knowledge engineers limited level of understanding and grasp of the points the expert has made. Elaborations and clarifications can then be explained to ensure that an adequate understanding has been obtained. It is therefore, probably the case that the conceptual level reached from text is not going to be as high, initially, as when a human expert is available. However, once the "conceptually simpler" system has been implemented - possibly in less time than the more complex "human expert/knowledge engineer" system - it is most likely that the knowledge engineer will then quickly become more proficient,

27

and, upon the reformulation of the conceptualisation stage a greater amount of time will be spent on increasing the depth of the system's knowledge. Overall the level of performance reached could be equal. Further work requires, however to be done in this area to substantiate the idea.

28

C H A P T E R FOUR

Conceptual model

The implementation of the small "familiarisation" domain in ESP/Advisors KRL was an extremely useful and benificial exercise, bringing to light the following points:-

1) It showed the structures allowed by the knowledge representation language from which the knowledge base could be created; then if traced back another stage these are the structures that the domain would need to be decomposed into, in order to construct a solution in that knowledge representation.

2) How the structure of the interaction and how the input/output could be manipulated in order to acieve the best results. The interactive qualities did however prove to be very disappointing; in respect of their "conversational" capabilities when used in a domian which is not purely deductive. If the domain had been fault diagnosis or queries about house conveyancing where the action sequence is "linear" then the mode of dialogue would be very suitable but when used in a conversational mode when the initiative needs to be user driven, rather than system driven then difficulties do arise.

3) The prototype development showed the lack of an adequate tool with which the system developer can structure the design of the knowledge base.

The problem is again amplified by the non-deductive non-linear nature of the problem. If the problem was linear then the developer would have strong indications about where to start in designing the system but with a highly interactive help system the dialogue and knowledge base need to be very open ended. The developer is therefore faced with two problems, i) A large amount of information which needs representing in KRL to from the knowledge base, ii) The non availability of structuring tools with which the problem could be modualarised.

The next logical step was to develop a "conceptual model" which would aid developers to map the problems from the domain concepts into the representation language of the knowledge base. This task is undertaken in the formalisation stage of the development cycle.

4.1 Formalisation stage

The solution of the problems put forward in the conceptualisation stage is the task of the formalisation process, which attempts to map the concepts and information isolated during the previous stages onto a more formal representation;

29

the result of which is the specification of the knowledge representation used to hold the knowledge base. At this stage a model to span between the conceptual and implementational levels of development should be built.

The lack of any tool with which to structure the knowledge base is a major problem for the developer. Normally in a deductive system the developer can quite simply map the domain expert knowledge into a series of production rules or similar representation which can then be easily mapped into an appropriate implementation, especially if the system is using a convenient production system language. This is why knowledge bases are quite often called 'rule bases' and expert systems 'rule based systems', where rules are used to compose most of the knowledge base and can be added/deleted/modified in order to improve the performance of the system. The domain, however in this problem is slightly different from normal, in that it is not wholely composed of action condition pairs so a production system based representation would not be the most appropriate way in which to express the knowledge. Therefore an entirely new model which is to be known as the "conceptual model" had to be developed.

4.1.1 A walkthrough of the conceptual model.

The model is based upon the two constructs of the K.R.L namely the section and the paragraph. It was decided that in order to add to the modularisation of problems that the two constructors should be modeled by "frames" which contain enough information to allow the desired section or paragraph could be designed or implemented. A full BNF description for this model is given in appendix A, we will attempt here to describe how the model is consnstructed by walking through the BNF, giving examples where necessary.

The highest level is the section and is designated by the B.N.F definition:

<section-spec>::= <section-name> <desc-spec> <section-body>

This can alteratively be defined by the frame notation:

-<sect i on-name>-

<desc-spec)

<section-body>

It should be noted that these are not Z or schema definitions [34].

30

The <desc-spec> is defined by:

<desc-spec> : : - desc: <s t r i ng>

and is used to supply a description of what the section is trying to achieve, an example of a section definition with <desc-spec> filled in is:

-sect ion-1-

desc: Th is t e x t descr ibes the r o l e of the sec t i on

<sect ion-body>

One of the ideas behind the module frame is that the knowledge engineer can develop it in stages enlarging upon one part at a time, for instance in the "section-1" example above the engineer could develop the system at the top level defining all of the sections in the next level and then fill in all the descriptions of each subsection or alternatively concentrate on developing one particular section in full at a time.

The next logical step in the section-1 frame is to fill in the <section-body> part:

<sec t ion -body> : := <pred> =$ ref <sect ion-spec>j

<pred> =4 <text-gen>

ref <sect ion-spec> is just used to "reference" another section if the predicate is true and <text-gen> refers to the generation of advisory text to be output. This text is contained in a special text paragraph:

<text-gen>::= <text-name><text>

or in frame format

<text-name>

<text>

31

Expanding this further

<s i mp 1 e-st r i ngXnumber >

<text>

Thus the text is held in the following format

text !

This is the text that w i l l be output i f the

predicate re fer ing to i t is t rue,

The format of the frame templates should be explained. A section frame consists of two halves with its top line broken to allow a section title to be inserted. A paragraph frame is a similiar but without the top line having any break in it. The double line at the top of the frame is used to indicate that the parameter is a special type of paragraph - a text paragraph used to hold the advisory text that will ultimately be output. The use of these frame templates allows the different types of entity to be easily destinguished thus aiding the developer to breakdown the problem into different levels and the implementor to recognise these levels.

Recall that the basis of the <sect ion-body> is thus the production rule,

<pred> =^ "action to be taken if true"

read as "evaluate predicate and then perform the action if true".

It is now necessary to consider the construction of a predicate. The predicate is defined by

<pred>::= (<paragraph-spec>? | (<paragraph-spec><operator><act ion>)? \ {(<paragraph-spec>)? <operatorXpred>}*

which entails evaluating the specifiction of parameters.

32

The simplest type of parameter is that of the "fact" type. That is a boolean predicate, the value of which needs to be established. For example

- t i t le-UNIX help f a c i l i t y

(beginner)? =#• ref ge t t ing-s ta r ted

This section frame; taken from the UNIX domain, has a single production rule which involves the predicate "beginner", this being placed inside some syntactic sugar to emphasise in a designer-friendly way that this is a predicate. The emphasis once again is that the design is by "development templates" that the designer can just decide that they wish to have a predicate beginner that when true causes the section "getting-started" to be referenced but they need not develop the structure of "getting-started" until later until after "beginner" has been developed. To ensure that the conditional part has a correct relationship to the action it is advised that the predicate is developed first. Returning to the "beginner" predicate, as this has no operators or other predicates involved it must be a boolean or factual predicate which are defined by a fact parameter paragaraph.

<para-type>

<body-type>

developing one stage

<para-name><lype>

<fact-body>

which can be developed further

33

<pararneter> [<type-body>]

<desc-spec>

<expl-spec>

<query-spec>

then

<para-name> [fact]

desc: <string)

expl: <string>

query: <text>

It is now a simple exercise to fill out the body of this parameter paragraph frame. The <para-name> we have already decided is "beginner" the predicates name, the string following "desc:w is a string describing the parameter in general terms, the string following the "expl:" is used to give an explanation of why the parameter is needed and the text following the query is the text used as a question that the user has to respond to.

The completed frame looks like:

beginner (Tac t ]

desc: you are new to t h i s o p e r a t i n g system

e x p l : t h i s i s to determine the leve l of he lp necessary

query: are you a beginner w i t h UNIX

In this development, the <type-body> which becomes [fact] is used as a quick reminder to the developer of what type of parameter the frame contains and is in practice an extremely useful aid to development.

34

The second parameter type is category. This follows, similar lines to that of factual parameters. Category types allow the developer to allow a parameter to be equal to range of options. Another example taken from the UNIX doimain is.

-chrono1og i ca1-funct i ons-

desc: This section deals with the calendar, date and time funct ions.

(option-16 = cal)? ^ ref sect ion-16,1

(option-16 = calendar)? =^ ref section-16.2

(option-16 = date)? =$ ref section-16.3

{option-16 = time)? =4 ref section-16.4

(option-16 = expanded-format)? =# ref section-16.5

(option-16 = none)? =4 qui t

Where option-16 evolves in a similar way to a boolean parameter.

<para-type>

<body-type>

developing one stage

<para-name> <type>

<category-body>

which can be further developed

35

<para-narne> [<type body>]

<desc-spec>

<expl-spec>

<options-spec>

<query-spec>

then

<para-name> [category]

desc: <string>

exp]: <string>

options: option {.option}'

query: <text>

This can then be filled in with the desired information. The para-name being option-16, the description and explanation strings fulfilling the same roles as they do in the fact parameter frame as does the query text. The

options: option {, option}"

syntax giving a list of the possible options available to that parameter.

36

The category frame for the section-16 frame then becomes:

o p t i o n - l G [ ca tego ry ]

desc: the ca lendar , date and t ime f u n c t i o n s

e x p l : t e s t s to see which f u n c t i o n s from the ch rono log ica l

ones i s to be used

op t i ons : c a l , ca lendar , date, t ime, expanded-forrnat, none

query: please s e l e c t the f u n c t i o n you are i n t e res ted i n .

The third type of paragraph is the number parameter which can be developed in a similar way. For example if a section had as its predicate

- reg-user- i net

desc: t h i s s e c t i o n deals w i t h becoming a r e g i s t e r e d user

(use r - t ype = B. f t . )? A (s tuden t -year = 4)? =#• t e x t l

where the first part of the conjunction is of a category type and the second is a number type. Then this works in exactly the same way 'as category except that instead of explicitly stating each option an abbreviation is allowed which restricts the category type to integers between given limits. For example:

s tuden t -yea r [number]

desc: year o f s tudy

e x p l : t e s t s to see which computers undergraduates can use

range: 1..4

query: which year are you i n

which could be written in a category frame:

37

student-year [category]

desc: year of study

expl: test to see which computers undergraduates can use

options: 1, 2, 3, 4

query: which year are you in

However if the range is large, say 1..9000, this could be a problem and so the seperate number frame was introduced.

The fourth parameter type is the phrase parameter. This allows the user to input a piece of text and have it assocated with the parameter name of the given frame. The phrase frame is built up in the same was as the other parameter frames. An example of a phrase frame taken from the familiarisation domain is when the user is asked for his initials to enable the system manager to compose a username i.e.,

username [phrase]

desc: the users name

e x p l : the users i n i t i a l s are needed i n order

to compose the users l og i n code

query: what are your i n i t i a l s

The paragraph parameter frames form a set of building blocks upon which the predicates can be constructed and the sections based, by developing paragraphs in this way the frames can act as templates to ensure that no information that requires to be associated with a parameter will be overlooked.

It is now necessary to expand upon what happens when the predicate portion of the section contains more than one predicate. Basically this can be split into two cases depending upon whether the predicates are factual or non-factual.

38

Factual predicates (as in the section frame "getting-started")

-get t i n g - s t a r t e d

desc: this section enables people to get started with UNIX

(not reg~user}? =^ ref reg~user-inst

(not reg-user)? A (not log-in)? =#• ref log-in-inst

(not reg-user)? A (not log-in)? A (not cont)? =^ textl

(reg-user)? A {not log-in)? =^ ref log-in-inst

(reg-user)? A (not log-in)? A (cont)? =^ ref system-commands

(reg-user)? A (not log-in)? A (not cont)? =^ text2

(reg-user)? A (log-in)? A (cont)? =4 ref system-commands

(reg-user)? A (log-in)? A (not cont)? =4 text3

The rules get investigated in the order in which they are placed in the list. In "getting-started" the first parameter that will be established is "reg-user", if this is false then "reg-user-inst" will be referenced and this path followed. When the path for "reg-user-inst" is exhausted then control will return to the point following the rule that "fired" and the second rule will try to be established. The value of the "reg-user" parameter having already been established, it will not need to be investigated again and "log-in" will need to be solved, with the second rule being fired if "log-in" proves to be false. This process continues until all of the rules have been investigated whereupon the consultation ends.

The second type of rule firing mechanism is for non-factual predicates such as category rules. If the section frame for "chronological-functions" is reconsidered:

-chr ono 3 og i ca 1 - f unct i ons-

desc: This s e c t i o n deals w i t h the ca lendar , date and t ime f u n c t i o n s .

( op t i on -16 = c a l ) ? =^ r e f sec t i on -18 . i

( op t i on -16 = ca lendar )? =#• r e f s e c t i o n - 1 6 . 2

(op t i on -16 = date)? =# r e f s e c t i o n - 1 6 . 3

(op t i on -16 = t ime)? => r e f sec t i on -16 .4

(op t i on -16 = expanded-format)? =£• r e f s e c t i o n - 1 6 . 5

(op t i on -16 = none)? =£• q u i t

The way that this is evaluated is for the "option-16" query to be asked and then for each predicate to be matched against the answer. In this case there is only one possible match and if none is found to match then the question is re-stated.

39

There is however, a problem with this type of section frame that once a parameter has been assigned a value, this assignment cannot be changed, the aim is to reduce the amount of questions asked by eliminating repeated and redundant questioning. It is felt that this is a serious limitation in the KRL design as it inhibits menu driven systems from being constructed. If the user wishes to examine the "calendar" funtion and references "cal" in "section-16.1" by mistake he would be unable to re-select via the option parameter "option-16" the correct section, which is, "section-16.2".

Thus this is a problem forced upon the model by the implementation and not a fault of the model. If the feature to allow re-assignment of parameters were to be introduced at a later sage, perhaps as a switch perhaps that could be set, then the model could still be used. However, the set of assumptions upon which it is based would need to be changed.

One point that the selection of rules raised was that of "case analysis" and made consideration of decision structures over time necessary. This analysis uncovered some redundent questions, for instance in the section frame "getting-started" the first thing to check was that the user was registered or not. If he was registered, the system could pass on to whether they had logged in before if he had whether he wished to have help on UNIX system commands. If he had not logged in but was registered then he would need to be informed of how to log in before being consulted upon whether further help was desired on the system commands. If the user was not registered then he would need information on how to become registered before being asked if they had logged in. If the user had never been registered then there was no point in asking about logging in, so making this question redundant.

40

A useful tool for finding redundant questions is to draw a decision tree which shows the structure of the questions. For example:

REG-USER?

YES NO

PREUIOUS-LOG-IN?

YES NO

REF

LOG-IN-INST

CONT?

YES NO

CONT?

REF

SYS-CMDS

"TEXT3"

REF

REG-USER-I'NST

PREUIOUS-LOG-IN?

YES

YES

1 '

REF

SYS-CMDS

NO

'

"TEXT2"

REF

LOG-IN-INST

CONT?

YES NO

REF

SYS-CMDS

"TEXT1"

shows the structure of the "getting-started" decisions. Strings followed by a question mark represent predicates, references to sections are in boxes, advisory text being is in quotes and a star means an illegal decision path which is a cause of redundency.

A further point about the syntax of the model is the use of

@<para-name> and w<para-name>

41

in string items. These refer to phrase and number parameters used in output, for instance when instructing the user how to log in to the system, the system designer would use a frame like "textlO"

text 10

PRG ©username is your username and your password is ©password

switch on the terminal and when the prompt

login:

appears, type your username e.g. PRG ©username

then the prompt

password:

will appear. So type in your password e.g. ©password

you should now be logged in and the prompt

S

should appear.

The designer may, at this stage, require the system to output some text that models what the user will actually experience when they login. So rather than use some imaginary general username as an example i.e. "PRGXYZ" the designer may wish to use the exact username that the user will input. Usernames being composed of two halves "PRG" (which designates that the user is in this case from the Programming Research Group) with the users initials being appended on to the end i.e. if the user's initials are "RTP" then the username is "PRGRTP". It is therefore necessary to obtain the users initials and this is done through a phrase parameter called from within the text frame. In order to designate that the string amongst the text is in fact a paragraph name and also that it should be output along with the text, it is prefixed by an a@" sign. If the paragraph is a number then this is used in the same way, it is prefixed by a K #" symbol. If "RTP" is given as the users initials and "wadham" as the password then "text 10** would represent the following advisory text.

"PRGRTP i s your username and your password i s wadham sw i t ch on the te rm ina l and when the prompt l o g i n : appears, type your username e .g . PRGRTP then the prompt password: w i l l appear. So type i n your password e . g . wadham you should now be logged in and the prompt S should appear . "

42

This completes the walkthrough of the conceptual model and its B.N.F definitions. To summarise, the aim of the conceptual model was to provide a mechanism by which logic of the problem could be developed in an environment which promoted consistency and structure in the design. The model therefore provides a tool for a stepwise development where the concepts and relationships are considered first and the details second. It also provides a structure that should enable the concepts to be entirely implemented in the KRL of the ESP/Advisor shell upon which the system is to be developed. The KRL however has forced a series of assumptions upon the model which do constrict its processes in several ways i.e. the fixed assignment of values to parameters but it is felt that the frames are flexible enough to allow for change and that they could be easily extended or adapted to meet other situations and needs.

43

4,2 A conceptual model for the UNIX domain.

The next step in the construction of an assistant for the UNIX operating system was to perform the formalisation stage for the UNIX domain using the conceptual model and its frame techniques.

The highest point in the design and natural starting point was to see if the user was a beginner or not. If he was then he would be directed into the "familiarisation" domain through "getting-started". Acomplete listing of the whole model is given in appendix B. Thus the top level is:

i—t i t l e -

UNIX help f a c i l i t y

(beginner)? =^ ref ge t t ing-s ta r ted

(not beginner)? =$ ref system-commands

which uses the "beginner" parameter so this is developed next:

beginner [ f a c t ]

desc: tests for registered users

expl : t h i s is to determine the level of help necessary

query: are you a beginner wi th UNIX

The referenced section "system-commands" can then be designed and drafted:

44

-system-commands •

desc: t h i s sec t i on should cover the UNIX commands £ enable

the user to use and se lec t the c o r r e c t one-

(help

(help-

(help-

{help-

(help-

(help-

(help-

(help-

(help-

(help-

(help-

(help-

(help-

(help-

(help-

(help-

(help-

-area

-area

-area

-area

-area

-area

-area

-area

-area

-area

-area

-area

-area

-area

-area

area

area

b a s i c - l o g i n ) ? =4 r e f sec t i on i -empty

boo lean - func t i ons )? =4 ref sec t i on l -emp ty

change)? =# re f sec t i on l -emp ty

da ta -man ipu la t i on )? =4 r e f sec t i on l -emp ty

e d i t o r s ) ? =¥ r e f sec t i on l -emp ty

h e l p - f a c i l i t i e s ) ? =4 r e f sec t i on i -emp ty

hardware-sys tem-ca l l s )? =4 r e f sec t i on l -emp ty

l i b r a r y - r o u t i n e s ) ? =4 re f sec t i on l -emp ty

ma ths - func t i ons )? =4 re f sec t i on l -emp ty

p r i n t i n g ) ? =4 r e f sec t i on l -empty

prog-env-cmds)? =4 re f sec t i on l -emp ty

remote-useage)? =# r e f sec t i on l -emp ty

s y s t e m - c a l l s ) ? =4 r e f sec t i on l -emp ty

tabs)? =4 r e f sec t i on l -emp ty

c h r o n o l o g i c a l - f u n c t ions)? r e f s l 6 ~ c h r o n o l o g i c a l - f u n c t ions

o the rs )? r e f sec t i on l -emp ty

n o t - s u r e ) ? reC sec t i on l -emp ty

This acts as a top level menu from which the user can select the most

appropriate action. In total there are seventeen choices of action including "not

sure" which enables the user to examine each choice in further detail. The

"help-area" paragraph is a category parameter which was designed and developed

as:

45

he lp -a rea [ ca tego ry ]

desc: area of un ix t ha t help i s needed on

e x p l : t e s t s to see which area of the o p e r a t i n g system f u r t h e r expansion

i s needed upon to a id the user

opt i ons: bas i c - 1 o g i n - f u n c t i ons, boo 1ean-funct i ons, change,

data-mani pu la t i on - too Is , edi t o r s , h e l p - f a c i 1 i t i e s - i n - u n i x ,

hardware-sys tem-ca l Is , 1 i b r a r y - r o u t i n e s , ma i1 - fac i1 i t i e s ,

ma ths - func t ions , pr i n t ing, programming-environment-commands,

rernote-useage-faci 1 i t ies, system-cal Is , tabs,

c h r o n o l o g i c a l - f u n c t i o n s , o the rs , n o t - s u r e .

query: in which area do you requ i r e help

Due to the large number of choices and for the reasons stated earlier only the "sl6-chronological-functions" choice was implemented, the rest referencing a dummy section "section l-emptyM.

I—sect ion l -empty 1

desc: t h i s i s a dummy s e c t i o n used w h i l s t the knowledge

base i s being b u i l t f o r completeness and cons is tency

empty body

The section "sl6-chronological-functions" is a category parameter which was designed and developed as:

I—s16-chrono1og i ca1 - func t i ons-

desc: t h i s s e c t i o n deals w i t h the ca lendar , date and t ime f u n c t i o n s ,

( op t i on -16 = c a l ) ? =4 r e f sec t ion~16.1

( o p t i o n - I B = ca lendar )? =4 r e f s e c t i o n - I B . 2

( o p t i o n - l S = date)? =4- ref s e c t i o n - 1 6 . 3

(op t i on -16 = t ime)? =#• ref sec t i on -16 .4

( o p t i o n - I B = expanded-format)? =4 r e f s e c t i o n - 1 6 . 5

( op t i on -16 = none)? =^ q u i t

46

with the "option-16" parameter being defined as:

opt ion-16 [ ca tego ry ]

desc: the ca lendar , date and t ime f u n c t i o n s

e x p l : t e s t s to see which f u n c t i o n from the ch rono log i ca l ones i s to be used

op t i ons : c a l , ca lendar, date, t ime, expanded format, none

query: please se lec t the f u n c t i o n you are i n t e res ted in

Having defined the "option-16" parameter it was then possible to go ahead and design each of the sections referred to by the rules of the "sl6-chronological-functions" section. Due to the nature of the model} these do not need to be developed in any order but were developed top down in this case. Each section being used to illustrate a separate command option of the "chronological-functions" domain.

"Section-16.1" was developed to deal with the "cal" command which is the UNIX command to look at a calendar for a given year and month. Within this section the developer has to decide what kind of information would be of use to the enquiring user. It was decided that he would most probably like to see a listing of the entry in the manual. One of the main reasons for developing this project within the UNIX system commands domain was to try to simplify the task for the user of understanding the parameters associated with running a command. In consequence a useful operation that could be performed is for the system to ask the user questions and then to tell him exactly what the command is that they should type in to the machine. Thus ttsectionl6-l" was designed with these aims in mind.

i—sect i on - lG . i-

desc: t h i s s e c t i o n deals w i t h the " c a l " command

which p r i n t s out the calendar

( f u l l - l i s t i n g ) ? =» t e x t i G . 1

(16.1-example-cmd)? =^ t e x t l 6 . l b

Looking at the first rule, the predicate part is simply a question asking the user if he does want to use the manual entry facility.

47

ful1-1isting [fact]

desc: manual description of the command

expl: gives a full description of the command

query: do you require a full listing of the commands manual reference

If the question gets a positive answer the the manual page is requested

to be printed out. Thus a text paragraph is needed to hold the advisory text.

textiG.1

NAME

cal - print calendar

SYNOPSIS

cal [month] year

DESCRIPTION

cal prints a calendar for the specified year. If a month is also

specified, a calendar for just that month is printed. Year can be

between 1 and 3999. The month is a number between 1 and 12.

The calendar produced is that for England and her colonies

Try September 1752

BUGS

The year is always considered to start in 3anuary even though this is

historically naive.

Beware that "cal 78" refers to the early christian era, not the 20th

century.

Having looked at the first rule the second now needs to be examined. The predicate part simply questions the intent of the user.

48

1G.1-example-cmd [ f a c t ]

desc: to determine whether an example i s needed

e x p l : the user may requ i r e an example

query: do you wish to p r i n t a calendar out

The action part of the rule has to be able to show what the user should type at the UNIX machine and this consists of the command mnemonic "cal" followed by the required month and year. So embedded phrase parameters are used within the command text paragraph.

textlS. ib

type the command: cal I3cal-month @cal-year

Here "cal-month" asks for the required-month.

cal-month [number]

desc: month of the required calendar

expl: a month is needed for successful execution of the command

range: 1..12

query: which month is the calendar for

49

and "cal-year" the required year.

cal-year [number]

desc: year of the required calendar

expl: a year is require for successful execution of the cal command

range: G..9999

query: which year is the calendar for

Thus if "cal-month" = 12 (i.e December) and "cal-year" = 1966, textl6.1b would output

"type the command: cal 1219GB"

Having designed "section-16.1" it was then possible to develop the other sections referenced by the "chronological-functions" section (section-16.2j section-16.3, section-16.4 and section-16.5) in similar ways. See appendix B for a full listing of the model.

50

4.3 Implementation stage

Having constructed the conceptual model for the UNIX domain the next task was to implement this as a knowledge base in KRL. This was not a too difficult an undertaking due to the nature of the models constructs which were deliberatly created to aid the implementation process.

The conceptual models constructs and those of KRL are very similar, the difference being that those of the model are intended to allow the knowledge engineer to develop and design the knowledge base of a system whilst those of KRL are designed to allow the knowledge base to be translated into Prolog by its compiler. One of the underlying aims of the model is that it allows an easy transformation from the frame templates to KRL code to take place.

To show how the different types of frame are mapped to the KRL we will examine an example of each type.

If we again consider the top level section "title", this is represented by the frame:

- t i t le-unix help faci1ity

(beginner)? =£• ref gett ing-started

(not beginner}? =4 ref system-commands

Which is equivalent to the K R L code:

title 'the UNIX help facility'

{beginner} reference getting_started.

{not beginner} reference system_commands.

These entities have a close correspondence as does the predicate

"beginner" in its frame representation:

beginner [fact]

desc: you are new to this operating system

expl: this is to determine the level of help necessary

query: are you a beginner with UNIX

51

and its K R L code

beginner: 'you are new to this operating system' fact explanat ion

'this is to determine the level of explanation necessary' askable

'are you a beginner with UNIX ?'.

The other parameter types also have a close mapping between their model

representation and their coded form. For example the category parameter of the

°chronological-functions"j

"option-16" is defined by the frame:

option-16 [category]

desc: the calendar, date and time functions

exph tests to see which function from the chronological ones is to be used

options: cal, calendar, date, time, expanded format, none

query: please select the function you are interested in

Which represents the following code:

options_lG: 'The calendar, date and time functions' category explanat ion

'tests to see which function from the chronological ones is to ' 'be used'

opt ions

cal - 'cal',

calendar - 'calendar',

date - 'date',

time - 'time',

expanded_format - 'expanded format',

none - 'quit' askable

'please select the function you are interested in'.

52

The number parameter for "cal-years" is defined by the frame:

ca1-year [number]

desc: year of the required calendar

expl: a year is required for successful execution of the cal command

range: 0..9999

query: which year is the calendar for

and easily maps to the KRL:

cal_year: 'Year of the required calendar' number explanation

'A year is required for successful execution of the cal command' range 1..9999 askable

'which year is the calendar for ?'.

Finally the phrase parameter for the "password" frame:

password [phrase]

desc: the user needs a password

expl: ©username is your username but a password is needed for security

query: what is your password

maps to

password: 'the user needs a password'

phrase

explanat ion

©username.. M s your username but a password is also needed '

'for secur ity'

askable

'what is your password ?'.

Thus it can be seen that each parameter frame maps quite easily into K R L code.

53

The question might be asked that if the mapping is so good, why bother with the model at all? The answer to this is, that the model is, first and foremost, a design tool that enables the knowledge engineer to develop a structured design it combines consistency and completeness within a format which allows a secondary process to occur. The transformation of the model into an implementation.

It would be useful at this point to consider as an example the implementation of "section-16.1"; which deals with the "cal" command. From looking at this example it can be seen that the developer would soon become tied down in syntactic detail. Secondly due to the "loose structure" of the KRL language the developer would loose all direction of design if an attempt was made to directly implement a non-trivial system in KRL.

The design of "section-16.1" incorporates seven frames which were developed in the manner described in chapter 4.12 &; 4.2. The seven frames take the problem and produce a model whose structure and direction of reasoning is easy to follow. This model then acts as a representation from which a programmer; independent of the project, could work on to implement the system.

The frames of section-16.1 are:

sect ion-16.1 f u l 1 - 1 i s t ing 1G. 1-exarnple-cmd tex t l 6 . lb cal-month cal-year

and can be seen in Appendix G.

The KRL which was then produced from the seven frames can be seen in appendix C under the heading "Section 16.1 : The (cal' command".

If the developer were to try to implement this area of the domain by translating his ideas straight from text into the KRL then a tangled, unstructured web of code would result. It can be seen from the code that was actually generated that the syntax and flow of control are not the easiest to follow, even with the high structuring forced upon it.

The KRL code was also subject to other structuring techniques to make it more "readable". These included the numbering of sections, labeling of all the parameters associated with a section by prefix numbers of that section, keeping all parameters of sections together near their section definition, the use of comments, v/hite space and meaningful identifier names.

Thus the model allowed the implementation of the UNIX domian to follow in a straightforward manner.

54

A full listing of the KRL knowledge base can be found in appendix C.

4.4 Evaluation of the conceptual model

Having developed a knowledge base through the stages from conceptualisation to implementation by using the model developed at the beginning of the chapter, it is now possible to draw conclusions and evaluate how well it performed.

Initially the problem was one of bringing some structure to an open ended domain and forming a representation which would enable the developer to span the gap from conceptualisation to implementation. It is in this sense the conceptual model fulfilled its role by producing compact representations which forced the developer to be consistant in the approaches used and also to be complete - in that the use of templates prohibits the loss of data items. The model, if used in accordance to the guidelines given in developing the UNIX domain (chapter 4.1), could be used to construct knowledge bases in many different fields. The model is currently tapered to fit the ESP/Advisor shell but it is considered to be flexible enough for adaption to other shells

It should be posible to write a translator to process these frames and turn them into KRL or directly into the Prolog upon which the consultataion shell runs. This could be done in conjunction with a graphical editor for manipulating frames. The templates could be established at the start of the design process and each type summoned as necessary when the design unfolds. All that the developer would then have to do would be to fill in the templates. The design process could also be aided by use of windows and cross referencing. It is envisaged that this would greatly speed up the development time.

One area for interesting future research is the exploitation of parallelism in the process of production rules of the sort contained in the models section definitions. This could greatly enhance the scope of such systems allowing larger rule bases with wider domains.

4.5 Evaluation of the ESP/Advisor shell's performance.

Having implemented the UNIX domain in KRL on the ESP/Advisor shell it was now possible for the first time to evaluate how well the system could perform under a real environment where an expert level of performance was expected from the system.

The initial set of problems that were encountered were those associated with the implementation of a non-trivial knowledge base in terms of size and complexity. There were basically two areas in errors kept appearing. The first was caused by "over complex structures" which referred to problems of 'deeply nested structures'. Usually these were Prolog-ESP errors because the output text was not allowed to

55

be greater than twenty lines long or else it had to be output in twenty line segments. This was due to the limitations emposed by the size of the systems intermediate files. It is a very surprising problem for a system claiming to a be specialised text handling shell. The second of these initial problems was that the system kept on running out of workspace and gave the message:

"your knowledge base is so big it has caused the Prolog-1 in which ESP is embedded to run out of workspace.

Try to make your knowledge base smaller and avoid greedy commands like 'status' ".

which terminated the consultation session. There did not prove to be any way around this problem which caused major concern, as this knowledge base is in fact only a very small portion of the size the full UNIX knowledge base would be.

Having overcome these problems the systme restrictions of one way dialogue and fixed value variables were found to be even more important.

The interrogational type dialogue which keeps asking questions in an attempt to determine what information the user requires, appeared to be acceptable in the initial stages of designing the system. When actually implemented it was found to be most annoying to have to go through a large number of questions and menu screens in order to home. When it would have been much easier to have queried the system in English. For example, "how do I delete a file?". This menu driven system becomes more frustrating when the fixed value variable comes into operation. This property totally rules out any reselection of an option from a menu and traps the user at a certain level if reiteration of a menu is attempted. The only exit is for the user to break out of the system completely.

However, despite all of these problems it was possible to refine and run the whole system in a way that did provide some useful advice for an enquiring user.

An example dialogue would be of the form:

> are you a beginner with UNIX? y > have you been set up as a UNIX user? n > what type of user are you ?

(1) - D.Phil (2) - M.Sc (3) - B.A Enter the number of the relevant entry: 2

> "The M.Sc users can use the PRGV which runs UNIX and VAX2 under VMS" > have you ever logged in? n > what are your initials? why

56

I am asking because I wish to establish what is the users name (username) which is a necessary precondition for the display of the following paragraph:-PRG ©username is your username and your password is ©password switch on the terminal and when the prompt login: appears, type your username e.g. PRG ©username then the prompt password: will appear. So type in your password e.g. ©password you should now be logged in and the prompt S should appear.

> what is your username? rtp > what is your password? wadham > "PRGRTP is your username and your password is wadham

switch on the terminal and when the prompt login: appears, type your username e.g. PRGRTP then the prompt password: will appear. So type in your password e.g. wadham you should now be logged in and the prompt S should appear."

> do you wish to look at the system commands? y > in which area do you require help ?

(1) - basic login, (2) - boolean values, (3) - change, (4) - data manipulation tools, (5) - on line editors, (6) - help facilities in unix, (7) - hardware system calls, (8) - library routines, (9) - mail facilities, (10) - maths functions, (11) - printing, (12) - programming environment commands, (13) - remote useage, (14) - system calls , (15) - tabs, (16) - chronological functions, (17) - others, (18) - not sure, enter the relevant entry: 16

> please select the function you are interested in (1) - 'CAL - print calendar', (2) - 'CALENDAR - reminder servive', (3) - 'DATE - print and set date',

57

(4) - 'TIME - time a command1, (5) - 'EXPANDED FORMAT - helps explain these options', (6) - 'QUIT this set of functions' enter the relevant entry: 3

> do you require a full listing of the commands manual reference? n > do you wish to use the print and set date service? y > If GMT is needed type -u? -u > which year is the date to be set to (last two digits only)? 66 > which month is the date to be set to? 12 > which day is the date to be set to? 17 > what hour is the clock to be set to (24 hour clock)? 21 > what number of minutes is the date to be set to? 58 > how many seconds is the date to be set to? 10 > "Type the command: date -u [6612172158 (.10)] > END OF CONSULTATION

This dialogue shows many of the problems that have been discussed regarding the limitations of this mode of man machine interface. Further problems which are revealed are problems of not being able to prefix numbers with a zero. For example if (6' were to be answered in response to (what number of minutes is the date to be set to?' then there is no way informing the system that this should be of field width two and prefixed by leading zeros if necessary. Thus the wrong answer is given by the output text. A second problem is that the date command needs to know if G.M.T is to be used. If G.M.T is used then the output needs to be prefixed by f-u'. If the question "is GMT to be used" were asked and the answer fyes ' w e r e given there is unfortunately no way that a (-u' could be assigned to a variable or represented in any form. So the user has to respond artificially by typing in {-u' to the question.

The shell does work in certain limited ways and it is felt that ESP/Advisor would be more suited to a linear domain such as house conveyancing or tax advise, where the dialogue is driven by the system and the user just has to feed it information obtaining an answer at the end of the consultaion.

4.6 Conclusions and summary to shell approach

After having performed a thorough investigation of three shells and having chosen what appeared to be a very suitable shell which was specifically designed for the role of developing expert systems from textual sources, the final outcome of the shell approach is far from the desired ideal that was envisaged.

The ESP/Advisor shell initially promised to make the transfer of knowledge from text to knowledge base a very straight forward process. In fact it necessitated the building of a "conceptual model" in order to make this transition possible.

This model proved to be a very useful tool and it is felt that without it the knowledge base would not have been built in such a structured and rigorous way, if

58

at all. It is felt that only 'trivial-toy' knowledge bases can be constructed without some kind of design tool.

The shell itself did not live up to expectations and proved very weak in handling even small to medium sized knowledge bases. It gave the impression of only being half finished. However the ideas upon which it is based i.e., text animation, plus the explanation facilities offered prove to be sound and would provide a firm basis for future developments in this area of expert system shells.

The use of a shell decreased the time of developing the expert system and gave a useful insight into their construction. The problem of creating an expert assistant for the UNIX domian had not been solved and so the next step was to use the experience gained in this shell approach and let the ESP/Advisor knowledge base act as a prototype for a custom built expert system.

59

CHAPTER FIVE

A Natural Language Approach.

The building of a customised system that could act as an assistant on UNIX lead directly to the area of natural language processing.

It was felt that an intelligent front end capable of interpreting English queries from a user and then through the use of a knowledge base respond with meaningful answers was the best solution to the problem.

Having reached this conclusion it was deemed necessary to investigate the techniques associated with natural language processing (NLP) in order to devise the most suitable approach.

5.1 An Investigation of NLP Techniques.

The most basic technique for analysing natural language is that of keyword matching. The most famous example of this type of system being Weizenbaum's ELIZA system. The way in which this works is to have a series of patterns which are used to try and obtain a match with the input. These patterns usually contain special symbols that are not allowed in the input assertions. These symbols are used to match up the input assertions with the set of patterns. For example the input assertion:

{ I have been i l l w i t h a co ld )

matches with

{ I have been i l l w i t h +PATTERN)

Where 'PATTERN' is assigned the (a cold' part of the query. This could be used to ask the user another question of the form:

( lam s o r r y to hear you have had PATTERN , are you be t t e r now?)

which is output as:

"lam s o r r y to hear you have had a c o l d , are you b e t t e r now?"

60

This type of pattern matching when combined with a selection of stock phrases and standard patterns with fragments of a previous sentence inserted into them (e.g. "Tell me more about PATTERN" , "When did you PATTERN") plus various adjustments (e.g changing 'you' to 'me') produces very effective dialogues. The major criticism of this type of NLP is that the dialogues generated tend to be very shallow and superficial, not allowing the user to probe for very meaningful answers from the program. This is mainly due to the fact that one of the major techniques employed by pattern matchers of this type is to change the subject as soon as it realises it can not answer a question, usually be refering back to some previous reference made by the user (e.g. 'Did you say earlier PATTERN').

The second level of NLP techniques was inspired by Chmosky's work on grammars[35]. A grammar of a language being a scheme for specifying the sentences allowed in the language, indicating the syntactic rules for combining words into well formed phrases and clauses.

Chomsky's ideas have radically influenced NLP, the grammars being used to 'parse' or break down the structure of the sentence helping to establish their meaning, as opposed to the keyword matchers, which were based on the expectation of certain keywords being present in the sentence given to them and with very little meaning being extracted from the input. There have been many types of grammars used in NL programs. For example (phrase structure grammars' [36], transformational grammars (35j, case grammars [37] and syntactic grammars [38].

Thus parsing can be seen as the central construct upon which NLP is based, by the use of grammatical rules and other sources of knowledge the function of words in the input can be determined and the relations between them used to extract some meaning from the sentences.

5.2 The U N I X assistant Natural Language Processor.

Having surveyed the various approaches open towards solving the problem it was decided that the best method of solution would be to design, implement and run a parser based system. This was because parsers have the ability to extract meaning from their processing and not just attempt to match keywords at random.

A justification of this approach can be provided by an example. If a user; new to the UNIX system, needed, for some reason to search a file for a pattern, then under keyword pattern matching this can cause some problems. Firstly the user being a novice, does not know the commands mnemonic name, so retrieval of the information is not available through that. Secondly, if the keyword 'search' was tried then no solution would be found. The next step would, perhaps be, to search under 'file' but this, through the keyword lookup facility of unix 'apropos file', produces five screens of possible commands associated with files. The only way to find the desired 'grep' command would be to divide and conquer the situation. This

61

extract meaning from a piece of text is a very important part of the process in solving the problem efficiently.

5.3 Grammar and B.N.F definitions.

The first step towards constructing a NL system was to specify a grammar upon which the system could be based. It was decided that the best basis for the grammar would be that of a series of B.N.F definitions [39].

A simple 'experimental grammar' was first drafted with which to obtain the correct structures:

<s> <nphr> < a r t i c > <adjs> <adj> <nouns> <noun> <verb> <rest> <object> <descr>

= <nphr><verb><rest> = <art ic><adjs><nouns> = " t h e " | " a " | " a n " j " s o m e " = <adj>|<adj><adjs> = " b i g " | " r e d " j " q u i c k " | " b r o w n " | " l a z y " = <noun> [<nounXnouns> = " f r u i t " | " f l y " [ " f 1 i e s " | " b a n a n a " | " f o x " | " d o g " | " d o g s " = " j u m p " | " j u m p s " | " f l y " \ " f 1 i e s " | " 1 i ke"1"1 ikes" = <object> j<descr> = <nphr> = " l i k e " ! " a s " l < n p h r >

This could then be used to parse sentences in a top down fashion. For example the sentence "fruit flies like a banana" is a classical ambigious sentence and it's two parses can be derived from the B.N.F in the following manner.

<s>

<nphr><verb><rest>

I I 1 <nouns> " f i l e s " <desc>

\ <noun>

" f r u i t "

I. like" <nPhr>

<ar t i c X a d j s X n o u n s >

a" <none> <noun>

"banana"

62

or as

<s>

I <nphr><verb><rest>

<nouns> "likes" <object>

I * <nphr>

\ <art ic><adjs><nouns>

<nounXnouns>

I I " f r u i t " <noun>

f " f l i e s " <none>

"banana" i

The ability to cope with ambigious grammars in such a way that all possible meanings can be deduced, is therefore an important requirement that a useful parse should posses.

63