Interactive query workstation: Standardizing access to computer-based medical resources

Computer Methods and Programs in Biomedicine, 35 (1991) 293-299 293 © 1991 Elsevier Science Publishers B.V. All rights reserved 0169-2607/91/$03.50

C O M M E T 01207

Interactive Query Workstation: Standardizing access to computer-based medical resources

C h r i s t o p h e r Cimino , G. O c t o Ba rne t t , L a u r i e H a s s a n , D y a n R y a n Blewe t t

and Jud i t h L. Piggins

Laboratory of Computer Science - Massachusetts General Hospital, Boston, MA, U.S.A.

Methods of using multiple computer-based medical resources efficiently have previously required either the user to manage the choice of resource and terms, or specialized programming. Standardized descriptions of what resources can do and how they may be accessed would allow the creation of an interface for multiple resources. This interface would assist a user in formulating queries, accessing the resources and managing the results. This paper describes a working prototype, the Interactive Query Workstat ion (IQW). The IQW allows users to query multiple resources: a medical knowledge base (DXplain), a clinical database ( C O S T A R / M Q L ) , a bibliographic database (MEDLINE), a cancer database (PDQ), and a drug interaction database (PDR). Descriptions of each resource were developed to allow IQW to access these resources. The descriptions are composed of information on how data are sent and received from a resource, information on types of query to which a resource can respond, and information on what types of information are needed to execute a query. These components form the basis of a s tandard description of resources.

Unified medical language system; Database access; User interface

I. Introduction

It is often stated that the amount of medical information available to the physician is increas- ing at an almost exponential rate [1]. The amount of published material pertaining to a particular specialty is almost unmanageable, not only because of its volume but also because it may ap- pear in a wide range of resources unknown to the specialist. In order to manage this information, a number of computer-based medical resources are being developed. The need to learn to access these resources adds another information burden to the physician's load.

Correspondence: C. Cimino, Laboratory of Computer Science, 50 Staniford Street - Room 530, Boston, MA 02114, U.S.A.

As more computer-based medical resources become available, it becomes more difficult for occasional users to be aware of all the options available. For example, a user with a specific question may not be aware what resources are available through his institution. He may not be aware which resource is most appropriate to an- swer his question. It is likely he will not know how to formulate his question to utilize the resource in the most efficient way.

The potential exists for questions raised about information acquired from one resource to be answered by another resource. This potential is lost unless the user is familiar with all the resources available. A computer-based environment would recover this potential if it could provide information about what resources were available, access those resources, and help the user formulate appropriate queries.

294

2. Background

Previous research in this field has concentrated on interfaces that remove the need for the user to know about a particular aspect of the resource they are using. No at tempt is made to inform or guide the user in the development of a query. The CONIT program was developed at MIT in 1981 for formulat ing queries for mult iple databases [2]. The user did not need to have any knowledge about the syntax of any of the databases being searched. Fairly sophisticated queries could be formed and a wide variety of databases searched. The primary disadvantage of CONIT was that in order to capture the power of some of the database queries, the CONIT query language was very complex. The CANSEARCH program was developed at Huddersfield (U.K.) in 1986, in part to address some of this problem. It is an at tempt to incorporate domain knowledge to assist a user in formulating a query [3]. The program uses a rule-based system to develop a query about cancer topics which is then executed on the M E D L I N E database. In comparison to CONIT, this system is very easy to use. Its disadvantages are that to extend it to other domains requires incorporation of new domain knowledge and the ref inement of a query is done through the use of multiple menus. The number of menus the user needs to select from can become tedious, especially if the domain is expanded.

To create an expandable system it is necessary for the program to rely on information from outside sources. One approach would be to mod- ify each resource to provide a hard-wired link to a common user interface. This would require a developer to have access to the program code for every resource that is to be linked. It has the additional disadvantage that linking new resources would require a great deal of work. Many proprietary resources could not be linked at all.

An alternative approach would depend on a standardized description for each resource. A database of these resource descriptions could provide textual information for the user and machine-readable information for a computer program to access a resource and execute well- formed queries. Two additional components that

could be useful would include a description of the vocabularies used by each resource and a description of the types of information dealt with by each resource. These components correspond to the Unified Medical Language Systems's Infor- mation Sources Map, Metathesaurus and Seman- tic Network, respectively.

2.1. Unified medical language system (UMLS)

The National Library of Medicine's UMLS project is "designed to facilitate the retrieval and integration of information from many machine- readable information sources" [4]. In order to guide users to appropriate information resources, an Information Sources Map will be developed which "will contain information about the scope, location, vocabulary, syntax rules, and access con- ditions of publicly available machine-readable biomedical information resources" [4]. The current emphasis is on building a machine-readable knowledge source that will encompass terminol- ogy from a variety of controlled vocabularies. This knowledge source is called the UMLS Metathesaurus (META). Each term in M E T A will have information about the vocabularies in which the term appears, as well as semantic information about the term [5]. The UMLS Semantic Network is a description of relationships that can occur between the various semantic types used in META. "The purpose of the semantic types and the associated semantic network is to provide a consistent categorization of all concepts repre- sented in the Metathesaurus and to elucidate the permissible relationships between and among these concepts" [6].

2.2. Direct programmed links

As a first approach to linking multiple resources, direct hard-wired links were developed for two applications, a medical knowledge base (DXplain) and a medical li terature search program (RAMM) [7]. These were chosen because the source code for both was available and the linking of these two would give a non-trivial result not available from other computer-based resources. The user first entered a case into DXplain and obtained a

differential diagnosis. A disease in the differential was then selected as the basis of a literature search.

Several disadvantages to such hand-coded, specific links were immediately obvious. Each new link between resources required extensive programming. Any change or update in the resource programs would require revision of the links. The links could be created or modified only by someone who had an understanding of the data structures and functions of the resources and access to the source code.

This exercise provided information about gen- eralizations that could be made in describing computer-based resources. There are two distinct processes involved in both DXplain and RAMM. Terms appropr ia te to the particular resource were selected. Then the terms were processed by the application to produce a result. Both vocabularies include entry terms (terms recognized as being equivalent to a term in the controlled vocabulary). Both vocabularies use modifiers. Both vocabularies use a tree structure to link terms that are more or less specific.

2.3. Partial programmed links

The next exercise consisted of developing links based on what was learned about term selection and term processing. For this exercise, the links for DXplain and R A M M were rewritten so that the interactions could be controlled from a uniform user interface. A link to a clinical database ( C O S T A R / M Q L ) was also developed. A subset of 100 patient records, with identification information removed, was used. Each link still required specialized programming for each resource but provided a uniform interaction with the user interface. The interface program could send messages to a link that would provide information about the resource. This information could then be used to access the resource.

All the links were required to provide re- sponses to four system-defined messages. These provided information about the interactions the link was capable of, what data were required to perform them, and what results would be returned. The links were required to respond to at

295

least one other predefined message to provide access to the application's controlled vocabulary. For example, the BEST message provided the best match of a string of text to the resource's vocabulary.

This exercise demonstrated some interesting capabilities. The system only needed information about how to send messages to a resource link. The system could then automatically generate a user interface based on information acquired from sending the definition messages described above. This isolation of the user interface from the resources allowed modifications to be made in a resource and the potential for new resources to be added without affecting the user interface.

In this revised system, the user was able to select which resources were used. The system was able to provide the user limited descriptions of the resources to aid selection. The user only needed to be familiar with the I Q W user interface in order to use multiple resources. However, aside from this static information, no guidance was provided to the user concerning which of the available queries might be appropriate.

Queries could be chained; information derived from a patient record could be used to acquire a differential diagnosis which could then be used as the basis of a literature search. While chaining of queries increased the potential usefulness of multiple resources, two new issues arose. The system did not display which terms from one controlled vocabulary might be used in a query of a different resource's controlled vocabulary. In addition, all the information returned from a query was available for new queries resulting in sometimes over- whelming quantities of information from which the user could select.

The system still had all the disadvantages of requiring source code for creating links. The user still needed to understand the purpose of all the available resources, although he no longer needed to know how to formulate queries for each resource. The user could easily be overwhelmed by the number of possible queries and the amount of accumulated results. The next step in development was to create an interface that provides more active guidance in selecting queries, and organizes the results of queries.

296

2.4. Design considerations

The primary goals in this system are to (1) allow easy addition of new resources, (2) maintain a uniform interface, (3) retrieve information from the resources, (4) provide the user information about the resources, and (5) allow the user to enter and execute a query. In this prototype, efficient performance was not considered a primary goal. Nor was any a t tempt made to mini- mize hardware requirements.

3. System description

3.1. Kappa

The use of messages to derive characteristics about resources and the desire to provide a uniform user interface for a variety of resources suggested that an object-oriented environment would be appropriate for development. Mainte- nance of separate applications and the user interface is simplified by using an object layer between the user interface and the application. The small number of reasonable queries relative to the large amount of possible starting information suggested that query guidance would be best performed by a rule-based system. Kappa, which is an object- oriented, rule-based system, was chosen for development of the IQW. Kappa requires Micro- soft ® Windows version 3 and at least 1 megabyte of random access memory. Kappa allows the creation of objects which have single inheritance (anything defined for a parent object is accessible to the child and children can have only one parent). Each object can have slots (variable val- ues) and methods (functions) associated with it. The environment also allows the creation of goals to be achieved as well as rules for filling in slots. If a goal requires a slot that has not been filled, the appropriate rule is invoked. Backward chaining of rules to satisfy a specific goal or forward chaining to accomplish any goal can be invoked. Kappa also provides functions for creating a cus- tomized user interface, and easily allows new functions to be added in the form of C code.

3.2. Bridge 386

Bridge 386 is a commercial product that enhances Windows' interprocess communication utilities. It provides additional capabilities useful to the IQW. Bridge allows a Windows program to run and control another program. IQW can start and stop another application and use it as a resource. Bridge allows IQW to send commands which the resource application receives as if the commands were entered at the keyboard. Currently, in order for the application to be of use to IQW, it must be capable of writing the results of a query to a file. IQW can then read the result file for further processing.

3.3. Hardware

The software currently runs on a Hewlett Packard Vectra RS-25C, which is an MS-DOS compatible 386-based machine. While only 1 megabyte of RAM is needed in theory, to run Windows, Bridge and Kappa efficiently with other resources re-

ID Last Name AA0099 Doe

Query Selection ~ r ~ t i e n t R e c o r d ~

First Name Age Sex John 43 M

Problems Obesity Aphthous Stomatitis Colon Cancer Colonoscapy

Concepts Queries Applications

Finding Info - - Obesity . ~ - ~ ~ - DXplain

Aphthous S t o m a t i t i s ~ Disease Info PDQ

Colon Cancer ~ Cancer Staging

Colonoscopy ~ ~ Citations-- - - MEDLINE

Nicorette Gum - ~ ' - - Side Effects • PDRS

Fig. 1. In a typical session, the user might start from a patient record. The Interactive Query Workstation (IQW) identifies terms contained in the record and presents them to the user. In this example, the user has selected Nicorette Gum. IQW eliminates queries which are not applicable to this term ('Disease Info' and 'Finding Info') and allows the user to select from those remaining. When the user selects 'Side Effects', IQW checks if there is any other information needed. In this case there is not, so it precedes to send the query to the PDRS program, collect the results and display them

to the user.

Query Execution

Concepts Queries Applications Bridge DOS File Information Concepts

Fig. 2. Currently l O W passes queries to applications through Bridge 386. The 'Resul ts ' of these queries must be available to IQW in a file. Information collected from the applications can be parsed into terms. If semantic type information is available (or the user can provide semantic type information) these terms can become concepts. These concepts can then

form the basis of a new query.

quires at least 4 megabytes of RAM. An EGA or VGA monitor and a Hayes compatible modem are required.

3.4. Resource descriptions

For each resource there are three types of object that describe the resource: APPLICATION objects, QUERY objects, and TYPE objects. An APPLICATION object has slots and methods that allow data to be sent and received from a resource. For example, the DXplain APPLICATION object contains a method for calling a communications program, names of appropriate script files, and the name of the file where results will be stored. A QUERY object describes a template for a query. This includes a user-readable description of a query, the name of the application object that can han- dle the query, methods for formatting a valid query in the syntax of the resource, and a list of slots that need to be filled by the user in order to process the query. Continuing the example, the disease information QUERY object would have a pointer to the DXplain APPLICATION object. Each slot has a slot TYPE which points to a TYPE object. The disease information QUERY object has only one slot, which points to the disease name TYPE object. TYPE objects carry information about types of term. Any terms acquired during a session are stored in the appropriate TYPE object. Queries

297

that need to fill a slot of that TYPE may use these terms or use the TYPE object's method to acquire a new term from the user. The TYPE object allows validation to insure that user entries are of the appropriate format.

3.5. Status report

In one form of a typical session the user first looks up a patient record. The patient record is then parsed to extract terms. The user can choose one or more of these terms as the basis of a query. The extraction and typing of terms are currently determined based on a vocabulary in- corporated into the IQW. Eventually, this vocabulary will be replaced by the UMLS Metathe- saurus. From all possible queries, applicable queries are selected based on the slot TYPES needed for each query and the chosen term TYPES. If no query can make use of all the chosen terms, then queries are selected that can make use of any of the chosen terms. If more than one query is applicable, the user selects from the reduced list and that query is made active. If only one query is applicable, that query is the active query. If there are still unfilled slots in the active query, the user is asked to fill them. The query is then executed and the results displayed.

The user can also select a query from the list of all queries. Once she selects this query, the slot filling proceeds in the same manner as above. The user can review results of any query performed in a session and process the results to extract terms for a new query. Text of queries can be copied to the Windows clipboard and pasted into another application such as a word proces- sor. Text from the Windows clipboard can also be parsed to extract terms for use in a new query.

4. Future plans

Now that META-1 is available, it will be used to replace the test vocabulary currently used. The META-1 semantic types will be used as TYPE objects where appropriate. Some TYPE objects, such as "patient ID", though not META-1 semantic type, will be retained. Currently there are

298

different TYPE objects for each of the resources: for example, DXplain disease, MeSH disease, and COSTAR problem. META-1 will contain explicit source vocabulary information so, in the above example, these TYPES would be collapsed into one TYPE 'disease'.

The current system can only create queries de novo or based on terms found in the result of a single previous query, i.e., there is no easy way to use terms from the results of several different queries. This limitation results from the number of different TYPES from which a query will be created; when the number is large, there are too many possible queries to be useful to the user. Making use of Kappa's rules algorithms might limit the possibilities and allow the system to take a more active role in query selection. For example, a physician may retrieve a patient record and then perform a literature search for treatment of one of the patient's problems. If the literature search results contain references to therapeutic drugs, the system may suggest the user search for drug interactions between the drugs listed in the literature search and those in the patient record.

The UMLS Semantic Network will be tested as a tool for query selection. If terms have been acquired which are typed in META-1 as 'Bacterium' (e.g., Pneumococcus) and no terms have been typed as 'Pharmacologic Substance' (e.g., an antibiotic) the system might derive that "Proper treatment of a specific bacterial infec- t ion?" would be an appropriate query.

The system will collect information about queries executed. This information will be useful for improving future versions. In addition to information about common queries, common se- quences of queries will be useful in refining the query selection algorithms. Information about an individual user could be used to personalize the search strategies for a particular user. For example, some users might prefer a few relevant pieces of information while others might desire all relevant pieces of information. Collection of this information would be done in an unobtrusive way.

New resources will be added. A textual database would be the next logical choice of resource to be added. A resource such as Scien- tific American MEDICINE's CONSULT would

be an example of both a new type of database (textual) and a new type of resource (CD-ROM). Existing resources might be replaced with network-available counterparts; for example, DX- plain through Telenet. This would result in a smaller program and smaller hardware requirements at the expense of speed and external costs (connect time charges). Because of the amount of memory swapping that occurs when Bridge con- trois DOS applications, it is unlikely that the memory requirements will drop below 2 megabytes.

While the resources currently used did not require alteration of the application code, signifi- cant program-specific information was needed in the APPLICATION objects to allow interaction. Fur- ther generalization of the system would require a more extensive resource description language. This would include a query description general enough to be applicable to all resources, a script- ing language for describing interactions with a resource, and a method of translating a query description into a script. Currently, results are handled as free text, but a standardized description of the results would improve processing of results.

These planned improvements correspond to concepts suggested for an intelligent guidance of user queries [1,9]. These include (1) providing a consistent user interface for several applications, (2) standard data definitions to improve interpre- tation of results and allow easy transfer of data between applications, (3) profiling of users;, to permit, individualized interaction, (4) obtaining feedback to guide system development, (5) using the context of the user's interaction to guide query selection, and (6) using methods of ranking and presenting results that are based on the user's interests. In addition, the IQW should keep the user informed of lOW abilities and application abilities especially when these diverge.

Equally important to development will be feedback from practising clinicians. The IQW will be placed in an out-patient environment within the next year. This will provide information about whether it is a useful tool and what further steps can be made to improve it. It may also reveal new insights into the information needs of physicians.

Acknowledgements

This work was supported in part by NLM con- tract [N01-LM-8-3513] and in part by an educa- tional grant from Hewlett Packard Corporation. C.C. is supported by a NLM training grant [2- T15-LM07037-04]. DXplain, MQL and COSTAR are trademarks of Massachusetts General Hospi- tal. MEDLINE is a trademark of the National Library of Medicine. Kappa is a trademark of Intellicorp, Inc. Bridge is a trademark of Soft- Bridge, Inc. PDR ® Drug Interactions and Side Effects Diskettes is a trademark of Medical Eco- nomics Company, Inc. Scientific American MEDICINE CONSULT is a trademark of Online Computer Systems, Inc.

References

[1] R.A. Greenes and E.H. Shortliffe, Medical Informatics: An Emerging Academic Discipline and Institutional Prior- ity. J. Am. Med. Assoc. 263(8) (1990) 1114-1120.

[2] R.S. Marcus and J.F. Reinjes, A translating computer interface for end-user operation of heterogeneous retrieval systems, l:Design, J. Am. Soc. Inform. Sci., 32(4) (1981) 287-303.

[3] A.S. Pollitt, An expert systems approach to document retrieval: A summary of the CANSEARCH research project,

299

in Technical Report Series, Research Report 86/6, (Hud- dersfield Polytechnic, U.K., 1986).

[4] B.L. Humphreys and D.A. Lindberg, Building the Unified Medical Language System. in Proceedings of the Thir- teenth Annual Symposium on Computer Applications in Medical Care, ed. L.C. Kingsland, pp. 475-480 (IEEE Computer Society Press, Washington, DC, 1989).

[5] M. Tuttle, D. Shertz, M. Erlbaum, N. Olson and S. Nel- son, Implementing Meta-l: The First Version of the UMLS Metathesaurus. in Proceedings of the Thirteenth Annual Symposium on Computer Applications in Medical Care, ed. L.C. Kingsland, pp. 483-487 (IEEE Computer Society Press, Washington, DC, 1989).

[6] A.T. McCray, The UMLS Semantic Network. In: Proceed- ings of the Thirteenth Annual Symposium on Computer Applications in Medical Care, ed. L.C. Kingsland, pp. 503-507 (IEEE Computer Society Press, Washington, DC, 1989).

[7] H.J. Lowe, G.O. Barnett, J. Scott, R. Eccles, E. Foster and J. Piggins, Remote Access MicroMeSH: A Microcom- puter System for Searching the MEDLINE Database. in Proceedings of the Twelfth Annual Symposium on Com- puter Applications in Medical Care, ed. R.A. Greenes, pp. 535-539 (IEEE Computer Society Press, Washington, DC, 1988).

[8] T. Barsalou, An Object-Based Architecture for Biomedi- cal Expert Database Systems. In: Proceedings of the Twelfth Annual Symposium on Computer Applications in Medical Care, ed. R.A. Greenes, pp. 572-578 (IEEE Computer Society Press, Washington, DC, 1987).

[9] W.W. Stead, IAIMS: An Opportunity for National Collab- oration. Integr. Acad. Inform. Man. Syst. Newsl., 2(3) (1989) 1-2 (Duke University Medical Center).

Documents

Interactive query workstation: Standardizing access to computer-based medical resources