A formal basis for an abbreviated concept-based querylanguage
Vesper Owei a,*, Shamkant Navathe b
a Information and Decision Sciences Department (M/C 294), University of Illinois at Chicago, Chicago, IL 60607, USAb College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
Received 22 September 1998; received in revised form 16 December 1999; accepted 3 March 2000
Concept-based query languages allow users to specify queries directly against conceptual schemas. The primary goal
of their development is ease-of-use and user-friendliness. However, existing concept-based query languages require the
end-user to explicitly specify query paths in totality, thereby rendering such systems not as easy to use and user-friendly
as they could be. The conceptual query language (CQL) discussed in this paper also allows end-users to specify queries
directly against the conceptual schemas of database applications, using concepts and constructs that are native to and
exist on the schemas. Unlike other existing concept-based query languages, however, CQL queries are abbreviated, i.e.,
the entire path of a query does not have to be specified. CQL is, therefore, an abbreviated concept-based query lan-
guage. CQL is developed with the aim of combining the ease-of-use and user-friendliness of concept-based languages
with the power of formal languages. It does not require end-users to be familiar with the structure and organization of
the application database, but only with the content. Therefore, it makes minimal demands on end-users cognitive
knowledge of database technology without sacrificing expressive power. In this paper, the formal semantics and the
theoretical basis of CQL are presented. It is shown that, while CQL is easy to use and user-friendly, it is nonetheless
more than first-order complete. A contribution of this study is the use of the semantic roles played by entities in their
associations with other entities to support abbreviated conceptual queries. Although only mentioned here in passing, a
prototype of CQL has been implemented as a front-end to a relational database manager. 2001 Published byElsevier Science B.V. All rights reserved.
Keywords: Abbreviated query formulation; Computer-supported query formulation; Concept-based query languages; Conceptual
query language; Query language expressive power
Query tools that depend on programming skill for their eective and ecient use impose acognitive burden that may diminish users productivity with the tools. This underscores the needfor database query languages (DBQLs) that are matched to the skills and ability of end-users,
Data & Knowledge Engineering 36 (2001) 109151www.elsevier.com/locate/datak
* Corresponding author. Present address: Division of Management Information Systems, University of Oklahoma, 307 West Brooks,
Room 306, Norman, OK 73019-4007, USA. Tel.: +1-405-325-0768; fax: +1-405-325-7482.
E-mail addresses: email@example.com (V. Owei), firstname.lastname@example.org (S. Navathe).
0169-023X/01/$ - see front matter 2001 Published by Elsevier Science B.V. All rights reserved.PII: S 0 1 6 9 - 0 2 3 X ( 0 0 ) 0 0 0 4 2 - 2
necessitating a rethinking of the DBQL design. Concept-based approaches to DB queryingsupport the direct use of conceptual schemas and constructs that are either the same or similar tothose in users mental model. Therefore, concept-based DB querying naturally tends to fit theskills and ability of typical end-users. Conceptual DB querying will be needed with ever increasingdemand as we place more and more complex databases on the Web. This need for concept basedinformation retrieval has led to research into concept-based DBQLs.
However, because the primary motivation for the development of concept-based querylanguages is ease-of-use and user-friendliness, they tend to be weak in formalism. For ex-ample, visual query languages, which are only a sub-class of concept-based query languages,are usually very weak in expressive power . This paper discusses the conceptual querylanguage (CQL)  which is developed with the aim of combining the ease-of-use and user-friendliness of concept-based languages with the power of formal languages. CQL allows usersto formulate queries in a very intuitive way without the need for them to learn about theschema (structure) of the database or to grapple with the syntactic complexity of command-based languages. It, therefore, makes minimal demands on end-users cognitive knowledge ofDB technology without sacrificing expressive power. Experiments in  show that end-usersperform better with CQL than with alternative languages such as SQL; they also have a betterperception of CQL. Our focus in this paper is on the theoretical basis and formal semanticsof CQL. We show that, while CQL is easy to use, it is nonetheless more than first-ordercomplete.
The rest of the paper is organized as follows: In Section 2, we give an example to illustrate themotivation for this study. In Section 3, we discuss some related studies in conceptual queryformulation and semantics based querying. We formally define CQL in Section 4. Section 5 isdevoted to discussing the functionality of the dierent modules of CQL. We examine the claimsconcerning the expressive power of CQL in Section 6. The paper concludes in Section 7 with adiscussion of much earlier work in the development of conceptual interfaces and an examinationof other issues, e.g., intelligent interfaces, that are important in interface design. A summary of thepaper and an examination of its main contributions and limitations, as well as an indication ofrelated studies planned for the future are also given in the concluding section.
Query specification in linear keyword languages (LKLs) like SQL and in other visual systemspatterned after or similar to query-by-example (QBE) make use of joins defined either during datadefinition or during query formulation. ACCESSe and PARADOXe are examples of QBEsystems. Recent QBE implementations, for example in ACCESS, are able to perform joins oncethe tables to be joined have been specified by the user. This requires the joins to have been definedas relationships during table creation. Where needed joins are not defined, possible joins can besuggested to the user. The domain types of attributes can be used for this task. The existingcommercial systems are unable to select joins automatically for the user. The ability to selectdefinite joins is tantamount to specifying a particular query path; this requires the use of meta-knowledge about the schema in the form of the meaning of a query path to ensure the semanticcorrectness of the selected path. Such meta-knowledge is lacking in existing LKL and QBEsystems.
110 V. Owei, S. Navathe / Data & Knowledge Engineering 36 (2001) 109151
We therefore ask the following thematic question: Given the rich semantics of data models likethe ER model, is it possible to exploit the meta-knowledge about these models to reduce the cog-nitive load faced by end-users to facilitate query formulation? This question deals with the issue offurther enhancements to the query formulation methods in commercially popular LKL, drag-and-drop and point-and-click query tools.
Since the mid-1980s a number of approaches using meta-knowledge about DB schemas toenhance the facility of end-users in query formulation have been proposed (for example,[1,14,15,35,44,47]). The recent prototypical approaches in [14,15,44,47] elevate query formulationfrom the logical level to the conceptual schema level by supporting the direct use of concepts andabstractions on conceptual schema in query statements. Query formulation can be further facil-itated in these systems by reducing the cognitive workload entailed by their use. One way this canbe achieved is through minimizing what is required to be specified by the end-user. The system canthen use schema meta-knowledge to determine and select a semantically correct query. CQL isbased on this approach.
2.1. Structure of the conceptual query language
Current commercially popular LKL and QBE systems require users to explicitly mention all thetables needed by the system to solve the problem. Furthermore, in LKL and QBE systems the usermust also specify query paths. This explicit navigation is a major source of diculty for a typicalend-user. In our proposed language called the CQL, this cognitive burden in formulating DBqueries is reduced by migrating much of this task to the underlying DBMS. Unlike LKL and QBEsystems, query formulation in CQL does not require the user to specify all the tables needed tosolve a query. Also, the user does not have to specify query paths. CQL is, therefore, particularlysuitable for business and administrative end-users who, generally speaking, are not programmers.
In CQL only the entities and conditions explicitly mentioned in query statements are requiredto be specified in their formulations. CQL has a simple and straightforward query syntax. Thebasic (canonical) form of a CQL query, Q, can be expressed as
Query : QtE; SE; fCsel;Csemg
where tE is the set of targets (entities and attributes about which information is sought), SE the setof sources (entities and attributes about which information is given or known), Csel the selectioncriteria/conditions, Csem the semantic relationships between implicit sources and implicit targets,and the entities semantically adjacent to them on the application conceptual schema. An implicitsource is either a source or a target entity of the query. An implicit target may be the target of thequery or an intermediate entity that is neither the source nor the target of the specified query, butlies on the query path. As discussed latter, the specification of intermediate entities in CQL isoptional and not necessary.
In formulating a query with CQL, therefore, the end-user only needs to state tE; SE;Csel andCsem. The formulated query is then automatically passed to the underlying DBMS to determineand select the query path. The CQL system uses semantic information about the schema toperform these tasks. This information is in the form of the semantic roles played by schemaentities in their relationships with other entities.
V. Owei, S. Navathe / Data & Knowledge Engineering 36 (2001) 109151 111
2.2. Query abbreviation in the conceptual query language
Concept-based or conceptual query interfaces reduce the cognitive load in querying DBs byallowing users to directly use constructs form conceptual schemas [24,13,41,47]. As exemplifiedin , instead of specifying the relational condition Where s.sno sp.sno and sp.pno p.pno, concept-based interfaces would allow for a more natural specification like WhereSupplier supplies Parts. The CQL approach provides additional enhancement to this. Whereintermediate entities exist on the query path between Supplier and Parts, CQL uses built-in meta-knowledge about the application schema to determine and select the correct intermediate entities.Therefore, in comparison to LKL and QBE queries, conceptual queries in CQL tend to be highlyabbreviated, since the user is not required to specify the entire query path. The main problem withabbreviated queries is to derive the corresponding semantically correct full queries . Thisconcern naturally carries over to CQL queries. In this section we use an illustration to explainwhat CQL is, what its structure is and what it is trying to achieve. The illustration is based onFig. 1, which is a semantically constrained entity-relationship diagram (SCERD) 1 of a universitydepartment.
Fig. 1. Semantically enhanced ER diagram of a university department schema.
1 SCERD contains other constructs that are used for updates. These have been left out in Fig. 1, since they are not pertinent to the
112 V. Owei, S. Navathe / Data & Knowledge Engineering 36 (2001) 109151
In SCERD, entity types in the schema bear explicitly named relationships, or association,among themselves. Each relationship has a semantic meaning. Double-headed arrows are used inan SCERD to indicate that the entities at both heads of the arrows have a direct semantic re-lationship, and the arrow-heads are labeled with the roles, e.g., works-for, can-teach, advises, etc.,played by the entities in specific relationships. The association semantics of the relationships in-volving entities are constrained by the roles the entities play in the particular relationship. InSCERD, the meaning of the links between entities, therefore, lies in the form of roles. CQLsupports the direct use of SCERD constructs in query formulation.
Example. Suppose the following query is posed and specified on Fig. 1:Query 1: What course(s) is Marshall taking from associate professor Jones?
An abbreviated CQL formulation of this query requires the user to specify only the statedentities Student, Teacher and Course along with a set of selection predicates on these entities. Thesystem is then required to chart one or more paths through the conceptual schema from Studentand Teacher to Course. We refer to such paths as derived paths. In addition to path derivation,the system must also be capable of performing any needed operations, e.g., conjunction or dis-junction, on the derived paths. In this case, the meaning of the desired query demands that thesub-paths Student ! ! Course and Teacher ! ! Course be derived and conjunctivelycombined, where indicates segments of the sub-paths that must be determined by the system.Furthermore, these segments must be such that the meaning of the resulting path is the same asthat of the desired query. Clearly, the sub-path STDjenrolledin ! CRjhPi ! C is semanticallycorrect. In this notation P fp1; p2; . . . ; png is a set of paths, and jenrolledin denotes the role playedby the Student entity on that path. The path derives its meaning from the totality of the semantics ofthe roles played by all the entities on the path.
An examination of Fig. 1 shows that multiple paths exist between Student and Course and alsobetween Teacher and Course. What complicates the problem here is that all the paths do not havethe same meaning. For example, the semantics of STDjadvicedby ! T jcanteach ! C, i.e., the sub-path leading from Student to Course via Teacher deals with adviseradvisee relationship, and notwith students taking classes. It would be semantically incorrect for the system to include this sub-path in constructing the query path.
The task of the system, then, is twofold: (1) To determine PI P and PII P such that for eachpi 2 PI and pk 2 PII; STDjhPIi ! C and T jhPIIi ! C are semantically correct. In CQL, meta-knowledge (in the form of the semantics of roles) about the relationships that the entities par-ticipate in are used to resolve this path ambiguity problem. (2) To select a pi and a pk from all thecandidate paths in (1). A modification of the path selection algorithm in  is used for this. In therest of the paper, the formal basis of CQL is...