An abbreviated concept-based query language and its exploratory evaluation

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>An abbreviated concept-based query language and itsexploratory evaluation</p><p>Vesper Owei a,*, Shamkant B. Navathe b,1, Hyeun-Suk Rhee c,2</p><p>a Division of Management Information Systems, University of Oklahoma, 307 West Brooks, Room 306, Norman, OK 73019-4007, USAb College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA</p><p>c School of Management, University of Texas at Dallas, Richardson, TX 75083, USA</p><p>Received 15 September 1999; received in revised form 23 March 2000; accepted 8 August 2001</p><p>Abstract</p><p>Research on the use of conceptual information in database queries has primarily focused on semantic query optimization.</p><p>Studies on the important aspects of conceptual query formulation are currently not as extensive. Only a relatively small number of</p><p>works exist in this area. The existing concept-based query languages are similar in the sense that they require the user to specify</p><p>the entire query path in formulating a query. In this study, we present the Conceptual Query Language (CQL), which does not</p><p>require entire query paths to be specied but only their terminal points. CQL is an abbreviated concept-based query language that</p><p>allows for the conceptual abstraction of database queries and exploits the rich semantics of semantic data models to ease and</p><p>facilitate query formulation. CQL was developed with the aim of providing typical end-users like secretaries and administrators an</p><p>easy-to-use database query interface for querying and report generation. A CQL prototype has been implemented and currently</p><p>runs as a front-end to an underlying relational DBMS. A statistical experiment conducted to probe end-users reaction to using</p><p>CQL vis-aa-vis SQL as a database query language indicates that end-users perform better with CQL and have a better perceptionof it than of SQL. This paper discusses the design of CQL, the strategies for CQL query processing, and the comparative study</p><p>between CQL and SQL.</p><p> 2001 Published by Elsevier Science Inc.</p><p>1. Introduction</p><p>According to a recent study, end-user computing isgrowing at the rate of 5090% per year (Cronan andDouglas, 1990). This rapid increase raises the concernsabout the suitability of database (DB) query languages(DBQLs) for present-day end-users, who are typicallynon-expert DB users. Query tools that depend on usersprogramming skill for their eective and ecient useimpose a cognitive burden that may diminish usersproductivity. This underscores the need for DBQLs thatare matched to the limited ability of end-users. A re-</p><p>thinking of DBQL design is therefore called for. Wethink it is essential that DBQLs be adapted to the userand not the user to DBQLs. This requires that DBQLsuse concepts that are as close as possible to those inthe users cognitive mental model and adopt interfacetechniques that are suited to users abilities. Becauseconceptual schemas represent users real world view andthe mental model of their application universe, concept-based approaches to DB querying tend to support thedirect use of concepts and constructs on conceptualschemas which are either the same or similar to thosein users mental model. Therefore, concept-based DBquerying naturally tends to t the skills and ability oftypical end-users. This has led to research into concept-based DBQLs.</p><p>Research on the use of conceptual information in DBqueries has, however, mainly focused on semantic queryoptimization, i.e., on the use of semantic information toreformulate a query more eciently into a dierent butsemantically equivalent form that yields correct answers</p><p>The Journal of Systems and Software 63 (2002) 4567</p><p></p><p>* Current address: The George Washington University, Manage-</p><p>ment Science Department (Information Systems), Monroe Hall, 2115</p><p>G Street, N.W., Washington, DC 20052, USA. Tel.: +1-202-994-4364.</p><p>E-mail addresses: (V. Owei), sham@cc.gatech.</p><p>edu (S.B. Navathe), (H.-S. Rhee).1 Tel.: +404-894-0537.2 Tel.: +972-883-4459.</p><p>0164-1212/01/$ - see front matter 2001 Published by Elsevier Science Inc.PII: S0164-1212 (01 )00139-X</p></li><li><p>(Pittges, 1995a,b). A good discussion on research eortsin semantic query optimization can be found in Pittges(1995a) and Pittges et al. (1995). Studies on the impor-tant aspects of conceptual query formulation are cur-rently not as extensive. A few recent works in this areacan be found in Bloesch and Halpin (1996, 1997), Cat-arci and Santucci (1988), Chan (1989), Chan et al.(1993), Chang and Sciore (1992), Halpin and Proper(1995a), Owei (1994), Owei and Higa (1994), Owei et al.(1997a) and Siau et al. (1995). In one respect, the ex-isting concept-based query languages are similar, i.e.,they require the user to specify the entire query pathin formulating a query. In this study, we present theConceptual Query Language (CQL), which does notrequire entire query paths to be specied but only theirterminal points. CQL was developed with the aim ofproviding an easy-to-use DB query interface for typicalend-users. CQL makes minimal demands on end-userscognitive knowledge of DB technology. To the best ofour knowledge, there are no other concept-based querylanguages that employ abbreviated querying as in CQL.</p><p>Because SQL has come to be taken as the de-factostandard for relational languages, query languages thatrun on an underlying relational database managementsystem (DBMS) tend to be benchmarked against SQL.Experimental studies have therefore been conducted toattempt to establish the superiority of existing concept-based query languages over SQL (Bell and Rowe, 1992;Chan et al., 1998, 1993; Siau et al., 1995, for example).In this paper, a relative evaluation of CQL against SQLis reported. We conducted a statistical experiment toprobe end-users reaction to using CQL, vis-aa-vis SQL,as a database query language. The comparison focusedon the eect of the two dierent database query lan-guage interfaces on user performance (as measured byquery formulation time, query correctness, and usersperception) in a query writing task with varying di-culty levels. Statistically signicant dierences betweenthe two query languages were found.</p><p>The results indicate that end-users perform betterwith CQL and have a better perception of it than ofSQL. There were signicantly more accurate formula-tions with CQL than with SQL. Also, the groups withCQL took signicantly less time than the groups withSQL. The CQL subjects perceived their query languageto be easier to use than their SQL counterparts feltabout SQL; they also felt more satised with CQL thanthe SQL subjects were with SQL. These dierences weremore pronounced when query-diculty level was con-sidered. The statistical signicance of the dierencesincreased with the complexity of the query. The scoresindicate that users are more likely to perform better withCQL than with SQL, and that they are more likely toharbor a more favorable perception of it than of SQL.The Cronbach alpha values for the user perceptionfactors ranged from 0.80 to 0.93, well above the ac-</p><p>ceptable level of 0.70, considered to be adequate forbehavioral research.</p><p>1.1. Focus and contribution</p><p>In developing CQL, the goal is conceptual queryformulation, particularly the use of semantic informa-tion on data models to make query formulation intuitivefor end-users. The CQL approach allows for the con-ceptual abstraction of DB queries and exploits the richsemantics of DB schemas. The design of CQL has led tothe following contributions:</p><p> use of relationship semantics of data models to alle-viate or free the user from dealing with syntacticcomplexity of query formulation in current query lan-guages;</p><p> use of the roles played by entities in relationships indeveloping semantic graphs of conceptual queries;</p><p> use of the roles played by entities in relationships indeveloping pseudo-natural language explanations ofqueries;</p><p> use of system-constructed semantic graphs to aid theautomatic generation of SQL.</p><p>The evaluation of CQL shows that end-users are likelyto perform better with abbreviated concept-based querylanguages and to have a better perception of them thanof SQL. The result of the experiment is pertinent to thecurrent practice in database development and use. Thecurrent approach prescribes three steps: conceptualschema design, logical schema design, and logical query.The results here suggest that a transition to a two-stepapproach should be adopted: conceptual schema designand abbreviated conceptual query. This is consistentwith the recommendation in Chan et al. (1998). We,however, note that since there are no other existingempirical studies on abbreviated concept-based querylanguages, the experimental study here should be takenonly as seminal, an initial exploratory one.</p><p>In this study, we discuss the design of CQL, thestrategies for CQL query processing, and the compara-tive study between CQL and SQL.</p><p>The rest of the paper is organized as follows: InSection 2, we perform a literature review of concept-based query languages and empirical studies involvingconcept-based query languages. We discuss the necessityfor concept-based query languages in Section 3. Section4 presents CQL. The syntax of CQL and query formu-lation in CQL are examined. The strategy for dealingwith iterative concept-based queries in CQL is discussedin Section 5. The comparative study on CQL and SQL isreported in Section 6. In Section 7, the results of theexperiment are reported. We devote Section 8 to a dis-cussion on the results of the experiment. The paperconcludes in Section 9.</p><p>46 V. Owei et al. / The Journal of Systems and Software 63 (2002) 4567</p></li><li><p>2. Related works</p><p>The main goal in developing concept-based querylanguages is to provide end-users with high-level, easy touse, and user-friendly interfaces for data manipulation.As far as we are aware, the universal relation (UR) in-terface (Maier et al., 1986; Maier and Ullman, 1983) wasone of the earliest eorts in that direction. An examin-ation of existing database query languages seems to in-dicate a continuing trend in this direction. This trendhas mainly resulted in the emergence of conceptualDBQLs that employ concepts on semantic schemas toaid query formulation. We examine only a sample set ofexisting concept-based query languages in this section.We also discuss a set of comparative experimentalstudies involving concept-based query languages.</p><p>2.1. Enabling data manipulation through semantic pathson conceptual schemas</p><p>Chang and Sciore (1992) propose the Universal Re-lation with Semantic Abstraction (URSA) model, whichis an extension of the UR interface. Instead of de-manding a universally unique role for each attribute, theURSAmodel requires this uniqueness of role only withina limited set, called closure, of entities. Querying inURSA is based on the UR query paradigm. Its refer-encing scheme therefore forces a QUEL-type and anSQL-type syntax. This may render it not suitable for thegenerality of end-users. Peckham et al. (1996) propose aDB design paradigm that abstracts the relationship se-mantics of application conceptual data models and usesthis as a predictor of query and update paths.</p><p>Peckham et al. show that association roles in se-mantic schemas dene connection paths between ob-jects, and these connections can be used to enable datamanipulation. The URSA study shows that the seman-tics of the association among schema entities can beused to ensure the semantic correctness of queries. CQLextends these ideas by showing that the connectionpaths have meanings that are derived from the semanticmeaning of the association roles, and that the path-meanings can be used to determine and select the correctpaths of abbreviated conceptual queries.</p><p>The Intuitive System 3 denes a very intuitive archi-tecture for information retrieval. The system is aimedat end-user interaction with heterogeneous DBs, but isgeneric enough for non-heterogeneous and single DBscenarios. The point-and-click mode of request formu-lation in Intuitive presents an ER schema to the user,who can then specify a query by selecting the subschemadening the desired query path. Interesting similarities</p><p>and dierences between intuitive and CQL exist here:With Intuitive, to formulate the query on persons whoappear in an interview, the user highlights the enti-ties Person and Interview and the relationshipappears_in linking the two entities. In CQL this isspecied as Person appears_in Interview. For morecomplex queries involving longer paths, the Intuitiveuser still highlights the entire path on the schema, theCQL user does not. Intuitive supports multimedia data,text retrieval from documents, and exploratory search ofhypertexts. Intuitive is a much more comprehensivesystem than CQL, which is currently narrowly focusedon the manipulation of data in a single DB via a se-mantic data model.</p><p>ConQuer-II (Bloesch and Halpin, 1997) is a com-mercial concept-based query language based on theObject-Role Modeling (ORM) paradigm (Halpin, 1995,1996; Halpin and Orlowska, 1992; Halpin and Proper,1995a,b). ORM models applications in terms of the se-mantic roles played by objects and entities in relation-ships. ConQuer-II allows queries to be formulated viapaths through the conceptual schema. The query pathsare constructed from the semantic roles of objects andentities. Data manipulation in the system proposed inMannino and Shapiro (1990), i.e., the Graph Model,involves nding a path from a set of starting nodesthrough possible intermediate nodes and edges to a setof terminating nodes. In Graph Model, entities are con-ceptually conceived of as graph nodes and link semantictypes as graph edges. Query formulation involves graph-ically selecting a set of source and target nodes, thendrawing a set of edges between the selected sets of nodes,and nally specifying for each node a set of data retrievalcriteria. Users select each node and edge on the graph-path between the source and the target. Graphs aremanually manipulated until the desired query is obtained.Although CQL adopts these basic ideas, it, however,extends them by requiring users to specify only the end-point, i.e., starting and terminating, entities and rela-tionship roles. The CQL system automatically deduces thecorrect intermediate nodes to use on a given query path.</p><p>Vizla (Berztiss, 1993) is a visual query languageinterface for the information control prototyping lan-guage SF (Berztiss, 1986). In Vizla, a database is ab-stracted as a collection of sets (entities) and functionsthat map from this collection of sets to auxiliary sets(attributes). Queries are formulated in Vizla by point-ing to representations of functions, their domains andcodomains, or subsets of the domains and codomains,and to various operators in a conceptual model of adatabase. The items selected in this way are displayedand assembled graphically in a workspace, or window,similar to the query formulation workspace in CQL.</p><p>The workspace concept is used in Vizla to reducethe cognitive burden query formulation imposes onend-users. It achieves this by allowing users t...</p></li></ul>