Using Knowledge Anchors to Facilitate User Exploration of ... · Using Knowledge Anchors to Facilitate User Exploration of Data Graphs Editor(s): Name Surname, University, Country

Using Knowledge Anchors to Facilitate UserExploration of Data GraphsEditor(s): Name Surname, University, CountrySolicited review(s): Name Surname, University, CountryOpen review(s): Name Surname, University, Country

Marwan Al-Tawila,*, Vania Dimitrovaa, Dhavalkumar Thakkerb

a School of Computing, University of Leeds, Leeds, UKb School of Electrical Engineering and Computer Science, University of Bradford, Bradford, UK

Abstract. This paper investigates how to support a user’s exploration through a data graph in a way leading to expanding theuser’s domain knowledge. To be effective, approaches to facilitate exploration of data graphs should take into account the utili-ty from a user’s point of view. Our work focuses on knowledge utility – how useful exploration paths through a data graph arefor expanding the user’s knowledge. We propose a new exploration support mechanism underpinned by the subsumption theo-ry for meaningful learning, which postulates that new knowledge is grasped by starting from familiar entities in the data graphwhich serve as knowledge anchors from where links to new knowledge are made. A core algorithmic component for adoptingthe subsumption theory for generating exploration paths is the automatic identification of knowledge anchors in a data graph(KADG). Several metrics for identifying KADG and the corresponding algorithms for implementation have been developed andevaluated against human cognitive structures. A subsumption algorithm which utilises KADG for generating exploration pathsfor knowledge expansion is presented and applied in the context of a data browser in a music domain. The resultant explorationpaths are evaluated in a controlled user study to examine whether they increase the users’ knowledge as compared to free ex-ploration. The findings show that exploration paths using knowledge anchors and subsumption lead to significantly higherincrease in the users’ conceptual knowledge. The approach can be adopted in applications providing data graph exploration tofacilitate learning and sensemaking of layman users who are not fully familiar with the domain presented in the data graph.

Keywords: Data graphs, knowledge utility, data exploration, meaningful learning, knowledge anchors, exploration paths.

*Corresponding author (E-mail: [email protected]) is a PhD candidate supervised at and by University of Leeds.

1. Introduction

In the recent years, linked data (in the form ofRDF graphs) has emerged as the de facto standard forsharing data on the Web. Consequently, data graphshave become widely available on the Web and arebeing adopted in a range of user facing applicationsthat provide search and exploration tasks. In contrastto regular search where the user has a specific needin mind and an idea of the expected search result [1],exploratory search is open-ended requiring signifi-cant amount of exploration [2], has an unclear infor-mation need [3], and is used to conduct learning andinvestigative tasks [4]. For example, when peopleexplore resources in a new domain (like in academic

research tasks) or browse through large informationspaces with many options (like exploring job oppor-tunities, travel and accommodation offers, videos,music). In many cases, the users will have no (orlimited) familiarity with the specific domain. Whenthe users are novices to a domain, the users’ cogni-tive structures about that domain are unlikely tomatch the complex knowledge structures of datagraphs that represent the domain. This can have anegative impact on the user exploration experienceand effectiveness. Users can find themselves in situa-tions where they are not able to formulate knowledgeretrieval queries (users do not know what they do notknow [3]). Users can face an overwhelming amountof exploration options, not being able to identify

which exploration paths are most useful; this can leadto confusion, high cognitive load, frustration and afeeling of being lost.

To overcome these challenges, appropriate ways tofacilitate user exploration through data graphs arerequired. Research on exploration of data graphs hascome a long way from initial works on presentinglinked data in visual or textual forms [5,6]. Recentstudies on data graph exploration have brought to-gether research from diverse, but related, areas suchas Semantic Web, personalisation, adaptive hyper-media, and human-computer interaction; with the aimof reducing user cognitive load and providing supportfor knowledge exploration and discovery [7–9]. Sev-eral attempts addressed supporting layman users, i.e.users who are novices to the domain. Examples in-clude: personalising the exploration path tailored tothe user’s interests [10], presenting RDF patterns togive an overview of the domain [11], or providinggraph visualisations to support navigation [12].

Although a significant amount of work addressesthe problem of facilitating user exploration throughdata graphs, this has been applied mainly in investi-gative tasks. There is limited research on supportinglearning through data graph exploration. The learningperspective has been studied with regard to providinggeneric tools for exploration of interlinked open edu-cational resources [13]. We address an importantoutstanding challenge by focusing on the learningeffect of user exploration, i.e. expanding the user’sdomain knowledge while he/she is exploring a datagraph in an unfamiliar or partially familiar domain.Supporting learning through search is an emergingresearch area in information retrieval [14,15]. It ar-gues that “searching for data on the Web should beconsidered an area in its own right for future researchin the context of search as a learning activity” [16].Motivated by this vision, we investigate how learningcan be supported through exploration of data graphs.

The work presented in this paper opens a new ave-nue in semantic data exploration, which looks at theknowledge utility, i.e. expanding one’s domainknowledge while exploring a data graph. This buildson earlier work showing that while exploring datagraphs in unfamiliar (or partially familiar) domains,users serendipitously learn new things that they wereunaware of [17–19]. However, not all explorationpaths can be beneficial for knowledge expansion, e.g.paths may not bring new knowledge to the user ormay bring too much unfamiliar things so that the userbecomes confused and frustrated [18].

The main goal of our research is to develop auto-matic ways to generate exploration paths that can

expand the user’s domain knowledge. A scoping userstudy which investigated several exploration strate-gies for knowledge expansion [20] pointed us to theadoption of Ausubel’s subsumption theory for mean-ingful learning [21] as the underpinning model forthe generation of exploration paths. This theory pos-tulates that human cognitive structures are hierarchi-cally organised with respect to levels of abstraction,generality, and inclusiveness of concepts; hence, fa-miliar and inclusive entities are used as knowledgeanchors to subsume new knowledge into the users’cognitive structures. Such entities are crucial for ex-panding one’s domain knowledge.

Adopting Ausubel’s meaningful learning theory,our work addressed the following research questions:

RQ1. How to develop automatic ways to identifyknowledge anchors in a data graph? This meansfinding graph entities which correspond to domainconcepts that can be familiar to layman users.

RQ2. How to use knowledge anchors to generateexploration paths to facilitate domain knowledgeexpansion? This means suggesting to the user graphentity sequences for exploration, which can result inlearning new things about the domain.

To address RQ1, we utilise Rosch’s notion ofbasic level objects (BLO) [22] in the context of datagraphs. We devise a formal framework that mapsRosch’s definitions of BLO and cue validity to datagraphs, and develop several metrics and the corre-sponding algorithms to identify knowledge anchorsin a data graph (KADG). These anchors represent con-cepts which are likely to be familiar to layman users,and hence can be used as knowledge bridges of fa-miliar entities from where links to new knowledgecan be made. To evaluate the KADG algorithms, wecompare the derived KADG against human cognitivestructures. Using an experimental approach thatadapts Cognitive Science methods to derive BLO in adomain, we identify human basic level objects in adata graph (BLODG) used to benchmark the KADG

algorithms. Based on the evaluation, we identify hy-bridisation heuristics to improve precision and recall.

To address RQ2, we develop an algorithm that isunderpinned by the subsumption theory for meaning-ful learning [21] to generate exploration paths forknowledge expansion. This involves two steps –identifying the most appropriate knowledge anchor tostart the path, and iterative introduction of newknowledge through subsumption to connected enti-ties. We have conducted a controlled task-driven userstudy with a semantic data browser in the Music do-

main to examine the effectiveness of the suggestedpaths to increase the users’ domain knowledge. Thestudy shows that the generated paths have signifi-cantly higher knowledge utility and provide betterexploration experience than free exploration paths.

The main contributions of this work are: formal description and implementation of metrics

and the corresponding algorithms for identifyingKADG;

formal description and implementation of an al-gorithm for identifying BLODG which correspondto human cognitive structures over a data graph;

analysis of the performance of KADG metrics, andsuggested hybridization heuristics, using BLODG

identified by humans applied to a data graph inthe Music domain;

formal description and implementation of an al-gorithm for generating exploration paths usingKADG by adopting the subsumption theory formeaningful learning; and

analysis of the knowledge utility and usability ofautomatically generated exploration paths basedon knowledge anchors and subsumption againstfree exploration of a data graph; using a semanticdata browser in the Music domain.

The paper is structured as follows. Section 2 posi-tions the work in the relevant literature and comparesto similar approaches. Section 3 presents the researchcontext for experimentation – a semantic data brows-er and describes its data graph. Section 4 providespreliminaries with key definitions, followed by aformal description of metrics for identifying KADG,and the corresponding algorithms for applying thesemetrics over a data graph (Section 5). Sections 6 de-scribes an experimental study to validate the KADG

algorithms against human BLODG. A subsumptionalgorithm used to generate exploration paths forknowledge expansion is described in Section 7, whileSection 8 presents an experimental study to evaluatethe knowledge utility of the generated paths againstfree exploration. Section 9 discusses the findings,followed by conclusions in Section 10.

2. Related Work

In the context of the research presented in this pa-per, we review relevant research on data graph explo-ration approaches to justify the main contributions ofthis work. We also compare to existing approachesfor identifying key entities and generating paths in

data graphs. Since our work involves several evalua-tion steps, we review relevant evaluation approaches.

2.1. Exploration through Data Graphs

Semantic data exploration approaches are dividedinto two broad categories: (i) visualisation approach-es for data graph exploration [23–26] and (ii) text-based semantic data browsers [27–30]. Visualisationprovides an important tool for exploration that lever-ages the human perception and analytical abilities tooffer exploration trajectories. These approaches, inaddition to intuitiveness, focus on the need for man-aging the dimensions in semantic data represented asproperties, similarity and relatedness of concepts.The text-based browser approaches operate on se-mantically augmented data (e.g. tagged content) withlayout browsing trajectories using relationships in theunderpinning ontologies. These are adopting tech-niques and knowledge from learning, human-computer interaction and personalisation to enhancethe data exploration experience of users.

2.1.1. Visualisation Approaches for Data GraphExploration

The state of the art in approaches that harness vis-ualisation as a tool for exploratory discovery andanalysis of linked data graphs is presented in [23].They use Sheiderman’s seminal work on visual in-formation seeking (overview first, zoom and filter,then details-on-demand) [31] to evaluate the usabilityand utility of these approaches and focus on: (i) howwell these approaches generate summary of data; (ii)how well they focus on finding relevant and im-portant data; and (iii) what visualisation techniquesare used. In [32] the authors utilise cartographic met-aphor and visual information seeking principles tooffer overview of data based on instance types, andthen automatically generating SPARQL queriesbased on search and interaction with a map. The fo-cus of their work is the entry point of the vast datagraph the users have to explore. There are numerousapproaches to support visual SPARQL query con-struction and many of these works [33–35] have sim-ilar target audience, i.e. a layman user who is unfa-miliar with the domain of the data graph. Theseworks use visualisation techniques to help laymanusers browse through large data graphs. For examplethe work in [33] introduced a graphical interface(called NITELIGHT) for semantic query constructionwhich is based on the specification of SPARQL que-ry language. The interface allows users to create

SPARQL queries using a set of graphical notationsand editing action. The state of the art approaches tosupport visual SPARQL query construction is pre-sented in [36]. The focus of their work is in support-ing layman users to perform exploratory querying ofRDF graphs in space, time, and theme with interac-tive visual query construction methods. The authorspresent a number of design principles to counter thechallenges and evaluate them in a usability study onfinding maps in a historical map repository [36]. Alt-hough these approaches hide complexity of graphterminologies, their primarily focus on helping lay-man users to generate SPARQL queries instead offocusing on the properties of data graphs to guideuser exploration. [37] is such a work that manipulatethe data graph properties to guide exploration with asystem called Aemoo that helps users to focus on themost important bit of information about an entity firstand then explore other related information. Aemooutilises encyclopaedic knowledge patterns as rele-vance criteria for selecting, organising, and visualis-ing knowledge. They are discovered by mining thelinking structure of Wikipedia to build entity-centricsummaries that can be exploited to help the users inexploratory search tasks. However, this approach isfeasible in multi-knowledge domains that are built byhumans (e.g. Wikipedia) and may not be feasible inspecific domains with complex structures. Further-more, the system considers one level below the root,and does not cover different entities at different levelof abstractions.

These visualization efforts are geared towardshelping layman users explore the complex graphstructures by hiding the complexity of semantic ter-minology from the users. However, the effectivenessof the visualizations depends on the user’s ability tomake sense of the graphical representation which inmany cases can be rather complex. Users who arenew to the domain may struggle to grasp the com-plexity of the knowledge presented in the visualisa-tion. Our approach to automatically identify entitiesthat are close to the human cognitive structures canbe used as complementary to visualisation approach-es to simplify the data graph by pointing at entitiesthat layman users can be familiar with. The primefocus of our approach is on augmenting text-baseddata browsers by offering exploration paths.

2.1.2. Text-based Semantic Data BrowsersTwo types of linked data browsers had emerged

since the early days– (i) pivoting (or set-orientedbrowsing) browsers and (ii) multi-pivoting browsers.

In a pivoting browser, a many-to-many graph brows-ing technique is used to help a user navigate from aset of instances in the graph through common links[27]. Exploration is often restricted to a single startpoint in the data and uses ‘a resource at a time’ tonavigate anywhere in a dataset [28]. This form ofbrowsing is also referred as uni-focal browsing. Asecond type of browsers supports multi-pivoting thatallows a user to start from multiple points of interest.For example, PolyZoom enables multi-focus explora-tion of maps for a user to zoom various parts of themap at the same time [35]. Different multi-pivotinginterfaces have also been proposed, including Paral-lax [29], VisiNav [38], PepeSearch [30].

A noteworthy variation of the pivoting approach isthe use of facets for text-based data browsing oflinked datasets. Faceted browsing is the main ap-proach for exploratory search in many applications.The approach employs classification and propertiesfeatures from linked datasets as a mean to offer facetsand context of exploration. Facet Graphs [39], gFacet[40], and tFacet [41], are early efforts in this area.More recent attempts include Rhizomer [42] thatcombines navigation menus and map to provide flex-ible exploration between different classes, Facete[43] a visualization-based exploration tool that offersfaceted filtering functionalities, Hippalus [44] allowsusers to rank the facets according to preferences de-fined directly by the user, Voyager [45] is a mixed-initiative system that couples faceted browsing withvisualization recommendation to support users explo-ration, and SynopsViz [46] a tool that offers multi-level visual exploration and analysis over RDF andprovides faceted browsing and filtering over classesand properties. Although these approaches providesupport for user exploration, layman users who areperforming exploratory search tasks to learn or inves-tigate new topic, can be cognitively overloaded espe-cially when facets provide lengthy options (i.e. mul-tiple links) for users to explore. The authors in [11]proposed Sview, a browser that utilises a link pattern-based mechanism for entity-centric exploration overLinked Data. Link patterns describe explicit and im-plicit relationships between entities and are used tocategorise linked entities. A link pattern hierarchy isconstructed using Formal Concept Analysis (FCA),and three measures are used to select the top-k pat-terns from the hierarchy. However, the approachlacked considering the user preference in identifyingthe link patterns in supporting exploratory searchtasks such as learning, which would be challengingespecially when the domain is not familiar to the user.

These initial approaches have become more so-phisticated, personalising the exploration in order tonot only hide the complexity of semantics, but also totake into consideration the user’s profile and interests.An approach that explicitly targets personalisation insemantic data exploration using users’ interests ispresented in [47]. Linked dataset concepts are dy-namically categorised into upper mapping and bind-ing exchange layer concepts using a fuzzy retrievalmodel. Results with the same concepts are groupedtogether to form categories, later used during con-cept-based browsing to align the exploration space tothe users’ interests. Recent approaches aimed to im-prove search efficiency over Linked Data graphs byconsidering user interests [48], or to diversify theuser exploration paths with recommendations basedon the browsing history [49]. A method for personal-ised access to Linked Data has been suggested in [50]based on collaborative filtering that estimates thesimilarity between users, and then produces resourcerecommendations from users with similar tastes.Similarity between users is calculated by taking intoaccount the commonalities, the informativeness andthe connectiveness of the shared resources betweenthe users. More recently, a graph-based recommenda-tion methodology based on a personalised PageRankalgorithm has been proposed in [51]. This adopts apersonalisation vector that assignes different weightsto different nodes to get a bias towards nodes closerto user preferences. The personalisation approach in[52] allows the user to rate semantic associationsrepresented as chains of relations that may revealinteresting and unknown connections between differ-ent types of entities for personalised.

The above approaches stress the importance of tai-loring the exploration to the users. None of them in-vestigates the user’s familiarity with the domain,which is the main focus of the approach we presenthere where familiarity is related to understanding andknowledge expansion. Moreover, conventional per-sonalization approaches suffer from the ‘cold start’problem – for a reliable user model to be obtained,the user should have to spend time interacting withthe system to provide sufficient information abouttheir interests. Instead, we exploit the structure of theknowledge graph to identify entities that are likely tobe familiar to users, which overcomes the cold startproblem. Strictly considered our approach is not per-sonalisation, because we do not dynamically adapt tothe user’s knowledge as it expands while the userbrowses through the graph. However, knowledgeanchors can be seen as a way to approximate whatentities in the domain may be familiar to the user

which is used in the algorithms we propose for gen-erating exploration paths.

2.2. Identifying Key Entities in Data Graphs

The most recent statistics 1 show that there are3360 interlinked, heterogeneous datasets containingapproximately 3.9 billion facts. The volume and het-erogeneity of such datasets makes their processingfor the purpose of exploration a daunting challenge.Utilisation of various computational models makes itpossible to handle this challenge in order to offerfruitful exploration of semantic data. Finding keyentities in a data graph is an important aspect of suchcomputational models and is generally implementedusing ontology summarization [53] and formal con-cept analysis [54] techniques.

Ontology summarisation has been seen as an im-portant technology to help ontology engineers tomake sense of an ontology, in order to understand,reuse and build new ontologies [24,55,56]. The pro-cess of summarising an ontology involves identifyingthe key concepts in an ontology, and then hid-ing/missing concepts that are not key concepts [57].A good summary should be concise, yet it needs toconvey enough information to enable a decent under-standing of the original ontology, yet provide an ex-tensive coverage of the entire ontology [58]. Cen-trality measures have been used in [53] to identifykey concepts and produce RDF summaries. The no-tion of relevance based on the relative cardinality andthe in/out degree centrality of a node has been usedin [59] to produce graph summaries. The state of theart ontology summarization approach is presented in[58]. The approach exploits the structure and the se-mantic relationships of a data graph to identify themost important entities using the notion of relevance,and then select edges connecting the entities by max-imising either locally or globally the importance ofthe selected edges. The notion of relevance is basedon the relative cardinality (i.e. judging the im-portance of an entity from the instances it contains)and the in/out degree centrality (i.e. the number andtype of the incoming and outgoing edges) of an entity.

The closest ontology summarisation approaches tothe context of our work relates to extracting key con-cepts as the best representatives of ontology [60,61].It highlights the value of cognitive natural categoriesfor identifying key concepts in an ontology to aidontology engineers to better understand the ontologyand quickly judge the suitability of an ontology in a

1 http://stats.lod2.eu/

knowledge engineering project. The authors applies aname simplicity approach, which is inspired by thecognitive science notion of basic level objects [22] asa way to filter entities with lengthy labels for the on-tology summary. The work in [61] has utilised basiclevel concepts to extract ontologies from collabora-tive tags. The work proposes an algorithm for con-structing an ontology using basic level concepts. Ametric based on the category utility is proposed toidentify basic concepts from collaborative tags,where tags of a concept are inherited by its sub-concepts and a concept has all instances of its de-scendants. However, these approaches lack applyingthe formal definitions of basic level objects and cuevalidity described in [22,62] in the context of a datagraph. Our work has operationalised these definitionsby developing several algorithms with the corre-sponding algorithms for identifying knowledge an-chors in a data graph.

Formal Concept Analysis (FCA) is a method foranalysis of object-attribute data tables [54], wheredata is represented as a table describing objects (i.e.taxonomical concepts), attributes and their relation-ships. FCA has been applied in different applicationareas [63,64] such as Web mining and ontology en-gineering. In Web mining, FCA based approacheshave been used to improve the quality of search re-sults presented to the end users. For example, thework in [65] developed a personalised domain-specific search system that uses logs of keywords andWeb pages previously entered visited by other per-sons to build a concept lattice. More recently, FCAhas been applied to construct a link pattern hierarchyto organise semantic links between entities in a datagraph [11]. In ontology engineering, FCA has beenused in two topics: ontology construction and ontolo-gy refinement. The work in [66] used FCA to con-struct ad hoc ontologies to help the user to better un-derstand the research domain. In [67] the authorspresented OntoComp, an approach for supportingontology engineers to check whether an OWL ontol-ogy covers all relevant concepts in a domain, andsupports the engineers to refine (extend) the ontologywith missing concepts. The psychological approachesto basic level concepts have been formally definedfor selecting important formal concepts in a conceptlattice by considering the cohesion of a formal con-cept [68]. This measures the pair-wise similarity be-tween the concept’s objects based on common attrib-utes. More recently, the work in [69] has reviewedand formalised the main existing psychological ap-proaches to basic level concepts. Five approaches tobasic level objects have been formalised with FCA

[69]. The approaches utilised the validity of formalconcepts to produce informative concepts capable ofreducing the user’s overload from a large number ofconcepts supplied to the user.

Existing studies in ontology summarisation andFCA utilise BLO to identify key concepts in an on-tology in order to help experts to examine or reengi-neer the ontology. They have been evaluated withdomain experts, and are applicable in tasks where theusers have a good understanding of the domain. Incontrast, we apply the notion of BLO in a data graphto identify concepts which are likely to be familiar tousers who are not domain experts. Focusing on lay-man users, we provide unique contribution thatadopts Rosch’s seminal cognitive science work [22]to devise algorithms that identify (i) KADG that repre-sent familiar graph entities, and (ii) BLODG whichcorrespond to human cognitive structures over a datagraph. Crucially, these algorithms are validated withlayman users who are not domain experts.

2.3. Generating Paths in Data Graphs

In data graphs, the notion of path queries uses reg-ular expressions queries to indicate start and end enti-ties of paths in data graphs [70]. For example, in ageographical graph database representing neighbor-hoods (i.e. places) as entities and transport facilities(e.g. Bus, Tram) as edges, the user writes a simplequery such as “I need to go from Place a to Place b”,and the user is then provided with different transpor-tations facilities going through different routes(paths) starting from Place a to reach the destinationPlace b [70]. Another used notion is property paths.A property path defines possible routes between anytwo given entities in a data graph. A trivial case of aproperty path is simple triple pattern of length 1.Property paths are used to capture associations be-tween entities in data graphs where an associationfrom entity a to entity b comprises entity labels andedges [71]. However, in big data graphs there areusually high numbers of associations (i.e. possibleproperty paths) between the entities and ways to re-fine and filter the possible paths, are required. Totackle this challenge, the work in [7] presented Ex-plass2 for recommending patterns (i.e. paths) betweenentities in a data graph. A pattern represents a se-quence of classes and relationships (edges). Explassuses frequency of a pattern to reflect its relevance tothe query. It also uses informativeness of classes andrelationships in the pattern to indicate its informa-

2 http://ws.nju.edu.cn/explass/

tiveness by adding the informativeness of all classesand relationships. A class having fewer instances ismore specific and thus more informative. The idea issimilar to relationships (i.e. an edge) except that in-formativeness is considered to the start (Subject) enti-ty and target (Object) entities of a relationship.Relfinder3 [72] is another approach for helping usersto get an overview of how two entities are associatedtogether by showing all possible paths between twoentities in an RDF graph. This approach may not besuitable for layman users, who may become confuseor overloaded with too much unfamiliar entities.

While several approaches address the problem ofsupporting users’ exploration through data graphs,none of them aims at providing layman users withexploration paths to help such users to expand theirdomain knowledge. Furthermore, earlier approachesgenerate paths that link graph entities, and are suita-ble for tasks where users explore associations be-tween known entities. Whereas, we provide paths foruni-focal exploration where the user starts from asingle entry point and explores the data graph. Theunique feature of our work is the explicit considera-tion of knowledge utility of exploration paths. We arefinding entities that are likely to be familiar to theuser and using them as knowledge bridges to gradu-ally introduce unfamiliar entities and facilitate learn-ing. This can enhance the usability of semantic dataexploration systems, especially when the users arenot domain experts. Therefore, our work can facili-tate further adoption of linked data exploration in thelearning domain. It can also be useful in other appli-cations to facilitate the exploration by users who arenot familiar with the domain presented in the graph.

2.4. Data Exploration Evaluation Approaches

Evaluation of data exploration applications usuallyconsiders the exploration utility from a user’s pointof view or analyses the application’s usability andperformance (e.g. precision, recall, speed etc.) [23].The prime focus is assessing the usability of semanticWeb applications, while assessing how well the ap-plications help the users with their data explorationtasks is still a key challenge [73]. Task driven userstudies have been utilised to assess whether a dataexploration application provides useful recommenda-tions for accomplishing users exploration tasks [37].A task driven benchmark for evaluating semanticdata exploration has been presented in [73]. Thebenchmark presents a set of information-seeking

3 http://relfinder.dbpedia.org

tasks and metrics for measuring the effectiveness ofcompleting the tasks. The evaluation approach in[74] aimed to identify whether the simulated explora-tion paths over information networks are similar tothose produced by human exploration.

In the context of ontology summarisation, there aretwo main approaches for evaluating a user-drivenontology summary [55]: gold standard evaluation,where the quality of the summary is expressed by itssimilarity to a manually built ontology by domainexperts, or corpus coverage evaluation, in which thequality of the ontology is represented by its appropri-ateness to cover the topic of a corpus. The evaluationapproach used in [60] included identifying a goldstandard by asking ontology engineers to select anumber of concepts they considered the most repre-sentative for summarising an ontology.

In this paper, we evaluate algorithms for identify-ing knowledge anchors in data graphs by comparingthe algorithms’ outputs versus a benchmarking set ofbasic level objects identified by humans. To the bestof our knowledge, there are no evaluation approachesthat consider key concepts in data graphs which cor-respond to cognitive structures of users who are notdomain experts. Our evaluation approach that identi-fies BLODG through an experimental method adapt-ing Cognitive Science methods is novel and can beapplied to a range of domains. Our second evaluationstudy which assess the knowledge utility and usabil-ity of the generated exploration paths adopts an es-tablished task-based approach and utilises an educa-tional taxonomy for assessing conceptual knowledge.

3. Application Context

As an application context for algorithm validationand user evaluation we use MusicPinta - a semanticdata browser in the music domain [18]. MusicPintaprovides a uni-focal interface for users to navigatethrough musical instrument information extractedfrom various linked datasets. The MusicPinta datasetincludes several sources, including DBpedia4 for mu-sical instruments and artists, extracted usingSPARQL CONSTRUCT queries. The DBTune5 da-taset is utilised for music-related structured data.Among the datasets on DBTune.org we utilise: (i)Jamendo which is a large repository of CreativeCommons licensed music; (ii) Megatune is an inde-pendent music label; and (iii) MusicBrainz is a com-

4 http://dbpedia.org/About.5 http://dbtune.org/.

munity-maintained open source encyclopaedia ofmusic information. The dataset coming fromDBTune.org (such as MusicBrainz, Jamendo andMegatunes) already contains the “sameAs” links be-tween them for linking same entities. We utilise the“sameAs” links provided by DBpedia to link Mu-sicBrainz and DBpedia datasets. In this way, DBpe-dia is linked to the rest of the datasets fromDBtune.org, enabling exploration via rich intercon-nected datasets. The MusicPinta dataset is madeavailable as an open source on sourceforge6. All da-tasets in MusicPinta are available as a linked RDFdata graph and the Music ontology7 is the ontologyused as the schema to interlink them.

The dataset has 2.4M entities and 38M triplestatements, taking 1.5GB physical space. Table 1shows the main characteristics of the MusicPintadataset.

Table 1. Main characteristics of MusicPinta data graph. Thedata graph includes five class hierarchies. Each class hierarchy has

number of classes linked vie the subsumption relationshiprdfs:subClassOf (e.g. there are 151 classes in the StringInstrument class hierarchy). DBpedia categories are linked toclasses in a hierarchy via the dcterms:subject relationship,and classes are linked via the domain-specific relationship Mu-sicOntology:instrument to musical performances. Thedepth of each class hierarchy is the maximum depth value for of

the entities in that class hierarchy.

Instrument classhierarchy

No. ofclasses

Depth No. of DBpe-dia categories

No. of musicperformances

String 151 7 255 348Wind 108 7 161 1539Percussion 82 5 182 127Electronic 16 1 7 11Other 7 1 0 2

The music ontology provides sufficient class hier-archy for experimentation. For instance, the classhierarchies for the String and Wind musical in-struments have depth of 7, which is considered idealfor applying the cognitive science notion of basiclevel objects [22] on data graphs, as this notion statesthat objects within a hierarchy are classified at leastthree different levels of abstraction (superordinate,basic, subordinate). The MusicPinta dataset providesan adequate setup since it is fairly large and diverse,yet of manageable size for experimentation.

Figures 1 and 2 show examples of the user inter-face in the MusicPinta semantic data browser.

6 http://sourceforge.net/p/pinta/code/38/tree/7 http://musicontology.com/.

Fig. 1. Description page of the entity 'Xylophone' in Mu-sicPinta, extracted from DBpedia using CONSTRUCT queries.

Fig. 2. Semantic Links (i.e. predicates) related to entity Xylo-phone presented in Features and Relevant Information. Featuresinclude the semantic relationships rdf:type (e.g. Xylophoneis an instrument), and semantic relationships rdfs:subclassOf (e.g. the subClass Xylophone belongs to the superClass

Tuned Percussion) and dcterms:subject relationshipthat links an entity to its DBpedia category (e.g. Xylophone

belongs to the category Greek loanwords). Relevant Infor-mation also include the semantic relationship rdfs:subclass

Of (e.g. Celesta is a subClassOf Xylophone)

In a scoping user study, we examined user explo-ration of musical instruments in MusicPinta to identi-fy what strategies would lead to paths with highknowledge utility (details of the study are given in[20]). We examined two dimensions - the user’s fa-miliarity with the domain and the density of entitiesin the data graph. Consequently, three explorationstrategies were suggested based on these two dimen-sions: (i) density-strategy – directing the users todensest entities; (ii) familiarity-strategy – asking theuser to always select entities that are familiar tohim/her; and (iii) unfamiliarity-strategy – asking theuser to always select entities that are unfamiliar tohim/her. Paths which included familiar and denseentities and brought unfamiliar entities led to increas-ing the users’ knowledge. For example, when a par-ticipant was directed to explore the entity Guitarwhich he/she was familiar with, the participant could

see unfamiliar entities linked to Guitar such asResonator Guitar and Dobro. The entityGuitar served as an anchor from where the usermade links to new concepts (Resonator Guitarand Dobro). We also noted that the dense entities,which had many subclasses and were well-connectedin the graph, provided good potential anchors thatcould serve as bridges to learn new concepts.

Subsumption Theory for Meaningful Learning.While the scoping study provided useful insights, itdid not give solid theoretical model for developing anapproach that generalises across domains and datagraphs. However, the study directed us to Ausubel’ssubsumption theory for meaningful learning [21] as asuitable theoretical underpinning for generating ex-ploration paths. This theory [21,75–77] has beenbased on the premise that a human cognitive struc-ture (i.e. individual’s organization, stability, and clar-ity of knowledge in a particular subject matter field)is the main factor that influences the learning andretention of new knowledge [77]. In relation to mean-ingful learning, the subsumption process postulatesthat a human cognitive structure is hierarchicallyorganised with respect to levels of abstraction, gener-ality, and inclusiveness of concepts. Highly inclusiveconcepts in the cognitive structure can be used asknowledge anchors to subsume and learn new, lessinclusive, sub-concepts through meaningful relation-ships [21,76,78,79]. Once the knowledge anchors areidentified, attention can be directed towards identify-ing the presentation and sequential arrangement ofthe new subsumed content [77]. Hence, to subsumenew knowledge, anchoring concepts are first intro-duced to the user, and then used to make links forintroducing new concepts.

In the following sections, we will utilise the mean-ingful learning theory to generate exploration pathsthrough data graphs based on knowledge anchors.We will split this into two stages: (i) identifyingknowledge anchors in data graphs, and (ii) using theknowledge anchors to subsume new knowledge.

4. Preliminaries

We provide here the main definitions that will beused in the formal description of the main algorithms.

Linked Data graphs are built using traditional Webstandards (e.g. Uniform Resource Identifiers (URIs)and Hypertext Transfer Protocol (HTTP) and use acommon data graph model, the Resource Description

Framework (RDF). RDF describes entities and at-tributes (edges) in the data graph, represented as RDFstatements. Each statement is a triple of the form<Subject - Predicate - Object> [80]. The Subjectand Predicate denote entities in the graph. An Objectis either a URI or a string. Each Predicate URI de-notes a directed attribute with Subject as a source andObject as a target.

Definition 1 [Data graph]. Formally, a data graphis a labeled directed graph TEVDG ,, , depict-

ing a set of RDF triples where:

- },...,,{ 21 nvvvV is a finite set of entities;

- },...,,{ 21 meeeE is a finite set of edge labels;

- },...,,{ 21 ktttT is a finite set of triples where

each triple is a proposition in the form of

oiu vev ,, with Vvv ou , , where uv is the

Subject (source entity) and ov is the Object (target

entity); and Eei is the Predicate (edge label).

In our analysis of data graphs, the set of entities Vwill mainly consist of the concepts of the ontologyand can also include individual objects (notable in-stances of certain concepts). The edge labels willcorrespond to semantic relationships between con-cepts and individual objects. These labels will alwaysinclude the subsumption relationship rdfs:subclassOf and may also include the rdf:type rela-tionship. For a given entity

iv , we will be interested

primarily in its direct and inferred subclasses, andinstances. The set of entities V can be divided furtherby using the rdfs:subclassOf subsumption re-lationship denoted as ) and following its transitivi-

ty inference. This includes:- Root entity ( r ) which is superclass for all entities in

the domain;- Category entities ( VC ) which is the set of all

inner entities (other than the root entity r ), thathave at least one subclass, and may also includesome individual objects (notable instances of cer-tain concepts);

- Leaf entities ( VL ) which is the set of entities

that have no subclasses, and may have one or moreindividuals.Starting from the root entity r , the class hierarchy

in a data graph is the set of all entities linked via thesubsumption relationship rdfs:subClassOf. Theset of entities in the class hierarchy include the rootentity r , Category entities C and Leaf entities L .

The set of edge labels E is divided further consid-ering two relationship categories:- Hierarchical relationships )(H is a set of subsump-

tion relationships between the Subject and Objectentities in the corresponding triples.

- Domain-specific relationships )(D represent rele-

vant links in the domain, other than hierarchicallinks, e.g. in a music domain, instruments used inthe same performance are related.

Definition 2 [Data Graph Trajectory]. A trajec-

tory J in a data graph TEVDG ,, is defined as a

sequence of entities and edge labels within the data

graph in the form of 1211 ,,...,,, nnn vevvevJ ,

where:- 1,...,1, niVvi

;

- njEe j ,...,1, ;

-1v and

1nv are the first and the last entities of the

data graph trajectory J , respectively;

- n is the length of the data graph trajectory J .

Definition 3 [Entity Depth]. The depth of an enti-

ty LCv is the length of the shortest data

graph trajectory from the entity v to the root entityr in the class hierarchy of the data graph.

Definition 4 [Exploration Path]. An explorationpath P in a data graph DG is a sequence of finite setof transition narratives generated in the form of:

1322211 ,,,...,,,,,, mmm vnvvnvvnvP , where:

- 1,...,1, miVvi;

- 1v and 1mv are the first and last entities of the

exploration path P , respectively;

- m is the length of the exploration path P ;

- mini ,...,1, is a text string that represents a nar-

rative script;

- 1,, iii vnv presents a transition from iv to

1iv , which is enabled by the narrative scriptin .

Note that an exploration path P is different from a

data graph trajectory J in that iv and 1iv in P

may not be directly linked via an edge label, i.e. the

transition from iv to 1iv in P can be either via

direct link, an edge, or through an implicit link, atrajectory.

Our ultimate goal is to provide an automatic way,starting from an entity Vv , referred as first entityof an exploration path, and a data graph

TEVDG ,, to generate an exploration path P in

DG. For this, we will first identify entities that canserve as knowledge anchors (Section 5) and will thenutilise the knowledge anchors and the subsumptionstrategy to generate an exploration path (Section 7).

5. Identifying Knowledge Anchors in Data Graphs

5.1. Basic Level Objects in Cognitive Science

The notion of Basic Level Objects (BLO) was in-troduced in Cognitive Science research, illustratingthat domains of concrete objects include familiarcategories that exist at an inclusive level of abstrac-tion in humans’ cognitive structures (called the basiclevel), more than categories at the superordinate lev-el (i.e. above the basic level) or categories at the sub-ordinate level (i.e. below the basic level) [22,62],where a human cognitive structure is hierarchicallyorganised with respect to levels of abstraction, gener-ality, and inclusiveness of concepts [77]. Rosch, etal .[22], define BLO as: categories that “carry themost information, possess the highest category cuevalidity, and are, thus, the most differentiated fromone another”. Crucial for identifying basic level cate-gories is calculating cue validity: “the validity of agiven cue x as a predictor of a given category y (theconditional probability of y/x) increases as the fre-quency with which cue x is associated with categoryy increases and decreases as the frequency withwhich cue x is associated with categories other than yincreases” [22].

Most people are likely to recognise and identifyobjects at the basic level. An example from the ex-perimental studies from Rosch et al. [22] of a BLO inthe musical instrument domain is Guitar. The BLOGuitar represents a familiar category that is neithertoo generic (e.g. musical instrument – theroot entity of the musical instrument taxonomy) nortoo specific (e.g. Folk Guitar – subclass of thecategory Guitar). According to Rosch et al. [22],Guitar is at a level within the musical instrumenttaxonomy where its members (e.g. Folk Guitar,Classical Guitar) share attributes that are dif-ferent from the attributes of members (e.g. Grand

Piano, Upright Piano) of another object at thesame level of abstraction (i.e. at the basic level) such

as Piano. This indicates that members of a BLOshare many features (attributes) together, and hencethey have high similarity values in terms of the fea-ture the BLO members share. Based on the abovediscussion, two approaches can be applied to identifyBLO in a domain taxonomy. Consequently, to identi-fy basic level categories in a domain taxonomy, wewill adopt two approaches:

Distinctiveness (highest cue validity). Identifiesmost differentiated category objects. A differentiatedcategory object has most (or all) of its cues (i.e. at-tributes) linked to the category members (i.e. sub-classes) only, and not linked to other category objectsin the taxonomy. Each entity that is linked through anattribute to members of the category will have a sin-gle validity value used as a predictor of the distinc-tiveness of the category among other category objectsin the taxonomy. The aggregation of all validity val-ues will indicate the distinctiveness of the categoryobject.

Homogeneity (highest commonality betweencategory members). Identifies category objectswhose members have high similarity values. Thehigher the similarity between category members, themore likely it is that the category object is at thebasic level of abstraction. This is complementarywith the distinctiveness feature. A category objectwith high cue validity will usually have high numberof entities common to its members.

5.2. Algorithms for identifying Knowledge Anchors inData Graphs

Our work in [81] has formally adopted the Cogni-tive science notion of BLO [22], to describe twogroups of metrics with the corresponding algorithmsfor implementation. The set of all knowledge anchorsin a data graph DG is denoted as KADG. The defini-tions of the KADG metrics followed Formal ConceptAnalysis (FCA) approaches in [69] and adapt them inthe context of identifying KADG .

Formal Concept Analysis. Formal Concept

Analysis (FCA) was presented by Wille in 1982 asa method for data analysis and knowledge representa-tion [82]. It analyses object-attribute data tables,where data is represented as a table describing ob-jects, attributes and binary relationships between theobjects and the properties [54]. The two main notionsin FCA relevant to the metrics for identifying KADG

presented in this Section are: formal context and for-mal concept. A formal context X is represented by a

triple IRM ,, , where M is a set of objects, R is a

set of attributes and I is a binary relation RMI .

For an object Mm and formal attribute Rr , isread as: the object m has the attribute r . Table 2presents an example of a formal context.

Table 2. An example of a formal context in FCA represented assets of objects M and attributes R , where (x) indicates that an

object m has attribute r .

set

of

Ob

ject

sM

set of Attributes R

1r 2r 3r 4r 5r

1m x x x

2m x x x x x

3m x

For a given formal context X , let RA and MB .

The pair BA, is called a formal concept, where

BA ( B is the set of objects having all the attrib-utes belonging to A ), and AB ( A is the set ofattributes applying to all objects belonging to B ). Anexample from the formal context X in Table 2 for a

formal concept is },,{},,{, 52121 rrrmmBA -

the concept objects21,mm are conceptually clustered

based on the three shared attributes521 ,, rrr .

In our work, we adapt the FCA notions of formalcontext and formal concept to data graph DG and

category entity Cv , respectively. Objects M and

attributes R of a formal context IRM ,, comprise

entities V and edge labels E in a data graph DG ,

respectively. The objects MB of a formal concept

BA, comprise members vvv : of a category

entity Cv , and the attributes A in the formal con-

cept represent attribute entities ev linked to members

vvv : of the category entity Cv via an edge

label Ee in a data graph DG .

5.2.1. Distinctiveness MetricsThis group of metrics aims to identify the most

differentiated category entities whose members arelinked to distinctive entities that are shared amongstthe members of the category entities but are notshared to members of other categories. Each entity

Vv that is linked through an edge label e to

members vv of the category entity Cv will

have a single validity value used to distinctive the

category entity Cv among other category entitiesin the data graph). Three distinctiveness metrics weredeveloped:

Attribute Validity (AV). The attribute validitydefinition corresponds to the cue validity definitionin [22] and adapts the formula from [69]. We use‘attribute validity’ to indicate the association withdata graphs - ‘cues’ in data graphs are attributes ofthe entities and are represented as relationships interms of triples. The attribute validity value of anentity Cv is calculated with regard to a relation-ship type e, as the aggregation of the attribute validi-

ty values for all entities v′e linked to subclasses

v : vv . The attribute validity value of v′e increases,

as the number of relationships of type e between v′eand the subclasses v : vv increases; whereas the

attribute validity value of v′e decreases as the number

of relationships of type e between v′e and all entitiesin the data graph increases.

We define the set of vertices W(v,e) related as Sub-jects to the subclasses v : vv , via relationship of

type e :

},,:{),( TvevvvvvevW ee (1)

The following formula defines the attribute validi-ty metric for a given entity v with regard to a rela-tionship type e.

),( |}:,,{|

|}:,,{|),(

evWv aae

e

eVvvev

vvvevevAV (2)

In Figure 3, the AV value for category entity

2v with regard the domain-specific relationship D is

the aggregation of the AV values of the (Subject enti-ties

6543 ,,, eeee ) linked to members of the category

entity2v (Object entities

24232221 ,,, vvvv ) via the

edge label (i.e. Predicate) D . The AV value for theentity

3e equals the number of the triples between the

Subject entity3e and members of the category

2v

(Object entities21v ,

22v ) via the relationship D (2

triples), divided by the number of triples between the

Subject entity3e and all Object entities in the graph

(Object entities12v ,

21v ,22v ) via the relationship D (3

triples). Hence the AV value for3e equals 2/3 = 0.66.

The aggregation AV values for entities6543 ,,, eeee

will identify the AV value for the category entity2v

v2

v23v22v21

e3e2 e6e5

v11 v12

v1

e1

CategoryEntities

Members ofCategoryEntities

Entities linkedto members ofa Category

v24

e4

Fig. 3. A data graph showing entities and relationship typesbetween entities.

Category Attribute Collocation (CAC). This ap-proach was used in [83] to improve the cue validitymetric by adding the so called category-feature collo-cation measure which takes into account the frequen-cy of the attribute within the members of the category.This gives preference to ‘good’ categories that havemany attributes shared by their members. In our case,a good category will be an entity Cv with high

number of relationships of type e between v′e andthe subclasses v : vv , relative to the number of

its subclasses. The following formula defines thecategory-attribute collocation metric for a given enti-ty v with regard to a relationship type e .

||

|}:,,{|

|}:,,{|

|}:,,{|),(

),( V

vvvev

Vvvev

vvvevevCAC e

evWv aae

e

e

(3)

Considering the above example for identifyingthe AV value for the entity

3e , and considering the

relationship D , the CAC adds a weight of (the num-ber of the triples the Subject between

3e and the

members of2v (Object entities

21v ,22v ) via the

relationship D (2 triples) divided by (the number of

members of the category entity2v ( i.e. FOUR

members: 21v , 22v ,23v ,

24v ) . Hence the CAC of entity

3e will be the AV value of3e multiplied by (2/4).

The CAC value for entity3e equals [(2/3) * (2/4) =

0.33]. The aggregation CAC values for entities

6543 ,,, eeee will identify the CAC value for the

category entity2v .

Category Utility (CU). This approach was pre-sented in [84] as an alternative metric for obtainingcategories at the basic level object. The metric takesinto account that a category is useful if it can im-prove the ability to predict the attributes for membersof the category, i.e. a good category will have manyattributes shared by its members (as mentioned in thecategory-attribute collocation metric). At the sametime, it possess ‘unique’ attributes that are not relatedto many other categories (efficiency of categoryrecognition). We adapt the formula in [69] for a datagraph:

22

),(

)||

|}:,,{|()

||

|}:,,{|(

||

||),(

V

Vvvev

V

vvvev

V

VevCU aae

evWv

e

e

(4)

Following the previous examples for calculating theAV and the CAC values for entity

3e (see Figure 3);

in addition to the proportion used by the CAC value(2/4), the CU will include the proportion of all tri-ples between the Subject entity

3e and all Object

entities in the graph (THREE Object entities

12v ,21v ,

22v ) linked via relationship D (i.e. 3 triples)

over the number of entities linked via subsumptionrelationships (e.g. rdfs:subClassOf and rdf:type) in the graph (11 entities). Hence the CUvalue for

3e will be: [(2/4)2 - (3/11)2= 0.177].

The aggregation CU values for entities6543 ,,, eeee ,

multiplied by the number of members of category

2v (FOUR members) divided by number of entities in

the graph linked via the susbsumption relationships(11 entities) will identify the CU value for the cate-

gory entity2v .

Algorithm I describes calculating the distinctive-ness metrics. The algorithm takes a data graph DGand a relationship type (hierarchical or domain-specific relationship) as input and returns values forthe three distinctiveness metrics for each categoryentity Cv .

In (line 1), all category entities are retrieved viaSPARQL query:

SELECT distinct ?categoryWHERE {

?category rdfs:subClassOf r.?subclass rdfs:subClassOf ?category.}

where r is the root entity in the data graph.

After tha, for an entity v, all members are retrieved(line 2). Members of an entity can be subclasses or

instances. In the current implementation, we are us-ing the subsumption relationship rdfs:subClassOf

to retrieve the entity’s members via the followingSPARQL query:

SELECT distinct ?subclassWHERE {?subclass rdfs:subClassOf v.}

For each entity v′e linked to one or more subclassentities v′ via an edge label e veve

,, (line 3),

several steps are conducted: retrieving all triples withSubject v′e and Object any subclass v′ (line 4); re-trieving all triples with Subject v′e and Object anygraph entity v (line 5); applying the formulas for cal-culating the AV, CAC, and CU metrics for v′e (lines6-8); and aggregating values for v′e to the overallvalues for v (lines 9-11).

Algorithm I: Distinctiveness Metrics

Input: EeTEVDG ,,,

Output:vvv CUCACAV ,, for all Cv

1. for all do //all category entities

2. V := the set of all vvv : //all members of v

3. for all do

4. := set of all

5. := set of all

6. :=

7. :=

8. :=

9. := +

10. := +

11. := +

12. end for

13. :=

14. end for

5.2.2. Homogeneity MetricsThese metrics aim to identify categories whose

members share many entities among each other. Inthis work, we have utilised three set-based similaritymetrics [81]: Common Neighbors (CN), Jaccard(Jac), and Cosine (Cos). The homogeneity value fora category entity is identified as the average of theaccumulated pair-wise similarity values between thecategories members with regard to an edge label e .For instance (see Figure 3), the Jaccard similaritybetween the pair-wise members ),( 2221 vv of the enti-

ty2v considering the edge label D is the number of

intersected Subject entities (ONE entity3e ) linked to

vevv ee ,,:

eN

eM

||/|| ee MN

|)|/|(||)|/|(| VNMN eee

22 |)|/|(||)|/|(| VMVN ee

evAV

evCAC

vCACevCAC

vCU

evCU

evCU

vAV

Cv

Vvveve :,,

evAV

Vvvev aae :,,

vCU vCUV

V

||

||

vCUvCAC

vAV

Objects2221,vv via edge label D , divided by the

number of union Subject entities (TWO entities3e ,

4e ) linked to Objects2221 , vv via edge label D .

Accordingly, the Jaccard similarity between mem-bers ),( 2221 vv is (1/2).

Algorithm II describes implementation of the met-rics. The algorithm takes a data graph and a relation-ship type (hierarchical or domain-specific relation-ship) as input and returns values for the three homo-

geneity metrics for each entity Cv

Algorithm II: Homogeneity MetricsInput: EeTEVDG ,,,

Output: , , for all

1. for all do //all category entities

2. V := the set of all vvv : //all members of v

3. for all do

4.

5.

6.

7.

8. //Common Neighbors

9. //Jaccard

10. //Cosine

11.

12.

13.

14. end for

15.

16.

17.

18. end for

The algorithm takes a data graph and a relationshiptype (hierarchical or domain-specific relationship) asinput and returns values for the three homogeneitymetrics for each entity Cv . In (line 1), all categoryentities are retrieved via the same SPARQL queryused in (Algorithm I / line 1).

Then, for every entity v, all members are retrievedusing the subsumption relationship (line 2) using thesame SPARQL query in (Algorithm I / line 2). Foreach pair of subclass entities v′ and v″ (line 3), sever-al steps are conducted: retrieving all entities linkedvia triples with v′ and v″ (lines 4-5); calculating theirintersection and union (lines 6-7); applying the for-mulas for calculating the similarity metrics CN, Jac,and Cos (lines 8-10); and aggregating these values to

the overall values for v (lines 11-13); and normalis-ing the aggregated values (lines 15-17).

All SPARQL queries were run over the MusicPin-ta data graph [18] stored in a triple store. This im-plementation allowed examining the performance ofthe KADG metrics over a specific data graph, as pre-sented next.

6. Evaluating KADG against Human BLODG

To enable impartial comparison of the outputs bythe KADG algorithms and the cognitive structures ofhumans, we conducted a study with humans follow-ing earlier Cognitive Science studies closely to iden-tify basic level objects in domain taxonomies. As ause case in a representative domain for evaluatingknowledge anchors over a data graph, we used a typ-ical semantic data browser, MusicPinta, which wasdeveloped in our earlier research [18] and is intro-duced in section 3. We focus on musical instrumentssince they are objects rich in meaning and provide asuitable domain for studying BLO, as discussed in[85]. It should be noted that it is not appropriate totake the list of BLO derived empirically by earlierCognitive Science studies (e.g. [22,85]) in musicdomain because these BLO presents a commonsenseview of the domain (music in this case) that may notcorrespond to the data graph over which the algo-rithms are run (data graphs are partial abstract mod-els of the real world). Hence, to ensure that the eval-uation focuses on the performance of the algorithms,we have to eliminate any impact of the knowledgemodelling in the data graph (e.g. missing or wrongconcepts/relationships).

6.1. Cognitive Science Experimental Approaches forDeriving BLO

While studying the notion of basic level objects,Rosch et al [22] conducted several experiments com-prising free-naming tasks testing the hypothesis thatobject names at the basic level should be the namesby which objects are most generally designated byadults. In a free-naming task, objects in a domaintaxonomy are shown to a participant as a series ofimages in a fixed amount of time, and the participantis asked to identify the names of objects shown in theimages as quickly as possible. Three types of packetsof images were shown to the participants in their ex-periments: those in which one picture from each su-perordinate category appeared; one in which one im-

vvvv CNCNCN ,

vCNvJac vCos

VvVvvv :),(

Cv

},,:{: vevvV eee

},,:{: vevvV eee

ee VVI :

ee VVU :||:, ICN vv

||/||:, UIJac vv

|)|||/(||:, eevv VVICos

vvvv JacJacJac ,

vvvv CosCosCos ,

)2/)1|.(||/(| VVCNCN vv

)2/)1|.(||/(| VVCosCos vv

)2/)1|.(||/(| VVJacJac vv

Cv

age from each basic level category appeared; and onein which all images appeared. The participants over-whelmingly used names at the basic level while nam-ing objects in the images [22].

To identify BLO, they considered accuracy andfrequency. Accuracy refers to naming an entity cor-rectly by the user. It considers whether a user namesan entity with its exact name, or with a parent (su-perclass) or with a category member (subclass) of theentity. Frequency indicates how many times a partic-ular category was accurately named by different us-ers. In the example of Guitar, when participantswere shown members of Guitar object (e.g. FolkGuitar, Classical Guitar) in a packet, theynamed them with their parent Guitar at the basiclevel more frequently than with names at the super-ordinate level (e.g. Musical instrument) orwith their exact names (e.g. Folk Guitar, Clas-sical Guitar) at the subordinate level.

The selection of object names used in the free-naming tasks in [22] was based on the population ofcategories of concrete nouns in common use in Eng-lish. Every noun with a word frequency of ten orgreater from a sample of written English [86] wasselected as a basic level object. A superordinate cate-gory was considered in common use if at least four ofits members met this criterion.

However, the Cognitive Science approach for se-lecting BLO cannot be applied directly in the contextof a data graph. The principal difference is that weneed to constrain the human cognitive structures up-on the data graph, as opposed to using a bag of wordsfrom popular dictionaries. This is because a datagraph presents a lesser number of concepts from adomain, which belong to the graph scope, and therecan be concepts that have been omitted. Moreover,the Cognitive Science studies included concrete do-mains where images of the objects could be shown toparticipants. Many semantic web applications utilisedata graphs which include more abstract concepts forwhich images cannot be reliably shown to users (e.g.medical illnesses, environmental concepts, profes-sions). Therefore, we adapt the Cognitive Scienceexperimental approach for deriving BLO to considerthe domain coverage of a data graph, which is appli-cable to any domain presented with a data graph.

6.2. Algorithm for Identifying Human Basic LevelObjects in a Data Graph

Following Cognitive Science experimental studiesoutlined above, we present two strategies with the

corresponding algorithm for identifying human basiclevel objects in a data graph (BLODG).

Strategy 1. Takes into account whether a leaf enti-

ty Lv that has no subclasses is presented to a userand named with one of its parents (i.e. super classes).

Strategy 2. Takes into account whether a category

entity Cv that has one or more subclasses is pre-sented and named with its exact name, or with thename of a category parent (i.e. superclass) or a cate-gory member (i.e. subclass that is not a leaf entity).

Algorithm III describes the two strategies for iden-tifying human BLODG using accuracy and frequency.Accuracy refers to naming an entity correctly.

Algorithm III: Identifying Human BLODG

Input: TEVDG ,,Output: two sets of entities: Set1 for human BLODG identified

from Strategy 1, and Set2 for human BLODG identified fromStrategy 2. The Union of the Set1 and Set2 identifies the finalset of Human BLODG

1. for a set of entities Vv do

2. for all );;1:( inii

3. if Lv then //Strategy1

4. )(vshow and ask a user to name v5. if )(vanswer )(vparent then

6. ;acount //count frequency

7. end if;

8. else if Cv then //Strategy2

9. )(vshow and ask a user to name v10. if )(vanswer = )(vlabel then


12. else if )}()({)( vmembervparentvanswer then


14. end if;15. end if;16. end for;17. end for;18. Set1 = }:)({ kcountLvvanswer a

19. Set2 = }:)({ kcountCvvanswer a

The algorithm takes a data graph DG as input andreturns two sets of human BLODG. For any class enti-

ty Vv (line 1), we identify the number of users to

be asked to name the entity (line 2). For Strategy 1(lines 3-7), we consider accurate naming of a catego-ry entity (a parent) when a leaf entity Lv that is amember of this category is seen. For Strategy 2 (lines8-14), we consider naming a category entity

Cv with its exact name (lines 10, 11) or a nameof its parents (superclasses) or category members(subclasses that are not leaf entities) (lines 12-13).

In both the strategies, we use a representation func-

tion )(vshow to create a representation of an entity

v to be shown to the user. The representation of aleaf entity Lv (in Strategy 1) will consider theleaf itself (e.g. show a single label or a single imagefor the leaf entity), while the representation of a cate-gory entity Cv (in Strategy 2) will consider all(or some) of the category’s leaf entities (e.g. showinga random listing of a set of labels of leaf entities orshowing a group of images of leaf entities asa collage). The following SPARQL query is used toget the set of leaf entities of v :SELECT ?leaf ?leaf_labelWHERE {

?leaf rdfs:subClassOf v .?leaf rdfs:label ?leaf_label.

FILTER NOT EXISTS{?member rdfs:subClassOf ?leaf.}}

The two strategies in Algorithm III for obtaininghuman BLODG are applied as follows:

Strategy 1. When a user is shown a representation

of a leaf entity Lv (line 4), the following steps areconducted:- The function )(vanswer assigns a user's answer

to the leaf entity v .- The function )(vparent returns a set of labels (i.e.

names) of the parent(s) of the leaf entity v via thefollowing SPARQL query:SELECT ?parent_label ?labelWHERE {

v rdfs:subClassOf ?parent.?parent rdfs:label ?parent_label.}

- The algorithm III (line 5) checks if the user namedthe leaf entity v with one of its parents. If an ac-curate name of a parent was provided, then thefrequency of the parent entity will be increased byone (line 6).

Strategy 2. When a user is shown a representation

of a category entity Cv (line 9), the followingsteps are conducted:

- The function )(vanswer assigns a user's answer

to the category entity v .

- The function )(vparent returns a set of labels of

parent(s) of the category entity v via SPARQLqueries similar to Strategy 1 above.

- The function )(vmember returns a set of labels (i.e.

names) of member(s) of the category entity v viathe following SPARQL query:

SELECT ?member_labelWHERE {?member rdfs:subClassOf v .?member rdfs:label ?member_label.}

- The function )(vlabel returns the label of the cate-

gory entity v via the following SPARQL query:SELECT ?labelWHERE {

v rdfs:label ?label.}

The algorithm in (lines 10, 12) checks if the usernamed the category entity v with its exact name, or aname of its parents or its members. If there was accu-rate naming of the category, a parent or a member,the frequency of the category name (line 11), theparent name or the member name (line 13) will beincreased by one. K in lines 18 & 19 indicate thenumber of different participants that have correctlynamed a category as indicated in lines (11,13).

6.3. Evaluating KADG against human BLODG

6.3.1. Obtaining human BLODG

To obtain a set of human BLODG that correspondto a human cognitive structure, we conducted a userstudy in the musical instrument domain to apply Al-gorithm III.

Participants. For this user study, 40 participants –university students and professionals, age 18–55,were recruited on a voluntary basis. None of the par-ticipants had any expertise in music.

Method. The participants were asked to freelyname objects that were shown in image stimuli, un-der limited response time (10 seconds). Overall, 364taxonomical musical instruments were extracted fromthe MusicPinta dataset by running SPARQL queriesover the triple-store hosting the dataset to get all mu-sical instrument concepts linked via therdfs:subclassOf relationship. The entities in-cluded: leaf entities (total 256) and category entities(total 108). While applying the two strategies in Al-gorithm III, for each leaf entity, we collected a repre-sentative image from the Musical Instrument Muse-ums Online (MIMO)8 to ensure that pictures of highquality were shown9. For a category entity, all leavesfrom that category entity were shown as a group in asingle image (similarly to a packet of images in [22]).

8 http://www.mimo-international.com/MIMO/9 MIMO provided pictures for most musical instruments. In the

rare occasions when an image did not exist in MIMO, Wikipediaimages were used instead.

We ran ten online surveys10. Among them: (i) eightsurveys presented 256 leaf entities, each showed 32leaves; (ii) two surveys presented 108 category enti-ties, each showed 54 categories.

Free-naming task. Each image was shown for 10seconds on the participant's screen, and the partici-pant was asked to type the name of the given object(for leaf entities) or the category of objects (for cate-gory entities). The image allocation in the surveyswas random. Every survey had four respondents fromthe study participants (corresponds to line 2 in Algo-rithm III). Each participant was allocated only to onesurvey (either leaf entities or category entities). Fig-ures 4-7 show example instrument images and partic-ipant answers (Figure 4 from Strategy 1, and Figures5-7 from Strategy 2). Applying Algorithm III overthe MusicPinta dataset, two sets of human BLODG

were identified.Set1 [resulting from Strategy1]. We consider accu-

rate naming of a category entity (parent) when leafentity that belongs to this category is seen. For ex-ample, as shown in Figure 4, a participant has namedViolotta, a leaf entity in the data graph, with itsparent category Violin. This will be counted as anaccurate naming and will increase the count for Vi-olin. The overall count for Violin will includeall cases when participants named Violin whileseeing any of its leaf members.

Fig. 4. An image of Violotta (a leaf entity in the data graph) wasshown to a user, who named it as Violin.

Set2 [resulting from Strategy 2]. We consider nam-ing a category entity with its exact name or a name ofits parent or subclass member. For example (see Figure5), a participant saw the category Fiddle and named

10 The study was conducted with Qualtrics (www.qualtrics.com).Examples from the surveys are in:https://login.qualtrics.com/jfe/preview/SV_cHhHPPthBFO5r6d?Q_CHL=preview

its parent category Violin; this will increase thecount for Violin. In Figure 6 a participant was shownthe image of category Violin and named it with itsexact name; this will increase the count for Violin.

Fig. 5. An image of Fiddle (a category entity with two leaf enti-ties) was shown to a user, who named it as Violin.

Fig. 6. An image of Violin (a category entity) was shown to a user,who named it as Violin

In Figure 7 a participant saw the category BowedString Instrument and named it as its membercategory Violin; this will also increase the count forViolin.

Fig. 7. An image of Bowed String Instrument (a categoryentity) was shown to a user, who named it as Violin

In each of the two sets, entities with frequencyequal or above two (i.e. named by at least two differ-ent users) were identified as potential human BLODG.The union of Set1 and Set2 gives human BLODG (weidentified 24 human BLODG). The full list of humanBLODG obtained from MusicPinta is available in [87].

6.3.2. Quantitative AnalysisWe used the identified human BLODG to examine

the performance of the KADG metrics. For each met-ric, we aggregated (using union) the KADG entitiesidentified using the hierarchical relationships (H).We noticed that the three homogeneity metrics havethe same values; therefore, we chose one metricwhen reporting the results, namely Jaccard similari-ty11. A cut-off threshold point for the result lists withpotential KADG entities was identified by normalizingthe output values from each metric and taking themean value for the 60th percentile of the normalisedlists 12 . The KADG metrics evaluated included thethree distinctiveness metrics plus the Jaccard homo-geneity metric; each metric was applied over bothfamilies of relationships – hierarchical (H) and do-main-specific (D).

As in ontology summarisation approaches [60], aname simplicity strategy based on data graphs wasapplied to reduce noise when calculating key con-cepts (usually, basic level objects have relativelysimple labels, such as chair or dog).

The name simplicity approach we use is solelybased on the data graph. We identify the weightedmedian for the length of the labels of all data graphentities Vv and filter out all entities whose label

length is higher than the median. For the MusicPintadata graph, the weighted median is 1.2, and hence weonly included entities which consist of one word.Table 3 illustrates precision and recall values com-paring human BLODG and KADG derived using hier-archical and domain specific relationships.

Table 3. MusicPinta: performance of the KADG algorithms com-pared to human BLODG.

11 The Jaccard similarity metric is widely used, and was used inidentifying basic formal concepts in the context of formal con-cept analysis [68].

12We experimented the KADG metrics at the 60th, 70th, 80th and 90th

percentiles on the metrics normalised output lists, and the met-rics performed best using at the 60th percentile.

Hybridisation of Algorithms. Further analysis ofthe False Positive(FP) and False Negative (FN) enti-ties indicated that the algorithms had different per-formance on the different taxonomical levels in thedata graph. This led to the following heuristics forhybridisation of algorithms.

Heuristic 1: Use Jaccard metric with hierarchicalrelationships for the most specific categories in thegraph (i.e. the categories at the bottom quartile of thetaxonomical level). There were FP entities (e.g.Shawm and Oboe) returned by distinctiveness met-rics using the domain-specific relationship Mu-sicOntology:Performance because theseentities are highly associated with musical perfor-mances (e.g. Shawm is linked to 99 performancesand Oboe is linked to 27 performance). Such entitiesmay not be good knowledge anchors for exploration,as their hierarchical structure is flat. The best per-forming metric at the specific level was Jaccard forhierarchical attributes - it excluded entities which hadno (or a very small number of) hierarchical attributes.

Heuristic 2: Take the majority voting for all theother taxonomical levels. Most of the entities at themiddle and top taxonomical level will be well repre-sented in the graph hierarchy and may include do-main-specific relationships. Hence, combining thevalues of all algorithms is sensible. Each algorithmrepresents a voter and provides two lists of votes,each list corresponding to hierarchical or domain-specific associated attributes (H, D). At least half ofthe voters should vote for an entity for it to be identi-fied in KADG. Examples from the list of KADG identi-fied by applying the above hybridisation heuristicsincluded Accordion, Guitar and Xylo-phone. The full KADG list is available here [87].Applying the hybridisation heuristics improved Pre-cision value to 0.65 (average Precision in Table 3 =0.59), Recall value to 0.68 (average Recall in Table 3= 0.56).

6.3.3. Qualitative AnalysisExamining the FP and FN entities for the hybridi-

zation algorithm, we made the following observa-tions that relate to the possible use of KADG as explo-ration anchors.

Observation 1: Missing basic level entities due tounpopulated areas in the data graph. We noticed thatnone of the metrics picked FN entities (such as Har-monica, Banjo or Cello) that belonged tothe bottom quartile of the class hierarchy and had asmall number of subclasses (e.g. Harmonica,Banjo and Cello each have only one subclass

Relationshiptypes

Precision Recall

AV CAC CU Jac AV CAC CU Jac

H 0.60 0.62 0.62 0.60 0.68 0.73 0.73 0.55

D 0.55 0.53 0.55 0.62 0.50 0.45 0.50 0.36

and there are no domain-specific relationships withtheir members). Similarly, none of the metrics pickedTrombone (which is false negative) - althoughTrombone has three subclasses, it is linked only toone performance and is not linked to any DBpediacategories. While these entities belong to the cogni-tive structures of humans and were therefore added inthe benchmarking sets, one could doubt whether suchentities would be useful exploration anchors becausethey are not sufficiently presented in the data graph.These entities would take the user to ’dead-ends’with unpopulated areas which may be confusing forexploration. We therefore argue that such FN casescould be seen as ‘good misses’ of algorithms.

Observation 2: Selecting entities that are superor-dinate of basic level entities. The FP included entities,such as Tambura, Reeds, Bass, Brass,Castanets, and Woodwind, which are well pre-sented in the graph hierarchy (e.g. Reeds has 36subclasses linked to 60 DBpedia categories, Brasshas 26 subclasses linked to 22 DBpedia categories,Woodwind has 72 subclasses linked to 82 DBpediacategories). Also, their members participate in manydomain-specific relationships (e.g. Reeds membersare linked to 606 performances, Brass - 33, andWoodwind - 853). Although, these entities are notclose to the human cognitive structures, they providedirect links KADG entities from the benchmarkingsets (e.g. Reeds links to Accordion, Brass linksto Trumpet and Woodwind links to Flute).We therefore argue that such FP cases could be seenas ‘good picks’ of the algorithms because they canprovide exploration bridges to reach BLO.

7. Exploration Strategies Based on Subsumption

In this section we describe how we adopt the sub-sumption theory for meaningful learning [21] (de-scribed in section 3) as our underpinning theoreticalmodel to develop an automatic approach for generat-ing exploration paths in data graphs.

7.1. Challenges in Applying the Subsumption Theoryfor Meaningful Learning

Applying the subsumption theory for meaningfullearning to generate exploration paths brings forthtwo challenges:- How to find the closest knowledge anchor to the

first entity of an exploration path? In uni-focalbrowsing (pivoting), a user starts his/her explora-

tion from a single entity in the graph, also referred

as a first entity (vs). To start the subsumption pro-cess, the user has to be directed from this first enti-ty to a suitable knowledge anchor in the data graphfrom where links to new entities can be made.However, there can be several knowledge anchorsin a data graph, requiring a mechanism to identify

the closest knowledge anchor vKA to vs.- How to use the closest knowledge anchor to sub-

sume new class entities for generating an explora-tion path? The closest knowledge anchor usuallycan have many subclass entities that exist at differ-ent levels of abstractions. It is important to identifywhich subclasses to subsume and in what orderwhile generating an exploration path for the user.Furthermore, we also need to identify appropriatenarrative scripts between the entities in the explora-tion path. We argue that providing meaningful nar-rative scripts between entities will help layman us-ers to create meaningful relationships between fa-miliar entities they already know in their cognitivestructures and the new subsumed class entities.

7.2. Finding the Closest KADG

Let vs be the first entity of an exploration path. The

first entity vs can be any class entity in the graph (i.e.any entity in the subsumption class hierarchy of allentities linked via the subsumption relationship

rdfs:subClassOf). If vs is a knowledge anchor

(vs KADG), then there is no need to identify theclosest knowledge anchor (i.e. find another anchor inthe data graph), and the subsumption process can

start immediately from vs. However, if vs is not a

knowledge anchor (vs KADG), then vs can be su-perordinate, subordinate, or sibling of one or moreknowledge anchors. In this case, an automatic ap-proach for identifying the closest knowledge anchor

vKA to vs is required.To find the closest knowledge anchor, we calculate

the semantic similarity between vs and every

knowledge anchor in the data graph viKADG. Thesemantic similarity between two entities in the classhierarchy is based on their distances (i.e. length ofthe data graph trajectory between both entities). Dueto the fact that class hierarchies exist in most datagraphs, we adopt the semantic similarity metric from[88] and apply it in the context of a data graph, wheresemantic similarity is based on the lengths betweenthe entities in the class hierarchy. The semantic simi-

larity between vs and a knowledge anchor vi is calcu-lated as:

)()(

)),((.2:),(

is

isis

vdepthvdepth

vvlcadepthvvsim

(5)

where, lca(vs , vi) is the least common ancestor of vs

and vi, and depth(v) is a function for identifying the

depth of the entity v in the class hierarchy.Algorithm IIII describes how the semantic similarity

metric is applied to identify vKA.

Algorithm IIII: Identifying Closest KADG

Input: },...,,{,,,, 21 iDGs vvvKAVvTEVDG

Output: vKA – closest knowledge anchor with highest semanticsimilarity to vs

1. ifDGs KAv then // vs is a knowledge anchor

2.sKA vv : ;

3. else // vs is NOT a knowledge anchor

4. {}:S ; //list for storing similarity values

5. for allDGi KAv do //for all knowledge anchors

6. {}:CA ; //list to store common ancestors of vs and vi

7. {}:L ; //list for storing trajectory lengths

8. ),(_ is vvancestorscommonCA ;

9. for all CAvca //for all common ancestors

10. ),( ica vvlengthL ; //length between vca and vi

11. end for;

12.calca vv : with least length in L ;

13.

)()(

)(2

is

lca

vdepthvdepth

vdepthS

;

14. end for;

15.iKA vv : with maximum similarity value in list S;

16. end if;

The algorithm takes a data graph, the first entity vs ofan exploration path and a set of knowledge anchorsKADG as an input, and identifies the closest

knowledge anchor vKA KADG with highest seman-

tic similarity value to vs. If the first entity vs belongs

to the set of knowledge anchors vs KADG (line 1),

then the first entity vs is identified as the closest

knowledge anchor vKA (line 2). However, if the first

entity vs does not belong to the set of knowledge an-

chors in the data graph vs KADG (line 3), then thefollowing steps are conducted:- The algorithm initialises a list S to store semantic

similarity values between vs and every knowledge

anchor vi KADG (line 4).

- For every knowledge anchor vi KADG (line 5),the algorithm initiates two lists: list CA for storing

the common ancestors (i.e. common superclasses)

of vs and vi (line 6), and list L for storing the tra-jectory lengths between the common ancestors inlist CA and the knowledge anchor vi (line 7).

- The algorithm in (line 8) uses a function com-mon_ancestors(vs,vi) which retrieves all commonancestors of vs and vi in the class hierarchy, andstores them in list CA. The function uses the fol-lowing SPARQL query to retrieve the common an-

cestors of vs and vi:SELECT distinct ?common_ancestorWHERE {

vs rdfs:subClassOf ?common_ancestor.

virdfs:subClassOf ?common_ancestor}.

- For every common ancestor in list CA (line 9), thealgorithm identifies the length of the data graph tra-

jectories between vi and each of the common ances-

tors vca in list CA, via the SPARQL query:

SELECT(count(?intermediate)-1 as ?length)WHERE {

vi rdfs:subClassOf ?intermediate.

?intermediate rdfs:subClassOf vca.}

- The common ancestor vca with least trajectory

length with vi in list L is identifies as the least

common ancestor vlca (line 12).

- After identifying the least common ancestor vlca to

the first entity vs and the knowledge anchor vi, thesemantic similarity metric (Formula 5) is applied(line 13). The metric includes identifying depths of

vs, vi , and vlca. The depth of an entity v is identifiedusing the following SPARQL query:

SELECT (count(?intermediate)-1 as ?depth)WHERE {

v rdfs:subClassOf ?intermediate.?intermediate rdfs:subClassOf r .}


The semantic similarity value is then inserted into thelist S (line 13), and the knowledge anchor with the

highest similarity value to the first entity vs will be

identified as closest knowledge anchor vKA (line 15).

7.3. Subsumption Using the Closest KnowledgeAnchor

The closest knowledge anchor vKA is used to sub-sume new class entities and to generate transitionnarratives in the exploration path. Table 4, describesthe different conditions for the narrative scripts usedbetween entities in an exploration path.

Table 4. Narrative types of transitions between entities in anexploration path.

NarrativeType

Fromentity

Toentity

DescriptionOutput script

(From_ entity, Narrative type,To_entity)

1N vs vKA

vs issubclass of

vKA

“You may find it useful to knowthat vs belongs to a familiar and

well-known class–vKA.Let's explore vKA.”

2N vs vKA

vs issuperclass

of vKA

“You may find it useful to knowthat there is a well-known class –

vKA that belongs to vs .Let's explore vKA”

3N vs vKA

vs and vKA

Aresiblings

“You may find it useful to knowthat vs is similar to a well-known

class – vKA.Let's explore vKA.”

4N vKA v′KA

v′KA issubclass of

vKA and

superclass

of vs

“You may find it useful to knowthat v′KA belongs to vKA, and vs

belongs to v′KA.Let's explore v′KA.”

5N vKA vʺKA

vʺKA issubclass of

vKA and notsuperclass

of sv

“You may find it useful to know

that vʺKA belongs to vKA. Let's

explore vʺKA.”

Algorithm V describes our approach for generatingexploration paths following Ausubels’ subsumptiontheory for meaningful learning. The narrative types(N1, N2, … , N5) used in this algorithm correspond tothe narratives types described in Table 4 above.

The algorithm takes a data graph DG, the first enti-ty vs, the closest knowledge anchor vKA, the length ofthe exploration path (m), and the edge labele = rdfs:subClassOf as an input, and generatesan exploration path P of length m. The algorithmstarts by initialising an empty exploration path P(line 1) used to store m transition narratives. If thefirst entity vs is not the closest knowledge anchor vKA

(line 2), then the algorithm starts identifying the rela-tionship type between vs and vKA. If

sv is subclass of

vKA (lines 3), then the following steps are conducted:- The transition narrative KAs vNscriptv ),(, 1

from vs

to vKA is inserted into P (line 4) where the functionscript(N1) retrieves the script output for narrativetype N1 from Table 4. The length of explorationpath m is decreased by one (line 5).

- The algorithm in (line 6) identifies the set of inter-mediate class entities (

KAV ) between vs and vKA, us-

ing the following SPARQL query:SELECT distinct ?intermediateWHERE {

vs rdfs:subClassOf ?intermediate.

?intermediate rdfs:subClassOf vKA .}

Algorithm V: Subsumption Using Closest KADG

Input: TEVDG ,, , Vvs ,DGKA KAv , m = length of

exploration path, e = rdfs:subClassOf

Output: an exploration path P of length m.1. {};:P //empty exploration path P

2. ifKAs vv then //vs is not vKA

3. if KAs vev ,, then //vs is subclass of vKA

4. ;),(, 1 KAs vNscriptvP //insert narrative

5. m ; //reduce length of P by one

6.KAV is all

KAKAKAsKA vevvevv ,,,,: ;

7. {}:Q ;

8. )( KAVsortDepthQ ;

9. for );0||;1:( imQii

10. ;][),(, 4 iQNscriptvP KA//insert narrative


12. end for;

13. else if sKA vev ,, then

14. ;),(, 2 KAs vNscriptvP15. m ; //reduce length of P by one

16. else if iKAis vevvev ,,,, then

17. ;),(, 3 KAs vNscriptvP //insert narrative


19. end if;

20. else ifKAs vv then //vs is vKA

21.KAV is all Qvvevv KAKAKAKA

,,: ;

22. {}:Q ;

23. )( KAVensitysortDepthDQ ;

24. for );0||;1:( jmQjj do

25. ;][),(, 5 jQNscriptvP KA//insert narrative

26. m ;

27. end for;28. end if;

- A list Q is created in (line 7), and the function

sortDepth() sorts the class entitiesKAKA Vv based

on their depths starting from the least depth classentity (i.e. direct subclass of vKA) to highest depth inascending order, and inserts the sorted class entitiesinto list Q (line 8). The function sortDepth() iden-

tifies the depth of a class entityKAv via the follow-

ing SPARQL query:SELECT (count(?intermediate)-1 as?depth)WHERE {

KAv rdfs:subClassOf ?intermediate.

?intermediate rdfs:subClassOf r .}


- The algorithm in (lines 9 – 12) uses the closest vKA

to subsume the class entities in Q . The transition

narrativ ][),(, 4 iQNscriptvKAfrom vKA to ][iQ

(i.e.KAv ) is inserted into P (line 10) where the func-

tion script(N4) retrieves the script output for narra-tive type N4 from Table 4. The length of the path mis decreased by one (line 11).

If vs is superclass of vKA (lines 13), then the transi-tion narrative KAs vNscriptv ),(, 2

from vs to vKA is

inserted into P (line 14) where the function script(N2)retrieves the script output for narrative type N2 fromTable 4. The length m of the path P is decreased byone (line 15).

If vs and vKA are siblings (i.e. share same super-class) (line 16), then the transition narrative

KAs vNscriptv ),(, 3from vs to vKA is inserted into P

(line 17) where the function script(N3) retrieves thescript output for narrative type N3 from Table 4. Thelength m of the path P is decreased by one (line 18).

However, if the first entity vs equals the closestknowledge anchor vKA (line 20), then the followingsteps are conducted:- Identify the set of subclass entities (

KAV ) of vKA

which do not belong to Q (line 21). A list Q is

created in (line 22), and the function

()ensitysortDepthD sorts the class entities

KAKA Vv based on two their depths and density,

as the following steps: Identify the depth of the class entities (similar to

function sortDepth() described above). Identify the density (using degree centrality) of

the class entities based on number of subclasses. Sort the class entity starting from the least depth

(i.e. direct subclasses of vKA) to highest depth inascending order, and from highest density to leastdensity (i.e. first sort using depth, if two or moreentities are at the same depth, then sort these enti-ties based on their density from highest to low-est). The sorted class entities are inserted into list

Q (line 23).

- The algorithm in (lines 24–27) uses vKA to subsumethe class entities in Q . The transition narrative

][),(, 5 jQNscriptvKAfrom vKA to ][ jQ (i.e.

KAv )

is inserted into P (line 25) where the functionscript(N5) retrieves the script output for narrative

type N5 from Table 4. The length of explorationpath m is decreased by one (line 26).

8. Evaluation of the Subsumption Algorithm

To evaluate the subsumption algorithm, we willconduct an experimental user study with users toexamine the knowledge utility and users’ explorationexperience of the generated exploration paths. Wewill compare two conditions:

Experimental condition (EC): where users followexploration paths generated using the subsumptionalgorithm;

Control condition (CC): where users carry outfree exploration and they are free to select entities tovisit.

Comparing both conditions a controlled task-drivenexperimental user study will be conducted with par-ticipants to examine the following hypotheses:

H1. Users who follow EC expand their domainknowledge.

H2. The expansion in the users’ knowledge whenfollowing EC is higher than when following CC.

H3. The usability when EC is followed is higher thanwhen CC is followed.

8.1. Experimental Condition Setting: ExplorationTask Design

Designing exploration tasks for users is consideredan important requirement for evaluating data explora-tion approaches [89]. A typical exploration task hasto be generic (i.e. the scope of the task is broad andthe user don’t have specific information needs), real-istic (i.e. real-life task that set in a familiar situation),discovery-oriented (i.e. users travel beyond what theyknow), open-ended (i.e. requires a significant amountof exploration, where open-endedness relates to un-certainty over the information available, or incom-plete information on the nature of the search task),and set in an unfamiliar domain for the user[2,89,90]. In this work, we follow a two-step ap-proach (similar to [89]) to design a data explorationtask for the study participants. The approach in-volves: (i) Designing a task template that places theparticipant in a familiar situation which involves ex-ploring multiple entities in an unfamiliar domain ortopic (e.g. a researcher at a university that wants towrite a research paper (familiar situation) about anew topic), and (ii) Identifying unfamiliar candidate

entities (e.g. find new research topic) in the domainthat could be plugged into the task template.

Our aim was to design a generic task template thatencourages layman users to seek knowledge in a do-main unfamiliar to them. Therefore, we designed thetask template in the context of a general knowledgequiz show where layman users need to acquire asmuch knowledge as they can. Inspired by the tasktemplates in [89], the task template in Table wasdesigned to suit the musical instrument domain.

Table 5. Task template used in the experimental user study

Task template

“Imagine that you are a member of a teamwhich will take part in a general knowledge quizshow. You have been asked to explore two musi-cal instruments for 20 minutes in order toprepare a short presentation to describe to

your team what you have learned about theseinstruments”.

The second step in designing data exploration taskwas to identify unfamiliar entities in the domain ofthe user study. For this we ran a questionnaire withusers to identify the unfamiliar entities in theString Instrument and Wind Instrumentclass hierarchies in the MusicPinta data graph. Thesetwo class hierarchies have the richest class represen-tation in terms of the number of classes and the hier-archy depth as discussed in Section 3.5.1, and havethe highest number of knowledge anchors (9 anchorsin the String Instrument class hierarchy and10 anchors in the Wind Instrument class hierar-chy – out of 24 anchors in MusicPinta data graph).We have extracted class entities at the bottom quar-tile of the two class hierarchies (note that the depth ofthe two class hierarchies is 7 – see Table 1, and enti-ties of depth 6 or 7 are considered to be at the bottomquartile of the data graph). This is based on earlierCognitive science studies acknowledging that laymanusers are not familiar with specific objects in a do-main [91]. Overall 61 class entities from the StringInstrument and Wind Instrument class hier-archies were used in the survey. The selected classeswere randomised and distributed among twelve par-ticipants who are not experts in the musical instru-ments (the participants have limited knowledge aboutmusical instruments and may have seen the instru-ment, and none of the participants had played anymusical instruments).

The most unfamiliar instrument from each classhierarchy was Biwa (class hierarchy: String In-strument, origin: Japanese) and Bansuri (classhierarchy: Wind Instrument, origin: Indian).

8.2. Experimental Setup

Participants. 32 participants consisting of univer-sity students and professionals (24 students and 8professionals13) were recruited on a voluntary basis(a compensation of £5 Amazon voucher was offered).Participants varied in age 18–45 (mean age is 30),and cultural background (1 Austrian, 9 British, 1Chinese, 3 Greek, 1 Italian, 5 Jordanian, 1 Libyan, 2Malaysian, 6 Nigerian, 1 Polish, 1 Romanian and 1Saudi).

Method. Four online surveys14 were run (whichcan be found here15). Every survey had 8 participants.Each participant was allocated one survey. Figure 8shows the overall structure of the user study. Eachparticipant explored both musical instruments(i.e. Biwa, Bansuri), where each of the instrumentwas allocated to an exploration strategy (EC or CC).The order of EC and CC was randomised to counterbalance the impact on the results. Every participantsession was conducted separately and observed bythe author. All participants were asked to providefeedback before, during, and after the interactionwith MusicPinta.

Fig. 8. Structure of user study to examine EC against CC in termsof knowledge utility, usability and cognitive load.

Task presentation [1 min] – utilise the task tem-plate (as described in Section 8.1) to present the dataexploration task for the users at the beginning of their

13 University lecturers and private Sector employees (Banking andAirlines).

14 The study was conducted with Qualtrics (www.qualtrics.com).15 https://login.qualtrics.com/jfe/preview/SV_39nOZZWAmCk8Jp

3?Q_CHL=preview

exploration session. We used the task template inTable 5 for Biwa and Bansuri.

Pre-study questionnaire [2 min] - collected infor-mation about the participants’ profiles, and their fa-miliarity with the music domain, focusing on the twomusical instrument class hierarchies which would beexplored – String Instrument and WindInstrument. The participants’ familiarity with thetwo class hierarchies varied from low to medium(63% and 78% of the participants had low familiaritywith String Instrument and Wind Instru-ment, respectively).

Graph exploration [20 min] – each user exploredthe two strategies (EC and CC) where each strategycorresponds to one of the two unfamiliar instruments(Biwa or Bansuri). Figure 9 shows an examplefor generating the exploration path under EC for themusical instrument Biwa (the first entity of the ex-ploration path) using the closest knowledge anchorLute.

Fig. 9. Extract from the String Instrument class hierarchyin MusicPinta showing exploration path of Biwa. This path was

followed in the experimental condition EC.

The first transition narrative in the explorationpath is between the Biwa and the closest knowledgeanchor Lute (using N1 in Table 4). After that, theclosest knowledge anchor Lute is used to subsumenew class entities and generate transition narrativesin the exploration path using the narrative types N4

(subsume intermediate class entities between Luteand Biwa – line 10 in Algorithm V) and N5 (sub-sume subclasses of Lute other than class entitiesthat have been subsumed using N4 – line 25 in Algo-rithm V).

The transition narratives that were followed ingenerating the exploration path for Biwa are listed inTable 6.

Table 6. Transition narratives used for generating the explorationpath for Biwa

Transition Narrativesfor path of Biwa

Narrative Script

KAs vNscriptv ),(, 1You may find it useful to know

that ‘Biwa’ belongs to a familiar andwell-known instrument called‘Lute’. Let's explore ‘Lute’.

]1[),(, 4 QNscriptvKAYou may also find it useful to know

that ‘Oud’ belongs to ‘Lute’ , and‘Biwa’ belongs to ‘Oud’. Let's

explore ‘Oud’.

]1[),(, 5 QNscriptvKAYou may also find it useful to

know that ‘Tambura’ belongs to‘Lute’. Let's explore ‘Tambura’

]2[),(, 5 QNscriptvKAYou may also find it useful to

know that ‘Pipa’ belongs to ‘Lute’.Let's explore ‘Pipa’

Table 7 shows examples of the entities that werefreely visited in the control condition CC for Biwa.

Table 7. Examples of entities the participants have visited duringtheir free exploration of Biwa

Example 1 Example 2 Example 3Biwa Biwa Biwa

Bouzouki StringInstruments

Japanese Musi-cal Instruments

Xalam ‘Guitar’ Lute

Banjitar AcousticGuitar

Moon Lute

Plucked Stringinstruments

ClassicalGuitar

Bouzouki

Figure 10 shows an example for generating the ex-ploration path under EC for the instruments Bansu-ri (the first entity vs in the exploration path) usingthe closest knowledge anchor Flute.

Fig. 10. Extract from the Wind Instrument class hierarchy inMusicPinta showing exploration path of Bansuri. This path was

followed in experimental condition EC.

The exploration path for Bansuri was generatedusing the closest knowledge anchor Flute and nar-

rative types N1, N4 given in Table 4. The first transi-tion narrative is between the first entity Bansuriand the closest knowledge anchor Flute (using N1

in Table 4). After that, the closest knowledge anchorFlute is used to subsume new class entities and togenerate transition narratives in the exploration pathusing narrative type N4 (subsume intermediate classentities between Flute and Bansuri line 10 inAlgorithm V). Transition narratives that were fol-lowed in generating the exploration path for Bansu-ri are listed in Table 8.

Table 8. Narrative scripts used for generating the experimentalcondition EC (i.e. exploration path) for instrument Bansuri

Transition Narrativesfor path of Bansuri

Narrative Script

KAs vNscriptv ),(, 1You may find it useful to know that'Bansuri' belongs to a familiar and

well-known instrument called'Flute'. Let's explore 'Flute'.

]1[),(, 4 QNscriptvKAYou may also find it useful to know that'Fipple Flute' belongs to 'Flute' , and'Bansuri' belongs to 'Fipple Flute'.

Let's explore 'Fipple Flute'.

]2[),(, 4 QNscriptvKAYou may also find it useful to know that‘Transverse Flute' belongs to 'Flute' ,and 'Bansuri' belongs to 'Transverse

Flute'. Let's explore 'Transverse Flute'.

]3[),(, 4 QNscriptvKAYou may also find it useful to know that

‘Indian Bamboo Flutes' belongs to'Flute' , and 'Bansuri' belongs to

'Indian Bamboo Flutes'. Let's explore'Indian Bamboo Flutes'.

Table 9 shows examples of the entities that werefreely visited in the control condition CC forBansuri

Table 9. Examples of entities the participants have visited duringtheir free exploration of Bansuri

Example 1 Example 2 Example 3Bansuri Bansuri Bansuri

TransverseFlute

Fipple Flute Bamboo Musi-cal Instru-

mentsSaw Truck Contrabass

RecorderSide-blown

FluteFipple Flute Recorder Concert FluteFlute D’amour Great bass

recorderFipple Flute

Both, EC and the CC had the same length (EC hadfour transition narratives, and CC had four edges)16.We analysed knowledge utility and user explorationexperience using usability aspects, associated withthe user’s exploration settings under the experimental

16 The length of four edges (5 entities) is based on Miller's Law[100], which indicates the number of objects that an average hu-man can hold in working memory is 7 ± 2.

condition (EC) and the control condition (CC), as thefollowing:

Measuring knowledge utility. To compare theknowledge utility of an exploration path, we need areliable approach for measuring knowledge. For this,we adopt the well-known taxonomy by Bloom usedfor assessing conceptual knowledge [92]. The taxon-omy identifies a set of progressively complex learn-ing objectives that can be used to design or assesslearning experiences over information seeking andsearch tasks, and offers a means of assessing thedepth of learning that occurs through search [93]. Itsuggests linking knowledge to six cognitive process-es: remember, understand, apply, analyze, evaluate,and create. Among these, ‘remember’ and ‘under-stand’ are directly related to browsing and explora-tion activities. The remaining processes require deep-er learning activities, which usually happen outside atool, in our case data browser, and hence will not beconsidered. The ‘remember’ process is about retriev-ing relevant knowledge from the long-term memory,and includes recognition (locating the knowledge)and recall (retrieving it from the memory) [92]. The‘understand’ process is about constructing meaning,from which the most relevant to a semantic browseris categorise (determining that an entity belongs to aparticular category) and compare (detecting similari-ties) [92].

To measure the knowledge utility of a path, we usea schema activation technique used for assessing howusers expand their knowledge after reading text [94].To assess user’s knowledge of a target domain con-cept, the user is asked to name concepts that belongsto and are similar to the concept. The schema activa-tion test is conducted before an exploration and afteran exploration, using three questions related to theselected cognitive processes of remember, categories,and compare:- Q1 [remember] What comes in your mind when

you hear the word X?;- Q2 [categorise] What musical instrument catego-

ries does X belong to?;- Q3 [compare] What musical instruments are

similar to X?The number of accurate concepts named (e.g.

naming an entity with it’s exact name, or with a par-ent or with a member of the entity) by user beforeand after exploration is counted, and the differenceindicates the knowledge utility of the exploration. Forexample, if a user could name correctly two musicalinstruments similar to the musical instrument Biwa(Q3) before his/her exploration and then the user

could name correctly six names of musical instru-ments similar to the instrument Biwa after his/herexploration, then the effect of the exploration on thecognitive process compare is indicated as 4 (i.e. as aresult of the exploration the user learned 4 new simi-lar musical instruments to the musical instrumentBiwa). If a user named one instrument after his/herexploration (i.e. knowledge utility is -1), in such cas-es the knowledge utility equal to zero.

User’s exploration experience of a path. Aftereach exploration, the participants were asked to fill aquestionnaire about their exploration experience andthe cognitive load they have experienced (based on amodified version of the NASA-TLX questionnaire[95]). Furthermore, the participants were asked tothink aloud and notes of all comments were kept.

8.3. Results

In the following, we present the results of evaluat-ing the subsumption algorithm. We have analysed theuser knowledge expansion and user exploration ex-perience by usability aspects, associated with theuser’s exploration settings under the experimentalcondition (EC) and the control condition (CC).

8.3.1. Measuring Knowledge Utility

The user knowledge was measured before and aftereach exploration using the three questions of theschema activation test related to the entities Biwaand Bansuri. Before exploration, none of the userswere able to articulate any item linked to the twomusical instruments (Biwa and Bansuri) using thethree cognitive processes. The knowledge utility forthe three cognitive process before and afterexploration of EC and CC is shown in Figure 11.

Fig. 11. Knowledge utility of the two strategies (EC and CC) ofthe user cognitive processes (median of the knowledge utility of

exploration for all users).

The knowledge utility of the exploration under ex-perimental condition (EC) in the three cognitive pro-cesses was higher than the effect of free explorationunder control condition (CC); and this difference issignificant (See Table 10). The results showed thatall participants were able to remember and categoriseentities with EC (only 5 participants couldn’t com-pare new entities). Whereas not all participant couldremember, categorise or compare new entities afterthey have finished their exploration with CC (therewere 2 participants that could not remember or cate-gorise new entities and 13 participants that could notcompare between entities).

Table 10. Statistically significant differences of the values inFigure 11 (Mann-Whitney, 1-tail, Na=Nb=32)

Difference in KnowledgeUtility between P and F

CognitiveProcess

Z-value p

EC > CCRemember 3.6 P<0.01Categorise 5.1 P<0.0001Compare 2.7 P<0.01

Notably, for the cognitive process categorise thebigger effect on exploration of the subsumption ex-ploration strategy over the free exploration strategy ishighly significant (p<0.0001). The difference be-tween median values for categorise cognitive processunder EC and CC was higher than the remember andcompare cognitive processes. Furthermore, we exam-ined the knowledge utility for the three cognitiveprocesses for each instrument in its correspondingclass hierarchies, as shown in Figure 12.

Fig. 12. Knowledge utility of the two strategies (EC and CC) ofthe user cognitive processes (median of the knowledge utility of

exploration for all users).

The knowledge utility of the exploration under exper-imental condition (EC) in the three cognitive pro-cesses was higher than the effect of free explorationunder control condition (CC) for Biwa and washigher in the cognitive processes compare and cate-gorise for Bansuri (See Figure 12); and this differ-ence in EC and CC is significant except for the cog-

nitive process compare for instrument Bansuri(See Table 11). By inspecting the participants’ an-swers for the cognitive process compare, it was no-ticed that more than half of the participants (9 out of16 participants for EC; 13 out of 16 participants forCC) had 0 or 1 values as their knowledge expansionin the cognitive process compare (0 value: partici-pants didn’t learn similar entities to Bansuri; 1value: participants learned one entity similar toBansuri). Since more than half of the results in thetwo lists (EC and CC) are 0 or 1, this decreases thedifference between EC and CC (i.e. decrease thechance that a randomly selected value from EC willbe higher than a randomly selected value from CC).Notably, for the cognitive process categorise the big-ger effect on exploration of EC over CC is highlysignificant (p<0.001) for Biwa and Bansuri.

Table 11. Statistically significant differences of the values inFigure 12 (Mann-Whitney, 1-tail, Na=Nb=16)

Difference inKnowledge Utilitybetween EC and

CC

Instrument(class

Hierarchy)

CognitiveProcess

Z-value p

EC > CCBiwa

(String)Remember 1.658 P<0.05Categorise 3.373 P<0.001Compare 2.449 P<0.05

EC > CCBansuri

(Wind)Remember 3.467 P<0.001Categorise 3.900 P<0.001

Compare 1.280 P<0.5

To further inspect what caused the low knowledgeutility for the cognitive process compare for instru-ment Bansuri, we looked into the participants’familiarity with the Wind Instrument class hier-archy (the class hierarchy that Bansuri belongs to)and noticed that 78% of the participants had low fa-miliarity (i.e. participants have limited knowledgeand they may have seen some instruments) withWind Instrument, whereas 65% of the partici-pants had low familiarity with the String In-strument class hierarchy. One could argue thatbeing more familiar with the String Instru-ment class hierarchy than the Wind Instrumentclass hierarchy, participants knew more entities tocompare with. Furthermore, entities in the StringInstrument class hierarchy are associated withmore DBpedia categories compared to entities in theWind Instrument class hierarchy (StringInstrument has 255 and Wind Instrumenthas 161 DBpedia categories). Most of these catego-ries are grouping musical instruments based on theircultural origin (e.g. Chinese Musical In-

struments, Japanese Musical instru-ment, Indian Musical Instruments,Greek Musical Instruments), which helpedthe participants to associate entities from their cul-tures. This indicates important considerations of thedata graph and the user familiarity for generatingexploration paths for knowledge expansion.

8.3.2. User Exploration Experience

After each exploration strategy, the participants’feedback on the exploration experience was collectedincluding exploration usability and exploration com-plexity, adapted from NASA-TLX [95] (Table 12).

Table 12. Questions to gather feedback on user exploration ex-perience, adapted from NASA-TLX (mental demand, effort, per-

formance).

Subjectiveprocess

Question text

KnowledgeExpansion

How much the exploration expanded yourknowledge?

ContentDiversity

How diverse was the content you have ex-plored?

MentalDemand

How mentally demanding was this explora-tion?

Effort How hard did you have to work in this explo-ration?

Performance How successful do you think you were in thisexploration?

Figures 13 and 14 give a summary of the users’feedback.

Fig. 13. Users’ exploration experience of the two explorationstrategies (EC and CC). Values show number of user paths rated

with the corresponding characteristics.

In Figure 13, the exploration experience with ECwas the most informative (all 32 participants identi-fied their exploration experience with the explorationpaths under EC as informative, whereas 20 partici-pants indicated their exploration with CC as informa-tive). The participants also found their explorationunder EC to be slightly more interesting and enjoya-ble than CC. Furthermore, the participants found the

exploration paths under EC to be the least boring andleast confusing – only 3% (one participant) and 16%(four participants) of the participants founded theirexploration with EC to be boring or confusing, re-spectively. For instance, on participant indicated hisexploration experience with EC as boring since hewas not able to freely explore through entities in thegraph (his feedback was “Narratives in paths allowsme to explore entities in a hierarchical fashion, and Iwould like to freely explore other types of relation-ships”). Another participant indicated his explorationexperience with EC as confusing (his feedback was“I saw the same instruments several times during myexploration”.

Fig. 14. Users' subjective perception of the two exploration strate-gies (EC and CC), based on an adapted NASA-TLX questionnaire

[96] (median values for all users in the range 1-10).

The effect of the exploration path under EC on thesubjective processes knowledge expansion and per-formance was higher than the effect of the free explo-ration strategy; and this difference was significant(See Table 13 and Fig 14).

Table 13. Statistically significant differences of users’ subjec-tive perception of cognitive process (Mann-Whitney, 1-tail,

Na=Nb=32)

Difference in theusers’ experience

SubjectiveProcess

Zvalue

p

EC > CC

Knowledge Expansion 3.98 P<0.0001Content Diversity 0.32 P<0.5Mental Demand 0.14 P<0.5Effort 0.14 P<0.5Performance 2.15 P<0.05

Notably, for the subjective process knowledge ex-pansion the bigger effect on exploration of EC overCC is highly significant (p<0.0001).

9. Discussion

In this section, we revisit our research questionsrelated to finding computational ways to: (i) identify

knowledge anchors KADG and (ii) generate explora-tion paths for knowledge expansion using KADG. Wereflect on how and to what extent we have answeredthem, discuss the major contributions of our work,and point at limitations.

9.1. RQ1: Identification of KADG

To address the first research question, we utilisedRosch’s definitions of basic level objects and cuevalidity to develop two groups of metrics and thecorresponding algorithms for identifying knowledgeanchors in a data graph. These include: distinctive-ness metrics, which identify the most differentiatedcategories whose attributes are shared amongst thecategory members but are not associated to membersof other categories, and homogeneity metrics, whichidentify categories whose members share many at-tributes together. The formal framework that mapsRosch’s definitions of BLO and cue validity to datagraphs is a major contribution of our work.

9.1.1. KADG AlgorithmsThe KADG algorithms have been formally defined;

they are generic and can be applied over differentapplication domains represented as data graphs. Thealgorithms have been applied over two data graphs –in music (presented here) and in careers (presented in[97]). Although this paper focuses only on the musicdomain (so that we can show a holistic approach il-lustrated with a concrete application), in the discus-sion below we identify key features of the algorithmswhich have been confirmed in broader evaluation(see [97] for further detail).

The implementation showed that the output of theKADG algorithms is sensitive to the quality of thedata graph in terms of the richness of the class enti-ties and the hierarchy depth. Specifically, the algo-rithms tend to pick more anchoring entities when adata graph has many classes and high depth. For in-stance, the algorithms identified 9 anchors in theString Instrument class hierarchy and 10 an-chors in the Wind Instrument class hierarchy –out of 24 anchors (String Instrument andWind Instrument are the richest class hierar-chies in MusicPinta – See table 1). Whereas the algo-rithms did not produce any anchor in the Elec-tronic Instrument class hierarchy (only 15classes with a depth of one). The same observationwas confirmed in the career domain [97].

9.1.2. Identifying Human BLODG

To enable objective comparison of the outputs bythe KADG algorithms and the cognitive structures ofhumans, we presented an algorithm that adapts earli-er Cognitive Science experimental approaches offree-naming tasks to capture human basic level ob-jects in a data graph (BLODG) that correspond to fa-miliar concepts in human cognitive structures over adata graph. A user study in the music domain wasconducted to identify benchmarking sets of humanBLODG used to examine the performance of theKADG metrics. Based on quantitative and qualitativeanalysis, the strengths and limitations of each metricwere assessed, and a hybridisation to enhance theperformance of the metrics approach was proposed.

An important component for applying the humanBLODG algorithm is to identify appropriate stimuli tobe used for representing graph entities and showingthem to humans in a free-naming tasks. One of themain factors that affects choosing appropriate stimuliis how well the stimuli cover the entities in the datagraph. In other words, the chosen stimuli should haverepresentations for all entities in the graph ontology.For instance, the stimuli for MusicPinta were images– taken from an established source (MIMO7). Thechosen stimuli have to be close enough to users’ cog-nitive structures, so the users can understand the rep-resentation of entities.

The formal description of the human BLODG algo-rithm makes it applicable over a broad range of do-mains, by selecting appropriate stimuli. In this paperwe presented an application in a domain with con-crete objects - musical instruments - that can havedigital representations (e.g. image, audio, video). In[97] we have shown how the algorithm can be ap-plied also to a data graph in an abstract domain (ca-reers) where domain objects have text representations(i.e. labels of entities) but no clearly distinguishabledigital representations.

The derived human BLODG set is depends on whatcategories are represented in the data graph. If keydomain concepts, which are well-familiar to people,are missing in the data graph, they cannot be shownto the users, and hence will be missing from the de-rived human BLODG set. This is important for navi-gation through the data graph, as the evaluation willconsider only entities that are in the data graph (andhence can be used in navigation paths). Furthermore,the derived human BLODG can point at deficienciesof the data graph. For instance, if a well-known do-main category is not present in the derived BLODG

this can indicate that the category is unrepresented in

the data graph. Such findings can be useful for ontol-ogy evaluation and re-engineering.

9.1.3. Evaluating KADG Against Human BLODG

An important step in the evaluation is to identify asuitable cut-off point for the KADG metrics. In thecurrent implementation, this was done through exper-imentation – the best performance was by using the60th percentile. The same percentile was identified asbest for the career domain (see [87]). We expect thatwhen applied to a range of data graphs, the 60th per-centile will give reasonable performance (the bestcut-off point for a specific data graph would requireexperimentation comparing different percentiles).

The analysis indicated that hybridisation of themetrics notably improved performance. The samewas observed in the careers domain ([97]). Appropri-ate hybridisation heuristics for the upper level of thedata graph is to combine the KADB metrics using ma-jority voting. The hybridisation heuristics for thebottom level of the hierarchy are dependent on thedomain-specific relationships in the data graph.Hence, to derive appropriate hybridisation heuristicsthat give good performance for categories at the bot-tom level, further experimentation will be required.This will include comparing the KADB derived usingthe various domain-specific relationships againsthuman BLODB.

9.2. RQ2: Generation of Exploration Paths in DataGraphs

9.2.1. Subsumption Algorithms for Generating PathsWe adopted the subsumption theory for meaning-

ful learning [21] to generate exploration paths forknowledge expansion using KADG. For this, we for-mally described two algorithms.

Algorithm for identifying the closest knowledgeanchor to the first entity of an exploration path.It applies a semantic similarity metric based on classhierarchy depth to identify the closest knowledgeanchor to the first entity of an exploration path. Thissimilarity algorithm is not applicable in data graphswith shallow class hierarchies (e.g. class hierarchydepth = 1 or 2). In such cases, there will be no com-mon ancestors for the first entity and the knowledgeanchors, and hence the semantic similarity value willbe zero. Furthermore, two (or more) knowledge an-chors in a data graph can have the same semanticsimilarity value with the first entity. For example,there were two knowledge anchors (Flute andReeds) with the same semantic similarity value with

the first entity Bansuri. One possible way to ad-dress this is to filter the knowledge anchors based ontheir density and give preference to the densest an-chor as it will include many subclass members tosubsume while generating the exploration path.

In some cases, the semantic similarity metric canidentify closest knowledge anchors which are notsuperclass (i.e. N1 in Table 4), subclass (i.e. N2 inTable 4) nor sibling (i.e. N3 in Table 4) to the firstentity, which means that the closest knowledge an-chor can not be reached directly from the first entityusing a narrative transition. For instance, althoughthe anchors Flute and Reeds have the same se-mantic similarity value with the first entity Bansu-ri, Flute is a superclass of Bansuri that can bereached directly using the narrative type (N1 in Table4), whereas Reeds can’t be reached directly fromBansuri. Our solution to address this issue was toapply semantic similarity to knowledge anchorswhich could be reached directly from the first entity.Another solution can be to add a narrative type inTable 4 that indicates that the first entity and theclosest knowledge anchor simply belong to the samedomain but there is no direct trajectory between them.

Algorithm for generating exploration path as a setof transition narratives using the closest knowledgeanchor. The algorithm is underpinned by the sub-sumption theory for meaningful learning to generateexploration paths for knowledge expansion. The al-gorithm uses a knowledge anchor to subsume subor-dinate entities (i.e. subclasses of the closestknowledge anchor) while generating transition narra-tives of an exploration path. The algorithm is de-pendent on the sub-classes of the selected knowledgeanchor – it may not be possible to generate m transi-tion narratives in the exploration path (where m is therequired length of exploration path identified as aninput of the algorithm). Our implementation usesonly the knowledge anchor, and can result in pathswhose length of less than m. Another way to addressthis would be to continue the path, using anotherknowledge anchor. It is also possible to use superor-dinate categories to the knowledge anchor to extendan exploration path. This will be suitable for caseswhen the user is gained knowledge at the subordinatelevel and is ready to generalise to a more abstractlevel. In such cases, a user model will be required.

9.2.2. Evaluation of Exploration PathsOverall, the evaluation results have supported our

hypothesis H1-H3 as set out in Section 8.

H1: When users followed the experimental condi-tion, i.e. the paths generated by the subsumption al-gorithm, they expanded their domain knowledge. Allparticipants in the study have indicated that their ex-ploration in the experimental condition was informa-tive (in other words, they felt that while following thepath they were able to find useful information);whereas 62% of the participants founded their freeexploration trajectories to be informative.

H2: The expansion of the users’ knowledge whenfollowing the generated exploration paths was higherthan when following free exploration - the partici-pants were able to remember, categorize and com-pare significantly more entities. The results alsoshowed that the cognitive process categorise had thebigger effect on expanding the participants’knowledge. This was caused because: (i) the sub-sumption hierarchical relationship (rdfs:subclassOf) was used to create the narrative scriptsbetween entities of the generated exploration paths,which helped the users to categorise new entities atdifferent levels of abstraction at their cognitive struc-tures; and (ii) the subsumption process usesknowledge anchors to subsume and learn new sub-categories similar to the way a human mind workswhile learning a new concept.

H3: The results showed that participants foundthe generated exploration paths to be more enjoyableand less confusing than free exploration paths, andtheir assessment of performance was higher. Oneparticipant founded his experience with hierarchicalnarrative scrips to be boring; and suggested that thesystem should diversify the types of narratives usedbetween entities in the exploration path. For example,to use other relationships, other than the subsumptionrelationship, for generating transition narratives.

9.2.3. Applicability of Evaluation ApproachInstruments used in the evaluation of the explora-

tion paths can be applied in other exploratory searchevaluation studies.

To measure knowledge utility of an explorationpath, the user knowledge is assessed before and afterexploration using Bloom’s cognitive processes ofremember, categorise and compare. These were ex-tracted from the first two cognitive categories inBloom’s taxonomy (namely remember and under-stand) which are directly related to exploration ac-tivities. In other evaluation contexts, more complexcognitive categories can be applied such as the cogni-tive category analyse. This category includes severalcognitive processes, such as differentiate (e.g. differ-

entiate between two entities in the data graph) andarrange (e.g. arrange entities in the data graph fromabstract to specific), which can be related to explora-tory search tasks.

We evaluated the subsumption algorithm for gen-erating exploration paths against free exploration byadopting a task-based approach. It involved twosteps: (i) designing a task template and (ii) identify-ing unfamiliar entities in the domain to be pluggedinto the task template. The task template presented inSection 8 is in the context of a general knowledgequiz that encourages users to seek knowledge in agiven domain represented as a data graph. The tem-plate can easily be adapted for a range of data graphexploration tasks. This will require identifying unfa-miliar entities to include in the template. We did thisbased on a small survey with participants to identifydomain entities (at the bottom quartile of the datagraph) which are likely to be unfamiliar to laymanuser. At a larger scale, crowdsourcing can be used toidentify unfamiliar entities in the data graph.

10. Conclusion and Future Work

Exploration of data to carry out an open-endedtask is becoming a key daily life activity. It usuallyinvolves a journey through large datasets or searchsystems that starts with an entry point, often an initialquery, and then exploring a large amount of datawhile constantly making decisions about which datato explore next. Users who are unfamiliar with thedomain they are exploring can face high cognitiveload and usability challenges when exploring suchlarge amount of data. Our work investigates how tosupport such users’ exploration through a data graphin a way that leads to expansion of the user’s domainknowledge. We introduced a novel exploration sup-port mechanism underpinned by the subsumptiontheory of meaningful learning, which postulates thatnew knowledge is grasped by starting from familiarconcepts in the graph which serve as knowledge an-chors from where links to new knowledge are made.

The KADG algorithms can have several applica-tions. In data graph exploration, KADG enables oper-ationalising the subsumption theory for meaningfullearning [21] to generate exploration paths forknowledge expansion. Furthermore, our approach foridentifying KADG can be applied to ontology summa-risation [53] where KADG allow capturing a laymanview of the domain. KADG approach can also be ap-plied to solve the ‘cold start’ problem in personaliza-

tion and adaptation [98]. One of the popular choicesfor addressing the cold start problem is a dialoguesystem with the user. The data graphs can provide alarge knowledge pool to implement such probingdialogues, however, one needs to select entities fromthe vast amount of possibilities for probing to avoidtoo long interactions with the user. Knowledge an-chors can be used as the starting entities for the prob-ing dialogues [99].

The subsumption algorithm is generic and can beapplied to generate transition narratives for explora-tory search in other domains. Further extensionwould be to maintain a user model and personalisethe path to the individual’s domain knowledge. An-other future direction would be to add a path diversi-fication strategy to suggest interesting concepts thatare more likely to attract people to engage in the ex-ploration, and hence learn more about the domain.

Acknowledgements. This research was supported bya Doctoral Grant from the University of Jordan. TheMusicPinta semantic data browser was developed bythe EU/FP7 project Dicode. We are grateful to allparticipants in the experimental studies.

References

[1] E. Palagi, F. Gandon, A. Giboin, R. Troncy, I.S. Antipolis,F. Gandon, I.S. Antipolis, A. Giboin, and I.S. Antipolis, Asurvey of definitions and models of exploratory search, in:ACM Work. Explor. Search Interact. Data Anal. -ESIDA ’17, 2017: pp. 3–8. doi:10.1145/3038462.3038465.

[2] R.W.W. and R.A. Roth, Exploratory Search Beyond theQuery–Response Paradigm, Morgan & Claypool, 2009.doi:10.1287/orsc.1110.0676.

[3] N. Belkin, Anomalous States of Knowledge as the Basis ofInformation Retrieval., Can. J. Inf. Sci. 5 (1980) 133–143.

[4] G. Marchionini, Exploratory search: from finding tounderstanding, in: Commun. ACM, 2006: p. 41.doi:10.1145/1121949.1121979.

[5] T. Berners-lee, Y. Chen, L. Chilton, D. Connolly, R.Dhanaraj, J. Hollenbach, A. Lerer, and D. Sheets,Tabulator : Exploring and Analyzing linked data on the Semantic Web, in: L. Rutledge, M C Schraefel, A.Bernstein, and D. Degler (Eds.), 3rd Int. Semant. Web UserInteract. Work., Citeseer, 2006.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.950&rep=rep1&type=pdf.

[6] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R.Cyganiak, and S. Hellamnn, DBpedia - A cystallizationpoint for the Web of Data, Web Semant. Sci. Serv. AgentsWorld Wide Web. 7 (2009) 154–165.

[7] G. Cheng, Y. Zhang, and Y. Qu, Explass: ExploringAssociations between Entities via Top-K OntologicalPatterns and Facets, in: ISWC ’13, 2014: pp. 422–437.doi:10.1007/978-3-319-11915-1_27.

[8] G.C. and Y. Qu, Searching linked objects with falcons:Approach, implementation and evaluation, Int. J. Semant.

Web Inf. Syst. 5 (2009) 49–70.[9] L. Zheng, Y. Qu, J. Jiang, and G. Cheng, Facilitating entity

navigation through top-K link patterns, in: Proc. Int. Semant.Web Conf., 2015: pp. 163–179. doi:10.1007/978-3-319-25007-6_10.

[10] M. Sah, and V. Wade, Personalized concept-based searchon the Linked Open Data, Web Semant. Sci. Serv. AgentsWorld Wide Web. 36 (2016) 32–57.doi:10.1016/j.websem.2015.11.004.

[11] L. Zheng, Y. Qu, and G. Cheng, Leveraging link pattern forentity-centric exploration over Linked Data, in: Proc.WWW, World Wide Web, 2017. doi:10.1007/s11280-017-0464-y.

[12] R. Pienta, G. Tech, J. Vreeken, G. Tech, and J. Abello,FACETS : Adaptive Local Exploration of Large Graphs, in: IEEE SDM’17, 2017.

[13] S. Dietze, and E. Kaldoudi, Socio-semantic Integration ofEducational Resources - the Case of the mEducator Project,Univers. Comput. Sci. 19 (2013) 1543–1569.

[14] P. Vakkari, Searching as learning: A systematization basedon literature, J. Inf. Sci. 42 (2016) 7–18.doi:10.1177/0165551515615833.

[15] J. Gwizdka, P. Hansen, C. Hauff, J. He, and N. Kando,Search As Learning (SAL) Workshop 2016, in: Proc. 39thInt. ACM SIGIR Conf. Res. Dev. Inf. Retr., ACM, NewYork, NY, USA, 2016: pp. 1249–1250.doi:10.1145/2911451.2917766.

[16] L. Koesten, E. Kacprzak, and J. Tennison, Learning WhenSearching for Web Data., in: Sal@SigirSearch as Learn.,2016: pp. 2–3. http://dblp.uni-trier.de/db/conf/sigir/sal2016.html#KoestenKT16.

[17] V. Dimitrova, L. Lau, D. Thakker, F. Yang-Turner, and D.Despotakis, Exploring exploratory search: a user study withlinked semantic data, Proc. 2nd Int. Work. Intell. Explor.Semant. Data - IESD ’13. (2013) 1–8. doi:978-1-4503-2006-1.

[18] D. Thakker, V. Dimitrova, L. Lau, F. Yang-Turner, and D.Despotakis, Assisting user browsing over linked data:Requirements elicitation with a user study, in: ICWE’13 Int.Conf. Web Eng., 2013: pp. 376–383. doi:10.1007/978-3-642-39200-9_31.

[19] V. Maccatrozzo, Burst the filter bubble: using semantic webto enable serendipity, in: ISWC ’11, 2012. doi:10.1007/978-3-642-35173-0.

[20] M. Al-Tawil, D. Thakker, and V. Dimitrova, Nudging toExpand User ’ s Domain Knowledge while ExploringLinked Data, in: IESD@ISWC, 2014.

[21] D.P. Ausubel, A subsumption theory of meaningful verballearning and retention, J. Gen. Psychol. 66 (1962) 213–224.doi:10.1080/00221309.1962.9711837.

[22] E. Rosch, C.B. Mervis, W.D. Gray, D.M. Johnson, andP.Boyes-Braem, Basic Objects in Neutral categories, Cogn.Psychol. 8 (1976) 382–439.

[23] S. Mazumdar, D. Petrelli, K. Elbedweihy, V. Lanfranchi,and F. Ciravegna, Affective graphs: The visual appeal ofLinked Data, Semant. Web. 6 (2015) 277–312.doi:10.3233/SW-140162.

[24] B. Fu, N.F. Noy, and M. Storey, Eye Tracking the UserExperience - An Evaluation of Ontology VisualizationTechniques, Semant. Web J. 8 (2017) 23–41.http://www.semantic-web-journal.net/system/files/swj770.pdf.

[25] A.S. Dadzie, and E. Pietriga, Visualisation of Linked Data -Reprise, Semant. Web. 8 (2017) 1–21. doi:10.3233/SW-160249.

[26] M. Javed, S. Payette, J. Blake, and T. Worrall, VIZ –

VIVO : Towards Visualizations-driven Linked Data Navigation, in: VOILA@ISWC, 2016.

[27] I.O. Popov, M.C. Schraefel, W. Hall, and N. Shadbolt,Connecting the Dots: A Multi-pivot Approach to DataExploration, in: ISWC’10, 2011: pp. 553–568.doi:http://dx.doi.org/10.1007/978-3-642-25073-6_35.

[28] S. Araújo, D. Schwabe, and S.D.J. Barbosa, Experimentingwith Explorator: A direct manipulation generic RDFbrowser and querying tool, in: VISSW, 2009.

[29] D. Huynh, and D. Karger, Parallax and companion: Set-based browsing for the data web, WWW Conf. (2009) 2005–2008. http://davidhuynh.net/media/papers/2009/www2009-parallax.pdf.

[30] G. Vega-Gorgojo, M. Giese, S. Heggestøyl, A. Soylu, and A.Waaler, PepeSearch: semantic data for the masses, PLoSOne. 11 (2016).

[31] B. Shneiderman, The eyes have it: a task by data typetaxonomy for information visualizations, in: Proc. 1996IEEE Symp. Vis. Lang., IEEE, 1996: pp. 336--343.http://reference.kfupm.edu.sa/content/e/y/the_eyes_have_it__a_task_by_data_type_ta_43453.pdf.

[32] A. Valsecchi, M. Abrate, C. Bacciu, M. Tesconi, and A.Marchetti, Linked Data Maps: Providing a Visual EntryPoint for the Exploration of Datasets, IESD@ISWC. 1472(2015).

[33] P.R. Smart, A. Russell, D. Braines, Y. Kalfoglou, J. Bao,and N. Shadbolt, A Visual Approach to Semantic QueryDesign Using a Web-Based Graphical Query Designer, in:Proc. 16th Int. Conf. Knowl. Eng. Knowl. Manag., 2008: pp.275–291. doi:10.1007/978-3-540-87696-0_25.

[34] M. Zviedris, and G. Barzdins, Viziquer: A tool to exploreand query SPARQL endpoints, in: Proc. 8th ESWC, 2011:pp. 441–445. doi:10.1007/978-3-642-21064-8.

[35] W. Javed, S. Ghani, and N. Elmqvist, PolyZoom : Multiscale and Multifocus Exploration in 2D Visual Spaces,in: CHI’12, ACM, 2012.http://dl.acm.org/citation.cfm?id=2207716.

[36] S. Scheider, A. Degbelo, R. Lemmens, C. Van Elzakker, P.Zimmerhof, N. Kostic, J. Jones, and G. Banhatti,Exploratory querying of SPARQL endpoints in space andtime, Semant. Web. 8 (2017) 65–86. doi:10.3233/SW-150211.

[37] A.G. Nuzzolese, V. Presutti, A. Gangemi, S. Peroni, and P.Ciancarini, Aemoo: Linked Data exploration based onKnowledge Patterns, Semant. Web. 8 (2017) 87–112.doi:10.3233/SW-160222.

[38] A. Harth, Visinav: Visual web data search and navigation,in: DEXA, 2009: pp. 214–228.

[39] P. Heim, T. Ertl, and J. Ziegler, Facet graphs: complexsemantic querying made easy, in: L. Aroyo, G. Antoniou, E.Hyvönen, A. Teije, H. Stuckenschmidt, L. Cabral, and T.Tudorache (Eds.), ESWC’7th, Springer Berlin Heidelberg,Berlin, Heidelberg, 2010: pp. 288–302.http://dl.acm.org/citation.cfm?id=2176871.2176894(accessed February 26, 2013).

[40] P. Heim, J. Ziegler, and S. Lohmann, GFacet: A browser forthe web of data, in: IMC-SSW’08, 2008: pp. 49–58.doi:10.1.1.143.1026.

[41] S. Brunk, and P. Heim, Tfacet: Hierarchical facetedexploration of semantic data using well-known interactionconcepts, in: DCI@ INTERACT, 2011: pp. 31–36.

[42] J.M. Brunetti, R. García, and S. Auer, From Overview toFacets and Pivoting for Interactive Exploration of SemanticWeb Data, Int. J. Semant. Web Inf. Syst. 9 (2013) 1–20.http://www.igi-global.com/article/overview-facets-pivoting-interactive-exploration/77822 (accessed January 13, 2014).

[43] C. Stadler, M. Martin, and S. Auer, Exploring the web ofspatial data with facete, Proc. 23rd Int. Conf. World WideWeb - WWW ’14 Companion. (2014) 175–178.doi:10.1145/2567948.2577022.

[44] P. Papadakos, and Y. Tzitzikas, Hippalus: Preference-enriched faceted exploration, in: EDBT@ICDT, 2014.

[45] K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B.Howe, and J. Heer, Voyager: Exploratory Analysis viaFaceted Browsing of Visualization Recommendations,IEEE Trans. Vis. Comput. Graph. 22 (2016) 649–658.doi:10.1109/TVCG.2015.2467191.

[46] N. Bikakis, G. Papastefanatos, M. Skourla, and T. Sellis, AHierarchical Framework for Efficient Multilevel VisualExploration and Analysis, SWJ. 8 (2017) 139–179.http://arxiv.org/abs/1511.04750.

[47] M. Sah, and V. Wade, Personalized Concept-Based Searchand Exploration on the Web of Data Using ResultsCategorization, in: Proc. Ext. Semant. Web Conf., 2013: pp.532–547.

[48] O. Rossel, Implemention of a “ search and browse ”scenario for the LinkedData, in: IESD@ISWC, 2014.

[49] L. De Vocht, A. Dimou, J. Breuer, M. Van Compernolle, R.Verborgh, E. Mannens, P. Mechant, and R. Van De Walle,A visual exploration workflow as enabler for theexploitation of linked open data, in: IESD@ISWC, 2014.

[50] M. Dojchinovski, and T. Vitvar, Personalised Access toLinked Data, in: Knowl. Eng. Knowl. Manag., 2014: pp.121–136. doi:10.1007/978-3-319-13704-9_10.

[51] C. Musto, P. Lops, P. Basile, M. de Gemmis, and G.Semeraro, Semantics-aware Graph-based RecommenderSystems Exploiting Linked Open Data, in: UMAP ’16,2016: pp. 229–237. doi:10.1145/2930238.2930249.

[52] F. Bianchi, M. Palmonari, M. Cremaschi, and E. Fersini,Actively Learning to Rank Semantic Associations forPersonalized Contextual Exploration of Knowledge Graphs,in: Proc. 14th Int. Conf. ESWC 2017, 2017: pp. 120–135.

[53] X. Zhang, G. Cheng, and Y. Qu, Ontology summarizationbased on rdf sentence graph, in: Proc. 16th Int. Conf. WorldWide Web WWW 07, ACM Press, 2007.http://portal.acm.org/citation.cfm?doid=1242572.1242668.

[54] R. Wille, Formal Concept Analysis as Mathematical Theoryof Concepts and Concept Hierarchies, Form. Concept Anal.(2005) 1–33. doi:10.1007/11528784_1.

[55] N. Li, and E. Motta, Evaluations of user-driven ontologysummarization, Lect. Notes Comput. Sci. (Including Subser.Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 6317(2010) 544–553. doi:10.1007/978-3-642-16438-5.

[56] N. Li, and E. Motta, Evaluations of user-driven ontologysummarization, EKAW 17th Int. Conf. Knowl. Eng. Manag.by Masses. 6317 (2011) 544–553. doi:10.1007/978-3-642-16438-5.

[57] K. Wang, Z. Wang, R. Topor, J.Z. Pan, and G. Antoniou,Eliminating concepts and roles from ontologies inexpressive descriptive logics, Comput. Intell. 30 (2014)205–232. doi:10.1111/j.1467-8640.2012.00442.x.

[58] G. Troullinou, H. Kondylakis, E. Daskalaki, D. Plexousakis,F. Gandon, M. Sabou, and H. Sack, Ontology understandingwithout tears: The-summarization approach, Semant. Web. 8(2017) 797–815. doi:10.3233/SW-170264.

[59] G. Troullinou, H. Kondylakis, E. Daskalaki, and D.Plexousakis, RDF Digest : Efficient Summarization of RDF / S KBs, in: Gandon F., Sabou M., Sack H., d’Amato C.,Cudré-Mauroux P., Zimmermann A. Semant. Web. LatestAdv. New Domains. ESWC 2015. Lect. Notes Comput. Sci.Vol 9088. Springer, Cham, 2015. doi:10.1007/978-3-319-18818-8.

[60] S. Peroni, E. Motta, and M. Aquin, Identifying key conceptsin an ontology , through the integration of cognitiveprinciples with statistical and topological measures, in:ASWC ’08, 2008.

[61] Y. Cai, W.H. Chen, H.F. Leung, Q. Li, H. Xie, R.Y.K. Lau,H. Min, and F.L. Wang, Context-aware ontologiesgeneration with basic level concepts from collaborative tags,Neurocomputing. 208 (2016) 25–38.doi:10.1016/j.neucom.2016.02.070.

[62] E. Rosch, and B.B. Lloyd, Cognition and Categorization,Lloydia Cincinnati. pp (1978) 27–48.http://www.worldcat.org/title/cognition-and-categorization/oclc/3844382&referer=brief_results.

[63] J. Poelmans, D.I. Ignatov, S.O. Kuznetsov, and G. Dedene,Formal concept analysis in knowledge processing: A surveyon applications, Expert Syst. Appl. 40 (2013) 6538–6560.doi:10.1016/j.eswa.2013.05.009.

[64] P. Cimiano, A. Hotho, and S. Staab, Learning ConceptHierarchies from Text Corpora Using Formal ConceptAnalysis, J. Artif. Int. Res. 24 (2005) 305–339.http://dl.acm.org/citation.cfm?id=1622519.1622528.

[65] W.C. Cho, and D. Richards, Improvement of precision andrecall for information retrieval in a narrow domain: Reuseof concepts by formal concept analysis, in: Proc. -IEEE/WIC/ACM Int. Conf. Web Intell. WI 2004, 2004: pp.370–376. doi:10.1109/WI.2004.10082.

[66] D. Richards, Ad-Hoc and Personal Ontologies: APrototyping Approach to Ontology Engineering, in: PKAW,2006: pp. 13–24.

[67] B. Sertkaya, OntoComP: A Protégé Plugin for CompletingOWL Ontologies, in: L. Aroyo, P. Traverso, F. Ciravegna, P.Cimiano, T. Heath, E. Hyvönen, R. Mizoguchi, E. Oren, M.Sabou, and E. Simperl (Eds.), Semant. Web Res. Appl. 6thEur. Semant. Web Conf. ESWC 2009 Heraklion, Crete,Greece, May 31--June 4, 2009 Proc., Springer BerlinHeidelberg, Berlin, Heidelberg, 2009: pp. 898–902.doi:10.1007/978-3-642-02121-3_78.

[68] R. Belohlavek, and M. Trnecka, Basic level of concepts informal concept analysis, ICFCA’10. 7278 LNAI (2012) 28–44. doi:10.1007/978-3-642-29892-9_9.

[69] R. Belohlavek, and M. Trnecka, Basic level in formalconcept analysis: Interesting concepts and psychologicalramifications, IJCAI Int. Jt. Conf. Artif. Intell. (2013) 1233–1239.

[70] A. Bonifati, R. Ciucanu, and A. Lemay, Learning PathQueries on Graph Databases, in: Proc. 18th Int. Conf.Extending Database Technol., 2015.doi:10.5441/002/edbt.2015.11.

[71] K. Anyanwu, and A. Sheth, ρ-Queries Enabling Querying for Semantic Associationson the Semantic Web.pdf, in:Proc. 12th Int. Conf. World Wide Web, 2003: pp. 690–699.doi:10.1145/775152.775249.

[72] P. Heim, S. Hellmann, J. Lehmann, and S. Lohmann,RelFinder : Revealing Relationships in RDF Knowledge Bases, (n.d.).

[73] R. García, R. Gil, J.M. Gimeno, E. Bakke, and D.R. Karger,BESDUI: A benchmark for end-user structured data userinterfaces, in: ISWC ’16th, 2016: pp. 65–79.doi:10.1007/978-3-319-46547-0_8.

[74] D. Lamprecht, M. Strohmaier, D. Helic, C. Nyulas, T.Tudorache, N.F. Noy, and M.A. Musen, Using ontologies tomodel human navigation behavior in information networks:A study based on Wikipedia, Semant. Web. 6 (2015) 403–422. doi:10.3233/SW-140143.

[75] D.P. Ausubel, and D. Fitzgerald, ANTECEDENTLEARNING VARIABLES IN SEQUENTIAL VERBAL

LEARNING, (1962).[76] D.P. Ausubel, The use of advance organizers in the learning

and retention of meaningful verbal material, J. Educ.Psychol. 51 (1960) 267–272. doi:10.1037/h0046669.

[77] D.P. Ausubel, Cognitive structure and the facilitation ofmeaningful verbal learning, J. Teach. Educ. 14 (1963) 217–222. doi:10.1177/002248716301400220.

[78] D.P. Ausubel, J.D. Novak, and Helen Hanesian, EducationPsychology: A cognitive View, 1968.

[79] D.P. Ausubel, In Defense of Advance Organizers: A Replyto the Critics*, Rev. Educ. Res. Rev. Educ. Res. Spring. 48(1978) 251–257. doi:10.3102/00346543048002251.

[80] C. Bizer, T. Heath, and T. Berners-Lee, Linked data-thestory so far, Int. J. Semant. Web Inf. Syst. (2009).doi:10.4018/jswis.2009081901.

[81] M. Al-Tawil, V. Dimitrova, D. Thakker, and B. Bennett,Identifying knowledge anchors in a data graph, in: HT 2016- Proc. 27th ACM Conf. Hypertext Soc. Media, 2016.doi:10.1145/2914586.2914637.

[82] R. Wille, RESTRUCTURING LATTICE THEORY: ANAPPROACH BASED ON HIERARCHIES OFCONCEPTS, in: S. Ferré, and S. Rudolph (Eds.), Form.Concept Anal. 7th Int. Conf. ICFCA 2009 Darmstadt, Ger.May 21-24, 2009 Proc., Springer Berlin Heidelberg, Berlin,Heidelberg, 2009: pp. 314–339. doi:10.1007/978-3-642-01815-2_23.

[83] G. V Jones, Identifying Basic Categories, Psychol. Bull. 94(1983) 423–428. doi:10.1037//0033-2909.94.3.423.

[84] M.A. Corter, James E.; Gluck, Explaining Basic Categories:Feature Predictability and Information, (1992).

[85] C.F. Palmer, R.K. Jones, B.L. Hennessy, M.G. Unze, andA.D. Pick, How Is a Trumpet Known? The “Basic ObjectLevel” Concept and Perception of Musical Instruments, Am.J. Psychol. 102 (1989).

[86] W.N.F. Henry Kucera, Computational Analysis of Present-Day American English, Am. Doc. (1968).

[87] M. Al-Tawil, Intelligent Support for Exploration of DataGraphs, University of Leeds. Submitted (defence inDecember), 2017.

[88] Z. Wu, and M. Palmer, Verbs semantics and lexicalselection, Proc. 32nd Annu. Meet. Assoc. Comput. Linguist.-. (1994) 133–138. doi:10.3115/981732.981751.

[89] B. Kules, R. Capra, M. Banta, and T. Sierra, What DoExploratory Searchers Look at in a Faceted Search

Interface?, JCDL ’09 Proc. 9th ACM/IEEE-CS Jt. Conf.Digit. Libr. . (2009) 313–322.doi:10.1145/1555400.1555452.

[90] T. Nunes, and D. Schwabe, Frameworks of InformationExploration - Towards the Evaluation of ExplorationSystems Conceptual View of an Exploration Framework, in:IESD@ISWC, 2016.

[91] J.W. Tanaka, and M. Taylor, Object categories andexpertise: Is the basic-level in the eye of the beholder?,Cogn. Psychol. 23 (1991) 457–482. doi:10.1016/0010-0285(91)90016-H.

[92] D.R. Krathwohl, A Revision of Bloom’s Taxonomy : An Overview, Theory Pract. 41 (2002).

[93] L. Freund, S. Dodson, and R. Kopak, On measuringlearning in search: A position paper, in: Search as Learn.Work., 2016: pp. 1–2.

[94] S.C. Carr, and B. Thompson, The effects of priorknowledge and schema activation strategies on theinferential reading comprehension of children with andwithout learning disabilities, Learn. Disabil. Q. 19 (1996)48–61. doi:10.2307/1511053.

[95] S.G. Hart, and L.E. Staveland, Development of NASA-TLX(Task Load Index): Results of Empirical and TheoreticalResearch, Adv. Psychol. 52 (1988) 139–183.doi:10.1016/S0166-4115(08)62386-9.

[96] Sandra G. HartLowell E. Staveland, Development ofNASA-TLX (Task Load Index): Results of Empirical andTheoretical Research, Adv. Psychol. 52 (1988) 139–183.

[97] M. Al-Tawil, V. Dimitrova, D. Thakker, and AlexandraPoulovassilis, Evaluating Knowledge Anchors in DataGraphs against Basic Level Objects, in: ICWE’17 Int. Conf.Web Eng., 2017.

[98] A.I. Schein, A. Popescul, L.H. Ungar, and D.M. Pennock,Methods and metrics for cold-start recommendations, in:Proc. 25th Annu. Int. ACM SIGIR Conf. Res. Dev. Inf.Retrieval, SIGIR ’02, 2002: p. 253.doi:10.1145/564376.564421.

[99] M. Al-Tawil, V. Dimitrova, and D. Thakker, Using BasicLevel Concepts in a Linked Data Graph to Detect User’sDomain Familiarity, in: UMAP, Dublin, Ireland, 2015.

[100] G.A. Miller, The magical number seven, plus or minus two:some limits on our capacity for processing information,Psychol. Rev. 63 (1956).

Documents

Using Knowledge Anchors to Facilitate User Exploration of ... · Using Knowledge Anchors to Facilitate User Exploration of Data Graphs Editor(s): Name Surname, University, Country