30
Semantic Need Guiding Metadata Annotations by Questions People #ask Hans-Jörg Happel, FZI Karlsruhe, Germany 2010-11-09 @ 9th Int. Semantic Web Conference (ISWC 2010), Shanghai, China

Semantic Need: Guiding Metadata Annotations by Questions People #ask

Embed Size (px)

DESCRIPTION

In its core, the Semantic Web is about the creation, collection and interlinking of metadata on which agents can perform tasks for human users. While many tools and approaches support either the creation or usage of semantic metadata, there is neither a proper notion of metadata need, nor a related theory of guidance which metadata should be created. In this paper, we propose to analyze structured queries to help identifying missing metadata. We conduct a study on Semantic MediaWiki (SMW), one of the most popular Semantic Web applications to date, analyzing structured "ask"-queries in public SMWinstances. Based on that, we describe Semantic Need, an extension for SMW which guides contributors to provide semantic annotations, and summarize feedback from an online survey among 30 experienced SMW users.

Citation preview

Page 1: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic NeedGuiding Metadata Annotations by Questions People #ask

Hans-Jörg Happel, FZI Karlsruhe, Germany2010-11-09 @ 9th Int. Semantic Web Conference (ISWC 2010), Shanghai, China

Page 2: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 2

Page 3: Semantic Need: Guiding Metadata Annotations by Questions People #ask

• SMW is a popular Semantic Web application that allows to annotate Wiki pages semantically

• Semantic interpretation of the existing Wiki categories• Syntax extension for [[Wiki links]]

– Relations to other pages: [[Capital::Abuja]]– Literals: [[Inhabitants::182418]]

Semantic MediaWiki (SMW)

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 3

Page 4: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Structured Queries in SMW• SMW also allows for structured queries

{{#ask: [[Category:Country]] [[OnContinent::Africa]] |?area |?...}}

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 4

SMW resembles the Semantic Web in small

Page 5: Semantic Need: Guiding Metadata Annotations by Questions People #ask

SMW Query Result{{#ask:

[[Category:Country]] [[OnContinent::Africa]] |?area |?...}}

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 5

???

…?

Page 6: Semantic Need: Guiding Metadata Annotations by Questions People #ask

What happend to „Nigeria“?

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 6

Info might be missing

…not annotated properly

Different property

name

Distributed data source not available

Page 7: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gaps• Observation:

– „Semantic gap between supply and demand on the Semantic Web” [Mik09]

– Due to the evolutionary nature of the (Semantic) Web (OWA)

• What is missing? – i.e.:– KB: Axioms that are known (e.g. statements about Nigeria)– XKB: Axioms not yet known but people would like to know

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 7

Page 8: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Towards Semantic Need• Research questions

– How to identify „Semantic Gaps“?– Do „Semantic Gaps“ exist?– If yes, how to close these gaps?

• Research approach– Propose heuristics– Explorative: Analyze Public Semantic Web– Constructive: Design and evaluate tools

88Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China

Page 9: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 9

Page 10: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Idea: Guide Annotation by Information Needs• Means for deriving information needs

– (Structured) queries– Information access/browsing– Context– …?

• We chose to focus on queries– Explicit; can be captured easily– Express a „demand“ [Mik09]– Recur across time and different people (at least in

IR! [Smy05, Tee06, Zha09])

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 10

Page 11: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Identifying „Semantic Gaps“• Focus on

– Conjuctive queries– Retrieving instances and their properties

• Core elements{{#ask: [[Category:Country]]

[[OfContinent::Africa]]|?hasArea|?population|?hasCapital|?Currency}}

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 11

Printout Statement

Printout Statement

ConditionsConditions

Page 12: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gap Heuristic #1: Near Matches• Instance I KB is considered a near match of a

query q if:– I is not in the result set of q in KB

– There is at least one conjunctive query atom of q for which I is part of the result set

– I would be in the result set of q in KB XKB

• Correspondingly, we consider q to have an incomplete result set if it has „near matches“

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 12

Page 13: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gap Heuristic #1: Example{{#ask:

[[Category:Country]] [[OnContinent::Africa]] |?area |?...}}

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 13

Egypt 1.001.449 km2 83.082.869 Cairo Egyptian pound

Lacks annotation [[OnContinent::Africa]]

„Near Match“

Lacks annotation [[OnContinent::Africa]]

„Near Match“

Page 14: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gap Heuristic #2: Missing Printout Values• Instance I KB is considered to have missing

printout values for a query q if:– I is part of the result set of q– q contains a printout statement x for which no property

value of I exists in the KB

• Note: Technically, „missing printout values“ can be considered equivalent to near matches– SPARQL requires „OPTIONAL“ modifier to yield missing

printout values– SMW-QL allows to set printout values required

• Correspondigly, we consider q to have an sparse result set if it has at least one „missing printout value“

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 14

Page 15: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gap Heuristic #2: Example

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 15

„Missing Printout Values“

„Missing Printout Values“

Page 16: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 16

Page 17: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Public SMW Analysis: Design• Goal

– Do „Semantic Gaps“ exist?– Find out significance of missing result values and near

matches in real world queries

• Crawling public SMW installations– Collected ~200 public SMW installations via overview lists and

search engines– Selection of 8 SMW instances (filtered based on data and

technical reasons and random choice)– Those have on average 1880 annotations and 35 inline

queries

• Checking for sparse & incomplete query results– Analyzing 25 (out of 285) queries (only ASK-Queries, online

"Table"-output format, only queries with printout statements resp. conjunctions)

– 17 of these queries were located on Template pages

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 17

Page 18: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Public SMW Analysis: Results• Printout-Values

– In average, 16% of cells in a result set were empty due to missing annotations (up to 63% for certain queries)

Allows for identifying a total of 296 missing printout values– Validation showed that 13 out of 15 manually investigated

empty cells could be considered missing information

• Near matches– In average, 22% of all potential result pages of a query lack a

selective annotation (up to 94% for certain queries) Allows for identifying a total of 147 potentially missing

annotations for “selective” properties– Validation showed that 10 out of 15 manually investigated

near matches could be considered missing information

• Note: based on evaluation conditions, only around 9% of the overall inline queries were analyzed

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 18

Page 19: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 19

Page 20: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Extension:Semantic Need: Idea• Goal:

– How to close „Semantic Gaps“?– Guide the creation of semantic annotations in SMW

• Design principles– „Need-driven Knowledge Sharing“ [Hap09b]– People are willing to contribute missing information, if

they recognize that there is concrete demand– Derived from related work and supported by user studies

• Core features– Capture and store needs (i.e. #ask-queries)– Guide annotations by extending and modifying the SMW

user interface based on information need heuristics (i.e. „near matches“ and „missing printout values“

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 2020

Page 21: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Screenshot: In-Page Annotation

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 22

HintHint Sources of needSources of need

Page 22: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Need Online Survey: Design

• 34 questions on SMW and Semantic Need• Target group: SMW experts (via mailinglist,

invitation) • Data collected in June/July 2010• 30 complete answers (out of 58)

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 23

Page 23: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Need Online Survey: Semantic Need can help• Problem patterns do occur

– Sparse result set: 12/30 considered problematic

– Incomplete result set: 23/30 considered problematic

• Stressed in free text• Core issue: „invisibility“ of the issue

• Usage of SMW differs– „Structured“ settings focus on quality– „Open“ settings focus on guidance– Semantic Need generally considered helpful by

both groups

24Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China

Page 24: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Maintenance practices: mostly ad hoc• Methods & tools used to maintain semantic data

– (7: n.a.; due to given external data model)– 12: none– 5: „simple“– 7: „advanced“ (e.g. scripts, documentation, team

decisions)

• How to find missing annotations for a given page– 6: Compare similar pages („extensional“)– 7: Check schema („intensional“)– 4: Text analysis– 10: Use query

25Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China

Page 25: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 26

Page 26: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Insights• „Semantic Gaps“ do exist

– Information needs are a valuable source to find them

– „Missing printout values“ and „near matches“ seem to be useful heuristics

– Especially „incomplete result sets“ are considered problematic

• No systematic guidance & gardening of SMW knowledge bases

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 27

Page 27: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Design Implications• Semantic Annotation

– Issue: Costly, often driven by a pre-defined ontology structure

– Idea: Consider “incentives for annotation” [Han05]

• Semantic Search– Issue: Decoupling of provision & access– Idea: Consider information needs

• Need specification/ontology• Maintain semantic query logs

• Data Quality/Gardening/Maturing– Issue: The Semantic Web evolves continuously– Idea: Allow for better data quality modeling

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 28

Page 28: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Summary & Outlook• Main contributions

– How to identify „Semantic Gaps“ Heuristics based on queries

– Do „Semantic Gaps“ exist? Yes– If yes, how to close these gaps? Semantic Need

• Next steps– Large scale analysis of „Semantic Gaps“ (more public SMW

instances)– Provide stable implementation und gather feedback from

field usage of Semantic Need

• Further research opportunities– Use needs to guide the sharing of semantic annotations– Use needs to create schema-level mappings or for

class/property evolution– Many more (Semantic query logs, UI, Incentives, …)

29Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China

Page 29: Semantic Need: Guiding Metadata Annotations by Questions People #ask

References• Extension:Semantic Need

– http://amazonas.fzi.de/semanticneed / (Demo Wiki)– http://www.mediawiki.org/wiki/Extension:Semantic_Need

• Extension:Woogle4MediaWiki (for non SMW-Wikis)– http://amazonas.fzi.de/wooglenative/ (Demo Wiki)– http://www.mediawiki.org/wiki/Extension:Woogle4MediaWiki

• Literature– [Han05] Handschuh, Siegfried: Creating ontology-based metadata by annotation for the semantic web,

Dissertation, 2005– [Hap09b] Hans-Jörg Happel: Towards Need-driven Knowledge Sharing in Distributed Teams. In

Proceedings of the 9th International Conference on Knowledge Management (I-KNOW 2009)– [Hap09c] Hans-Jörg Happel: Social Search and Need-driven Knowledge Sharing in Wikis with Woogle. In

Proceedings of the 5th international Symposium on Wikis and Open Collaboration (Orlando, Florida, October 25 - 27, 2009). WikiSym '09. ACM, New York, NY, 1-10.

– [Mik09]: Mika, P., Meij, E., Zaragoza, H.: Investigating the semantic gap through query log analysis. In: International Semantic Web Conference. Lecture Notes in Computer Science, vol. 5823, pp. 441–455. Springer (2009)

– [Smy05] Smyth, Barry ; Balfe, Evelyn ; Freyne, Jill ; Briggs, Peter ; Coyle, Maurice ; Boydell, Oisin: Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine. In: User Modeling and User-Adapted Interaction 14 (2005), Nr. 5, S. 383–423.

– [Tee06] Teevan, Jaime ; Adar, Eytan ; Jones, Rosie ; Potts, Michael: History repeats itself: repeat queries in Yahoo’s logs. In: SIGIR’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA : ACM, 2006, S. 703–704.

– [Zha09] Zhang, Dell ; Lu, Jinsong: What queries are likely to recur in web search? In: SIGIR ’09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA : ACM, 2009, S. 827–828.

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 30

Page 30: Semantic Need: Guiding Metadata Annotations by Questions People #ask

The Semantic Web: Problems• Lack of resources: you might not annotate everything

– Metadata creation is costly– Access to metadata might be restricted to different spheres of sharing

(private, friends, world…)– “..probably the most important [open question] for the Semantic Web. How

to create incentives for annotation?” (Handschuh 2005, p198) [12]

• Lack of guidance: you might annotate the wrong things– „ Semantic gap between supply and demand on the Semantic Web” [Mik09]– The two processes of metadata creation and metadata use are decoupled

concerning time and actors– Existing annotation approaches drive the annotation process by the pre-

defined ontology structure

No unified theory, why metadata is created and how it is shared– Semantic Web Vision does not address the creator side of metadata –

it spends a lot of effort to convince people using the Semantic Web but not contributing to it

31Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China