Upload
fariz-darari
View
54
Download
0
Embed Size (px)
Citation preview
C Q
Fariz Darari
Supervisors: Werner Nutt, Sebastian Rudolph
Managing and Consuming Completeness Information
for RDF Data Sources
Why completeness information?Though generally incomplete, parts of data on the Web are indeed complete!
Completeness information lets us know exactly which parts are complete…
Real-world RDF data sources need a large number of completeness
statements, resulting in long reasoning time.
Data-agnostic reasoning optimization
CORNER COOL-WD
• Generic data source• Support data-agnostic
reasoning• Highlights: RDFS
extension, federated extension
• Wikidata-specific• Support data-aware
reasoning• Highlights: Built-in on
Wikidata, completeness analytics, query diagnostics
Complete for all Apollo 11 crew:Compl(apollo11,crew,?crew)
Give me people who are NOT Apollo 11 crew:SELECT * WHERE { ?person isA person .
FILTER NOT EXISTS { apollo11 crew ?person } }
Is this query Qneg sound?Give me the children of Apollo 11 crew:SELECT * WHERE { apollo11 crew ?crew .
?crew child ?child }
Is this query Qpos complete?*
*Suppose Wikidata is also complete for all children of Neil, Buzz, and Michael
Let’s manage and consume completeness information!Data-aware completeness reasoning
Darari et al. (ISWC’13) formalized data-agnostic completeness reasoning.
The abstraction of the data graph results in weaker inferences:
e.g., fails to guarantee the completeness of Qpos
The incorporation of data graph increases the complexity from
NP-complete (for data-agnostic) to П2𝑃-complete.
Yet, optimization techniques exist for practical settings.
But data-aware reasoning can guarantee it:
Optimizing completeness reasoning
Data-aware reasoning optimization
Soundness reasoning
Answer soundness reasoning
Is my query answer sound?
Input: P query with negation,
C set of completeness statements,
G graph,
u answer mapping
Output: true iff u is sound wrt. P, C, and G
Characterization The answer u of P over G wrt. C is sound iff
all P's NOT-EXISTS-BGPs (= negative parts), after applying u
to them, are complete for G wrt. C
Time-aware completeness reasoning
Completeness statements can sometimes be out-of-date. Capturing this data-dynamicity over time
increases flexibility in completeness reasoning!
Completeness management tools
To increase the potential uptake of our completeness reasoning framework, we have developed two completeness
management tools: CORNER (for Completeness Reasoner) and COOL-WD (for Completeness Tool for Wikidata)
Publications• Radityo Eko Prasojo, Fariz Darari, Simon Razniewski, Werner Nutt: Managing and Consuming Completeness Information for Wikidata Using COOL-WD. COLD 2016.
• Fariz Darari, Simon Razniewski, Radityo Eko Prasojo, Werner Nutt: Enabling Fine-Grained RDF Data Completeness Assessment. ICWE 2016.
• Fariz Darari, Radityo Eko Prasojo, Werner Nutt: Expressing No-Value Information in RDF. ISWC (P&D) 2015.
• Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gap between RDF and SPARQL using Completeness Statements. ISWC (P&D) 2014.
• Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A Completeness Reasoner for SPARQL Queries over RDF Data Sources. ESWC (P&D) 2014.
Cardinality extraction from the Web: Auto-generating completeness informationCardinality information often expresses complete count information, when this matches the count of respective data in a KB,
completeness statements can be generated automatically!
Web documents
(eg. Wikipedia)
POS tags
NER tags
parsing
Distant Supervision Learning
Sentences containing a number matching with
the values’ count of a relation
Sentences containing a number NOT matching
with the values’ count of a relation
Learning classifier: Naïve Bayes, Logistic
Regression, SVM, Conditional Random Fields
Cardinalities
KB with
completeness statements
Training data
Pattern soundness reasoning
Is my query pattern sound?
Input: P minimal query with negation,
C set of completeness statements
Output: true iff P is sound wrt. C
Characterization The query P is sound wrt. C iff
each BGP of the NOT-EXISTS patterns (= negative parts)
is complete wrt. C under the condition of
the positive part of P
It is the case that Qneg is pattern-sound since the statement
Compl(apollo11,crew,?crew) guarantees the completeness of
“apollo11 crew ?person” under any condition
(hence also under the condition of “?person isA person”)
apollo11 crew ?crew ?crew child ?childQpos
Compl(apollo11,crew,?crew)
neil child ?child buzz child ?child michael child ?child
Compl(neil,child,?child) Compl(buzz,child,?child) Compl(michael,child,?child)
Compl(neil,spouse,?spouse) Compl(buzz,child,?child) Compl(michael,child,?child)
Constants:
Constant-relevance“A completeness statement C is relevant to the query Q
iff all constants in C appear in Q”
{neil, spouse} {buzz, child} {michael, child}
michael child ?child
Constants: {michael, child}
X XRetrieval of constant-relevant
statements can be reduced to
subset-querying
Completeness template“Generalize similar completeness statements
for simultaneous matching process”
Compl(neil,child,?child) Compl(buzz,child,?child) Compl(michael,child,?child)
Compl[$p,child,?child]$p = {neil, buzz, michael}Partial matching
“Filter irrelevant completeness templates
by ruling out templates whose body is not overlapped
with the query’s body”
Experiments showed
a 50000X speed-up!
Experiments showed
a 112X speed-up!
Open-world style Closed-world style
of negationCompleteness statements:
reducing soundness checking to completeness checking!
2012
Compl(?movie,director,tarantino)
∞
Compl(?movie,actor,tarantino)
SELECT * WHERE { ?movie actor tarantino. ?movie director tarantino }
“GCD := maximum date d s.t.
all parts of the query Q can be guaranteed to be complete”
Guaranteed Completeness Date (GCD) = 2012
Algorithm
Incrementally compute the union of query parts that can be guaranteed to be complete from the latest date in C to the earliest date, while on the way checking if all the query parts are already included.