24
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization Blerina Spahiu , Riccardo Porrini, Matteo Palmonari, Anisa Rula, Andrea Maurino University of Milano-Bicocca ([email protected] ) [email protected] imib.it

ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Embed Size (px)

Citation preview

Page 1: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Blerina Spahiu, Riccardo Porrini, Matteo Palmonari, Anisa Rula, Andrea Maurino

University of Milano-Bicocca ([email protected])

[email protected]

Page 2: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Outline

Motivation Dataset Understanding State of the Art

Summarization Framework Abstract Knowledge Patterns (AKPs) Pattern Minimalization Summary extraction, storage and presentation

Evaluation Compactness Informativeness User Study

Conclusion and Future Work

2University of Milan - Bicocca

Page 3: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Introduction

What types of resources are there in a data set? How are they described? What types of resources are linked by a certain property and how frequently?

Page 4: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Motivation

Understanding the content of data sets is challenging Looking at the ontology is not enough:

Ontologies may be large and underspecified

• DBpedia 2015-04: 2795 properties, domain not specified for 259 properties, range not specified for 187 properties

• No information about the usage Explorative queries are too expensive

Significant server overload High response time/timeout

Page 5: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

State of the Art

University of Milan - Bicocca 5

Relevance Based Summarization Pattern Based Approaches

Troullinoy et al. 2015Zhang et al. 2007

Identifying subsets of data sets or ontologies that are considered to be more relevant

Aim at extracting knowledge patterns for a complete representation of the data set

Mihindukulasooriya et al. 2015Persutti et al. 2011M. Jarrar and M. Dikaiakos, 2012

Schema Induction

Induces a schema from the data and aim at extracting stronger assertions

Völker and Niepert, 2011

Statistics about the dataset

Konrath et. al 2012Langegger and W. Wöb, 2009Auer et al. 2012Linked Open Vocabularies (http://lov.okfn.org/)

Aim at reporting statistics about the usage of different vocabularies, properties and types in the data

Page 6: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

State of the Art

University of Milan - Bicocca 6

Relevance Based Summarization Pattern Based Approaches

Troullinoy et al. 2015Zhang et al. 2007

Identifying subsets of data sets or ontologies that are considered to be more relevant.

Aim at extracting knowledge patterns for a complete rapresentation of the dataset.

Mihindukulasooriya et al. 2015Persutti et al. 2011M. Jarrar and M. Dikaiakos, 2012

Schema Induction

Induces a schema from the data and aim at extracting stronger assertions.

Völker and Niepert, 2011

Statistics about the dataset

Konrath et. al 2012Langegger and W. Wöb, 2009Auer et al. 2012Linked Open Vocabularies (http://lov.okfn.org/)

Aim at reporting statistics about the usage of different vocabularies, properties and types in the data.

ABSTAT

Page 7: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

ABSTAT

ABSTAT (http://abstat.disco.unimib.it) is an ontology-driven linked data summarization framework

A summary provides a complete but compact schema-level representation of a data set A set of Abstract Knowledge Patterns (AKPs) Statistics

An AKP represents the fact that there are instance of type Person linked with instances of type Settlement by the property birthplace

How many times does this pattern occur in the data set

How many times does a certain type occur as minimal type and how many time does the property occur in the dataset

Page 8: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Abstract Knowledge Patterns (AKPs)

ABSTAT adopts a minimalization mechanism based on minimal type patterns

Minimalization is based on a subtype graph which represents the data ontology

Abstract Knowledge Patterns (AKPs) are abstract representations of Knowledge Patterns

An AKP is a triple (C; P; D ) such that C and D are types and P is a property

In ABSTAT we represent only a set of AKP occurring in the data set, those that are minimal types

Page 9: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Person

Sportist

FootballPlayer

Lawyer

Jim Brown

AmalClooney

“1936-02-17”

XMLSchema#Date

hasWife

Artist

George Clooney

birthDate

= types= instances= literals

.subclassOf

subclassOf

subclassOf

subclassOf

type

type

type

The (minimal-type) patterns extracted by ABSTAT are:<Artist, hasWife, Lawyer><FootballPlayer, birthDate, XMLSchema#Date>

(type)

An example how AKPs are extracted

typetype

type

Page 10: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Person

Sportist

FootballPlayer

Lawyer

Jim Brown

AmalClooney

“1936-02-17”

XMLSchema#Date

hasWife

Artist

George Clooney

birthDate

= types= instances= literals

.subclassOf

subclassOf

subclassOf

subclassOf

type

type

type

The (minimal-type) patterns extracted by ABSTAT are:<Artist, hasWife, Lawyer><FootballPlayer, birthDate, XMLSchema#Date>

(type)

An example how AKPs are extracted

typetype

typeRedundant patterns excluded by the summary:<Person, hasWife, Person><Sportist, birthDate, XMLSchema#Date><Person, birthDate, XMLSchema#Date>

Page 11: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Person

Sportist

FootballPlayer

Lawyer

Jim Brown

AmalClooney

“1936-02-17”

XMLSchema#Date

hasWife

Artist

George Clooney

birthDate

= types= instances= literals

.subclassOf

subclassOf

subclassOf

subclassOf

type

type

type

The (minimal-type) patterns extracted by ABSTAT are:<Artist, hasWife, Lawyer><FootballPlayer, birthDate, XMLSchema#Date><Artist, birthDate, XMLSchema#Date>

(type)

An example how AKPs are extracted

typetype

typetype

Page 12: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Summary Extraction Workflow

Page 13: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

13

ABSTAT User Interfaces

ABSTAT homepage

(http://abstat.disco.unimib.it)

ABSTATBrowse

(http://abstat.disco.unimib.it/browse)

ABSTATSearch

(http://abstat.disco.unimib.it/search)

SPARQL Endpoint

(http://abstat.disco.unimib.it/sparql)

University of Milan - Bicocca

Page 14: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Experimental Evaluation

Summary compactness Number of patterns in the summary vs. number of triples in the

data set Comparison with a similar approach without minimalization

Summary informativeness Insights about the semantics of the properties Small-scale user study

Page 15: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Compactness

Dataset Relational Typing Assertions Types (Ext.) Properties (Ext.) Patterns

DBpedia Core 2014 40.5M 29.7M 70.1M 869 (85) 1439 (15) 171340

DBpedia 3.9 Infobox 96.3M 19.7M 116.4M 821 (58) 62572 (14) 732418

Linked Brainz 180.1M 39.6M 221.7M 21 (9) 33 (0) 161

Reduction Rate =

Dataset ABSTAT LOUPE

DBpedia Core 2014 0.002 0.01

Linked Brainz 6.72 10-7 7.1 10-7

Minimalization produces more compact summaries Advantage of minimalization is more observable for datasets with

richer subtype graphs and typing assertions

Data sets and summaries statistics

Reduction rate

Number of patterns

Number of assertions in the data set

Similar to ABSTAT without minimalization

Page 16: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Informativeness

ABSTAT summaries provide useful insights about the semantics of properties, based on their usage within a data set

Dataset Missing Domain (%)

Missing Range (%)

Missing Domain & Range (%)

DBpedia Core 2014 259 (18%) 187 (13%) 48 (3.3%)

DBpedia 3.9 Infobox 61368 (98%)

61309 (98%)

61161 (97%)

Linked Brainz 13 (39%) 15 (45%) 13 (39%)

Page 17: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Inferred domain and range for DBpedia Core 2014

dbo:t

ype

dbo:s

ucce

ssor

dbo:d

ivisio

n

dbo:i

sPartO

f

dbo:s

eries

dbo:g

ender

dbo:s

ource

dbo:l

ocalA

utho..

.

dbo:r

oyalA

nthem

dbo:m

ainIntere

st

dbo:c

hairL

abel

dbo:f

ormat

dbo:m

anag

e...

dbo:r

elated

dbo:h

asVaria

nt

dbo:v

ariantO

f

dbo:n

amedAfte

r0

20

40

60

80

100

120

140

160

Extracted minimal types (domain)

Num

ber o

f min

imal

type

s

Page 18: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

User Study: Setup

Can ABSTAT be useful to support query formulation? Queries to DBpedia 3.9 Infobox from the Questions and

Answering in Linked Open Data benchmark 5 queries of increasing length (1 of length 1, 2 of length 2

and 2 of length 3) 20 participants, 2 groups:

abstat group uses ABSTAT (after 20 min of training)control group does not use ABSTAT

Measures:Time needed to formulate the queryAccuracy of the answer

Page 19: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

19

User Study: Questionnaire

University of Milan - Bicocca

Page 20: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

User Study: Results

Group Avg. Completion Time (s) AccuracyQuery 1- length 1 How many employees does Google

have?

abstat 358.9 0.9control 380.6 0.8

Query 2- length 2 Give me all people that were born in Vienna and died in Berlin.

abstat 356.3 1control 346.9 0.8

Query 3- length 2 Which professional surfers were born in Australia?

abstat 476.6 0.6

control 234.24 0.7Query 4- length 3 In which films directed by Gary Marshall was Julia Roberts

starring?

abstat 333.4 0.9

control 445.6 0.9

Query 5- length 3 Give me all books by William Goldman with more than 300 pages.

abstat 233.4 1control 569.8 0.7The independent t-test showed that there was a significant effect between two groups for answering correctly Q5: t(16) = 10.32, p < .005

Page 21: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

User Study: Results Analysis

abstat group users benefit from ABSTAT summary in terms of average completion time, accuracy, or both Increasing accuracy over increasing difficulty, performing the tasks faster Exception is query 3, because the individual Surfing is classified with no

type other than owl:Thing

Two used strategies to answer the queries by participants from the control group were: To directly access the public web page describing the DBpedia named

individuals mentioned in the query Very few submitted explorative SPARQL queries to the endpoint

Page 22: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Conclusion and Future Work

ABSTAT: ontology-driven summarization with minimalization Sensible reduction rate and promising results about the

informativeness of the summary Currently extending the user study

Apply relevance-oriented summarization methods based on connectivity analysis

ABSTAT summary should consider the inheritance of properties to produce even more compact summaries

We envision a complete analysis of the most important data set available in the LOD cloud (20+ data sets available)

APIs available soon

Page 23: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

Thank you for your attention!

23University of Milan - Bicocca

Page 24: ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization

24

www.abstat.unimib.it

University of Milan - Bicocca

Feedback is WELCOMED!