28
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis Mathieu d’Aquin and Enrico Motta Knowledge Media Institute The Open University, Milton Keynes, UK

Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Embed Size (px)

DESCRIPTION

Presentation at the KCAP 2011 conference of the paper: http://data.open.ac.uk/applications/kcap2011.pdf

Citation preview

Page 1: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Mathieu d’Aquin and Enrico MottaKnowledge Media Institute

The Open University, Milton Keynes, UK

Page 2: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Hey, Data! I Love Data!

Page 3: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Let’s see… You there, what are

you about?

One great

dataset

Me?My name is “one great

dataset” and my namespace

http://datasets.com/greatone/

Page 4: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

OK, but what’s there?

One great

dataset

1,254,245 triples. I also have a SPARQL

endpoint!

Page 5: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Can you be more explicit?

One great

dataset

Euh.. I have a Void description… with

links and all…

Page 6: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Hmm… I mean, what are these triples saying?

One great

dataset

You mean you want to see… my ontology?

Page 7: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

That would help… but can

you tell me what I can ask

you?

One great

dataset

Like example SPARQL queries?

Page 8: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Yeah… but I don’t know SPARQL, and how do you chose

your examples anyway?

One great

dataset

Page 9: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

One great

dataset

Well… figure it out by yourself them!

Page 10: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Summarizing an RDF dataset with questions

• We would like to be able to give an entry point to a dataset by showing questions it is good at answering

• In a way that can be navigated• Example:

Tom Heath’s FOAF profile

Who are the people Tom knows?

Page 11: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

A question

• A list of characteristics of objects (clauses) based on the relationships between objects– Things that are people, i.e. instances of <Person>

– Related to <tom> through the relation <knows>

• For which the answer is a set of objects – All the objects that satisfy the clauses of the

question

Page 12: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Formal concept analysisComposite (c)

Even (e)

Odd (o)

Prime (p)

Square (s)

1 X X

2 X X

3 X X

4 X X X

5 X X

6 X X

7 X X

8 X X

9 X X X

10 X X

Formal context: objects with binary attributes

Lattice of concepts: set of objects (extension) with common properties (intension)

Example from: http://en.wikipedia.org/wiki/Formal_concept_analysis

Page 13: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

RDF instances as individuals in a formal context

• Present relations of objects as binary attributes:– RDF: tom a Person. tom knows enrico. jeff knows tom.

– FCA: tom: {Class:-Person, knows:-Enrico, jeff-:knows}

• Include implicit information based on the ontology– tom: {Class:-Person, Class:-Agent, Class:-Thing, knows:-Enrico, knows:Person, knows:-Agent, knows:-Thing,jeff-:knows, Person:-knows, Agent-:knows, Thing:-knows}

Page 14: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Example lattice: Tom’s FOAF Profile

Page 15: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Eliminating redundancies

Who are the people Tom knows?

Page 16: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

A concept in the lattice is a question

• Intension = clauses of the question

• Extension = answers – All the objects of the extension satisfy the clauses of the question

• Different areas of the lattice focus on different topics

• Questions are organized in a hierarchy

{Class:-Person, tom-:knows}

What are the (Person) that (tom knows)?

What are tom’s current projects?

What are the people that tom knows?

What are the people?

Page 17: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

But…

• The RDFFormal Context process can generate a lot of attributes and so a lot of questions

• Ranging from things uninterestingly general What are the Things?

• To the ones that might be interesting only in very specific cases What are the indian restaurants located in San Diego that have

been rated OK and are called “Chez Bob”?• Need to extract a list of questions as an entry point

Dataset Nb. Instances Nb. Questions

geography 715 842

jobs 4142 66284

restaurants 9746 6810

drama 19294 10083

Page 18: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

How to measure the interestingness of a question - metrics

• Inspired by ontology summarization:– Coverage: if providing a list of questions, the questions should

cover the entire lattice (i.e., at least one question per branch)– Level: Too general or too specific questions are not useful– Density: The number of clauses can have an impact (avoid too

complex questions as well as too simple ones)• Inspired from FCA:

– Support: the cardinality of the extent – i.e. the number of answers– Intentional Stability: How much a concept depends on particular

elements of the extension– Extensional Stability: How much a concept depends on particular

elements of the intension

Page 19: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Experiment: finding the relevant metrics

• 4 datasets in different domains• 12 evaluators providing questions of interest for these

datasets• Obtained 44 questions, out of which 27 are valid (no

overlap)– Some are too complicated for our model (include disjunction,

negation, aggregation functions)• “What is the highest point in Florida?”

– A large part do not comply with the initial instructions: should be self-contained and answered by a list of objects• “How high is mountain x?”• “What are the restaurant in a given city?”

Page 20: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Results• Level: Questions between levels 3 and 7. 4.46 is the average.

Interesting questions located around the center of the lattice• Density: Questions have between 1 and 3 clauses

Simple questions are preferred• Support: Very large variations amongst the obtained questions• Intentional Stability: Very large variations amongst the obtained

questions• Extensional Stability: High values (between 0.75 and 1.0), especially

compared to the average (0.4)

• Conclusion:– In order to establish a list of questions most likely to be of interest, a

combination of level, density and extensional stability, together with coverage should be used

Page 21: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Evaluation

• Algorithm to generate a set of questions from the lattice of an RDF dataset that– Cover the entire lattice– Are believed to be interesting according to a given measure

• Datasets from data.open.ac.uk– 614 course descriptions – 1706 Video podcasts

• Using the metrics: random, closeness to middle level, density close to 2, support, extensional stability, and Aggregated = 1/3 level + 1/3 density + 1/3 stability

• 6 users to score the resulting sets of questions (6 metrics in 2 datasets: 12 sets in total) depending on interestingness

Page 22: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Results

Page 23: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Implementation: the whatoask interface

Dataset with

SPARQL endpoint

SPARQL2RCF Formal Context CORON

LatticeOffline

Online

Lattice Parser

Interface Generation

(using metrics)

Interface with navigation in

Browser User

Page 24: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Example: Open educational material(OpenLearn)

Page 25: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Example: Database of reading experiences (Arts History project)

Page 26: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Example: Open University Buildings

Page 27: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

Conclusion• The technique presented provides both a summary and an

exploration mechanism over RDF data, using the underlying ontology and formal concept analysis

• It provides an interface for documenting the dataset by examples rather than by specification

• It favors serendipity in the exploration of the dataset, without the need for prior, specialized knowledge

• The current interface in beta is available in an online demo• Need to improve the question generation and navigation

mechanisms• Ongoing experiment including information gathered through the

links to external dataset, to generate un-anticipated questions• Use-cases in research projects in Arts and Humanities

Page 28: Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

More info– Demo:

http://lucero-project.info/lb/2011/06/what-to-ask-linked-data/

– Data.open.ac.uk (for some of the datasets used)

– @mdaquin – [email protected]

Thank you!