Dipso K Mi

Web3.0 and Language Resources

Knowledge Media Institute (KMi)The Open University

Semantic Technologies @ KMi

Outline

• Library of generic problem solving methods– To act as active components

• Methods for dealing with heterogeneous knowledge sources– Knowledge Fusion (KnoFuss)– Ontology Matching (Scarlet)

• Infrastructures for storing large scale semantics (Watson)

• Tools for exploiting distributed semantics– Open Domain, Multi-Ontology Question Answering

(PoweAqua)

PSM

• Library of generic problem solving methods– To act as active components

I include both generic slides and a detailed example- this is more for you to understand what is going on, I think most of the detailed slides can be skipped in the talk.

Knowledge-level Architecturesfor Sharing and Reuse

Application of the modelling paradigm to the specification and

use of libraries of reusable components for knowledge systems

Knowledge-level Architecturesfor Sharing and Reuse

Modelling Frameworks (1)

• A modelling framework identifies the generic types of knowledge which occur in knowledge systems, thus providing a generic epistemological organization for knowledge systems

• Several exist – KADS/Common KADS - Un.of Amsterdam– Components of Expertise - Steels– Generic Tasks - Chandrasekaran– Role-limiting Methods - McDermott– Protégé - Musen, Stanford– TMDA - Motta– Ibrow - Fensel, Motta et al.

Modelling Frameworks (2)

• Much in common– Emphasis on reusable models– Typology of generic tasks– Constructivist paradigm

• Some differences– Different degrees of coupling between domain-specific and domain-independent knowledge

– Different degrees of flexibility– Different typologies of knowledge categories

A Constructive Approach...

Let’s define our own framework...

Generic Tasks

• Informal definition– A generic class of applications - e.g., planning, design, diagnosis, scheduling,

• More precise definition– A knowledge-level, application-independent description of the goal to be attained by a problem solver.

• Several typologies exist– e.g., Breuker, 1994

• Viewpoints over applications– No ‘natural categories’– Different viewpoints can be imposed on a particular application

Example: Parametric Design

Generic Task Parametric Design Inputs: Parameters, Constraints,

Requirements, Cost-Function, Preferences

Output: Design-Model

Goal: “To produce a complete and consistent design model, which satisfies the given

requirements”

Preconditions: “At least one requirement and one parameter are provided”

Example: Classification

Generic Task Classification Inputs: Candidate-classes

ObservablesMatch-criterionSolution-criterion

Output: Best-Matching-Classes

Preconditions: “At least one candidate

class exists”

Goal: “To find the class that best

explains the observables”

Generic Component 2: Reusable PSMs

• A domain-independent, knowledge-level specification of problem solving behaviour, which can be used to solve a class of tasks.

• PSM specifications may be partial

• PSM can be task-specific– E.g., heuristic classification

• PSM can be task-independent– E.g., search methods, such as hill-climbing, A*, etc.....

Functional Specification of a PSM

Problem solving method search ontology import state-space-terminology competence roles input input: State output output: State preconditions input ≠ 0 postconditions solution_state (output) assumptions ?s . solution_state (?s) & successor

(input, ?s)

Operational Description

Begin

states:= one x. initialize (input input)repeatstate:= one x . select _state (states states)if solution_state (state)then return state else succ_states:= one x. derive_successor_states

(state state) states:= one x. update_state_space (input1 states

input2 succ_states)

end ifend repeat

end

Task-Method StructuresProblem Type

Primitive PSM

Multi-Functional Domain Models

• Domain-specific models, which are not committed to a specific PSM or task.

• Examples– A database of cars– The CYC knowledge base, etc..

Picture so far..

Problem SolvingMethod

Classification Simple Classifier

Lunar rocks

Application Model

Generic Task

Multi-Functional Domain


Classification Simple Classifier

Lunar rocks

Application Model

Generic Task


Issue

How to link different reusable components?


Classification

Task-DomainMapping

PSM-DomainMapping

Simple Classifier

Lunar rocks

Application Model

Generic Task


Solution: Mappings

• Mappings model explicitly the relationship between different components in an application model

Task-PSMMapping

Example

• Scenario: Office Allocation Application

• Generic Task: Parametric Design• Domain: KB about employees and offices

Parameter

Employee

Design Model

Pairs<Employee, Room>

Task Level

Domain Level

Mappings are an example of application-specific knowledge. Are there others?

Application-specific knowledge

Yes: Application-specific heuristic problem solving knowledge

Elevator Design Example

• A configuration designer only considers two positions for the counterweight– Half way between platform and U-bracket– A position such that the distance

between the counterweight and the platform is at least 0.75 inches

Complete Picture


Application Model

Generic Task


MappingKnowledge

Application-specificProblem-Solving Knowledge

Application Configuration

Detailed Example:A Library of Components for

Classification

Observables

Candidate Sols.

Criterion

Classification Solution

Classification

• Classification can be seen as the problem of finding the solution (class), which best explains a set of known facts (observables), according to some criterion

Example

Observables

Candidate Sols.

Criterion

Classification Solution

{background=green; area=china...}

Complete-coverage-criterion(every observable has to be explained)

{chinese-granny, dutch-granny, etc..}

{chinese-granny}

Observables

Observables = set_of (Observable);Observable = {feature, value}.

Well defined Observables (obs): ({f1, v1} obs {f1, v2} obs) -> v1 = v2

({f1, v1} obs) -> legal_feature_value (f1, v1 )

Solutions

Solution = set_of (Feature_Spec);Feature_Spec = {Feature, Feature_value_spec}

Feature_value_spec = Unary_Relation

Well defined Solution (sol):{f1, s1} sol holds (s1, v1 ) -> legal_feature_value (f1, v1 )

Matching

Observable={f1, v1} matches Solution=sol iff:

{f1, c} sol holds (c, v1 )

Matching Sets of Obs to a Solution

Sol: {{fsol1, c1}...{fsolm, cm}}; Obs: {{fob1, v1}...{fobn, vn}}

Four possible cases: {fj, cj} sol {fj, vj} obs holds (cj, vj) -> Explained (fj)

{fj, cj} sol {fj, vj} obs not holds (cj, vj) -> Inconsistent(fj)

{fj, vj} obs {fj, cj} sol -> Unexplained (fj)

{fj, vj} obs {fj, cj} sol -> Missing (fj)

Default Match Criterion

Match Score:Vector: <I, E, U, M>

Match Comparison RelationS1 = (i1, e1, u1, m1); S2 = (i2, e2, u2, m2)

S1 better_score than S2 iff:

(i1 < i2) (i2 = i1 e2 < e1) (i2 = i1 e2 = e1 u1 < u2) (i2 = i1 e2 = e1 u2 = u1 m1 < m2)

Possible Solution Criteria

• Positive Coverage– Some feature is explained and none is inconsistent

• Complete Coverage– All features are explained and none is inconsistent

Hierarchy of Criteria

Solution Criterion

Match Criterion

Match Score Comparison Rel

Macro Score MechanismFeature Score Mechanism

Match Score Mechanism

Observables

(def-class observables (set) ?obs "This is simply a set of observables. An important constraint is that there cannot be two values

for the same feature in a set of observables" :iff-def (every ?obs observable) :constraint (not (exists (?ob1 ?ob2) (and (member ?ob1 ?obs) (member ?ob2 ?obs) (has-observable-feature ?ob1 ?

f) (has-observable-feature ?ob2 ?

f) (has-observable-value ?ob1 ?

v1) (has-observable-value ?ob2 ?

v2) (not (= ?v1 ?v2))))))

Solutions

(def-class solution () ?x "A solution is a set of feature definitions" :iff-def (every ?x feature-definition))

(def-class feature-definition () ?x ((has-feature-name :type feature) (has-feature-value-spec :type unary-relation)) :constraint (=> (and (has-feature-name ?x ?f) (has-feature-value-spec ?x

?spec)) (=> (holds ?spec ?v) (legal-feature-value ?f ?

v))))

Solution Criterion

(def-class solution-admissibility-criterion () ?c ((applies-to-match-score-type :type match-score-type) (has-solution-admissibility-relation :type unary-

relation)) :constraint (=> (and (solution-admissibility-criterion ?c) (has-solution-admissibility-

relation ?c ?r) (domain ?r ?d)) (subclass-of ?d match-score)))

Monotonicity of Admissibile Solutions

(def-axiom admissibility-is-monotonic "This axiom states that the admissibility criterion is

monotonic. That is, if a solution, ?sol, is admissible, then any solution which is better than ?sol will also be admissible"

(forall (?sol1 ?sol2 ?obs ?criterion) (=> (and (admissible-solution ?sol1 (apply-match-criterion

?criterion ?obs ?sol1) ?criterion)

(better-match-than ?sol2 ?sol1 ?obs ?criterion))

(admissible-solution ?sol2 (apply-match-criterion

?criterion ?obs ?sol2) ?criterion))))

Complete Coverage

(def-instance complete-coverage-admissibility-criterion solution-admissibility-criterion ((applies-to-match-score-type default-match-score) (has-solution-admissibility-relation complete-coverage-admissibility-relation)))

(def-relation complete-coverage-admissibility-relation (?score)

"a solution should be consistent and explain all features" :constraint (default-match-score ?score) :iff-def (and (= (length (first ?score)) 0) ;;no

inconsistency (= (length (third ?score)) 0))) ;;no

unexplained

Classification Task Ontology

• 42 Definitions• Provides both a theory of classification and a vocabulary to describe classification problems

• Ontology is separated from task specifications

Generic Classification Task

• Input roles– Candidate Solutions, Match Criterion, Solution Criterion, Observables

• Precondition– Both observables and candidate solutions have to be provided

• Goal– To find a solution from the candidate solutions which is admissible with respect to the given observables, solution criterion and match criterion

Specific Classification Tasks

• Single-Solution Classification Task– Single-solution assumption

• Optimal Classification Tasks– Goal requires optimality

Problem Solving Library

• Based on heuristic classification model

• Supports both data-directed and solution-directed classification

• Based on search paradigm• Supported by a method ontology

Method Ontology: Main Concepts

• Abstractors– Mechanism for performing abstraction on observables

– Abstractor: Obs* -> Obs

• Refiners– Mechanism for specialising a solution

– Refiner: Sol -> Sol*

• Candidate Exclusion Criterion– A criterion which is used to decide when a search path is a dead-end

– Default criterion rules out inconsistent solutions

Monotonicity of Exclusion Criterion

(def-axiom exclusion-is-monotonic (forall (?sol1 ?sol2 ?obs ?criterion) (=> (and (ruled-out-solution ?sol1 (the-match-score ?sol1) ?

criterion) (not (better-match-than ?sol2 ?sol1 ?

obs ?criterion))) (ruled-out-solution ?sol2 (the-match-score ?sol2)?criterion))))

Axiom of Congruence(def-axiom congruent-admissibility-and-exclusion-criteria (forall (?sol ?task) (=> (member ?sol (the-solution-space ?task)) (not (and (admissible-solution ?sol (the-match-score ?sol) (role-value ?task 'has-solution-admissibility-

criterion)) (ruled-out-solution ?sol (the-match-score ?sol) (role-value ?psm

'has-solution-exclusion-criterion)))))))

Three Heuristic Classification PSMs

• Two Data-directed– Admissible Solution Classifier

• Finds one admissible solution according to the given criteria

• Uses backtracking hill climbing– Optimal Classifier

• Performs complete search looking for optimal solution

• Uses best-first strategy• Uses candidate exclusion criterion to prune search space

• One Solution-directed– Goes down the solution hierarchy, acquiring observables as needed

– Ask for observables with max discrimination power

Task-Method Hierarchy

abstraction

heuristic-classification-psm

classification

rank-solutions refinement

basic-heuristic-matchselect-abstractor one-step-abstraction collect-refiners apply-refiners

abstraction-psm refinement-psmrank-solutions-psm

KnoFuss

• Methods for dealing with heterogeneous knowledge sources– Knowledge Fusion (KnoFuss)

• The story here is that smart products will contain a lot of instance level semantic data encoded in terms of different ontologies and it will be important to be able to compare and merge similar (or fuse) instance data.

Knowledge fusion scenario

RDF

Images

Other data

Annotation FusionText

Internal corporate reports (Intranet)

Pre-defined public sources (WWW)

Domain ontology

KnoFuss

Knowledge base

Knowledge fusion

Ontology integration

Knowledge base

integration

Ontology matching

Instancetransformation

Coreference resolution

Dependencyprocessing

Source KB

TargetKB

SPARQL query translation

Fusion workflow

KnoFuss architecture

Fusion KBIntermediate data

Main KB

Fusion module

ObjectIdentificationMethod

ConflictDetectionMethod

ConflictResolutionMethod

Method library

New data

Fusion ontology

• Method library– Contains implementation of each specific algorithm

• Fusion ontology– Describes method capabilities– Defines intermediate structures (mappings, conflict sets, etc.)

Steps

• Coreference resolution– Attribute similarity algorithms

• Dependency processing– Employing additional information:

• Schema restrictions• Links between instances• Provenance

– Using formal uncertainty reasoning• Dempster-Shafer belief networks

Scarlet

• Methods for dealing with heterogeneous knowledge sources– Ontology Matching (Scarlet)

• The motivation is that we will need to match between the different ontologies used by different products.

1

0.9

0.9 0.91

0.5

0.5

–Label similarity methods •e.g., Full_Professor = FullProfessor

–Structure similarity methods•Using taxonomic/property related information

Ontology Matching

–Most ontology matching techniques work only in cases when:

–There is a sufficient syntactic overlap between the labels of the concepts in the matched ontologies–The structure of the matched ontologies is rich enough to allow meaningful comparisons

–However, this might not be the case for smart products–Smart products from different domains will use very different terminology thus excluding syntactic comparison–Smart product ontologies will probably be small and structurally shallow due to their resource limitations thus excluding the use of structure based techniques

–Therefore we propose a new matching paradigm which relies on the use of external knowledge

Ontology Matching

New paradigm: use of background knowledge

A B

Background Knowledge(external source)

A’ B’R

R

Proposal: • rely on online ontologies (Semantic Web) to derive mappings• ontologies are dynamically discovered (using Watson) and combined

A Brel

Semantic Web

External Source = Semantic Web

Does not rely on any pre-selected knowledge sources.

Sabou, M., d'Aquin, M., and Motta, E. (2008) Exploring the Semantic Web as Background Knowledge for Ontology Matching, Journal of Data Semantic, XI.

How to combine online ontologies to derive mappings?

The Question is …

Strategy 1 - Definition

Find ontologies that contain equivalent classes for A and B and use their relationship in the ontologies to derive the mapping.

A Brel

Sem

anti

c W

eb

A1’B1’

A2’B2’

An’Bn’

O1

O2 On

BABA

BABA

BABA

BABA

⊥⇒⊥⊇=>⊇⊆=>⊆≡⇒≡

''

''

''

''For each ontology use these rules:

…

These rules can be extended to take into account indirect relations between A’ and B’, e.g., between parents of A’ and B’:

'''' BABCCA ⊥⇒⊥∧⊆

Strategy 1- Examples

ka2.rdf

Researcher AcademicStaff

Sem

anti

c W

eb

Researcher

AcademicStaff

⊆

⊆

ISWC SWRC

But what if there exists no ontology that contains both A and B?

Beef Food

Sem

anti

c W

eb

Beef

RedMeat

Tap

Food

MeatOrPoultry

⊆

⊆

⊆

⊆

SR-16 FAO_Agrovoc

Strategy 2 - Definition

BABCCAr

BABCCAr

BABCCAr

BABCCAr

BABCCAr

⊇⇒≡∧⊇⊇⇒⊇∧⊇⊥⇒⊥∧⊆≡⇒≡∧⊆⊆⇒⊆∧⊆

')5(')4(')3(')2(')1(

Principle: If no ontologies are found that contain the two terms then combine information from multiple ontologies to find a mapping.

A Brel

Sem

anti

c W

eb

A’BC

C’B’rel

rel

Details: (1) Select all ontologies containing A’ equiv. with A (2) For each ontology containing A’:

(a) if find relation between C and B.(b) if find relation between C and B.

CA ⊆'CA ⊇'

Details: (1) Select all ontologies containing A’ equiv. with A (2) For each ontology containing A’:

(a) if find relation between C and B.(b) if find relation between C and B.

Strategy 2 - Examples

PoultryChicken⊆FoodPoultry ⊆

Chicken Vs. Food(midlevel-onto)

(Tap)

Ex1:

FoodChicken⊆

Ham Vs. FoodEx2:

(r1)

MeatHam⊆FoodMeat ⊆

(pizza-to-go)

(SUMO) FoodHam⊆

(Same results for Duck, Goose, Turkey)

(r1)

Ham Vs. SeafoodEx3:

MeatHam⊆SeafoodMeat ⊥

(pizza-to-go)

(wine.owl) SeafoodHam ⊥(r3)

Large Scale Evaluation

• Ontology alignment evaluation initiative - food

• AGROVOC: UN FAO’s agricultural thesaurus– ±16.000 terms – multilingual

• NALT: United States National Agricultural Library Thesaurus

– ±41.000 terms

• Precision obtained: 70%

Concept_A

(e.g., Supermarket)

Concept_B

(e.g., Building)

ScarletScarlet≡≡

Semantic Web

Semantic Relation

( )

Deduce

Access

⊆

- SCARLET - relation discovery on the SW

- http://scarlet.open.ac.uk/

- Automatically selects and combines multiple online ontologies to derive a relation

Basic functionality used:Relation Discovery

Watson

• Infrastructures for storing large scale semantics (Watson)

• The story here is that Watson could be used as a starting point for building a storage infrastructure for the distributed semantic information in SmartProducts.

is a Search Engine for the Semantic Web

Gateway

Architecture

Web Interface

Web Interface

Advanced Keyword Search

Web Interface

Ontology Exploration

Web Interface

Ontology Metadata

Web Interface

Querying

APIs

• SOAP and REST APIs that provide the infrastructure to:– Find SW documents and

retrieve metadata about them– Find entities (classes,

properties, individuals) and explore their semantic description

– Apply SPARQL queries to Semantic Web documents

Next Generation Semantic Web Applications

WATSON enables a new generation of Semantic Web applications that need to access and reuse semantic information distributed on the entire Web.

Examples of NGSW

IEEE Intelligent Systems23(3), pp. 20-28, May/June 2008

• Key aspects of the paradigm• Tech. Infrastructure• Concrete Applications

PoweAqua

• Tools for exploiting distributed semantics– Open Domain, Multi-Ontology Question Answering

(PoweAqua)

PowerAqua

PowerAqua– Cross ontology question aswering– Selects and combines relevant information from multiple ontologie

PowerAqua

Natural language question

Answers from online semantic data

Open domain QA by exploring distributed semantic data.

PowerAqua: Architecture

• Steps 2 and 3 implement a run time knowledge matcher that efficiently produces mappings across ontologies and domains

• Performs concepts, relations, instances and literal mapping

• No assumptions on the user input• No assumptions on the domain, structure or complexity of ontologies

Education

Dipso K Mi