Download ppt - 11111 Benchmarking in KW. Sep 10th, 2004 © R. García-Castro, A. Gómez-Pérez Raúl García-Castro, Asunción Gómez-Pérez September 10th, 2004 Benchmarking

1 1 1 1 1Benchmarking in KW. Sep 10th, 2004 © R. García-Castro, A. Gómez-Pérez

Raúl García-Castro, Asunción Gómez-Pérez<rgarcia,[email protected]>

September 10th, 2004

Benchmarking in Knowledge Web

Jérôme Euzenat<[email protected]>


Research Benchmarking

Industrial Benchmarking

≠

WP 1.2

(From T.A. page 26)

WP 2.1(From T.A. Page 41)

Point of view • Tool recommendation • Research progress

Criteria • Utility • Scalalability• Robustness• Interoperability

Tools • Ontology development tools• Annotation tools• Querying and reasoning services of ontology development tools• Merging and alignment tools

• Ontology development tools• Annotation tools• Querying and reasoning services of ontology development tools• Semantic Web Service technology


Index

Benchmarking activities in Knowledge WebBenchmarking in WP 2.1Benchmarking in WP 2.2Benchmarking information repositoryBenchmarking in Knowledge Web


Overview of the benchmarking activities:

• Progress

• What to expect from them

• What are their relationships/dependencies

• What could be shared/reused between them

Benchmarking activities in KW


0 6 12 18 24 30 36 42 48

D2.1.1:Benchmarking

SoA

D2.2.2:Benchmarkingmethodologyfor alignment

D2.2.4:Benchmarking

alignmentresults

D2.1.4:BenchmarkingMethodology,

criteria, test suites

D2.1.6:Benchmarkingbuilding tools

Benchmarkingquerying, reasoning,

annotation

Benchmarkingweb service technology

D1.31:Best practices and guidelines

for industry

Best practices and guidelinesfor business cases

D1.2.1:Utility of ontologydevelopment tools

Utility of merging,alignment, annotation

Performance ofquerying, reasoning

Finished

Started

Not started

Progress:

WP 1.2Roberta Cuel

WP 1.3Luigi Lancieri

WP 2.1Raúl García

WP 2.2Jérôme Euzenat

Benchmarking timeline

?


T 2.1.1 SoA on the technology of the scalability

WP

T 2.1.4 Definitionof a methodology, general criteria for

benchmarking

T 1.2.1 Utility of ontology-based tools

T 2.1.6 Benchmarking

of ontology building tools

T 2.2.2 Designof a benchmark

suite for alignment

T 2.2.4 Researchon alignment

techniques and implementations

T 1.3.1Best Practices and Guidelines

Benchmarking relationships

Benchmarking methodology alignmentBenchmark suite alignment

Benchmarking methodologyBenchmark suites

6 12 18 24

Benchmarking overviewSoA ontology tech. evaluation

Benchmarking methodology

Best Practices


Index



0 6 12 18 24 ... 36 ... 48

T 2.1.4 Definition of a methodology, general criteria for ontology tools benchmarking

T 2.1.1 State of the Art

Benchmarking methodology

Type of tools to be benchmarked:• Ontology building tools• Annotation tools• Querying and reasoning services of ontology development tools• Semantic Web Services technology

General evaluation criteria:• Interoperability• Scalability• Robustness

Test suites for each type of tools

Benchmarking supporting tools

• Overview of benchmarking, experimentation, and measurement

• SoA of ontology technology evaluation

T 2.1.6 Benchmarking of ontology building tools

T2.1.xBenchmarkingquerying, reasoning,annotation, web service

Specific evaluation criteria:• Interoperability• Scalability• Robustness

Test suites for ontology building tools

Benchmarking supporting tools

Benchmarking in WP 2.1


Ontology Technology/Methods

Evaluation

Benchm

arking

Desired attributesWeaknesses

Comparative analysis...

Continuous improvementBest practices

Measurement

Experimentation

T 2.1.1: Benchmarking Ontology Technologyin D 2.1.1 Survey of Scalability Techniques for Reasoning with Ontologies

• Overview of benchmarking, experimentation, and measurement• State of the Art of Ontology-based Technology Evaluation

Recommendations


Plan 1 Goals identification

2 Subject identification

3 Management involvement

4 Participant identification

5 Planning and resource allocation

6 Partner selection

Experiment 7 Experiment definition

8 Experiment execution

9 Experiment results analysis

Improve10 Report writing

11 Findings communication

12 Findings implementation

13 Recalibration

T 2.1.4: Benchmarking methodology, criteria, and test suites

General evaluation criteria:• Interoperability• Scalability• Robustness

Benchmark suites for:• Ontology building tools• Annotation tools• Querying and reasoning services• Semantic Web Services technology

Benchmarking supporting tools:• Workload generators• Test generators• Statistical packages•...

Methodology


T 2.1.6: Benchmarking of ontology building tools

Benchmarking ontology

building tools

Partners/Tools:

UPM

...... ... ...

Benchmark suites:• Interoperability

(x tests)• Scalability

(y tests)• Robustness

(z tests)

Benchmarking results:• Comparative• Weaknesses• (Best) practices• Recommendations

Benchmark suites:• RDF(S) Import capability • OWL Import capability• RDF(S) Export capability• OWL Export capability

Experiments:• Import/export RDF(S) ontologies• Import/export OWL ontologies• Check for knowledge loss• ...

Experiment results:

• test 1• test 2• test 3• ...

NOOKOK

Benchmarking results:• Comparative• Weaknesses• (Best) practices

Interoperability• Do the tools import/export from/to RDF(S)/OWL?• Are the imported/exported ontologies the same?• Is there any knowledge loss during import/export?• ...


Index



T 2.2.2 Design of a benchmark suite for alignment

Why evaluate?• Comparing the possible solutions;• Detecting the best methods;• Finding out where we are bad.

Two goals:• For the developer: improving the solutions;• For the user: choosing the best tools;• For both: testing compliance with a norm.

Results:• Benchmarking methodology for alignment techniques;• Benchmark suite for alignment;• First evaluation campaign;• Greater benchmarking effort.

How evaluate?• Take a real life case and set the deadline• Take several cases normalizing them• Take simple cases identifying what they highlight

(benchmark suite)• Build a challenge (MUC, TREC)


T 2.2.2 What has been done?Information Interpretation and Integration Conference (I3CON), to held at the NIST Performance Metrics for Intelligent Systems (PerMIS) Workshop: focuses on "real-life" test cases and compare algorithm global performance.

Facts:• 7 ontology pairs;• 5 participants;• Undisclosed target alignments (independently made);• Ask for the alignments in normalized format;• Evaluation on the F-measure.

Results:• Difficult to find pairs in the wild (they have

been created);• No dominating algorithm, no most difficult

case for all;• 5 participants was the targetted number, we

must have more next time!

The Ontology Alignment Contest at the 3rd Evaluation of Ontology-based Tools (EON) Workshop, to be held the International Semantic Web Conference (ISWC): aims at defining a proper set of benchmark tests for assessing feature-related behavior.

Facts:• 1 ontology and 20 variations (15 hand-crafted on

some particular aspects);• Target alignment (made on purpose) published;• Ask for a paper, with comments on the tests and on

the achieved results (as well as the results in normalized format).

Results:

We are currently benchmarking the tools!

See you at

EON Workshop, ISWC 2004,

Hiroshima, JP

November …


T 2.2.2 What’s next?

• More consensus on what’s to be done?

• Learn more

• Take advantage of the remarks

• Make a more complete:

real-world+bench suite+challenge?

• Provide automated procedures


Index



Benchmarking information repository

Web pages inside the Knowledge Web portal with:• General benchmarking information

(methodology, criteria, test suites, references, ...)• Information about the different benchmarking activities in Knowledge Web• Benchmarking results and lessons learned• ...

Objectives:• Inform• Coordinate• Share/reuse• ...

Proposal for a benchmarking working group in the SDK cluster.


Index



In Knowledge Web:• Benchmarking is performed over products/methods (not processes)• Benchmarking is not a continuous process

Ends with findings communication, there is no findings implementation or recalibration• Benchmarking technology involes evaluating technology• Benchmarking technology is NOT just evaluating technology

We must extract practices and best practices• Benchmarking results

• Comparative• Weaknesses• (Best) practices

• Benchmarking results are needed!Both in industry and research

• ...

Recommendations (Continuous) Improvement

What is benchmarking in Knowledge Web?


How much do we share?

Benchmarking methodology, criteria, and test suites

Benchmarking results

• Is the view about benchmarking from industry “similar” to the view from research?• Is it viable to have a common methodology? Will anyone use it? • Can the test suites be reused between industry/research?• Can be useful a common way of presenting test suites?• ...

• Can research benchmarking results be (re)used by industry, and viceversa?• Can be useful a common way of presenting results?• ...


Provide the benchmarking methodology to industry:• First draft after Manchester Research meeting. 1st October.• Feedback from WP 1.2. End of October.• (Almost) final version by half-November.

Set up web pages with benchmarking information in the portal:• Benchmarking activities • Methodology• Criteria• Test suites

Discuss in a mailing list and agree on a definition of “best practice”.

Next meeting? To be decided (around November) (with O2I)

Next steps


Benchmarking in Knowledge Web

Raúl García-Castro, Asunción Gómez-Pérez<rgarcia,[email protected]>

September 10th, 2004

Jérôme Euzenat<[email protected]>