29
Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas , Christian Mader, Christian Dirschl, Katja Eck, Michael Leuthold, Jens Lehmann, Sebastian Hellmann

Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

Semantically Enhanced Quality Assurance in the

JURION Business Use Case

Dimitris Kontokostas, Christian Mader, Christian Dirschl, Katja Eck, Michael Leuthold,

Jens Lehmann, Sebastian Hellmann

Page 2: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Overview

● Wolters Kluwers overview● Use Case Tools● Challenges● Solutions● Evaluation● Future Work

Page 3: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Wolters Kluwers

Wolters Kluwer provides solutions to customers in over 170 countries and provides content in at least a dozen languages.

Focusing on legal, tax, finance and health industries.

Page 4: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Wolters Kluwer Transformation

Page 5: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Wolters Kluwer Transformation

Quality

Page 6: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

WKD in LOD2 project

Page 7: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Page 8: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

WKD in the ALIGNED Project

Page 9: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

RDF in the publishing industry

Page 10: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Use Case Tools

Page 11: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

● TDDD: Test Driven (Data) Development○ Methodology, definitions & Tools

● SPARQL● Reusable unit tests for

○ vocabularies○ datasets○ applications

● Test Auto Generators○ OWL○ IBM Shapes○ DSP (Dublin Core Set Profiles)○ W3c Shapes (in progress)

● Open Source

● Stable tool, used in many research & industrial settings

http://rdfunit.aksw.org

Page 12: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

https://www.poolparty.biz

● Commercial product developed by Semantic Web Company● Thesauri development in a collaborative way

○ From scratch / by extraction of terms from a document corpus

● Compliance to the 5-star Open Data principles (RDF & SKOS)● Automatically retrieve potential additional concepts for inclusion into the

thesauri by querying SPARQL endpoints (e.g. DBpedia)● identify and link to related resources from local / remote projects ● Simple ontology editing (rdf:type, rdfs:subClassOf, rdfs:domain/range,...)● Automated quality assurance mechanisms

○ Conformance to SKOS or a custom schema

○ Enforcement level of some quality metrics can be configured by the user so that it is, e.g.,

possible to get an alert if circular hierarchical relation○ Check a taxonomy “as a whole” against a set of potential quality violations

Page 13: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Challenges

Page 14: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Metadata RDF Conversion Verification

Existing Infrastructure

● Platform Content Interface (PCI) ontology

○ proprietary schema that describes legal documents and metadata in OWL

● PCI revisions => verify data conforms to PCI

● Proprietary SOAP-based validation service

○ Package based validation => hard error detection

○ Asynchronous & complex web service => hard to use

○ Network dependency => potentially unstable

Page 15: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Metadata RDF Conversion Verification

Continuous & high quality triplification of semi-structured data is a common problem in the information industry. Schema changes and enhancements are routine tasks, but ensuring data quality is still very often purely manual effort. So any automation will support a lot of real-life use cases in different domains.

Goal: Based on the schema, test cases should automatically be created, which are run on a regular basis against the data that needs to be transformed. The errors detected will lead to refinements and changes of the XSLT scripts and sometimes also to schema changes, which impose again new automatically created test cases

Page 16: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Page 17: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

RDFUnit / JUnit Integration

Page 18: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Quality Control in Thesaurus Management

● WKD develops multiple controlled vocabularies for annotating documents (e.g., court decision, labour law,...) using PoolParty

● Interconnected to each other● Consistency and quality must be ensured over all vocabularies● Various quality issues, e.g.,

○ Duplicates○ Links to deprecated (deleted) concepts○ Unresolvable links

● Up to now curated manually in deployed system, regular errors in production versions

Page 19: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Quality Control in Thesaurus Management

The creation and maintenance of knowledge models is gaining importance in the Web of Data. These tasks are increasingly being executed by SME’s in the domain, not in knowledge modelling and IT as such. Therefore, better automatic support of these processes will directly help achieving quality and efficiency gains.

● Automated quality checks over multiple vocabularies● Improved notifications: email on changes performed by users● Additional statistics on, e.g, vocabulary dependencies, changes, etc

Page 20: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Vocabulary link validation (PoolParty)

● Uses project metadata to identify linked vocabularies

● Link is invalid if target concept is either deprecated or deleted

● Creates a report for human curators

● Vocabulary repair still manual process

Quality Control in Thesaurus Management

Page 21: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Results & Evaluation

The analysis is based on measured metrics and the qualitative feedback of experts and users.

Participants of the evaluation study were selected from WKD staff in the fields of software development and data development. There were seven participants in total: four involved in the expert evaluation and three content experts involved in the usability/interview evaluation.

● Productivity

● Quality

● Agility

Page 22: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Productivity (RDFUnit)

● Total time for quality checks and error detection● The time need for manual interaction.

What we measured:

● 1ms to 50ms per single test (depending on the document / ontology size)○ as close to real-time as possible, currently a couple of minutes

● Quality checks can be triggered by manual execution, but they are always verified automatically by the CI build system

● A total of 44.000 tests with a total duration of 11 minutes ○ may scale-up easily when parallelized or clustered

Page 23: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Quality (RDFUnit)

What kind of errors can be detected and is categorization possible?

● Experts concluded that it is helpful to spot errors introduced by changes, since issues spotted in this way can be assumed to point to really existing errors; the causes of which can be identified and addressed

● Successful tests are less significant as we are not yet able to evaluate whether and how the measurements taken correspond to target measures and these tests do not point to concrete errors.

○ Coverage & other metrics needed

Page 24: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Agility (RDFUnit)

… time to include new requirements

● Including new constraints or adapting existing constraints works by adding new reference documents to the input dataset to make the test environment as representative as possible.

● The process of generating tests and testing is fully automated, it adapts very easily to changed parameters.

● Adding more documents to the input dataset increases the total runtime

Page 25: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Productivity (Links)

● The number of checked links● The number of violations ● The total time

What we measured:

The presentation of the results was well understood. In general, the tool was received well by the experts, which was reflected by their feedback in the interviews.

Page 26: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Quality (Links)

● No false broken link detection● Prototype still lacks some usability.

Page 27: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Agility (Links)

… integration, configuration time and extension

● Very useful for getting an overview

● cases it is desired to limit the link lookups and adapt the way links to external datasets are detected

○ Use custom base URI or regular expression-based techniques

● Re-configuration is possible but recompiling the application might be needed

○ Plans to delegate this process to unified views

Page 28: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Future Work

● Error analysis (statistics, time to fix an issue, regressions)

● Test coverage and better metrics

● Improve the UI of the Link Validation tool

● Provide more advanced settings

● Inter-repository Link Validation

Page 29: Semantically Enhanced Quality Assurance in the JURION ... · Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian

ESWC 2016

Thank You!

Questions ?

(You might want to) take a look at…RDF and XML Interoperability W3c Community grouphttps://www.w3.org/community/rax/