Upload
tyrone-anthony
View
223
Download
0
Embed Size (px)
Citation preview
Giorgos Flouris Open Data Tutorials, May 2013 1
Data and Knowledge Evolution
Slides available at: http://www.ics.forth.gr/~fgeo/Publications/WOD13.ppt
Giorgos [email protected]
Open Data Tutorials, May 2013
Giorgos Flouris Open Data Tutorials, May 2013 2
World Wide Web
WWW (and HTML) focus on human readability
Page presentation (fonts, colors, images, …)Human understandingPresentation Semantical contentContent is not formally described (for a machine to understand)
WWW contains documents, not data
Giorgos Flouris Open Data Tutorials, May 2013 3
Problems with the Current Web
Search and access becomes difficult
Software ignorant of the semantical content of a web pageKeyword searchHigh recall, low precision
Terminological issues
Synonyms (heart disease = cardiac disease)Hyponyms/hypernyms (parliament members are politicians)
Queries on the semantical content cannot be made
Fetch articles that support B. Obama’s foreign policyFetch the home pages of all members of the Greek Parliament
Giorgos Flouris Open Data Tutorials, May 2013 4
Semantic Web
The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation[BLHL01]
The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries http://www.w3.org/2001/sw/
[Semantic Web] is a collaborative effort led by W3C with participation from a large number of researchers and industrial partnershttp://www.w3.org/2001/sw/
Giorgos Flouris Open Data Tutorials, May 2013 5
Semantic Web in Practice
Web of data, rather than documents
HTML for presentationSemantical languages for semantical contentReadable and understandable by humans and machines
Semantic Web languages, protocols, etc
Web page annotation (metadata descriptions etc)Publication of data on the InternetEfficient communication and manipulation of data over the Internet
Different applications
Efficient searchingSharing of data (e-science, e-government, remote learning, …)Linked Open Data (more on that later)
Giorgos Flouris Open Data Tutorials, May 2013 6
Ontologies and Data (Datasets)
An ontology is an explicit specification of a shared conceptualization of a domain [Gru93]
Precise, logical account of the intended meaning of termsCommon (shared) interpretation of termsFormal vocabulary for information exchange (humans/machines)
Ontologies (vocabularies) allow the description of data
Terminology:
Ontology = vocabulary = schemaData = instancesDataset = data and the related ontology (i.e., a dataset may contain
schema and/or data)
Giorgos Flouris Open Data Tutorials, May 2013 7
Dataset Dynamics
Datasets change constantly
World changes (dynamic models)View on the world changes (new knowledge, measurements, etc)Perspective and usage changes
Example:
Gene Ontology (information about gene products): daily versionsDBPedia: 1,4 updates/second (http://live.dbpedia.org/LiveStats/) [MLA+12]
Need methodologies to cope with the problems related to dynamicity
Evolution (modify a dataset in response to a change)Versioning (keep track of versions and their relations)Debugging, cleaning, repairing, quality (maintain consistency and quality
in a dynamic environment)Change monitoring, detection and propagation (identify changes and use
them to synchronize remote datasets)…
Giorgos Flouris Open Data Tutorials, May 2013 8
Linked (Open) Data
Datasets can be interlinked
Sharing knowledgeReusing knowledgeModular developmentReuse of schemas
Linked Open Data (LOD) movement
Constantly growing31 billion triples and 295 datasets as of September 2011
Giorgos Flouris Open Data Tutorials, May 2013 9
Linked Open Data Cloud Diagram
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Giorgos Flouris Open Data Tutorials, May 2013 10
Linked Open Data Challenges
Both a blessing and a curse
Added-value benefitsDiscovery of unknown correlations, connections, relationshipsVast amount of interrelated knowledge
No central control, everyone can publish and relate to othersQuality of datasets lies/depends on different providersA change in one dataset affects all related ones
Several new problems related to dynamics
Propagation of changes among interrelated datasetsMaintaining the quality of local datasetsCo-evolution
Giorgos Flouris Open Data Tutorials, May 2013 11
Scope: Dynamic Linked Datasets
Dynamic Datasets
LinkedDatasets
You are here
Giorgos Flouris Open Data Tutorials, May 2013 12
Purpose of This Talk
To survey different research areas related to dynamic LOD
Remote Change ManagementRepairData and Knowledge Evolution
Categorize and classify works in each field
Broad but shallow descriptionSeveral references for more in-depth studyNo claims of completeness (references are just indicative)Two relevant surveys: [FMK+08, ZAA+13]
Emphasis on some related work done in FORTH
Will avoid technical discussionReferences will be given for further details
Giorgos Flouris Open Data Tutorials, May 2013 13
Defining Remote Change Management
Managing the effects of remote changes on interlinked datasets
Remote changes have profound effects on local datasetsGood practices are important
—Proper versioning, change logging, adaptation to remote changes, …
Attention exploded after the success of the LOD paradigm
Related research questions
How should I version my data?How can I efficiently monitor changes in my dataset?How can I detect changes in remote datasets? How does the evolution of remote datasets affect my data? How can I efficiently propagate changes from one dataset to
another?
Giorgos Flouris Open Data Tutorials, May 2013 14
Rem
ote
Site
Remote Change Management: Visualization
RD0 RD1
Versioning, Change Monitoring
Lo
cal Site
LD0
Change Detection
Change Propagation
LD1
Giorgos Flouris Open Data Tutorials, May 2013 15
Remote Change Management: Structure
Three subfields
VersioningChange monitoring and detectionChange propagation
Structure
Introduction, definition of subfieldsLiterature reviewAn approach for change detection [PFF+13]
Giorgos Flouris Open Data Tutorials, May 2013 16
Defining Repair
Assessing and improving the quality and the semantical or structural integrity of the data
Maintaining consistency, coherency, validityRestoring consistency, coherency, validity, when violatedAssessing and improving qualityPreserve quality/integrity in the face of remote changes
Related research questions
How can I preserve the integrity and quality of my data in a dynamic and interlinked environment?
How can I guarantee consistency and validity?How can I restore consistency and validity, if violated?
Giorgos Flouris Open Data Tutorials, May 2013 17
Repair: Visualization
D0 D1
Repair Process(Cleaning, Debugging,
Repairing, Quality Enhancement)
Assessment Module (Diagnosis, Quality Assessment)
Giorgos Flouris Open Data Tutorials, May 2013 18
Repair: Structure
Four subfields
CleaningDebuggingValidity repairQuality enhancement
Structure
Introduction, definition of subfieldsLiterature reviewAn approach for validity repair [RFC11]
Giorgos Flouris Open Data Tutorials, May 2013 19
Defining Evolution
Modifying a dataset in response to a change in the domain or its conceptualization
Identify the result of applying new information on the datasetDetermine the result of change propagation from remote datasetsUnderstand the process of change
Related research questions
What is the semantics of evolution and change? How can I efficiently compute the ideal evolution result?
Giorgos Flouris Open Data Tutorials, May 2013 20
Evolution: Visualization
Dataset
Real World
EvolutionAlgorithm
Delete_Class(…)Pull_Up_Class(…)Rename_Class(…)
…
D0 D1
Giorgos Flouris Open Data Tutorials, May 2013 21
Evolution: Summary
Evolution topics
Understanding the evolution challengesUnderstanding the process of change
—Balancing between philosophical and practical considerations
Cross-fertilization with belief change
Structure
Introduction, connection with belief changeUnderstanding the process of changeLiterature review
Giorgos Flouris Open Data Tutorials, May 2013 22
General Structure of this Talk
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1. Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1. Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1. Introduction, connection with belief change2.Understanding the process of change3.Literature review
The final few slides contain citations for the references in this talk
Part IPart I(2 hours)(2 hours)
Part IIPart II(1 hour)(1 hour)
Giorgos Flouris Open Data Tutorials, May 2013 23
Talk Structure (A)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 24
Datasets
Basic structures
Classes (or concepts): collections of objects (e.g., Actor, Politician)
Properties (or roles): binary relationships between objects (e.g., started_on, member_of)
Instances (or individuals): objects (e.g., Giorgos, B. Obama)
Relations between them
Subsumption (Parliament_Member subclass of Politician), instantiation (B. Obama instance of Politician), …
The allowed relations and their semantics depend on the language
Different representation languages for LOD
RDF/S, OWL
Giorgos Flouris Open Data Tutorials, May 2013 25
Visualization, Triples, Serialization
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
Define classes[Period type Class]Define properties[participants type Property][participants domain Onset][participants range Actor]Instantiate/define individuals[G_Birth type Birth][Giorgos type Actor][G_Birth participants Giorgos]Define hierarchies[Event subClass Period]
G_BirthGiorgosparticipants
<rdfs:Class rdf:ID=“Period”> </rdfs:Class> <rdf:Property rdf:ID=“participants”> <rdfs:domain rdf:resource=“Onset”/> <rdfs:range rdf:resource=“Actor”/> </rdf:Property> <G_Birth rdf:about Birth><participants><Giorgos rdf:about Actor/></participants></G_Birth><rdfs:Class rdf:ID=“Event”> <rdfs:subClassOf rdf:resource=“Period”/> </rdfs:Class>
Visualization Triple Representation Serialization (RDF/XML)
instantiation
subsumption
Giorgos Flouris Open Data Tutorials, May 2013 26
RDF and RDFS
An RDF dataset consists of triples
RDFS adds semantics
Subsumption hierarchies (classes and properties)—Transitive
Instantiation—Inheritance, implicit instantiation
Sometimes more than subsumption/instantiation is needed
Combining concepts, roles to form more complex relations—Concept definitions: a mother is a female who has a child—Other knowledge: all items stored in warehouse X are flammable
Constraints on data—Each person must have one mother
Giorgos Flouris Open Data Tutorials, May 2013 27
Extensions of RDF/S: DLs (1/2)
Description Logics (DLs)
http://dl.kr.org/Formal underpinning of web representation languagesFamily of logical formalisms
—Well-defined semantics—Model-theoretic reasoning based on interpretations
Formally studied —Expressiveness, reasoning tools, computational complexity, …
Components
Individuals: specific objects (instances) – GiorgosConcepts: sets of individuals (classes) – ParentRoles: sets of pairs of individuals (properties) – has_child
Operators: , , ⊓ , {.}, , …⊤
Connectives: , ≡, …⊑
Giorgos Flouris Open Data Tutorials, May 2013 28
Extensions of RDF/S: DLs (2/2)
Definitions, partial definitions, constraints, subsumptions, …
A mother is a female who has a child—Mother ≡ has_child Female⊓
Each person must have one mother—Person ⊑ has_child-1.Mother
A great variety of DLs (trade-off involved)
Different propertiesDifferent expressive powerDifferent reasoning complexity
Giorgos Flouris Open Data Tutorials, May 2013 29
Extensions of RDF/S: OWL
OWL (Web Ontology Language)
http://www.w3.org/2004/OWL/General-purpose representation languageCompatible with the architecture of the Semantic Web
A family of languages
Flavors: OWL-Lite, OWL-DL, OWL FullProfiles: OWL 2 EL, OWL 2 QL, OWL 2 RLDifferent expressiveness (and complexity)
Each corresponds to a specific DL
Useful from a modeling perspectiveExpressive but not too complexAppealing computationally
Giorgos Flouris Open Data Tutorials, May 2013 30
Representation Languages in LOD
Mostly RDF
With RDFS semantics —Instantiations
—Class subsumption
—Property subsumption is rare
Some OWL
Mostly OWL LiteExtensive use of owl:sameAs
—Often abusing it [HHM+10]
OWL 2 profiles are gaining ground
Giorgos Flouris Open Data Tutorials, May 2013 31
Talk Structure (B1)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 32
Motivation for Remote Change Management
Crucial problem for dynamic linked datasets
Linking: datasets linked to other datasets (e.g., vocabularies)Dynamics: changes cause problems to linked datasetsNo central curation or control
—No control over (or knowledge of) other datasets’ evolution processCurators don’t bother annotating and logging changes
—Temporal and versioning information is usually missing [RPH+12]
Remote change management seeks solutions to allow:
Keeping track of versionsRestoring previous versionsAssessing compatibility of versionsMonitoring and detecting changesTracing back the evolution history (of datasets, concepts, …)
—For visualization and understandingPropagating changes to synchronize linked datasets
DR
DL
uses
Giorgos Flouris Open Data Tutorials, May 2013 33
Subfields of Remote Change Management
Remote Change Management
Versioning—Keep track of versions
Change monitoring and detection—Monitoring: record changes as they happen
—Detection: identify changes after they happen
Change propagation—Propagate changes across linked datasets for synchronization purposes
Giorgos Flouris Open Data Tutorials, May 2013 34
VersioningVersioning
Keep track of versionsIdentify different versions of a datasetEnable transparent access to the “correct” version (smooth interoperation)
Issues involvedIdentification
—Determine which versions to store and how to identify them—Manually or automatically (syntactical, semantical considerations)—Packaging of changes
Relation between versions —A sequence or a tree
Compatibility information—Backwards/forwards compatibility and how to determine it (often manually)—Dataset-wide compatibility or fine-grained compatibility (e.g., at resource level)—Metadata on the different versions
Transparent access—Relate versions with (compatible) data sources, applications etc
Giorgos Flouris Open Data Tutorials, May 2013 35
Change Monitoring and Detection
Change monitoring
Record changes as they happen—Manual (error-prone and often incorrect)
—Automatic (not used in practice)
In the good will of the dataset ownerSometimes change logs are inaccessible
Change detection
Identify changes after they happenBased on the previous and current versions
In both cases, a change language is required
Supported set of changes, along with their semanticsCan be low-level or high-level
DR
DL
uses
Giorgos Flouris Open Data Tutorials, May 2013 36
Change Propagation
Change propagation
Communicate changes to linked datasets for synchronization
Push-based or pull-based propagation
Push-based: locally-initiated, via “registration” or via monitoring and versioning
Pull-based: consumer-initiated
Communication based on deltas (rather than versions)
Reduce communication overheadReduce storage requirementsOn average, 2-3% of a dataset changes between versions [OK02]
Deltas are based on a language of changes
Giorgos Flouris Open Data Tutorials, May 2013 37
Talk Structure (B2)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 38
Versioning Approaches (1/3)
Capture different aspects of versioning, such as:
Detecting versionsStoring versions efficientlyAllow cross-snapshot queries
—Find gene products whose functions have not changed in the last 50 versions—Determine price fluctuation for x along different versions of the product catalog
Early versioning approaches inspired by SVN
Good for files, not directly adaptable to semantical languagesSHOE language [HH00]
Machine-readable version information (e.g., compatibility)Provided by curator as SHOE statements
Memento [SSN+10]
Fine-grained versioning at URI level (resources, web pages)Machine-readable version information, in the HTTP header
—Timestamps, traversal information (prior/current versions) etc
Giorgos Flouris Open Data Tutorials, May 2013 39
Versioning Approaches (2/3)
Theoretical foundations for versioning [HP04]
Formal definitions to capture notions such as:—Compatibility (between versions)
—Commitment (resources committing to a certain ontology)
—Ontology perspectives (the part of the web committing to an ontology)
Temporal approaches [HS05, PTC05, KLGE07]
For capturing temporal relations between versionsFor allowing cross-snapshot queries
Versioning in multi-editor environments [RSDT08]
Via change monitoring
Giorgos Flouris Open Data Tutorials, May 2013 40
Versioning Approaches (3/3)
Automatically detecting version relationships [AAM09]
Using heuristics based on URIs
Study of “relatedness” between versions [CQ13]
A model of “relatedness” between vocabularies from various sources
Similar to links in web pages
POI: Partial Order Index [TTA08]
Efficient method for storing versions and their differencesStores several versions, exploiting their common triples for
efficient storage
Giorgos Flouris Open Data Tutorials, May 2013 41
Change Languages (1/2)
Change languages necessary for monitoring, detection, propagation
Granularity
Low-level (or atomic, or elementary)—Simple add/remove operations
—Add(s,p,o), Delete(s,p,o)
—Simple to detect and define
—Focus on machine-readability: determinism, well-defined semantics
High-level (or complex, or composite)—More coarse-grained, compact, closer to editor’s perception and intuition
—Generalize_Domain(P,A), Delete_Class(A)
—More interesting; harder to detect and define
—Focus on human-understandability: often unclear and/or informal semantics
Giorgos Flouris Open Data Tutorials, May 2013 42
Change Languages (2/2)
Many different high-level languages (no standard)
[HGR12, JAP09, PFF+13, SK03, AH06, DA09, PTC07, …]Some are domain-specific (e.g., [HGR12])Some are dynamic (e.g,, [AH06, DA09, PTC07])
—Allow custom, user-defined changes
Some allow terminological changes (e.g., [PFF+13])—Rename, merge, split
—Common, but tough to detect (easily confused with add/delete)
Giorgos Flouris Open Data Tutorials, May 2013 43
Representation Issues
Deltas are just sets of changes from the change language
Changes usually represented using a change ontology
Ontology represents changesA specific change is an instance of such an ontologyDeltas associated with sets of such instancesDifferent proposals [NCLM06, KFKO02, KN03, PT05]Allows the manipulation and communication of deltas/changes
using standard Semantic Web technologies
Giorgos Flouris Open Data Tutorials, May 2013 44
Change Monitoring Approaches
Using a version log [PT05]
Logging actions on the datasetUse it for change detection, as well as proper versioningGood quality, high-level change monitoringBased on a dynamic language of changes
Using migration specifications [ZZL+03]
Similar to logs, but with a more formal structure
DBPedia change monitoring [MLA+12]
http://live.dbpedia.org/Live versions, as opposed to “standard” versions
Giorgos Flouris Open Data Tutorials, May 2013 45
Low-Level Change Detection (1/2)
SemVersion [VWS+05]
Developed in Karlsruhe (FZI, AIFB)Low-level change detection tool for RDFProvides also versioning functionalitiesAllows cross-snapshot queries
For RDF [ILK12]
Low-level change detection based on set differenceAggregating and compressing deltas Also dealing with versioning issues
For RDF/S [ZTC11]
Takes into account semantics (RDFS inference)Four different methods to compute deltas (all based on set difference)Formal analysis of these methods’ properties and semanticsExtension: effect of blank nodes on change detection [TLZ12]
Giorgos Flouris Open Data Tutorials, May 2013 46
Low-Level Change Detection (2/2)
Bubastis (http://www.ebi.ac.uk/fgpt/sw/bubastis/index.html)
Simple diff tool (triple-based comparison)Basically RDF, but also supports OWL
For DL-Lite [KWZ08]
Formal, semantical approach
For EL [KWW08]
Uses a concept-based description of changes
For propositional knowledge bases [FMV10]
Propositional, but generic; it can be applied to DLsFormal analysis of the problemAlso dealing with propagation semantics
Giorgos Flouris Open Data Tutorials, May 2013 47
High-Level Change Detection (1/2)
For OWL: PromptDiff [NKKM04], OntoView [KFKO02]
Employ heuristics and probabilistic methodsEvaluation using precision/recall metrics against a gold standardIntegrated into tools that also provide versioning functionalities
For RDF/S [PFF+13]
Dealing with both machine-readability and human-understandability
Also dealing with propagation (applying changes)To be discussed in detail later
COnto-Diff [HGR12]
Rule-based approachAlso dealing with propagation
Giorgos Flouris Open Data Tutorials, May 2013 48
Change Propagation Approaches
Usually part of other tools [SMMS02, MMS+03]
Versioning, monitoring tools (push-based propagation)Detection tools (pull-based propagation)Evolution and repair tools (pull-based propagation)
—Adapt your data to be “compatible” with the new remote version
SparqlPush [PM10]
Push-based propagation of changes on SPARQL “views”PRISM, PRISM++ [CMZ08, CMDZ10]
High-level language of schema changes for relational data—Also supports changes on the integrity constraints
Identifies and propagates the changes required in the data for abiding to the new schema
Query and update rewriting —For applications that try to access the old schema
Giorgos Flouris Open Data Tutorials, May 2013 49
Other Change Management Approaches
Complete approach for XML [SP10]
Representing changes inline with the data using a graph (“evograph”)
Supports different change representation languages (both low-level and high-level)
Timestamps changesMonitoring: evograph can be used to log the changesPropagation: changes can be accessed and propagatedVersioning: timestamps in changes can be used to generate
snapshots (versions) at different timesAllows cross-snapshot queriesFairly generic, can be adapted for RDF
Giorgos Flouris Open Data Tutorials, May 2013 50
Talk Structure (B3)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 51
Our Approach on Change Detection
Purpose of this work: change detection [PFF+13]
A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way
Main design choices
Change detection based on a general-purpose high-level languageHuman-understandable, but also machine-readableClear, formal semanticsProvable formal properties and functionality guaranteesDetection and application (propagation) semantics
V1 V2 V3 V4 V5
C1 C2 C3 C4
Giorgos Flouris Open Data Tutorials, May 2013 52
Sample Evolution
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
Giorgos Flouris Open Data Tutorials, May 2013 53
Analyzing the Evolution (Using Triples)
Triples in V1 (partial list)
[Event type Class]
[Period type Class]
[Event subclass Period]
[participants type Property]
[participants domain Onset]
[participants range Actor]
[Giorgos type Actor]
[Existing type Class]
[Stuff subclass Existing]
[started_on domain Existing]
[Onset subclass Event]
[Birth subclass Onset]
…
Triples in V2 (partial list)
[Event type Class]
[participants type Property]
[Event domain participants]
[participants range Actor]
[Giorgos type Actor]
[Persistent type Class]
[Stuff subclass Persistent]
[started_on domain Persistent]
[Onset subclass Event]
[Birth subclass Event]
…
Giorgos Flouris Open Data Tutorials, May 2013 54
Low-Level Delta
Triples in V2 but not in V1
(added triples)
[Event domain participants]
[Persistent type Class]
[Stuff subclass Persistent]
[started_on domain Persistent]
[Birth subclass Event]
Triples in V1 but not in V2
(deleted triples)
[Period type Class]
[Event subclass Period]
[participants domain Onset]
[Existing type Class]
[Stuff subclass Existing]
[started_on domain Existing]
[Birth subclass Onset]
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
instantiation
subsumption
instantiation
subsumption
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
Low-Level DeltaAdd([Event domain participants])
Add([Persistent type Class])…
Del([Period type Class])…
Giorgos Flouris Open Data Tutorials, May 2013 55
Analyzing the Evolution (Visually)
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
High-Level DeltaGeneralize_Domain(participants, Onset, Event)
Pull_Up_Class(Birth, Onset, Event)Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø)
Rename_Class(Existing, Persistent)
Giorgos Flouris Open Data Tutorials, May 2013 56
Comparing the Deltas
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
Del([participants domain Onset])Add([participants domain Event])
Generalize_Domain(participants, Onset, Event)
Del([Birth subclass Onset])Add([Birth subclass Event])
Pull_Up_Class(Birth, Onset, Event)
Low-level delta High-level delta
Del([Period type Class])Del([Event subclass Period])
Delete_Class (Period,Ø,{Event},Ø,Ø,Ø,Ø)
Giorgos Flouris Open Data Tutorials, May 2013 57
Associations (Partitioning)
Low-Level Changes Associated High-Level Changes
Del([participants domain Onset]) Generalize_Domain(participants, Onset, Event)Add([participants domain Event])
Del([Birth subclass Onset])Pull_Up_Class(Birth, Onset, Event)
Add([Birth subclass Event])
Del([Period type Class]) Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø)Del([Event subclass Period])
Del([Existing type Class])
Rename_Class(Existing, Persistent)
Del([Stuff subclass Existing])
Del([started_on domain Existing])
Add([Persistent type Class])
Add([Stuff subclass Persistent])
Add([started_on domain Persistent])
Giorgos Flouris Open Data Tutorials, May 2013 58
Challenges for High-Level Languages
High-level deltas are superior
More concise (e.g., Rename_Class)More intuitive (e.g., Pull_Up_Class)Carry additional information (e.g., Generalize_Domain)
Challenges for high-level languages
Must be deterministic (exactly one high-level delta)Must be fine-grained enough to capture subtle changesMust be coarse-grained enough to be conciseMust be intuitive and close to editor’s perception of the changes
Compatible detection and application algorithms
Intuitive resultsEfficient
Giorgos Flouris Open Data Tutorials, May 2013 59
Proposed Language L
The formal definition of a change consists of:
Changes required in the low-level delta (added/deleted triples)
Conditions that should hold in V1 and/or V2
Generalize_Domain(P, X, Y)
Del([P domain X])Add([P domain Y])
P existing property in both V1, V2
X, Y existing classes in both V1, V2
X subclass of Y in both V1, V2
Generalize_Domain(participants, Onset, Event): detectable
Similarly for the other changes in L (132 high-level ones)
Giorgos Flouris Open Data Tutorials, May 2013 60
Types and Number of Defined Changes
Changes(134)
Low-Level (2)
High-Level (132)
Basic(54)
Composite(51)
Heuristic(27)
AddDel
Delete_SubclassDelete_Domain
Pull_Up_ClassChange_Domain
Rename_ClassSplit_Class
Giorgos Flouris Open Data Tutorials, May 2013 61
Results on L: Granularity
Granularity problem: solved by defining levels of changes
Basic Changes: fine-grained, roughly correspond to low-levelComposite Changes: coarse-grained, group several basic changes
togetherHeuristic Changes: based on heuristics, necessary for Rename,
Merge, Split etc; require mappings between URIs
Problems with determinism
One evolution could correspond to different sets of basic/composite changes
Priorities in detection
Heuristic Composite Basic
Giorgos Flouris Open Data Tutorials, May 2013 62
Results on L: Determinism
Each low-level change is associated with exactly one detectable high-level change
Full partitioning of low-level changes into high-level ones
Each pair of versions (V1, V2) is associated with:
Exactly one low-level deltaExactly one high-level delta
Determinism is necessary
More than one would lead to ambiguities
Less than one would make some inputs (V1, V2) irresolvable
Giorgos Flouris Open Data Tutorials, May 2013 63
Results on L: Propagation
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
G_BirthGiorgosparticipants
Detect C
Apply C
Apply C-1
Giorgos Flouris Open Data Tutorials, May 2013 64
Results on L: Deltas Keep Version History
Can reproduce all versions as long as you keep (any) one version and the deltas
Deltas are more concise than the versions themselves
Storage and communication efficiency
V1 V2 V3 V4 V5
C1 C2 C3 C4
Giorgos Flouris Open Data Tutorials, May 2013 65
Change Detection: Evaluation
Detection and application algorithms implemented for evaluation
Performance
Complexity: O(max{N1,N2,N2})
Performance depends on the detected changes (type, number)Bottleneck: calculating the low-level delta (>80% of total time)
Intuitiveness
Changes in our language are used in practiceResults confirmed by literature/editor notes (CIDOC, GO)Better than CIDOC’s manually recorded changes (18 changes missed)
Conciseness
Basic ≈ Low-LevelBasic + Composite + Heuristic << Low-LevelUp to 80% reduction, depending on the case
Giorgos Flouris Open Data Tutorials, May 2013 66
Summary and Conclusions: RCM
Remote change management is at the heart of LOD
Uncontrolled character of LOD makes it critical
Various related fields
Versioning, change monitoring and detection, change propagationUnfortunately, not used in practice in LOD
Presented a formal approach for change detection [PFF+13]
Other possible directions (related to LOD)
Best practices should be studied and promoted—Automated versioning and monitoring mechanisms embedded in evolution
tools/editors
—Understand and use temporal and provenance metadata on versions
Improved change monitoring and detection—A standard change language?
Giorgos Flouris Open Data Tutorials, May 2013 67
Talk Structure (C1)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 68
Motivation for Repair
Published data is usually problematic
Several different types of problems in LOD [HHP+10]Pedantic web initiative (http://pedantic-web.org/)
—Advice for data owners on how to prevent common problems in their data
Giorgos Flouris Open Data Tutorials, May 2013 69
Causes of Data Problems
Several reasons for data problems
Erroneous data (faulty sensors, human mistakes etc)Different symbolisms and terminologyModeling errors (e.g., all birds fly)Requirements (constraints) on the data may change
—E.g., when applications’ needs change
Reuse data by different providers (no quality guarantees)Quality jeopardized by re-use and open evolutionIntegration/merging of datasets
Giorgos Flouris Open Data Tutorials, May 2013 70
Generic Approaches
Four ways to deal with problems in data [HHH+05]
Prevent it (careful evolution, merging etc)—Can only prevent problems caused by changes in the local dataset
Correct it (repair)—Actively address the problem (after it appears)
Ignore it (consistent query answering, non-monotonic reasoning)—CQA: popular in database community; prevents user from noticing the
problem by rewriting queries (common denominator approach)
—NMR: popular in AI community; avoid trivialization of reasoning (paraconsistent reasoning, defeasible reasoning, default reasoning, …)
Use versions (versioning)—Make sure you refer to the correct (compatible) version
—Only when the problem is due to a remote change
Giorgos Flouris Open Data Tutorials, May 2013 71
Subfields of Repair
Cleaning
Mainly related to literal qualityTerminology, symbols, metric units etc
Debugging
Consistency (at least one model)Coherency (no unsatisfiable classes)Relevant for DL/OWL only
Validity repair
Satisfaction of custom integrity constraints (e.g., business rules)Expressed in OWL, DL, Datalog or predicate logic
Quality enhancement
Assessing and improving the quality of dataDifferent dimensions (timeliness, completeness, reputation, …)
Giorgos Flouris Open Data Tutorials, May 2013 72
Cleaning
Literals in LOD are often messy, and have to be “cleaned up”
Different formats for names, dates etc—&gf name “Giorgos Flouris” &gf name “Flouris, Giorgos”—&gf birth_date 03/05/76 &gf birth_date 05/03/76—&gf birthplace “Hellas” &gf birthplace “Greece”
Different symbols—Paris land_area 105,4 Paris land_area 105.4—Paris population 2.234.105 Paris population 2,234,105
Different metric units—Paris land_area 105,4 Paris land_area 40,7—&x price 30 &x price 39
Inconsistent values—&x price 0 &x price “free”
Data is not in the desired form (data transformation)—LIP6 addr “4, P. Jussieu” LIP6 street “P. Jussieu” LIP6 streetno 4
Giorgos Flouris Open Data Tutorials, May 2013 73
Debugging
Coherency
No unsatisfiable classesIndicates good modeling
Consistency
At least one modelAvoids reasoning triviality
Relevant for DL/OWL only
Pengo
Bird
canFly
canFly
Penguin
Horse
hasHorns
hasHorns
Unicorn
Giorgos Flouris Open Data Tutorials, May 2013 74
Validity Repair
Validity repair
Satisfaction of custom integrity constraints (e.g., business rules)Encode context- or application-specific requirements
—PROV-DM: http://www.w3.org/TR/2013/REC-prov-constraints-20130430/
Applications may be useless over invalid data
Expressed in OWL, DL, Datalog, Datalog±, predicate logic, …
Different expressive powerDifferent semantics (OWA/CWA, UNA) [TSBM10, MHS09]
Various types of constraints
Functional, inverse functional, transitivity, cardinality constraintsDisjointness constraintsPrimary key, foreign key, inclusion constraintsTuple-generating dependencies (tgd), equality-generating dependencies
(egd)
Giorgos Flouris Open Data Tutorials, May 2013 75
Quality Enhancement
Quality is defined as “fitness for use” [Jur74]
Multi-faceted (timeliness, completeness, reputation, …)Task-dependentSubjective
Assessing quality
Via assessment functions (e.g., [BC09]) or SPARQL queries (e.g., [FH10])
Some kind of combined scoring over the relevant dimensions
Improving (enhancing) quality
Usually manualTries to improve the assessment score
Giorgos Flouris Open Data Tutorials, May 2013 76
Talk Structure (C2)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 77
Cleaning Tool: OpenRefine
Open source
Originally developed by google (GoogleRefine)http://openrefine.org/
Applies on various representations of the input data
CSV/TSV, Excel, JSON, XML, RDF as XML, etcRDF extension
Functionalities (related to this talk)
Data exploration and cleaning —Both automated and manual (interface assists in manual cleaning)
Data transformation (format conversion)—Uses GREL (Google Refine Expression Language) and regular expressions
Giorgos Flouris Open Data Tutorials, May 2013 78
Cleaning Tool: ODCleanStore
Web application, written in Java
Developed by Charles University (Prague)http://www.ksi.mff.cuni.cz/~knap/odcs/sections/odcs.html
Functionalities (related to this talk)
Cleaning—Via “transformers” (policies for cleaning)
—Expressed using SPARQL or regular expressions
Quality assessment—Transformer assigns a score to data
Validity repair—Supports conflict resolution for functional properties
—Decides what to drop based on the quality of the data items involved
—Supports aggregation functionalities based on “aggregation policies”
Giorgos Flouris Open Data Tutorials, May 2013 79
Other Cleaning Approaches
Involve users in the loop [KHS12]
Manual requests for improvements (cleaning, quality, …)Patch Request Ontology (PRO)Use a GWAP (Game With A Purpose) for identifying data problems
Giorgos Flouris Open Data Tutorials, May 2013 80
Debugging: Literature Overview
Identify and resolve inconsistency/incoherency
Two phases
Diagnosis: identify inconsistency/incoherencyRepair: remove inconsistency/incoherency
Literature mostly dealing with diagnosis
Repair requires additional user inputDiagnosis is more than reasoning
Pinpoint the causes of inconsistency/incoherencyRepair
User input required (manual or semi-automatic approaches)Automatic approaches also require user input or domain
knowledge (ad-hoc solutions)
Giorgos Flouris Open Data Tutorials, May 2013 81
Debugging Approaches
Diagnosis using tableau-based algorithms for various DLs
Identify minimal sets of responsible axioms—[SC03, MLBP06, PT06, WHR+05]
Identify responsible parts of axioms (more fine-grained)—[KPS+06, LPSV06]
Repair
Manual: editors and related tools—Onion [MWK00], PROMPT [NM00], Chimaera [MFRW00]
Semi-automatic—Interactive approach via suggestions: ORE tool [LB10]
Automatic: —Using external information, e.g., for stratified datasets [QP07, MLB05]
Giorgos Flouris Open Data Tutorials, May 2013 82
Validity Repair: Literature Overview
Identify and resolve invalidity (custom constraints)
Two phases
Diagnosis: identify invalidityRepair: remove invalidity
Literature mostly dealing with diagnosis
Repair requires additional user inputDiagnosis is more than validation
Pinpoint the causes of invalidityRepair
User input required (manual or semi-automatic approaches)Automatic approaches also require user input or domain
knowledge (ad-hoc solutions)
Giorgos Flouris Open Data Tutorials, May 2013 83
Validity Repair Approaches
Not much work in repairing custom constraints in LOD
A large body of related work for the relational setting—For various constraint types and repair methodologies
Existing tools
Stardog (http://www.stardog.com/docs/)—Commercial RDF database that supports validation of custom constraints
Rondo (relational/XML) [Mel04]—Repair based on a fixed “importance” of data items
Declarative repairing based on preferences [RFC11]—To be discussed in detail later
Repairing functional properties ([FRPV+12], Sieve [MMB12])
Giorgos Flouris Open Data Tutorials, May 2013 84
Data Quality Frameworks (1/4)
Many different quality assessment methodologies and frameworks
Several different quality dimensionsDifferent works consider different dimensionsDifferent proposals for their classification and organization
There is no single, generally accepted data quality framework
Cannot be oneDifferent applications have different needs
Giorgos Flouris Open Data Tutorials, May 2013 85
Data Quality Frameworks (2/4)
Quality dimensions, quality indicators, scoring functions and assessment metrics [BC09]
Different quality dimensions—Timeliness, completeness, reputation, …
Each dimension associated with different indicators—Timeliness: last modification date, creation date, …
Each indicator associated with different scoring functions —E.g., days since last update
Scoring functions from relevant indicators are combined using assessment metrics
—E.g., Reputation_value*0,6 + days_since_update*0,4
Giorgos Flouris Open Data Tutorials, May 2013 86
Data Quality Frameworks (3/4)
[RH09]
Giorgos Flouris Open Data Tutorials, May 2013 87
Data Quality Frameworks (4/4)
[ADA98]
Giorgos Flouris Open Data Tutorials, May 2013 88
Talk Structure (C3)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 89
Our Approach on Validity Repair
Declarative approach for validity repair [RFC11]
Main design choices
Both diagnosis and repairApplicable for RDF/SAdopted relational semantics (CWA) for the constraintsGenerality on the supported constraints (DEDs)Minimal user interaction (all info provided at input)Automatic diagnosisAutomatic repair using preferences (provided by the user at input)
Giorgos Flouris Open Data Tutorials, May 2013 90
RDF/S Representation Model
Express RDF/S over an adequate relational schema
Hybrid method—C_IsA(A,B): A is a subclass of B
—C_Inst(x,A): x is an instance of A
—Domain(P,A): the domain of P is A
—…
Alternatives
Schema-specific—One table/predicate for each class/property (A(x), B(x), P(x,y), …)
—Not amenable to changes (e.g., delete class)
Schema-agnostic (triple-store)—One table with three columns (spo)
—Harder to define constraints, less intuitive
Giorgos Flouris Open Data Tutorials, May 2013 91
Allowed Constraints
Considered a very general class of constraints
Disjunctive Embedded Dependencies (DEDs) [Deu09]
Very general class
Functional, inverse functional, transitivity, cardinality constraintsDisjointness constraintsPrimary key, foreign key, inclusion constraintsTuple-generating dependencies (tgd), equality-generating
dependencies (egd)
Giorgos Flouris Open Data Tutorials, May 2013 92
Constraints
Express validity constraints over the aforementioned schema:
Class subsumption must be acyclic x,y C_IsA(x,y) C_IsA(y,x) ⊥
Correct classification in property instances x,y,p,a P_Inst(x,y,p) Domain(p,a) C_Inst(x,a) x,y,p,a P_Inst(x,y,p) Range(p,a) C_Inst(y,a)
Closed World Assumption (CWA)
Failure to prove something, is a proof for its negation
Syntactical manipulations on constraints allow
Diagnosis —Finding violated constraints
Repair —Identifying repairing options per violation
Giorgos Flouris Open Data Tutorials, May 2013 93
Dataset D0
Class(Sensor), Class(SpatialThing), Class(Observation)Prop(geo:location)Domain(geo:location,Sensor)Range(geo:location,SpatialThing)Inst(Item1), Inst(ST1)P_Inst(Item1,ST1,geo:location)C_Inst(Item1,Observation), C_Inst(ST1,SpatialThing)
Repairing Example
Correct classification in property instancesx,y,p,a P_Inst(x,y,p) Domain(p,a) C_Inst(x,a)
Sensor SpatialThing
Observation
Item1 ST1
geo:location
Schema
Data
Item1 geo:location ST1 Sensor is the domain of geo:locationItem1 is not a Sensor
P_Inst(Item1,ST1,geo:location)D0
Remove P_Inst(Item1,ST1,geo:location)
Add C_Inst(Item1,Sensor)Remove Domain(geo:location,Sensor)
C_Inst(Item1,Sensor)D0
Domain(geo:location,Sensor)D0
geo:location
Giorgos Flouris Open Data Tutorials, May 2013 94
Preferences for Repair
Which repairing option is best?
Data owner determines that via preferences
Preferences
Specified beforehandHigh-level “specifications” for the ideal repairServe as “instructions” to determine the preferred (optimal)
solution
Giorgos Flouris Open Data Tutorials, May 2013 95
Preferences (On Datasets)
D0
D2
D3
Score: 3
Score: 4
Score: 6
D1
Giorgos Flouris Open Data Tutorials, May 2013 96
Preferences (On Deltas)
D0
D2
D3
Score: 2
Score: 1
Score: 5
D1
-P_Inst (Item1,ST1, geo:location)
+C_Inst (Item1,Sensor)
-Dom (geo:location,
Sensor)
Giorgos Flouris Open Data Tutorials, May 2013 97
More Details on Preferences
Preferences on datasets are result-orientedConsider the quality of the repair resultIgnore the impact of repairPopular options: prefer newest/trustable information, prefer a
specific schema structurePreferences on deltas are impact-oriented
Consider the impact of repairIgnore the quality of the repair resultPopular options: minimize schema changes, minimize
addition/deletion of information, minimize delta sizeProperties of preferences
Quality metrics can be used for stating preferencesMetadata on the data can be used (e.g., provenance)Can be qualitative or quantitative
Giorgos Flouris Open Data Tutorials, May 2013 98
Generalizing the Approach
For one violated constraint
1. Diagnose invalidity
2. Determine minimal ways to resolve it
3. Determine and return preferred solution based on the preference
For many violated constraints
Problem becomes more complicated More than one resolution steps are required
Issues:
1. Resolution order
2. When and how to filter non-optimal solutions?
3. Constraint (and resolution) interdependencies
Giorgos Flouris Open Data Tutorials, May 2013 99
Constraint Interdependencies
A given resolution may:
Cause other violations (bad) Resolve other violations (good)
Optimal resolution unknown ‘a priori’
Cannot predict a resolution’s ramifications Exhaustive, recursive search required (resolution tree)
Two ways to create the resolution tree
Globally-optimal (GO) / locally-optimal (LO) When and how to filter non-optimal solutions?
Giorgos Flouris Open Data Tutorials, May 2013 100
Resolution Tree Creation (GO)
– Find all minimal resolutions for all the violated constraints, then find the optimal ones
– Globally-optimal (GO)
Find all minimal resolutions for one violation
Explore them all Repeat recursively until valid Return the optimal leaves
Optimal repairs (returned)
Giorgos Flouris Open Data Tutorials, May 2013 101
Resolution Tree Creation (LO)
– Find the minimal and optimal resolutions for one violated constraint, then repeat for the next
– Locally-optimal (LO)
Find all minimal resolutions for one violation
Explore the optimal one(s) Repeat recursively until valid Return all remaining leaves
Optimal repair (returned)
Giorgos Flouris Open Data Tutorials, May 2013 102
Comparison (GO versus LO)
Characteristics of GO
ExhaustiveLess efficient:
large resolution treesAlways returns optimal repairsInsensitive to constraint syntaxDeterministic (result does not
depend on resolution order)
Characteristics of LO
GreedyMore efficient:
small resolution treesMay return sub-optimal repairsSensitive to constraint syntaxNon-deterministic (result may
depend on resolution order)
Giorgos Flouris Open Data Tutorials, May 2013 103
Repair: Generality Results
The approach is very general
Thanks to the generality/flexibility of preferences
Repair approaches can be captured using adequately designed preferences
Using either the LO or the GO strategyAll the current approaches that we checkedPractically all future ones
—This has been proved, under some general conditions regarding the behavior of the repair approach
Our model can be viewed as a general approach engulfing other repair approaches
Giorgos Flouris Open Data Tutorials, May 2013 104
Repair: Algorithms and Complexity
Implemented both algorithms
Detailed complexity analysis for GO/LO and various different types of constraints and preferences
Inherently difficult problem
Exponential complexity (in general)Main exception: LO is polynomial (in special cases)
Theoretical complexity is misleading as to the actual performance of the algorithms
Giorgos Flouris Open Data Tutorials, May 2013 105
Performance in Practice
Performance in practice
Linear with respect to dataset sizeLinear with respect to tree size
—Types of violated constraints (tree width)
—Number of violations (tree height) – causes the exponential blowup
—Constraint interdependencies (tree height)
—Preference (for LO): affects pruning (tree width)
Further performance improvement
Use optimizationsUse LO with restrictive preferenceCurrently considering a redesign for further improvement
Giorgos Flouris Open Data Tutorials, May 2013 106
Summary and Conclusions: Repair
Data usually problematic
Different types of problems
Repair is done using different approaches depending on the type of the problem
Cleaning, debugging, repairing, quality assessment and enrichment
Presented a formal approach for validity repair [RFC11]
Other possible directions (related to LOD)
Most approaches detect problems, but don’t resolve themEfficiency problems (for repairing algorithms)Exploit external knowledge on the cause of the problem (e.g.,
propagation of invalidity by a linked dataset)
Giorgos Flouris Open Data Tutorials, May 2013 107
Talk Structure (D1)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 108
Motivation for Evolution
Reasons for evolution
New observations or experimentsChange in the viewpoint or usage of the datasetNewly gained access to information (previously classified,
unknown or otherwise unavailable)Incomplete or inaccurate conceptualizationChanges in the world itselfRepairingChange propagation (cascading evolution in LOD)
Not an LOD-specific problem
But critical for LOD as well
Giorgos Flouris Open Data Tutorials, May 2013 109
Definition of Evolution
The process of modifying a dataset in response to a change in the domain or its conceptualization
Dealing with both data and schema changes
NewData/Knowledge
EvolutionAlgorithm
OriginalDataset
ModifiedDataset
Giorgos Flouris Open Data Tutorials, May 2013 110
Evolution: Setting the Scope
Evolution is an overloaded term
Phases of evolution
Six phases in [SMMS02], five phases in [PT05]Detecting the need for evolution, change propagation, logging
changes, versioning etc
Scope: apply the change and compute the new dataset
Out of scope: deciding on the change, evaluating the result, managing versions, logging changes etc
Giorgos Flouris Open Data Tutorials, May 2013 111
Explaining Evolution (1/4)
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: RDF
Change: Add([King rdf:type Red])
Giorgos Flouris Open Data Tutorials, May 2013 112
Explaining Evolution (2/4)
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: RDF
Change: Del([King rdf:type Black])
Is the King Wooden?
Giorgos Flouris Open Data Tutorials, May 2013 113
Explaining Evolution (3/4)
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: RDF
Change: Del([King rdf:type Wooden])
Some domain knowledge required(extra-logical considerations)
Giorgos Flouris Open Data Tutorials, May 2013 114
Explaining Evolution (4/4)
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: OWL
Wooden and Plastic are disjoint[Wooden owl:disjointClass Plastic]
Change: Add([King rdf:type Plastic])
Is the King Black?Is the King Wooden?
disjoint
Giorgos Flouris Open Data Tutorials, May 2013 115
Side-effects in Evolution
Changes should not undermine the “quality” of the dataset
Side-effects: additional changes that need to be applied along with the original change to maintain knowledge integrity and quality
Consistency, coherency, custom constraints, quality metrics, …
Main challenge in determining the evolution result
Determining side-effects
Giorgos Flouris Open Data Tutorials, May 2013 116
Determining Side-effects
Challenges in determining side-effects
Evolution result not always obvious (even for humans)—Understand the process of change
—Various philosophical considerations involved
Selection involved (extra-logical considerations)—Domain expertise
—Preferences (trust, provenance, axiom “strength” or “entrenchment”)
Early evolution approaches rather naïve in this respect
Ignored such issues or addressed them in an ad-hoc manner
Giorgos Flouris Open Data Tutorials, May 2013 117
Belief Change
Belief change (often referred to as belief revision)
The process of modifying a knowledge base in the face of new, possibly contradictory knowledge
Mature, well-established fieldFocuses for logical formalisms (propositional, first-order logic)Recent survey on belief change [FH11]
Aims to understand the process of change
The philosophical/logical counterpart of dataset evolutionCan provide solutions and inspiration
Giorgos Flouris Open Data Tutorials, May 2013 118
Cross-Fertilization with Belief Change
Cross-fertilization beneficial [Flo06, FPA05, FPA06]
Benefits
Similar problemsDifferences on the underlying intuitions are minimalBelief change field more matureFrame problems and provide inspiration towards a solutionProtect from pitfallsAvoid “reinventing the wheel”
Problems
Representation languages and formalisms are differentAssumptions regarding the underlying representation language
—These assumptions do not hold for LOD representation languages
Can reuse the ideas, not the results themselves
Giorgos Flouris Open Data Tutorials, May 2013 119
Talk Structure (D2)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 120
Challenges and Considerations
List of challenges and problems related to evolution
As well as some answers from the belief change field
Challenges and the complexity of formalisms
Some of the problems do not appear in simpler formalisms (RDF)Some of the problems are only relevant in the presence of schema
—Data changes are simpler (on a fixed schema)
Part of the discussion only relevant for DL, OWL
Giorgos Flouris Open Data Tutorials, May 2013 121
Importance of Implicit Data (Example)
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: RDF
Change: Del([King rdf:type Black])
Is the King Wooden?
Giorgos Flouris Open Data Tutorials, May 2013 122
Importance of Implicit Data
Explicit and implicit data equally important
The coherence viewpointKing is WoodenThe closure of the dataset is
considered during changes—Belief set semantics
Implicit data persistent—Explicit support not necessary for
implicit data
No discrimination—No need to distinguish explicit
data from implicit
—Redundant data can be deleted
Explicit data more important than implicit
The foundational viewpointKing is not WoodenOnly explicit knowledge is
considered during changes—Belief base semantics
Implicit data volatile—Retained only as long as there is
explicit support
Discrimination—Explicit data should be explicitly
marked as such
—Redundant data should persist
Giorgos Flouris Open Data Tutorials, May 2013 123
Redundant Data
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: RDF
Change: Add([King rdf:type Black])
Giorgos Flouris Open Data Tutorials, May 2013 124
The King Is Black
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: RDF
Observation: the King is Black
Change: Add([King rdf:type Black])
Is the King Wooden?
Giorgos Flouris Open Data Tutorials, May 2013 125
Paint It Black
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: RDF
Action: King is painted Black
Change: Add([King rdf:type Black])
Is the King Wooden?
Giorgos Flouris Open Data Tutorials, May 2013 126
Static and Dynamic Worlds
Same dataset, same change, but different expected result
Different semantics between the two cases [KM91]Different operations
Static world change semantics
The world does not change, but our perception of it changes Modeling or conceptualization problems, new observation etc
Dynamic world change semantics
The world changes, and we need to keep ourselves up-to-dateNo problems with the original conceptualization
Giorgos Flouris Open Data Tutorials, May 2013 127
Types of Operations
Static world
Revision (add)Contraction (delete)
Dynamic world
Update (add)Erasure (delete)
Plus some more (forget, expansion, …)
Less well-studiedIgnored for this talkIrrelevant for LOD or trivial
Static Dynamic
Addition Revision Update
Deletion Contraction Erasure
Giorgos Flouris Open Data Tutorials, May 2013 128
Example: Revision and Contraction
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: OWL
Change #1I believe that the King is not BlackAdd([King rdf:type NotBlack],[NotBlack owl:complementOf Black])
Change #2I do not believe that the King is BlackDel([King rdf:type Black])NotBlack
Giorgos Flouris Open Data Tutorials, May 2013 129
Expressing the Change
Different paradigms for expressing the change
Modification-based—“Add([King rdf:type NotBlack], [NotBlack owl:complementOf Black])”
—The exact modifications that should be applied to accommodate the new knowledge
—Must know the conceptualization
—Closer to the ontology expert
Fact-based—“I believe that the King is not Black”
—A new fact that should be accommodated in the dataset
—Extra layer of abstraction (extra step required to determine modifications)
—Closer to the domain expert
Handling multiple changes
Iterated belief changePackage versus choice semantics (contraction and erasure)Merging
Giorgos Flouris Open Data Tutorials, May 2013 130
Evolution Principles (Partial List)
Principle of Success (Primacy of New Information)
New information is unconditionally acceptedNon-prioritized belief change
Principle of Validity (Consistency Maintenance)
Belief change: usually logical consistencyLOD evolution: consistency, coherency, custom constraints, …
Principle of Minimal Change
Determine the side-effects that have minimal impact —But satisfying the other principles
Corresponds to the selection processMinimality depends on the task, context, user, application, …Different postulates and intuitions (recovery, relevance etc)Different metrics (model-based, formula-based, cardinality etc)
Giorgos Flouris Open Data Tutorials, May 2013 131
Understanding the Principles
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: OWL
Wooden and Plastic are disjoint[Wooden owl:disjointClass Plastic]
Change: Add([King rdf:type Plastic])
Invalidity (basically, inconsistency)The King is both Wooden and Plastic
Three options (Minimal Change)
disjoint
Giorgos Flouris Open Data Tutorials, May 2013 132
Non-obvious Side-effects
King
ChessPiece
Plastic Wooden
White Black
Red
Data Level
Schema Level
Chess DatasetRepresentation Language: ALC DL
I don’t believe that all White items are Chess_Pieces
Replace subsumptions with:
White Chess_Piece Plastic⊓ ⊑Plastic White Chess_Piece⊑ ⊔
Plastic
Chess_Piece
WhiteWhite
Giorgos Flouris Open Data Tutorials, May 2013 133
Talk Structure (D3)
A.Introduction to RDF/S, DLs, OWL
B.Remote change management
1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]
C.Repair
1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]
D.Data and Knowledge Evolution
1.Introduction, connection with belief change2.Understanding the process of change3.Literature review
Giorgos Flouris Open Data Tutorials, May 2013 134
Classes of Belief Change Approaches (1/2)
Postulates (one set for each operation)
Formalize the principles, using logical conditionsEssentially define the properties of a rational change operator
—Some principles not considered or given varying semantics—Principle of Minimal Change is the most controversial
Do not uniquely define an operator—A class of operators (expected rational results)—Extra-logical considerations would determine the actual result—Operator-specific (preferences, axiom strength, hard-coded semantics, …)
Belief change context [AGM85, KM91, Han91]Evolution context [FKAC13, WWT10, QLB06a, QLB06b]
Giorgos Flouris Open Data Tutorials, May 2013 135
Classes of Belief Change Approaches (2/2)
Construction methods
Intuitive constructions for a family of operators of a certain typeRepresentation theorems
—Proof that the constructed family corresponds exactly to the class of operators that satisfy a certain set of postulates
Can be used as “templates” to construct rational change operatorsParameterized selection process
—Preferences, axiom strength, etc
Popular in belief change, not so much in evolution
Explicit algorithms
Implement a specific operator that satisfies some of the postulatesHard-coded or parameterized selection processPopular in evolution context, not so much in belief change
Giorgos Flouris Open Data Tutorials, May 2013 136
Discussion on the Operator Types (1/2)
Connections between the various operators
Static: revision/contraction interdefinable [AGM85]Dynamic: update/erasure interdefinable [KM91]Model-theoretic characterization of the connection between
static/dynamic worlds (revision-update, contraction-erasure) [KM91]
Postulates critical for establishing those results
Revision and update more useful in practice
Contraction/erasure only used to express agnosticism
Contraction and erasure more interesting from a theoretical perspective
More fundamental operations
Giorgos Flouris Open Data Tutorials, May 2013 137
Discussion on the Operator Types (2/2)
Revise with φ (in belief change)
Contract ¬φ —This resolves, a priori, any potential inconsistency problems
Add φ (without side-effects)Revise with φ (in LOD)
Contract data that could potentially cause problems—Inconsistency, incoherency, …
Add φ (without side-effects)Contraction is the basis for revision
Simpler operationBasically, if you know how to contract, you know how to reviseMost of the focus in belief change and also in LOD evolution
Same for update/erasure
Giorgos Flouris Open Data Tutorials, May 2013 138
Evolution via EditorsFeatures
Intuitive interfacesEasy to add/delete triples (but not facts)Some help for determining the side-effects of a change
—Embedded reasoners and/or debugging/repair tools to propose side-effectsAdditional facilities
—Versioning, monitoring, undo/redo, …Main problems
User should be both ontology and domain expertNot applicable in some cases
—Examples: automated agents, time-critical applications, massive streaming inputNo formal properties
ExamplesProtégé (http://protege.stanford.edu/)NeOn toolkit (http://neon-toolkit.org/wiki/Main_Page)OntoStudio (http://www.semafora-systems.com/en/products/ontostudio/)KAON2 (http://kaon2.semanticweb.org/)
Giorgos Flouris Open Data Tutorials, May 2013 139
Declarative Approaches
SPARQL Update (http://www.w3.org/TR/sparql11-update/)
For RDFFixed semantics, no side-effectsData and schema operations (also bulk changes)
RUL [MSCK05]
For RDF/S, taking into account RDFS semanticsFixed semantics, predefined set of side-effects per operationOnly for data operations (also bulk changes)
EvoPat [RHTA10]
Declaratively associate changes with side-effects (using SPARQL)SPARQL queries determine whether side-effects should be appliedSPARQL update statements represent such side-effects
Tempus fugit [LRV09]
Event-driven, declarative specification of the operators’ semantics
Giorgos Flouris Open Data Tutorials, May 2013 140
Fixed-Operations Approach
Standard approach in the early days (e.g., [SMMS02])
Set of supported operations (Add_Class, Add_Domain, …)Identify potential problems and side-effects per operation
—Decision is hard-coded or user-defined (from a set of options)
—Example: when deleting a subsumption, how about implicit subsumptions?
Automatic or semi-automatic
Problems
No consensus on the language of changesNo limit on the number of operations
—What about unknown/unsupported operators?
No exhaustive formal analysis of potential side-effectsNo formal properties or other guaranteesIncomplete understanding of the change process
Giorgos Flouris Open Data Tutorials, May 2013 141
Approaches Inspired by Belief Change (1/2)
Revision in ALU DL [LM04]
Using preferences among axiomsInspired by “epistemic entrenchment”
Revision in generic DLs [QD09]
Three model-based revision operators for DLsEmphasis on the Principle of Irrelevance of Syntax
—Semantical, rather than syntactical, considerations should drive the result
Revision in DL-Lite [GQW12]
Using a graph-based algorithmFor data changes only (Abox)
Update and erasure in RDF/S [GHV06, GHV11]
Taking into account RDFS inferenceUpdate is trivial, erasure is challenging (due to RDFS inference)
Giorgos Flouris Open Data Tutorials, May 2013 142
Approaches Inspired by Belief Change (2/2)
Using the maxi-adjustment algorithm [MLB05, QLB06a, QLB06b]
Used to repair inconsistencies in propositional knowledge basesRequires a stratification in the knowledgeAdapted for disjunctive DLs
Using kernel operators [Han94]
Kernels: minimal sets of formulas leading to inconsistency —Minimal Inconsistency Preserving Sub-Tboxes (MIPS) [SC03]
OWL [HWK06]DLs [QHHP08]Generic formalisms with no negation (such as RDF) [RW07]
Giorgos Flouris Open Data Tutorials, May 2013 143
Postulation Approaches in Evolution (1/3)
AGM: dominating paradigm in belief change [AGM85]
The single most influential work in the field of belief changeContributions
AGM postulates: two sets of 6 basic and 2 supplementary postulates
—One set for each operator (revision and contraction)Plus various related results
—Partial meet contraction—Representation theorems—Connections between operators
Only for classical logics (satisfying certain assumptions)
Propositional, first-order, modal logics, …Not for LOD formalisms (RDF/S, DLs, OWL)
Giorgos Flouris Open Data Tutorials, May 2013 144
Postulation Approaches in Evolution (2/3)
AGM contraction postulates adapted for monotonic logics [Flo06, FPA05, FPA06]
Includes all LOD formalismsBut: no satisfying contraction operator exists for many such logicsCannot find a proper result in certain cases
Necessary and sufficient conditions for the existence of such an operator [FPA06, Flo06]
Negative results for RDF/S, OWL, most DLs [FPA05, RWFA13]
Problem stems from the postulate of recovery [AGM85]
Captures the Principle of Minimal ChangeControversial [Han91]
Giorgos Flouris Open Data Tutorials, May 2013 145
Postulation Approaches in Evolution (3/3)
Replacing recovery with optimal recovery [FPA06, FHP+06]
Equivalent to recovery for classical logicsBut weaker in generalNot particularly successful either
Replacing recovery with relevance [Han91]
An intuitive, well-established alternative to recoveryEquivalent with recovery for classical logicsApplicable under quite general conditions [RWFA13]
—Applicable for all compact logics
—Includes RDF/S, practically all DLs and OWL flavors and profiles
Adequate for expressing the principles of contraction in LOD languages
Connections with recovery established for non-classical logics
Giorgos Flouris Open Data Tutorials, May 2013 146
Principle of Adequacy of Representation
Principle of Adequacy of Representation
The evolution result should be expressible in the same formalism as the original dataset
Obvious and trivial
Not always compatible with our requirements for the evolution result
Postulates (e.g., AGM postulates)Specific incarnations of the Principle of Minimal ChangeSpecific computational methods or classes of operators
Two stages for the computation [CGKZ12]
Find the “optimal” evolution result according to the requirementsExpress it in the target language (not always possible)
—Inexpressibility results
Giorgos Flouris Open Data Tutorials, May 2013 147
Inexpressibility for Classes of Operators
Generic contraction methods [CGKZ12]
Syntactic: remove a minimal set of explicit axiomsFormula-based: remove a minimal set of axioms from the closure
—Three different semantics for minimality
Model-based: modify the model in a minimal manner—Eight different methods to find the “minimal” distance between models
Existing contraction algorithms can be categorized along these generic classes of methods
Different contraction methods not compatible in general (for DLs)
Model-based and formula-based are compatible in classical logics
Inexpressibility results for DL-Lite, EL (i.e., OWL2 QL, OWL2 EL) [CGKZ12]
Proposal: a “hybrid” operator combining ideas from syntactic and formula-based approaches [CGKZ12]
Giorgos Flouris Open Data Tutorials, May 2013 148
More Inexpressibility Results
DL-Lite evolution [CKNZ10]
Focusing on model-based and formula-based approaches for contractionInexpressibility resultsPropose a formula-based approach
DL revision [LLMW06]
Model-based approach, limited to Abox only (data level)Inexpressibility resultsPropose a new DL that supports model-based evolution
Approximations
DL-LiteF [GLPR07, GLPR09]
—Update and erasure approximation algorithms for data-level changes only
—Alternative: extend DL-LiteF to make sure that result is expressible
DL-Lite [WWT10]—Provide postulates and approximation algorithms for revision
Giorgos Flouris Open Data Tutorials, May 2013 149
Other ApproachesEvolution using ideas from argumentation frameworks [MRF08]
ALC DLInconsistency in a dataset is an “attack” between argumentsAcceptability semantics used to resolve such attacks and
eliminate inconsistenciesUseful for both debugging and evolution
Evolution can be reduced to debugging/repair [HHH+05]
Apply the changeThen repair the result to resolve problems (Principle of Validity)
—Making sure the change is not “undone” during repair (Principle of Success)
Giorgos Flouris Open Data Tutorials, May 2013 150
Evolution Under Custom Constraints
Evolution in the presence of custom validity constraints [KFAC07, FKAC13]
Methodology
Apply the change (Principle of Success)Guarantee satisfaction of constraints (Principle of Validity)Use a preference to determine minimality (Principle of Minimal Change)
Features
Generic method, applied for RDF/S evolutionA formal expression of the principles for the proposed settingExhaustive method to determine all possible side-effects and identify the
“best” (according to the preference)Constrain allowed preferences for rationality and performance
Based on similar ideas as the repairing approach of [RFC11]
Giorgos Flouris Open Data Tutorials, May 2013 151
Summary and Conclusions: Evolution
The problem of evolution is very challenging
Several issues need to be considered—Not obvious to a newcomer
—Often ignored
Evolution approaches
Direct: manual, based on fixed operators, declarativeIndirect: postulation attemptsAdapted: adapting belief change algorithms or methods
Other possible directions (related to LOD)
Adapt for the “linked” character of LOD —Evolution during propagation or after change detection
—Extra knowledge that can be exploited for adapting preferences, fine-tuning of automated algorithms etc
Giorgos Flouris Open Data Tutorials, May 2013 152
Thank You!
Giorgos Flouris Open Data Tutorials, May 2013 153
References (1/18)
[AAM09] C. Allocca, M. d'Aquin, E. Motta. Detecting Different Versions of Ontologies in Large Ontology Repositories. IWOD-09, 2009.
[ADA98] M.L. Abate, K.V. Diegert, H.W. Allen. A Hierarchical Approach to Improving Data Quality. Data Quality Journal, 4(1), 1998.
[AGM85] C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50:510-530, 1985.
[AH06] S. Auer, H. Herre. A Versioning and Evolution Framework for RDF Knowledge Bases. PSI-06, Revised Papers, 2006.
[BC09] C. Bizer, R. Cyganiak. Quality-driven Information Filtering Using the WIQA Policy Framework. Journal of Web Semantics, 7:1–10, 2009.
[BLHL01] T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web. Scientific American, 2001.
Giorgos Flouris Open Data Tutorials, May 2013 154
References (2/18)
[CGKZ12] B. Cuenca Grau, E. Kharlamov, D. Zheleznyakov. Ontology Contraction: Beyond the Propositional Paradise. AMW-12, 2012.
[CKNZ10] D. Calvanese, E. Kharlamov, W. Nutt, D. Zheleznyakov. Evolution of DL-Lite Knowledge Bases. ISWC-10, 2010.
[CMDZ10] C.A. Curino, H.J. Moon, A. Deutsch, C. Zaniolo. Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++. PVLDB 4(2):117-128, 2010.
[CMZ08] C.A. Curino, H.J. Moon, C. Zaniolo. Graceful Database Schema Evolution: The PRISM Workbench. PVLDB 1(1):761-772, 2008.
[CQ13] G. Cheng, Y. Qu. Relatedness Between Vocabularies on the Web of Data: A Taxonomy and an Empirical Study. Web Semantics: Science, Services and Agents on the World Wide Web, 2013. Available at: http://dx.doi.org/10.1016/j.websem.2013.02.001
Giorgos Flouris Open Data Tutorials, May 2013 155
References (3/18)
[Deu09] A. Deutsch. FOL Modeling of Integrity Constraints (Dependencies). Encyclopedia of Database Systems, 2009.
[DA09] R. Djedidi, M. Aufaure. Change Management Patterns (CMP) for Ontology Evolution Process. IWOD-09, 2009.
Giorgos Flouris Open Data Tutorials, May 2013 156
References (4/18)
[FH10] C. Furber, M. Hepp. Using Semantic Web Resources for Data Quality Management. EKAW-10, 2010.
[FH11] E. Ferme, S.O. Hansson. AGM 25 Years: Twenty-five Years of Research in Belief Change. Journal of Philosophical Logic 40:295-331, 2011.
[FHP+06] G. Flouris, Z. Huang, J.Z. Pan, D. Plexousakis, H. Wache. Inconsistencies, Negations and Changes in Ontologies. AAAI-06, 2006.
[FKAC13] G. Flouris, G. Konstantinidis, G. Antoniou, V. Christophides. Formal Foundations for RDF/S KB Evolution. International Journal on Knowledge and Information Systems, 35(1):153-191, 2013.
[Flo06] G. Flouris. On Belief Change and Ontology Evolution. Ph.D. thesis, University of Crete, 2006.
Giorgos Flouris Open Data Tutorials, May 2013 157
References (5/18)
[FPA05] G. Flouris, D. Plexousakis, G. Antoniou. On Applying the AGM Theory to DLs and OWL. ISWC-05, 2005.
[FPA06] G. Flouris, D. Plexousakis, G. Antoniou. On Generalizing the AGM Postulates. STAIRS-06, 2006.
[FMK+08] G. Flouris, D. Manakanatas, H. Kondylakis, D. Plexousakis, G. Antoniou. Ontology Change: Classification and Survey. Knowledge Engineering Review, 23(2):117-152, 2008.
[FMV10] E. Franconi, T. Meyer, I. Varzinczak. Semantic Diff as the Basis for Knowledge Base Versioning. NMR-10, 2010.
[FRPV+12] G. Flouris, Y. Roussakis, M. Poveda-Villalon, P.N. Mendes, I. Fundulaki. Using Provenance for Quality Assessment and Repair in Linked Open Data. EvoDyn-12, 2012.
Giorgos Flouris Open Data Tutorials, May 2013 158
References (6/18)
[GHV06] C. Gutierrez, C. Hurtado, A. Vaisman. The Meaning of Erasing in RDF Under the Katsuno-Mendelzon Approach. WebDB-06, 2006.
[GHV11] C. Gutierrez, C. Hurtado, A. Vaisman. RDFS Update: From Theory to Practice. ESWC-11, 2011.
[GLPR07] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On the Approximation of Instance Level Update and Erasure in Description Logics. AAAI-07, 2007.
[GLPR09] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On Instance-level Update and Erasure in Description Logic Ontologies. Journal of Logic and Computation 19(5):745-770, 2009.
[GQW12] S. Gao, G. Qi, H. Wang. A New Operator for ABox Revision in DL-Lite. AAAI-12, 2012.
[Gru93] T.R. Gruber. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5 (2), 1993.
Giorgos Flouris Open Data Tutorials, May 2013 159
References (7/18)[Han91] S.O. Hansson. Belief Contraction Without Recovery. Studia Logica
50(2):251-260, 1991.
[Han94] S.O. Hansson. Kernel Contraction. Journal of Symbolic Logic, 59(3):845-859, 1994.
[HGR12] M. Hartung, A. Gross, E. Rahm. COnto-diff: Generation of Complex Evolution Mappings for Life Science Ontologies. Journal of Biomedical Informatics, 2012.
[HH00] J. Heflin, J. Hendler. Dynamic Ontologies on the Web. AAAI-00, 2000.
[HHM+10] H. Halpin, P.J. Hayes, J.P. McCusker, D.L. McGuiness, H.S. Thompson. When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. ISWC-10, 2010.
[HHH+05] P. Haase, F. van Harmelen, Z. Huang, H. Stuckenschmidt, Y. Sure. A Framework for Handling Inconsistency in Changing Ontologies. ISWC-05, 2005.
[HHP+10] A. Hogan, A. Harth, A. Passant, S. Decker, A. Polleres. Weaving the Pedantic Web. LDOW-10, 2010.
[HP04] J. Heflin, J.Z. Pan. A Model Theoretic Semantics for Ontology Versioning. ISWC-04, 2004.
[HS05] Z. Huang, H. Stuckenschmidt. Reasoning with Multi-version Ontologies: A Temporal Logic Approach. ISWC-05, 2005.
[HWK06] C. Halaschek-Wiener, Y. Katz. Belief Base Revision for Expressive Description Logics. OWLED-06, 2006.
Giorgos Flouris Open Data Tutorials, May 2013 160
References (8/18)
[ILK12] D.H. Im, S.W. Lee, H.J. Kim. A Version Management Framework for RDF Triple Stores. International Journal of Software Engineering and Knowledge Engineering, 22(1):85-106, 2012.
[JAP09] M. Javed, Y. Abgaz, C. Pahl. A Pattern-based Framework of Change Operators for Ontology Evolution. OTM-09, 2009.
[Jur74] J.M. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974.
Giorgos Flouris Open Data Tutorials, May 2013 161
References (9/18)[KFAC07] G. Konstantinidis, G. Flouris, G. Antoniou, V. Christophides. Ontology
Evolution: A Framework and its Application to RDF. SWDB-ODBIS-07, 2007.
[KFKO02] M. Klein, D. Fensel, A. Kiryakov, D. Ognyanov. Ontology Versioning and Change Detection on the Web. EKAW-02, 2002.
[KHS12] M. Knuth, J. Hercher, H. Sack. Collaboratively Patching Linked Data. USEWOD-12, 2012.
[KLGE07] N. Keberle, Y. Litvinenko, Y. Gordeyev, V. Ermolayev. Ontology Evolution Analysis with OWL-MeT. IWOD-07, 2007.
[KM91] H. Katsuno, A.O. Mendelzon. On the Difference Between Updating a Knowledge Base and Revising It. KR-91, 1991.
[KN03] M. Klein, N. Noy. A Component-based Framework for Ontology Evolution. IJCAI-03 Workshop on Ontologies and Distributed Systems, CEUR-WS, vol. 71, 2003.
[KPS+06] A. Kalyanpur, B. Parsia, E. Sirin, B. Cuenca Grau. Repairing Unsatisfiable Concepts in OWL Ontologies. ESWC-06, 2006.
[KWW08] B. Konev, D. Walther, F. Wolter. The Logical Difference Problem for Description Logic Terminologies. IJCAR-08, 2008.
[KWZ08] R. Kontchakov, F. Wolter, M. Zakharyaschev. Can you Tell the Difference Between DL-Lite Ontologies? KR-08, 2008.
Giorgos Flouris Open Data Tutorials, May 2013 162
References (10/18)
[LB10] J. Lehmann, L. Buhmann. ORE - A Tool for Repairing and Enriching Knowledge Bases. ISWC-10, 2010.
[LLMW06] H. Liu, C. Lutz, M. Milicic, F. Wolter. Updating Description Logic ABoxes. KR-06, 2006.
[LM04] K. Lee, T. Meyer. A Classification of Ontology Modification. AI-04, 2004.
[LPSV06] S.C. Lam, J. Pan, D. Sleeman, W. Vasconcelos. A Fine-grained Approach to Resolving Unsatisfiable Ontologies. WI-06, 2006.
[LRV09] U. Lusch, S. Rudolph, D. Vrandecic. Tempus Fugit: Towards an Ontology Update Language. ESWC-09, 2009.
Giorgos Flouris Open Data Tutorials, May 2013 163
References (11/18)
[Mel04] S. Melnik. Generic Model Management: Concepts and Algorithms. Springer, 2004.
[MFRW00] D.L. McGuinness, R. Fikes, J. Rice, S. Wilder. An Environment for Merging and Testing Large Ontologies. KR-00, 2000.
[MHS09] B. Motik, I. Horrocks, U. Sattler. Bridging the Gap Between OWL and Relational Databases. Journal of Web Semantics, 7(2):74-89, 2009.
[MLA+12] M. Morsey, J. Lehmann, S. Auer, C. Stadler, S. Hellmann. DBpedia and the Live Extraction of Structured Data from Wikipedia. Program: Electronic library and Information Systems, 46(2):157-181, 2012.
[MLB05] T. Meyer, K. Lee, R. Booth. Knowledge Integration for Description Logics. AAAI-05, 2005.
[MLBP06] T. Meyer, K. Lee, R. Booth, J.Z. Pan. Finding Maximally Satisfiable Terminologies for the Description Logic ALC. AAAI-06, 2006.
Giorgos Flouris Open Data Tutorials, May 2013 164
References (12/18)
[MMB12] P. Mendes, H. Muhleisen, C. Bizer. Sieve: Linked Data Quality Assessment and Fusion. LWDM-12, 2012.
[MMS+03] A. Maedche, B. Motik, L. Stojanovic, R. Studer, R. Volz. An Infrastructure for Searching, Reusing and Evolving Distributed Ontologies. WWW-03, 2003.
[MRF08] M. Moguillansky, N. Rotstein, M. Falappa. A Theoretical Model to Handle Ontology Debugging and Change through Argumentation. IWOD-08, 2008.
[MSCK05] M. Magiridou, S. Sahtouris, V. Christophides, M. Koubarakis. RUL: A Declarative Update Language for RDF. ISWC-05, 2005.
[MWK00] P. Mitra, G. Wiederhold, M.L. Kersten. A Graph-oriented Model for Articulation of Ontology Interdependencies. EDBT-00, 2000.
Giorgos Flouris Open Data Tutorials, May 2013 165
References (13/18)
[NCLM06] N. Noy, A. Chugh, W. Liu, M. Musen. A Framework for Ontology Evolution in Collaborative Environments. ISWC-06, 2006.
[NKKM04] N. Noy, S. Kunnatur, M. Klein, M. Musen. Tracking Changes During Ontology Evolution. ISWC-04, 2004.
[NM00] N.F. Noy, M.A. Musen. Prompt: Algorithm and Tool for Automated Ontology Merging and Alignment. In AAAI/IAAI-00, 2000.
[OK02] D. Ognyanov, A. Kiryakov. Tracking Changes in RDF(S) Repositories. EKAW-02, 2002.
Giorgos Flouris Open Data Tutorials, May 2013 166
References (14/18)
[PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.
[PM10] A. Passant, P.N. Mendes. SparqlPuSH: Proactive Notification of Data Updates in RDF Stores Using PubSubHubbub. SFSW-10, 2010.
[PT05] P. Plessers, O. de Troyer. Ontology Change Detection Using a Version Log. ISWC-05, 2005.
[PT06] P. Plessers, O. de Troyer. Resolving Inconsistencies in Evolving Ontologies. ESWC-06, 2006.
[PTC05] P. Plessers, O. de Troyer, S. Casteleyn. Event-based Modeling of Evolution for Semantic-driven Systems. CAiSE-05, 2005.
[PTC07] P. Plessers, O. de Troyer, S. Casteleyn. Understanding Ontology Evolution: A Change Detection Approach. Web Semantics: Science, Services and Agents on the WWW, 2007.
Giorgos Flouris Open Data Tutorials, May 2013 167
References (15/18)
[QD09] G. Qi, J. Du. Model-based Revision Operators for Terminologies in Description Logics. IJCAI-09, 2009.
[QHHP08] G. Qi, P. Haase, Z. Huang, J.Z. Pan. A Kernel Revision Operator for Terminologies. DL-08, 2008.
[QLB06a] G. Qi, W. Liu, D. Bell. Knowledge Base Revision in Description Logics. JELIA-06, 2006.
[QLB06b] G. Qi, W. Liu, D. Bell. A Revision-based Approach for Handling Inconsistency in Description Logics. NMR-06, 2006.
[QP07] G. Qi, J. Pan. A Stratification-based Approach for Inconsistency Handling in Description Logics. IWOD-07, 2007.
Giorgos Flouris Open Data Tutorials, May 2013 168
References (16/18)[RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing
Policies for Curated KBs. HDMS-11, 2011.
[RH09] T. Ravn, M. Hoedbolt. How to Measure and Monitor the Quality of Master Data. 2009. Available at: http://www.information-management.com/issues/2007_58/master_data_management_mdm_quality-10015358-1.html
[RHTA10] C. Riess, N. Heino, S. Tramp, S. Auer. EvoPat - Pattern-based Evolution and Refactoring of RDF Knowledge Bases. ISWC-10, 2010.
[RPH+12] A. Rula, M. Palmonari, A. Harth, S. Stadtmüller, A. Maurino. On the Diversity and Availability of Temporal Information in Linked Open Data. ISWC-12, 2012.
[RSDT08] T. Redmond, M. Smith, N. Drummond, T. Tudorache. Managing Change: An Ontology Version Control System. OWLED-08, 2008.
[RW07] M.M. Ribeiro, R. Wassermann. Base Revision in Description Logics – Preliminary Results. IWOD-07, 2007.
[RWFA13] M.M. Ribeiro, R. Wassermann, G. Flouris, G. Antoniou. Minimal Change: Relevance and Recovery Revisited. AI Journal (to appear), 2013.
Giorgos Flouris Open Data Tutorials, May 2013 169
References (17/18)[SC03] S. Schlobach, R. Cornet. Non-Standard Reasoning Services for the
Debugging of Description Logic Terminologies. IJCAI-03, 2003.
[SMMS02] L. Stojanovic, A. Maedche, B. Motik, N. Stojanovic. User-driven Ontology Evolution Management EKAW-02, 2002.
[SK03] H. Stuckenschmidt, M. Klein. Integrity and Change in Modular Ontologies. IJCAI-03, 2003.
[SP10] Y. Stavrakas, G. Papastefanatos. Supporting Complex Changes in Evolving Interrelated Web Databanks. CoopIS-10, 2010.
[SSN+10] H. Van de Sompel, R. Sanderson, M.L. Nelson, L.L. Balakireva, H. Shankar, S. Ainsworth. An HTTP-Based Versioning Mechanism for Linked Data. LDOW-10, 2010.
[TSBM10] J. Tao, E. Sirin, J. Bao, D.L. McGuinness. Integrity Constraints in OWL. AAAI-10, 2010.
[TTA08] Y. Tzitzikas, Y. Theoharis, D. Andreou. On Storage Policies for the Semantic Web Repositories that Support Version. ESWC-08, 2008.
[TLZ12] Y. Tzitzikas, C. Lantzaki, D. Zeginis. Blank Node Matching and RDF/S Comparison Functions. ISWC-12, 2012.
Giorgos Flouris Open Data Tutorials, May 2013 170
References (18/18)[VWS+05] M. Volkel, W. Winkler, Y. Sure, S. Kruk, M. Synak. SemVersion: A Versioning
system for RDF and Ontologies. ESWC-05, 2005.
[WHR+05] H. Wang, M. Horridge, A. Rector, N. Drummond, J. Seidenberg. Debugging OWL-DL Ontologies: A Heuristic Approach. ISWC-05, 2005.
[WWT10] Z. Wang, K. Wang, R. Topor. A New Approach to Knowledge Base Revision in DL-Lite. AAAI-10, 2010.
[ZAA+13] F. Zablith, G. Antoniou, M. d’Aquin, G. Flouris, H. Kondylakis, E. Motta, D. Plexousakis, M. Sabou. Ontology Evolution: A Process Centric Survey. Knowledge Engineering Review (to appear).
[ZTC11] D. Zeginis, Y. Tzitzikas, V. Christophides. On Computing Deltas of RDF/S Knowledge Bases. ACM Transactions on the Web (TWEB) 5(3), 2011.
[ZZL+03] Z. Zhang, L. Zhang, C.X. Lin, Y. Zhao, Y. Yu. Data Migration for Ontology Evolution. Poster ISWC-03, 2003.