Upload
ontologist
View
249
Download
0
Tags:
Embed Size (px)
Citation preview
Ontological Knowledge Engineeringfor Cultural Heritage of Andean Textiles
Immanuel Normann
Department of Computer Science and Information Systems
July 20, 2012
Project Context
● Pre-Columbian Latin America had no writing system
● Alternative encoding systems were developed to pass down cultural knowledge
● Hypothesis: weaving patterns as “writing systems” in this sense
● General research endavour: deciphering these “writing systems”
● Our objective: systematization on knowledge about Andean weaving through ontological approach
● implementation of ontological knowledge system
● instantiation of the system with facts
Project Team
● La Paz
Instituto de Lengua y Cultura Aymara (Denise Y Arnold)
● Domain experts: knowledge acquisition and creation, building physical and virtual models, creating multimedia data.
● Software developer: web front end
● London
● AHRC (Luciana Martins):
principal investigator & domain experts (iconographic analysis)
● Birkbeck DCS (Sven Helmer):
Knowledge engineering + knowledge system implementation
My Role in this Project
Knowledgeengineering
SoftwareengineeringSoftwareengineering
Contentprocessing
My Role in this Project
Knowledgeengineering
SoftwareengineeringSoftwareengineering
Contentprocessing
11
3
2
Overview
Project status at the beginning of my work
● Project proposal intends ontological approach
● LaPaz team already aquainted with ontology related know how:
● Methontology
● Protege, CMap tools
● CIDOC-CRM
● Great amount of knowledge/data in spreadsheets
● Relational database schemes developed.
● other
● handwritten museum register documents
● images, videos, other multimedia documents,
● woven samples
Initial Steps
● identification of central research subdomains and their documents
textiles, instruments, processes, historical/cultural back grounds, iconography, ...
● identification of central docs: concept maps, spreadsheets
● identification of the requirements for the KMS:
● identification of stake holders
● development of use case scenarios
● competency questions
● setting up a communication platform & versioning system
Objeto textil
Materia prima
tiempo
actividades
Lugar
actor
grupo
periodo
instrumentos
fibra
tinte
mordiente
telar
T. horizontal
T. cintura
Rueca, etc.
evento
proceso
movimiento
urdido
tejido
prenda
bien
estructura
técnica
persona
tejedora
S. producción
S. recojo
P. Colonial
P. Contemporáneo, etc.
tiene
se elabora
con
se hizo en
se elabora con
es elaborado
por
se obtiene mediante
estilo
e. universal
e. local/tecnológico
tiene
Vida social
teñido
acabado
hilado
esquila
Objeto textil
apsu
es
es es
es
es
es
eses
es
es
es
pertenece a
es
es
sitio
ruta
es
tiene
S. custodia
imagen
Aprendizaje, etc.
Foto, video
Example Concept Map
Example Competency Questions
● ¿En qué sitios se halla evidencia de la práctica de la técnica x?
● What sites is evidence of the practice of the technique x?
●
● ¿En qué culturas se halla evidencia de la práctica de tal técnica?
● In what cultures is evidence of the practice of the technique x?
●
● ¿Cuál es el registro más antiguo de la técnica T?
● What is the oldest log of the technical T?
●
● ¿En qué tipo de prenda se empleó por primera vez la técnica X?
● What type of garment is employment for the first time the technique X?
●
● ¿Qué tipos de textiles se ha tejido usando la técnica T en un período P y región R?
● What types of textiles has been woven using the technique T in a period P and region R?
Early Results from Requirement Analysis
● How much of ontological reasoning is needed?
● Which system could provide it?
● Early tendency: RDBM.
● RDB schema already defined
● content partially already inserted in RDBM
● most content in spreadsheets
● ideas for simple reasoning developed
(transitivity, ontological queries translated to SQL)
● Does this approach satisfy the requirements?
Against the RDBM approach
● Knowledge in concept maps
● graph like knowledge representation - closer to ontological knowledge representation.
● graph like queries involving some reasoning.
● Dynamik model evolution
● RDBS schema vs. Ontology change.
Relational Database vs. Ontology
Relational database systems
● are perfect to model relationships with a static knowledge model (i.e. static relationship schema)
● schema change is problematic and
● no notion of hierarchies.
Ontology knowledge systems
● allow to store the same datatypes as relational database systems
● allow for modelling relationships
– in a different way closer to concept maps then to relation tables● have a built in notion of hierarchies!
● and allow even more reasoning.
Requirements for Museum KMS
A museum knowledge management system should
● facilitate relations between entities
● have built in support for basic reasoning
● should be flexible w.r.t. the evolution of knowledge model
● facilitate storage of basic datatypes (numbers, boolean, ...), free text, and multimedia.
Conclusion
● the RDB approach is insufficient w.r.t. model evolution and reasoning.
● Ontological storage engine required.
● Which is the best for our purpose?
Review of Triplestores
State of the art surveys:
● http://www.w3.org/wiki/LargeTripleStores
● Europeana RDF Store Report (2011)
● An incomplete list of triple stores:
● Native stores: AllegroGrah, OWLIM, stardog
● RDBMS based: Oracle, Jena SDB
● hybrid: Virtuoso, Sesame, BigData
Our Decision: Virtuoso
● why virtuoso:
● multi paradigm storage: RDBM (SQL), XML (XQuery), OWL (SPARQL), reasoning.
● scalable, massive data processing, stable, opensource edition, active community.
● some know how from former projects
● may be drawbacks:
● too many ways to implement a knowledge base.
● manual 4000 pages.
● reasoning capabilities beyond reasoners like Pellet.
Ontology in a nutshell
● unary constructs:
● individuals (e.g. the textile object whose ID is ILCA_BML074)
● class (e.g. the set of all garment classified as Poncho)
● binary constructs:
● object property = relation between individuals (e.g. in custody of: textile object ILCA_BML074 is in custody of the British Museum)
● data property = attribute of an individual (e.g. has width: textile object ILCA_BML074 has width 52 cm)
● instance of (type) = a relation between individuals classes (e.g. textile object ILCA_BML074 is an instance of the class Facha Ancha)
● subclass relation = relation between classes (e.g. Facha Ancha is a subclass of Accesorios)
● and even more like: union, intersection, complement, quantification, number restriction, ...
Ontology Schema and Facts
Ontology schema (TBox)
● subclass relations (e.g. Poncho is subclass of Producto Textil)
● domain and range restrictions of
● object properties (e.g. in custody of has domain Producto Textil and as range Museum)
● data properties (e.g. has width has domain Producto Textil and cm as range)
Ontology facts (ABox)
● all relations involving individuals (instance of, object properties, data properties)
Abstract Entities
● Abstracts entities: don't exist in space or in time.
● Concrete entities exist at least in time. For example:
● physical objects (like garments, books, etc.)
● events (like the production of a certain garment)
● Entities like colour, material, and shape are rather time independent.
● what is the appropriate way to model abstract entities?
In OWL we have only two options: as classes or instances.
● For concrete entities it is easy:
● my jacket I am wearing is an instance of the class of all Jackets which is a subclass of physical objects.
● the discovery of Machu Picchu by Hiram Bingham is an instance of the class of all discoveries which is a subclass of events.
Abstract Entities
● What about abstract entities: can they have subclasses or instances? For example colours:
– is the red we see here one instance and the red we see there another instance?
– If so, isn't it inconsistent to say that they are both the same reds? (we introduced the concept of colour coccurrence).
– is red a unique colour or a class of colors whose instances are e.g. dark-red, orange-red.
– aren't dark-red and orange-red rather themselves classes of reds?
– are there at all colours that are not subdividable into more granular colour values? (we chose to stop at RGB. For physicians wave lenght would make more sense).
Semi Abstract Entities
● structure, technique, motive:
● not localized in space: possibly at two different place at the same time.
● not localized in time: may exist even if currently not applied or observed.
● but: techniques / motives are invented and can be forgotten
● epoch and style
● seem to be clearly bound to a certain time period, but
● at least styles may revive at any time.
● epoch is a highly debated concept anyway.
Anonymous Entities
● How should we formalize “Poncho p1 is made of Alpaca”?
The naive way:
p1 made_of a1. p1 type Poncho. a1 type Alpaca.
p1 is a concrete object we can point to. What about a1?
● Consider: “Poncho p2 is also made of Alpaca”.
p2 made_of a2. p2 type Poncho. a2 type Alpaca.
Is a1=a2 or not?
We don't know and we don't care!
Anonymous Entities
● Proper formalization of “Poncho p1 is made of Alpaca”:
p1 type (made_of some Alpaca)
● meaning:
● p1 is an instance of the class (made_of some Alpaca)
● (made_of some Alpaca) is the class of all x such that there exists and an a which is an instance of Alpaca.
short: “p1 is made of some instance of Alpaca”
Limited Reasoning in Virtuoso
● (made_of some Alpaca) is quantified class expression
(some is its quantifier)
● Problem with Virtuoso: it accepts quantified expressions, but does not support reasoning on them.
● Example:
p1 type (made_of some Alpaca)
Alpaca subClassOf Camelido
=> p1 type (made_of some Camelido)
● Virtuoso cannot infer this conclusion.
Prototypes as Workaround
Workaround for the Quantification Problem
● introduce a class Prototype
● create for every class (if needed) a dedicated instance of prototype.
● Example:
alpaca type Prototype. alpaca type Alpaca.
alpaca prototype_for Alpaca.
Prototypes as Workaround
Reasoning via prototypes
● Replace p1 type (made_of some Alpaca)
by p1 made_of alpaca.
● Now Virtuoso can deduce:
p1 made_of alpaca. Alpaca subClassOf Camelido.
=> p1 made_of ?x. ?x type Camelido.
● Note:
● prototypes, in contrast to regular physical individuals, are not located in space and time ( => modeling conflict )
● alpaca prototype_for Alpaca is not OWL conform.
Ontological Mistakes
Confusing subclass and instance with part of:
● lake Titicaca is a spatial part of the Andes, but not a subclass of it.
● weaving is temporal part of garment production (dying another one), but neither an instance nor a subclass of it.
● part of is a super property of spatial- and temporal-part of.
Confusing subclass with instance:
● Poncho (as indefinite word) is not an instance of garment but a subclass: the class of all concrete ponchos.
Ontological Mistakes
Confusing determined with undetermined objects:
● in “this poncho (p1) is made of Alpaca”
Alpaca should not be modelled as a certain instance of the class Alpaca!
Confusing equivalence with synonymy and/or translations:
● if cloak same as manto and manto same as coat,
then cloak same as coat.
● if chair same as Sessel and Sessel same as armchair,
then chair same as armchair.
Related Work
Controlled vocabularies:
● Getty Thesaurus of Geographic Names (TGN),
● Cataloging Cultural Objects (CCO),
● Categories for the Description of Works of Art (CDWA)
Foundational Ontologies:
● The CIDOC Conceptual Reference Model (CRM):
concepts and relationships used in cultural heritage documentation.
● DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering)
Linking open data (LOD):
● dbpedia, freebase, geonames, ... (http://linkeddata.org/)
● Linked Data and SPARQL service of British Museum
Migration of Knowledge Representations
Separation of knowledge modelling:
● TBox knowledge created with graph drawing tools (http://www.yworks.com)
● ABox facts created in spreadsheets
Technical challenges:
● migration to target format for TBox and ABox: RDF triples (source node - link - target node)
● TBox migration: easy
● ABox migration: difficult - due to irregular spreadsheets
● TBox & ABox vocabulary alignment: tedious