96
Experiences in the Development of Geographical Ontologies and Linked Data OntoGeo Workhop, Toulouse, 18 November 2010 Oscar Corcho, Luis Manuel Vilches Blázquez, José Angel Ramos Gargantilla {ocorcho,lmvilches,jramos}@fi.upm.es Ontology Engineering Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid Credits: Asunción Gómez-Pérez, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Miguel Angel García, Juan Sequeda and many others Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0

Experiences in the Development of Geographical Ontologies and Linked Data

Embed Size (px)

DESCRIPTION

Keynote at the OntoGeo workshop, held in Toulouse, France, on Nov 18th 2010, collocated with SAGEO2010.

Citation preview

Page 1: Experiences in the Development of Geographical Ontologies and Linked Data

Experiences in the Development of

Geographical Ontologies and Linked Data

OntoGeo Workhop, Toulouse, 18 November 2010

Oscar Corcho, Luis Manuel Vilches Blázquez, José Angel Ramos Gargantilla {ocorcho,lmvilches,jramos}@fi.upm.es

Ontology Engineering Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid

Credits: Asunción Gómez-Pérez, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Miguel Angel García, Juan Sequeda and many others

Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0

Page 2: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 3: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 4: Experiences in the Development of Geographical Ontologies and Linked Data

CGNGG

BCN200BCN25

PhenomenOntology, hydrOntology

Our main goal: Data Integration

Step 1: Building PhenomenOntology

Step 2: Mappings between the catalogues

and the Ontology

Page 5: Experiences in the Development of Geographical Ontologies and Linked Data

• Great variety of sources• Near 20 different producers in Spain (national and local

cartographic institutions with different interest)

• Various degrees of quality and structuring of information

• Natural language ambiguity• Synonymy, polysemy and hyperonymy

• Scale factor

Why ontologies? Geographical Information Context

Page 6: Experiences in the Development of Geographical Ontologies and Linked Data

Different producers have different vocabularies

Page 7: Experiences in the Development of Geographical Ontologies and Linked Data

• Great variety of sources• Various degrees of quality and structuring of

information• ICC has 49 types of features in total • IGN has (only in the hydrographic domain) 40 types of

features

• Natural language ambiguity• Synonymy, polysemy and hyperonymy

• Scale factor

Why ontologies? Geographical Information Context

Page 8: Experiences in the Development of Geographical Ontologies and Linked Data

Feature Catalogues

Base Cartográfica N. (BCN200)

Base Cartográfica N. (BCN25)

Page 9: Experiences in the Development of Geographical Ontologies and Linked Data

• Great variety of sources• Various degrees of quality and structuring of

information• Natural language ambiguity

• Synonymy: Different words with the same meaning» riverside, river bank

• Polysemy: Same word with different meanings. Bank» Bank: Financial institution» Bank: Relay upon (trust)

• Hyperonymy: One word includes other. » Bank and Morgan Bank

• Scale factor

Why ontologies? Geographical Information Context

Page 10: Experiences in the Development of Geographical Ontologies and Linked Data

• Great variety of sources• Various degrees of quality and structuring of

information• Natural language ambiguity

• Synonymy, polysemy and hyperonymy

• Scale factor • E.g., one village may be represented as a point X,Y or as an

area XN,YN

• This can act as a filter for geographical information• Different scales normally present different features• Generalisation processes are normally a problem, due to

the difficulties in finding “feature overlaps” in different feature catalogues

Why ontologies? Geographical Information Context

Page 11: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 12: Experiences in the Development of Geographical Ontologies and Linked Data

O. Specification O. Conceptualization O. ImplementationO. Formalization

1RDF(S)

OWL

Flogic

NeOn Scenarios

Ontology Restructuring

(Pruning, Extension,

Specialization, Modularization)

8

O. Localization

9

Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation;

Configuration Management; Evaluation (V&V); Assessment

1,2,3,4,5,6,7,8, 9

O. Aligning

O. Merging

Alignments5

5

5

Ontological Resource

Reengineering

4

4

4

6

6

6

6

Knowledge Resources

Ontological Resources

O. Design Patterns

2

Non Ontological Resources

Thesauri

DictionariesGlossaries Lexicons

TaxonomiesClassification

Schemas

Non Ontological Resource

Reuse

Non Ontological Resource

Reengineering

2

2

O. Repositories and Registries

Flogic

RDF(S)

OWL

Ontology Design

Pattern Reuse

7

3

Ontological Resource

Reuse

3

Page 13: Experiences in the Development of Geographical Ontologies and Linked Data

NeOn Scenarios

1. Building ontology networks from scratch without reusing existing resources.

2. Building ontology networks by reusing and reengineering non ontological resources.

3. Building ontology networks by reusing ontologies or ontology modules.

4. Building ontology networks by reusing and reengineering ontologies or ontology modules.

5. Building ontology networks by reusing and merging ontology or ontology modules.

6. Building ontology networks by reusing, merging and reengineering ontologies or ontology modules.

7. Building ontology networks by reusing ontology design patterns.8. Building ontology networks by restructuring ontologies or ontology

modules.9. Building ontology networks by localizing ontologies or ontology

modules.

Page 14: Experiences in the Development of Geographical Ontologies and Linked Data

NeOn Methodology

Process and activities covered:

Ontology Specification

Scheduling

Non Ontological Resource Reuse

Non Ontological Resource Reengineering

Reuse General Ontologies

Reuse Domain Ontologies

Reuse Ontology Statements

Reuse Ontology Design Patterns

All processes and activities are described with:

A filling card

A workflow

Examples

Page 15: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 16: Experiences in the Development of Geographical Ontologies and Linked Data

Hydrontology Development

NeOn Methodology for Building Ontology Networks: Specification, Scheduling and Reuse

María del Carmen Suárez de Figueroa Baonza

Page 17: Experiences in the Development of Geographical Ontologies and Linked Data

• One of the INSPIRE aims is to harmonise Geographical information sources to give support to formulating, implementing and evaluating EU policies (e.g., Environmental Management).

• Geographical Information Sources: Databases from EU State Members at local, regional, national and international levels.

INSPIRE as a context for hydrontology

Luis Manuel Vilches Blázquez

Page 18: Experiences in the Development of Geographical Ontologies and Linked Data

INSPIRE - Annexes

Luis Manuel Vilches Blázquez

Page 19: Experiences in the Development of Geographical Ontologies and Linked Data

Information Sources

GEMET

Feature Catalogues

BCN25

BCN200

EGM & ERM CC.AA.

Nomenclátor Geográfico Nacional

Thesauri and Bibliography

WFD

Nomenclátor Conciso

Dictionaries and Monographs

FTT ADL Getty

Luis Manuel Vilches Blázquez

Page 20: Experiences in the Development of Geographical Ontologies and Linked Data

• Glossary of hydrOntology terms.• Feature Catalogues of the Numerical Cartographic Database

(1:25.000; 1:200.000; 1:1.000.000)• Different Feature Catalogue from other local producers.• EuroGlobalMap & EuroRegionalMap• Water Framework Directive• Alexandria Digital Library, Dewey• Thesauri (UNESCO, GEMET, Getty Thesaurus of Geographic

Names, etc.)• National Geographic Gazetteer• Bibliography (Dictionary, Water, Law, etc.)

• This glossary contains more than 120 concepts

Page 21: Experiences in the Development of Geographical Ontologies and Linked Data

Criteria for structuring

• Abstracts concepts from: • Water Framework Directive

• Proposed by the EU Parliament and EU Council • List of hydrographic phenomena definition

• Part of the model from:• SDIGER Project

• INSPIRE pilot project• Two river basins, two countries, two languages

• Several semantic criteria from:• WordNet• Encyclopaedia Britannica• Diccionario de la Real Academia de la Lengua• Wikipedia• Several domain references

• Inheritance: From various actual catalogues• Meetings with domain experts that belong to IGN-E

Page 22: Experiences in the Development of Geographical Ontologies and Linked Data

Ontology Development

hasStatisticalData

on

Ontology

Specification

Legend

hydrOntology

4

FAO

FAO Geopolitical ontology

WGS84

4W3C Vocabulary

GML

4GML Specification

O. Statistics

SCOVO

O. Time

W3C Time

hasLat/Long

hasGeometry

hasLat/Long

hasGeometry

hasLocation/isLocated

Thesaurus

UNESCO

4EGM / ERM

GeoNames

scv:Dimension

scv:Item

scv:Dataset

WGS84 Geo Positioning: an

RDF vocabulary

hydrographical phenomena

(rivers, lakes, etc.)

Ontology for OGC Geography Markup

Language

Vocabulary for instants, intervals,

durations, etc.

Names and international code systems for territories and groups

Page 23: Experiences in the Development of Geographical Ontologies and Linked Data

Modelling the hydrology domain

Nivel superior

Nivel inferior150+ classes, 47 object properties, 64 data properties and 256 axioms.

Page 24: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 25: Experiences in the Development of Geographical Ontologies and Linked Data

Phenomenontology Development

NeOn Methodology for Building Ontology Networks: Specification, Scheduling and Reuse

María del Carmen Suárez de Figueroa Baonza

Page 26: Experiences in the Development of Geographical Ontologies and Linked Data

Knowledge Bases

Conciso Gazetteer

National Geographic Gazetteer

Numerical Cartographic Database (BCN200)

Numerical Cartographic Database (BCN25)

Page 27: Experiences in the Development of Geographical Ontologies and Linked Data

Knowledge Bases

• National Geographic Gazetteer has 14 item types and 460,000 toponyms (Spanish, Galician, Basque, Catalan, and Aranes).

• Conciso Gazetteer, which is agreed with the United Nations Conferences Recommendations on Geographic Names Normalization, has 17 item types and 3667 toponyms.

Conciso Gazetteer

• Gazetteer is a directory of instances of a class or classes of features than contain some information regarding position (ISO 19112)

National Geographic Gazetteer

Page 28: Experiences in the Development of Geographical Ontologies and Linked Data

Knowledge Bases

• BCN25 was designed as a derived product from National Topographic Map and this was built to obtain cartographic information that complies with the required data specifications exploited inside GIS.

• BCN200 was developed through analogical map digitalisation of provincial maps.

• Information is structured in 8 topics (Administrative boundaries, Relief, Hydrography, Vegetation and so on)

• Feature catalogue presents the abstraction of reality, represented in one or more sets of geographic data, as a defined classification of phenomena (ISO 19110)

Numerical Cartographic Database (BCN25)

Numerical Cartographic Database (BCN200)

Page 29: Experiences in the Development of Geographical Ontologies and Linked Data

Catalogue columns:- Group:

0- unfixed

1- road

...

- Code: 3 pair of digits

XXYYZZ

060101

06 Transportation

01 Roads

01 Highway. Axis

- Name:

Highway. Axis

Highway under construction. Axis

...

BCN25 details

Page 30: Experiences in the Development of Geographical Ontologies and Linked Data

Bottom-up process: PhenomenOntology

• Automatic ontology building from BCN25/BTN25

BCN25/BTN25

• Automatic checking of linguistic differences (linsearch): plurals, punctuation marks, capital letters and Spanish signs

• Curation process by expert domain of IGN-E

PhenomenOntology

Page 31: Experiences in the Development of Geographical Ontologies and Linked Data

Criteria for taxonomy creation

• Group (Road, Hydrographic...)• Code column

• (Topic) - (030501)• (Group) – (030501)• (Subgroup) – (030501)

• Common lexical parts• Highway with 2 lines• Highway with 3 lines• Highway under construction

• Highway (superclass)

• Lexical heterogeneity in feature names (“Autovía”, “AUTOVIA”, “Autovia”, “Autovía-”)

Numerical Cartographic Database (BCN25)

Page 32: Experiences in the Development of Geographical Ontologies and Linked Data

BCN25 BTN25

Base Cartográfica N. (BCN25)

Page 33: Experiences in the Development of Geographical Ontologies and Linked Data

BCN25 PhenomenOntology v3.5

03 ¿?

- Componente de río

• Eje

• Margen

• Eje conexión

- Régimen

• Permanente

• No permanente

- Categoría del río

• Desconocida

• Primera

• Segunda

• Tercera

• Cuarta

- Componente del cauce artificial

• Eje

• Margen

• Eje conexión

- Situación

• Desconocido

• Subterráneo

• Superficial

• Elevado

0301 Río 0304 Cauce artificial

Page 34: Experiences in the Development of Geographical Ontologies and Linked Data

• Homogeneising URIs and labels• Exploiting “type” hierarchies• Reducing unnecessary attributes• Incorporating BTN25 definitions as rdfs:comments

Ontology curation

Luis Manuel Vilches Blázquez

Page 35: Experiences in the Development of Geographical Ontologies and Linked Data

35Ontological Engineering Group

Homogeneising URIs and labels

- Meaningless labels from the first level in the hierarchy

Page 36: Experiences in the Development of Geographical Ontologies and Linked Data

36Ontological Engineering Group

Homogeneising URIs and labels

- All class and property names in lowercase

Page 37: Experiences in the Development of Geographical Ontologies and Linked Data

37Ontological Engineering Group

Homogeneising URIs and labels

- Spaces and accents in URIs

Page 38: Experiences in the Development of Geographical Ontologies and Linked Data

38Ontological Engineering Group

Exploiting “type” hierarchies

Attribute “type” normally corresponds to additional taxonomies

Page 39: Experiences in the Development of Geographical Ontologies and Linked Data

39Ontological Engineering Group

Reducing unnecessary/redundant attributes

Page 40: Experiences in the Development of Geographical Ontologies and Linked Data

40Ontological Engineering Group

Completing documentation

Page 41: Experiences in the Development of Geographical Ontologies and Linked Data

Some statistics (from BCN25 to BTN25)

PhenomenOntology 4.0PhenomenOntology 3.6

Page 42: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 43: Experiences in the Development of Geographical Ontologies and Linked Data

• Generic ontology development methodologies can be applied with some success• Hydrontology took a total of 6PM approximately• Initially done by a domain expert after very initial training• Ontology debugging was extremely difficult and has

provided interesting results in this area

• Top down vs bottom up approaches• Large curation process still needed in bottom-up

approaches, which may not advise following it (research ongoing on this)

• More lightweight ontologies with bottom-up approach, although easier to relate to underlying catalogues

• Next steps on relating them to upper-level ontologies (e.g., Dolce) and modularising for improving reusability

Some conclusions in ontology development

Page 44: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 45: Experiences in the Development of Geographical Ontologies and Linked Data

What is the Web of Linked Data?

• An extension of the current Web…• … where information and services

are given well-defined and explicitly represented meaning, …

• … so that it can be shared and used by humans and machines, ...

• ... better enabling them to work in cooperation

• How? • Promoting information exchange by

tagging web content with machine processable descriptions of its meaning.

• And technologies and infrastructure to do this

• And clear principles on how to publish data

data

Page 46: Experiences in the Development of Geographical Ontologies and Linked Data

What is Linked Data?

• Linked Data is a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.

• Part of the Semantic Web• Exposing, sharing and connecting data• Technologies: URIs and RDF (although others are also

important)

Page 47: Experiences in the Development of Geographical Ontologies and Linked Data

47

The four principles (Tim Berners Lee, 2006)

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

4. Include links to other URIs, so that they can discover more things.

• http://www.w3.org/DesignIssues/LinkedData.html

http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

Page 48: Experiences in the Development of Geographical Ontologies and Linked Data

Linked Open Data evolution

2007

2008

2009

Page 49: Experiences in the Development of Geographical Ontologies and Linked Data

LOD clouds

Page 50: Experiences in the Development of Geographical Ontologies and Linked Data

50

Linked Open Data Evolution

Page 51: Experiences in the Development of Geographical Ontologies and Linked Data

How should we publish data?

• Formats in which data is published nowadays…• XML• HTML• DBs• APIs• CSV• XLS• …

• However, main limitations from a Web of Data point of view• Difficult to integrate• Data is not linked to each other, as it happens with Web

documents.

Page 52: Experiences in the Development of Geographical Ontologies and Linked Data

52

How do we publish Linked Data?

1. Exposing Relational Databases or other similar formats into Linked Data• D2R• Triplify• R2O• NOR2O• Virtuoso• Ultrawrap• …

2. Using native RDF triplestores• Sesame• Jena• Owlim• Talis platform• …

3. Incorporating it in the form of RDFa in CMSs like Drupal

Page 53: Experiences in the Development of Geographical Ontologies and Linked Data

How do we consume Linked Data?

• Linked Data browsers• To explore things and datasets and to navigate between them.• Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE),

OpenLink RDF Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco Hyperdata Browser (FU Berlin, DE), Fenfire (DERI, Ireland)

• Linked Data mashups• Sites that mash up (thus combine Linked data)• Revyu.com (KMI, UK), DBtune Slashfacet (Queen Mary, UK),

DBPedia Mobile (FU Berlin, DE), Semantic Web Pipes (DERI, Ireland)

• Search engines• To search for Linked Data.• Falcons (IWS, China), Sindice (DERI, Ireland), MicroSearch

(Yahoo, Spain), Watson (Open University, UK), SWSE (DERI, Ireland), Swoogle (UMBC, USA)

53Listing on this slide by T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, O. Hartig

Page 55: Experiences in the Development of Geographical Ontologies and Linked Data

55

Open Government. USA and UK

TOP-DOWN

BOTTOM-UP

Page 56: Experiences in the Development of Geographical Ontologies and Linked Data

Linked Data Mashup (data.gov)

• Clean Air Status and Trends (CASTNET)• http://data-gov.tw.rpi.edu/demo/exhibit/demo-8-castnet.php

Page 57: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 58: Experiences in the Development of Geographical Ontologies and Linked Data

GeoLinkedData

• It is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data.

• This initiative has started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN-E) and National Statistics Institute (INE)

• http://geo.linkeddata.es

Page 59: Experiences in the Development of Geographical Ontologies and Linked Data

Motivation

» 99.171 % English» 0.019 % Spanish

Source:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/

Thanks to Aidan and Richard

The Web of Data is mainly for

English speakers

Poor presence of Spanish

Page 60: Experiences in the Development of Geographical Ontologies and Linked Data

Related Work

Page 61: Experiences in the Development of Geographical Ontologies and Linked Data

61

Impact of geo.linkeddata.es

• Number of triples in Spanish (July 2010): 1.412.248 • Number of triples in Spanish (September 2010):

21.463.088

Asunción Gómez Pérez

Before geo.linkeddata.es

en 99,1712875

ja 0,463849377

fr 0,05447229

de 0,034225134

pl 0,02532934

it 0,021982542

es 0,019584648

After geo.linkeddata.es

en 94,18744941

es 5,044085342

ja 0,440538697

fr 0,051734793

de 0,032505155

pl 0,024056418

it 0,020877812

Page 62: Experiences in the Development of Geographical Ontologies and Linked Data

Process for Publishing Linked Data on the Web

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 63: Experiences in the Development of Geographical Ontologies and Linked Data

1. Identification and selection of the data sources

Instituto GeográficoNacional

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Basque

Catalan

Galician

Spanish

Page 64: Experiences in the Development of Geographical Ontologies and Linked Data

1. Identification and selection of the data sources

Instituto Nacionalde Estadística

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Province

Year

Page 65: Experiences in the Development of Geographical Ontologies and Linked Data

2. Vocabulary development

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#whichvocabs

This i

s not

enou

gh

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 66: Experiences in the Development of Geographical Ontologies and Linked Data

2. Vocabulary development

• Features• Lightweight :

• Taxonomies and a few properties• Consensuated vocabularies

• To avoid the mapping problems• Multilingual

• Linked data are multilingual

• The NeOn methodology can help to • Re-enginer Non ontological resources into ontologies

• Pros: use domain terminology already consensuated by domain experts

• Withdraw in heavyweight ontologies those features that you don’t need

• Reuse existing vocabularies

66Asunción Gómez Pérez

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 67: Experiences in the Development of Geographical Ontologies and Linked Data

Vocabulary development: Specification

• Content requirements: Identify the set of questions that the ontology should answer• Which one are the provinces in Spain?• Where are the beaches?• Where are the reservoirs?• Identify the production index in Madrid• Which one is the city with higher production index?• Give me Madrid latitude and altitude• ….

• Non-content requirements• The ontology must be in the four official Spanish languages

67Asunción Gómez Pérez

Page 68: Experiences in the Development of Geographical Ontologies and Linked Data

2. Vocabulary development: HydrOntology

68Asunción Gómez Pérez

Page 69: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of RDF

• From the Data sources• Geographic information

(Databases)• Statistic information

(spreadsheets)• Geospatial information

• Different technologies for RDF generation• Reengineering patterns• R20 and ODEMapster• Geometry generation

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 70: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data

INE

NOR2O

ODEMapster

IGN

IGN

Geospatial column

Geometry2RDF

Page 71: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data

• Preliminaries• Select appropriate URIs

• Difficulties• Cumbersome URIs in Spanish

• http://geo.linkeddata.es/ontology/Río• RDF allows UTF-8 characters for URIs• But, Linked Data URIs has to be URLs as well• So, non ASCII-US characters have to be %code

• http://geo.linkeddata.es/ontology/R%C3%ADo

Page 72: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data / instances

• NOR2O is a software library that implements the transformations proposed by the Patterns for Re-engineering Non-Ontological Resources (PR-NOR). Currently we have 16 PR-NORs.

• PR-NORs define a procedure that transforms a Non-Ontological Resource (NOR) components into ontology elements. http://ontologydesignpatterns.org/

NOR2O

· Classification schemes

· Thesauri

· Lexicons

NOR2O

FAO Water classification

· Classification scheme

· Path enumeration data model

· Implemented in a database

Page 73: Experiences in the Development of Geographical Ontologies and Linked Data

NOR2O Modules

73

Page 74: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data – NOR2O

Industry Production Index

Province

Year

NOR2O

Page 75: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data – R2O & ODEMapster

• Creation and execution of R2O Mappings• Check out at http://www.neon-toolkit.org/

Page 76: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data

Page 77: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data – Geometry2RDF

Oracle STO UTIL package

SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry))

AS Gml311Geometry

FROM "BCN200"."BCN200_0301L_RIO" c

WHERE c.Etiqueta='Arroyo'

Page 78: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data – Geometry2RDF

Page 79: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF Data – Geometry2RDF

Page 80: Experiences in the Development of Geographical Ontologies and Linked Data

3. Generation of the RDF data – RDF graphs

• IGN INE

• So far• 7 RDF Named Graphs

BTN25 BCN200 IPI….

http://geo.linkeddata.es/dataset/IGN/BTN25 http://geo.linkeddata.es/dataset/IGN/BCN200 http://geo.linkeddata.es/dataset/INE/IPI

Page 81: Experiences in the Development of Geographical Ontologies and Linked Data

4. Publication of the RDF Data

SPARQL

Pubby

Linked DataHTML

Virtuoso 6.1.0

Pubby 0.3

Including Provenance

Support

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 82: Experiences in the Development of Geographical Ontologies and Linked Data

4. Publication of the RDF Data

Page 83: Experiences in the Development of Geographical Ontologies and Linked Data

4. Publication of the RDF Data - License

• Data Licenses• Official license as published in the Spanish official journal

(BOE - Boletín Oficial del Estado)• Creative Commons options• GNU Free Documentation License

• Each dataset has its own specific license• IGN• INE

Page 84: Experiences in the Development of Geographical Ontologies and Linked Data

5. Data cleansing

• Lack of documentation of the IGN datasets• Broken links: Spain, IGN resources• Lack of documentation of the ontology• Missing english and spanish labels• Building a spanish ontology and importing

some concepts of other ontology (in English):• Importing the English ontology. Add annotations

like a Spanish label to them.• Importing the English ontology, creating new

concepts and properties with a Spanish name and map those to the English equivalents.

• Re-declaring the terms of the English ontology that we need (using the same URI as in the English ontology), and adding a Spanish label.

• Creating your own class and properties that model the same things as the English ontology.

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 85: Experiences in the Development of Geographical Ontologies and Linked Data

6. Linking of the RDF Data

• Silk - A Link Discovery Framework for the Web of Data

• First set of links: Provinces of Spain• 86% accuracy

GeoLinkedDataDBPedia Geonames

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 86: Experiences in the Development of Geographical Ontologies and Linked Data

6. Linking of the RDF Data

• http://geo.linkeddata.es/page/Provincia/Granada

86Asunción Gómez Pérez

Page 87: Experiences in the Development of Geographical Ontologies and Linked Data

7. Enable effective discovery

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 88: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 89: Experiences in the Development of Geographical Ontologies and Linked Data

Provinces

Page 90: Experiences in the Development of Geographical Ontologies and Linked Data

Industry Production Index – Capital of Province

Page 91: Experiences in the Development of Geographical Ontologies and Linked Data

Rivers

Page 92: Experiences in the Development of Geographical Ontologies and Linked Data

Beaches

Page 93: Experiences in the Development of Geographical Ontologies and Linked Data

Future Work

• Generate more datasets from other domains, e.g. universities in Spain.

• Identify more links to DBPedia and Geonames.

• Cover complex geometrical information, i.e. not only Point and LineString-like data; we will also treat information representation through polygons.

Page 94: Experiences in the Development of Geographical Ontologies and Linked Data

• Why did we start developing Geographical Ontologies?

• Methodological guidelines for ontology development• The NeOn Methodology• The development process for Hydrontology• The development process for PhenomenOntology

• Why did we start developing Geographical Linked Data?

• Methodological guidelines for Linked Data generation

• Ontology and Linked Data usage in http://geo.linkeddata.es/

Structure of my Talk

Page 95: Experiences in the Development of Geographical Ontologies and Linked Data

• Reusable ontologies available for the community• Well-founded and well documented• Now working on multilinguality/multiculturality issues• Work continuing in understanding how to provide debugging

tools for domain experts.

• Reusable tools for geospatial Linked Data generation• There is still a lack of understanding of how much

benefit we can get from Linked Geographical Data• Benefits of linking seem to be clear• But geo-processing is still unsolved in RDF, as well as

geometry representation

General conclusions

Luis Manuel Vilches Blázquez

Page 96: Experiences in the Development of Geographical Ontologies and Linked Data

Experiences in the Development of

Geographical Ontologies and Linked Data

OntoGeo Workhop, Toulouse, 18 November 2010

Oscar Corcho, Luis Manuel Vilches Blázquez, José Angel Ramos Gargantilla {ocorcho,lmvilches,jramos}@fi.upm.es

Ontology Engineering Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid

Credits: Asunción Gómez-Pérez, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Miguel Angel García, Juan Sequeda and many others

Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0