43
Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards Stephane Fellah Barry J. Glick Yaser Bishr smartRealm LLC [email protected] Association of American Geographers (AAG) Annual Conference Washington DC, April 15, 2010

Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using

Open Semantic Web Standards

Stephane Fellah

Barry J. Glick

Yaser Bishr

smartRealm [email protected]

Association of American Geographers (AAG) Annual Conference

Washington DC, April 15, 2010

Page 2: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

2

Agenda

Gazetteer overview

Project overview

Open standards used

Geocoding Process

Prototype application

Page 3: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Gazetteer Overview

Page 4: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Role of gazetteer services

ADEPT, Smith, October 1999

Where is …?What’s there?

What happened there?

Books

News

Web

Publications

Archives

Geo referencing using gazetteer services

Data

4

smartRealm LLC confidential 9/16/2009

Page 5: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Semantic gazetteer vs. traditional gazetteers

Semantic gazetteer

Traditional gazetteer services

Multiple classification schemes

Point geometry Multi geometry Geo-spatial semantic relations

Time stamp for features Time stamp for geometries Time stamp for other properties

Semantic disambiguation Profile gazetteer KB capable

Spatial queries (Bbox, AOI) (only points)

5

smartRealm LLC confidential 9/16/2009

Page 6: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Project description

Page 7: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

7

R&D Project Goals

Demonstrate the value of geo-enabling a librarian database….by

Geocoding and spatially indexing a complete librarian database; i.e. ASFA

Implementing geographic search of documents integrated with topic and author search and map-based visualization of results

Assisting users in discovering relevant information by surfacing the controlled vocabularies of ASFA

Testing of prototype by users to assess utility, ease of use, etc

Demonstrate the value of linked data and semantic by: Enabling geospatial reasoning Encoding taxonomies in machine processable format Resolve ambiguity of terms Reusability of linked data

Page 8: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

ASFA: Aquatic Sciences and Fisheries Abstracts

ASFA series is the premier reference in the field of aquatic resources.

Input to ASFA is provided by a growing international network of information centers monitoring over 5,000 serial publications, books, reports, conference proceedings, translations and limited distribution literature.

ASFA is a component of the Aquatic Sciences and Fisheries Information System (ASFIS), formed by four United Nations agency sponsors of ASFA and a network of international and national partners.

1.3 million records encoded in XML.

Page 9: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

ASFIS 6

Descriptors used for subject indexing and retrieval of information on all aspects of aquatic sciences and technology

6267 vocabulary terms allowing the

We used existing SKOS encoding of the taxonomy

Page 10: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

ASFIS 7

Geographic descriptors used in ASFA system Not officially standardized Inconsistencies due manual entries Hardwired in system Goal of this project:

Encode semantically ASFIS7 taxonomy Geocoding of the taxnomy Enable spatial search in ASFA database.

Page 11: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

11

Support Multiple Use Cases

Researcher has a specific research goal: provide a quicker, simpler way to filter results to get to the relevant documents I’m looking for research on coral reef diseases in the western Caribbean region

Researcher has a specific area of interest: allow user to use map or geographic terms to define area of interest and use it to find relevant research I am studying the Danube delta region…what research is available in ASFA for

this area? (and what topics does the research address?) Geo-exploration of research: researcher is interested in a specific topic and

uses the map to explore relevant document. My research interest is oyster farming. Where in the world has research been

conducted on this topic? Others:

Where does a specific author conduct his/her research? Which authors have published the most research on a specific area of interest? What is the geographic distribution of research on a specific topic? (and where

are gaps?)

Page 12: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Open Standards used

Page 13: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

RDF: Graph Representation

Equivalent in relational model

Model minimalist: the TRIPLE Model

association

attribute

LiteralObject

Object

Page 14: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Linked Open Data

Page 15: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Geospatial Semantic Web Architecture

Source: Berners-LeeAAAI July 2006

Geospatial Datatypes

Geospatial Functions

Geospatial Ontology

Extensions

Geospatial Logic

Page 16: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

SKOS

SKOS = Simple Knowledge Organization System

A common data model for sharing and linking knowledge organization systems (KOS) via the Semantic Web.

KOS examples: thesauri, taxonomies, classification schemes, subject heading systems … …

Machine processable and portable representation

Extensible

Page 17: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

SKOS Thesaurus Example

Page 18: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Example of Classification Scheme

90. GEOPHYSICS, ASTRONOMY, AND ASTROPHYSICS

91. Solid Earth physics

91.10.-v Geodesy and gravity

91.10.Pp Gravimetric measurements and instruments

Page 19: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Example of Classification Scheme

Page 20: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

20

Semantic Geo-encoding

Arrange geographic places in an order from most general to most specific, e.g. World/Continent/Country/State or Province/City World/Ocean/Ocean Region/Sea/Bay World/Continent/Country/River or Lake

This allows user to move up and down hierarchy in search and to find related, more specific and more general terms

Also helps in distinguishing geographic place names that are ambiguous, e.g. Mississippi as river vs. Mississippi as state, etc.

Page 21: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Geo-SKOS

Define an extension of SKOS for geospatial concept.

GeoConcept is a subclass of Concept

GeoConcept has location propertyies

Specialization of narrower and broader

Narrower => Narrower-partitive,… Broader => Broader-partitive,… Related => Nearby, SW of, west of,…

Page 22: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Geocoding process

Page 23: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Geocoding Process

ASFAXML

Q3list

Q3extraction

SKOSEncoding

Top Concepts(Countries, Sea Zones)

ASFIS7SKOS

GeocoderGeocoded

ASFIS7SKOS

PostProcessing

(bbox, centroid)

Reasoning

Post-processedGeocoded

ASFIS7SKOS

Indexing

InferredGeocoded

ASFIS7SKOS

ASFIS7Index

IndexingMapping

SmartRealmGazetteer

OracleSpatialIndex

Page 24: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Approach

Encode legacy data from q3 fields in ASFA

Not using Authoritative list because no direct matching between terms

Sea codes not handled in authoritative list

Polygons and linestrings have priority on points

Page 25: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

ASFA Data

<rec id="16" status="1" type="Journal Article" jdf="Q1;Y">

    <ti>Divergence Among Barking Frogs (Eleutherodactylus Augusti) In The    

         Southwestern United States</ti>

    <ab>Barking frogs (Eleutherodactylus augusti) are distributed from southern Mexico along                    the Sierra Madre Occidental into Arizona and the SierraMadre Oriental into Texas and                New Mexico. ....     </ab>

        <pt>Journal Article</pt>

    <q1>

        <term>Amphibiotic species</term>

        <term>Burrowing organisms</term>

        <term>Burrows</term>

        <term>Coloration</term>

        ......

    </q1>

    <q2>

        <term>Anura</term>

      <term>Eleutherodactylus</term>

        <term>Eleutherodactylus augusti</term>

    </q2>

    <q3>

        <term>ISW,Mexico</term>

        <term>USA, Arizona</term>

        <term>USA, New Mexico</term>

     </q3>

Page 26: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Q3 field extraction

*--MED, Turkey, Bursa, Gemlik Bay*--Turkey, Bursa- British-Colimbia- Canada-VancouverA, AmericaA, America, East CoastA, Antarctic Bottom WaterA, AtlanticA, Atlantic PlateA, Atlantic, Antarctic Bottom WaterA, Atlantic, Gulf StreamA, Atlantic, Macaronesian Is.A, Atlantic, Mid-Atlantic RidgeA, Atlantic, Rio Grande PlateauA, Central AtlanticA, Mid-Atlantic BightA, Mid-Atlantic RidgeA, Mid-Atlantic Ridge, Lucky StrikeA, Mid-Atlantic Ridge, Oceanographer Fracture ZoneA, North AtlanticA, Northwest Atlantic BasinA, Rockall TroughA, Sargasso SeaA, Southern Hemispere OceansA, atlanticA,AtlanticAE, AfricaAE, AtlanticAE, Central Atlantic

Page 27: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Challenge: Inconsistent name and conventions

China, Nin gsia Hui Autonomous Region, YinchwanChina, People'S Rep., Hubei Prov., WuhanChina, People's RChina, People's R., Hailung HsienChina, People's Rep, Changjiang DeltaChina, People's. Rep., Xizang, Qing Zang Gaoyuan Plateau

China, Peoples RepChina, Peoples Rep., Fuxian L.China, Peoples rep., Dayawan HuizhouChina, Peoples's Rep., Yunnan Prov., Yuanjiang R.China, Peoples, Rep., Ya-Er L.China, Peoptes Rep. QingdaoChina, Reople's Rep., Yangtze R.China, Rep., Donghu L.China, people's Rep.China, people's rep.Chinea, People's Rep., Three Gorges Reservoir

Page 28: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Challenge: Legacy names

Germany, F.RGermany, F.R., WestphaliaGermany, Fed. RepGermany, Fed. Rep.Germany, Fed. Rep., WestphaliaGermany, Fed.RepGermany, Fed.Rep., WuerttembergGermany, Feldbach BrookGermany, D.R., Wipper RGermany, Dem RepGermany, Dem. RepGermany, Dem. Rep., Helme RGermany, Dem.RepGermany, Dem.Rep., Harz

Page 29: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

SKOS Encoding

asfis7:USA/California/San_Diego_City     

a       skos:Concept ;

      skos:prefLabel "San Diego City"@en .

      skos:altLabel "San Diego Cty."@en ;

      skos:altLabel "San Diego"@en ;     

      skos:broader   asfis7:USA/California ;

      skos:narrower    asfis7:USA/California/San_Diego_City/San_Luis_Rey_River ,                              asfis7:USA/California/San_Diego_City/Point_Loma ,                              asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek ;

      skos:inScheme <http://www.proquest.com/ontologies/asfis7/placescheme>.     

Page 30: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Data sources used

Geonames

Digital Chart of the World (DCW) Countries Admin1 Admin2 World Seas World Rivers Continent World Regions

FAO Geonetwork ASFA Data

Page 31: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

GeonamesRDBMS

KMSSQL Engine

KMSMapping

RDBMSModel

Table/ColumnCountry.shp

FeatureStore

KMSFeature Engine

KMSMapping

FeatureModel

FeatureTypeAttribute

Admin1.shpFeatureStore

KMSFeature Engine

KMSMapping

RDF Graph RDF Graph RDF Graph

FeatureModel

FeatureTypeAttribute

World SeasASFA

Sea ZonesFeatureStore

KMSFeature Engine

KMSMapping

RDF Graph

FeatureModel

FeatureTypeAttribute

Data ProductLayer

HydrologyOntology

AdministrativeDivisionOntology

Feature ModelUpper ontology

OntologicalLayer

Semantic Gazetteer API

DataLayer

Knowledge Integration Approach

Page 32: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Geocoding information

Geometry (polygon, linestring or point)

Centroid

Bounding box

Feature types

Alternate names

Neighbor places (similar to RT)

Page 33: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Geocoded Concept

<http://www.proquest.com/ontologies/asfis7/place/Nepal>

      a       skos:Concept ;

skos:altLabel "State of Nepal", "Neipeal", "Nepalia"...

ft:centroid "POINT (84 28)"^^ks:wkt ;

      

ft:featureType <http://www.geonames.org/ontology#A> ;

      

ft:geometry "MULTIPOLYGON (((82.70109558105469 27.711105346679688, 82.65790557861328 ....... 82.59803771972656 27.69027328491211, 82.571755981445312 27.690410614013672, 82.70109558105469 27.711105346679688)))"^^ks:wktMultiPolygon ;

    

owl:sameAs <http://www.smartrealm.com/gazetteer/feature/countries#NP>

 

Page 34: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Postprocessing

Centroid computed from geometry

Bounding box computed from polygon geometry.

If no polygon, inherit bounding box from parent

Centroid are not inherited

Page 35: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Inferencing

asfis7:USA/California/San_Diego_City      a       skos:Concept ;      skos:prefLabel "San Diego City"@en .      skos:altLabel "San Diego Cty."@en ;

      skos:altLabel "San Diego"@en ;     

skos:broader asfis7:USA/California;     

skos:broaderTransitive  asfis7:USA ,                                                asfis7:USA/California ;      skos:narrower asfis7:USA/California/San_Diego_City/San_Luis_Rey_River ,                    asfis7:USA/California/San_Diego_City/Point_Loma ,                     asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek;     

skos:narrowerTransitive              asfis7:USA/California/San_Diego_City/San_Luis_Rey_River ,              asfis7:USA/California/San_Diego_City/Point_Loma ,              asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek;     

skos:inScheme <http://www.proquest.com/ontologies/asfis7/placescheme>.

Page 36: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

SKOS Indexing

Field indexed in Lucene/Solr Id Type Preferred labels, alternate labels Geometry Centroid Bounding box Narrower, narrower transitive Broader, broader transitive Related Feature types Equivalent terms

Id, centroid and geometries are spatially indexed in Oracle spatial

Page 37: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Prototype Application

Page 38: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards
Page 39: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Advantages of Faceted Search

Lets the user decide how to start, and how to explore and group

After refinement, categories that are not relevant to the current results disappear

Seamlessly integrates keyword search with the organizational structure.

Very easy to expand out (loosen constraints)

Very easy to build up complex queries

Page 40: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Advantages of Faceted Search

Can’t end up with empty results sets (except with keyword search)

Helps avoid feelings of being lost Easier to explore the collection

Helps users infer what kinds of things are in the collection. Evokes a feeling of “browsing the shelves”

Is preferred over standard search for collection browsing in usability studies (Interface must be designed properly)

Page 41: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

41

Geospatial Hierarchical facet

Page 42: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

Benefit of semantic approach

Unique identifier for place

Distinction in search between direct place and indirect place (by transitivity)

Multilingual search

Alternate names search still point to same uri (New York, NYC, Big Apple)

Linkable to other data (reusable for different applications)

Reasoning

Easy integration

Page 43: Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

43

Accomplishments

Geo-semantic enabled ASFA prototype is a breakthrough Not just pins on a map – fully integrated geo-spatial and semantic

search with GIS display and operations Uses geographic knowledge base and map interface to aid search and

discovery Unique aspects:

Tagging research document not just to points, but to linear features and areal regions on the earth’s surface

Allowing for user-defined areas of interest, including polygons Creating a geo-semantic structure for the locations to enable enhanced

search because of inheritance and inference: e.g. if something is tagged with “Naked Island, Alaska” we know that it is part

of North America and USA but also that it is within Prince William Sound which is within the Gulf of Alaska, which is part of the eastern North Pacific ocean region. Thus a search for research on oil spills in Prince William Sound will also include any documents tagged with Naked Island, Alaska even without any explicit mention of Pr. Wm. Sound in the document