Upload
stephane-fellah
View
1.024
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using
Open Semantic Web Standards
Stephane Fellah
Barry J. Glick
Yaser Bishr
smartRealm [email protected]
Association of American Geographers (AAG) Annual Conference
Washington DC, April 15, 2010
2
Agenda
Gazetteer overview
Project overview
Open standards used
Geocoding Process
Prototype application
Gazetteer Overview
Role of gazetteer services
ADEPT, Smith, October 1999
Where is …?What’s there?
What happened there?
Books
News
Web
Publications
Archives
Geo referencing using gazetteer services
Data
4
smartRealm LLC confidential 9/16/2009
Semantic gazetteer vs. traditional gazetteers
Semantic gazetteer
Traditional gazetteer services
Multiple classification schemes
Point geometry Multi geometry Geo-spatial semantic relations
Time stamp for features Time stamp for geometries Time stamp for other properties
Semantic disambiguation Profile gazetteer KB capable
Spatial queries (Bbox, AOI) (only points)
5
smartRealm LLC confidential 9/16/2009
Project description
7
R&D Project Goals
Demonstrate the value of geo-enabling a librarian database….by
Geocoding and spatially indexing a complete librarian database; i.e. ASFA
Implementing geographic search of documents integrated with topic and author search and map-based visualization of results
Assisting users in discovering relevant information by surfacing the controlled vocabularies of ASFA
Testing of prototype by users to assess utility, ease of use, etc
Demonstrate the value of linked data and semantic by: Enabling geospatial reasoning Encoding taxonomies in machine processable format Resolve ambiguity of terms Reusability of linked data
ASFA: Aquatic Sciences and Fisheries Abstracts
ASFA series is the premier reference in the field of aquatic resources.
Input to ASFA is provided by a growing international network of information centers monitoring over 5,000 serial publications, books, reports, conference proceedings, translations and limited distribution literature.
ASFA is a component of the Aquatic Sciences and Fisheries Information System (ASFIS), formed by four United Nations agency sponsors of ASFA and a network of international and national partners.
1.3 million records encoded in XML.
ASFIS 6
Descriptors used for subject indexing and retrieval of information on all aspects of aquatic sciences and technology
6267 vocabulary terms allowing the
We used existing SKOS encoding of the taxonomy
ASFIS 7
Geographic descriptors used in ASFA system Not officially standardized Inconsistencies due manual entries Hardwired in system Goal of this project:
Encode semantically ASFIS7 taxonomy Geocoding of the taxnomy Enable spatial search in ASFA database.
11
Support Multiple Use Cases
Researcher has a specific research goal: provide a quicker, simpler way to filter results to get to the relevant documents I’m looking for research on coral reef diseases in the western Caribbean region
Researcher has a specific area of interest: allow user to use map or geographic terms to define area of interest and use it to find relevant research I am studying the Danube delta region…what research is available in ASFA for
this area? (and what topics does the research address?) Geo-exploration of research: researcher is interested in a specific topic and
uses the map to explore relevant document. My research interest is oyster farming. Where in the world has research been
conducted on this topic? Others:
Where does a specific author conduct his/her research? Which authors have published the most research on a specific area of interest? What is the geographic distribution of research on a specific topic? (and where
are gaps?)
Open Standards used
RDF: Graph Representation
Equivalent in relational model
Model minimalist: the TRIPLE Model
association
attribute
LiteralObject
Object
Linked Open Data
Geospatial Semantic Web Architecture
Source: Berners-LeeAAAI July 2006
Geospatial Datatypes
Geospatial Functions
Geospatial Ontology
Extensions
Geospatial Logic
SKOS
SKOS = Simple Knowledge Organization System
A common data model for sharing and linking knowledge organization systems (KOS) via the Semantic Web.
KOS examples: thesauri, taxonomies, classification schemes, subject heading systems … …
Machine processable and portable representation
Extensible
SKOS Thesaurus Example
Example of Classification Scheme
90. GEOPHYSICS, ASTRONOMY, AND ASTROPHYSICS
91. Solid Earth physics
91.10.-v Geodesy and gravity
91.10.Pp Gravimetric measurements and instruments
Example of Classification Scheme
20
Semantic Geo-encoding
Arrange geographic places in an order from most general to most specific, e.g. World/Continent/Country/State or Province/City World/Ocean/Ocean Region/Sea/Bay World/Continent/Country/River or Lake
This allows user to move up and down hierarchy in search and to find related, more specific and more general terms
Also helps in distinguishing geographic place names that are ambiguous, e.g. Mississippi as river vs. Mississippi as state, etc.
Geo-SKOS
Define an extension of SKOS for geospatial concept.
GeoConcept is a subclass of Concept
GeoConcept has location propertyies
Specialization of narrower and broader
Narrower => Narrower-partitive,… Broader => Broader-partitive,… Related => Nearby, SW of, west of,…
Geocoding process
Geocoding Process
ASFAXML
Q3list
Q3extraction
SKOSEncoding
Top Concepts(Countries, Sea Zones)
ASFIS7SKOS
GeocoderGeocoded
ASFIS7SKOS
PostProcessing
(bbox, centroid)
Reasoning
Post-processedGeocoded
ASFIS7SKOS
Indexing
InferredGeocoded
ASFIS7SKOS
ASFIS7Index
IndexingMapping
SmartRealmGazetteer
OracleSpatialIndex
Approach
Encode legacy data from q3 fields in ASFA
Not using Authoritative list because no direct matching between terms
Sea codes not handled in authoritative list
Polygons and linestrings have priority on points
ASFA Data
<rec id="16" status="1" type="Journal Article" jdf="Q1;Y">
<ti>Divergence Among Barking Frogs (Eleutherodactylus Augusti) In The
Southwestern United States</ti>
<ab>Barking frogs (Eleutherodactylus augusti) are distributed from southern Mexico along the Sierra Madre Occidental into Arizona and the SierraMadre Oriental into Texas and New Mexico. .... </ab>
<pt>Journal Article</pt>
<q1>
<term>Amphibiotic species</term>
<term>Burrowing organisms</term>
<term>Burrows</term>
<term>Coloration</term>
......
</q1>
<q2>
<term>Anura</term>
<term>Eleutherodactylus</term>
<term>Eleutherodactylus augusti</term>
</q2>
<q3>
<term>ISW,Mexico</term>
<term>USA, Arizona</term>
<term>USA, New Mexico</term>
</q3>
Q3 field extraction
*--MED, Turkey, Bursa, Gemlik Bay*--Turkey, Bursa- British-Colimbia- Canada-VancouverA, AmericaA, America, East CoastA, Antarctic Bottom WaterA, AtlanticA, Atlantic PlateA, Atlantic, Antarctic Bottom WaterA, Atlantic, Gulf StreamA, Atlantic, Macaronesian Is.A, Atlantic, Mid-Atlantic RidgeA, Atlantic, Rio Grande PlateauA, Central AtlanticA, Mid-Atlantic BightA, Mid-Atlantic RidgeA, Mid-Atlantic Ridge, Lucky StrikeA, Mid-Atlantic Ridge, Oceanographer Fracture ZoneA, North AtlanticA, Northwest Atlantic BasinA, Rockall TroughA, Sargasso SeaA, Southern Hemispere OceansA, atlanticA,AtlanticAE, AfricaAE, AtlanticAE, Central Atlantic
Challenge: Inconsistent name and conventions
China, Nin gsia Hui Autonomous Region, YinchwanChina, People'S Rep., Hubei Prov., WuhanChina, People's RChina, People's R., Hailung HsienChina, People's Rep, Changjiang DeltaChina, People's. Rep., Xizang, Qing Zang Gaoyuan Plateau
China, Peoples RepChina, Peoples Rep., Fuxian L.China, Peoples rep., Dayawan HuizhouChina, Peoples's Rep., Yunnan Prov., Yuanjiang R.China, Peoples, Rep., Ya-Er L.China, Peoptes Rep. QingdaoChina, Reople's Rep., Yangtze R.China, Rep., Donghu L.China, people's Rep.China, people's rep.Chinea, People's Rep., Three Gorges Reservoir
Challenge: Legacy names
Germany, F.RGermany, F.R., WestphaliaGermany, Fed. RepGermany, Fed. Rep.Germany, Fed. Rep., WestphaliaGermany, Fed.RepGermany, Fed.Rep., WuerttembergGermany, Feldbach BrookGermany, D.R., Wipper RGermany, Dem RepGermany, Dem. RepGermany, Dem. Rep., Helme RGermany, Dem.RepGermany, Dem.Rep., Harz
SKOS Encoding
asfis7:USA/California/San_Diego_City
a skos:Concept ;
skos:prefLabel "San Diego City"@en .
skos:altLabel "San Diego Cty."@en ;
skos:altLabel "San Diego"@en ;
skos:broader asfis7:USA/California ;
skos:narrower asfis7:USA/California/San_Diego_City/San_Luis_Rey_River , asfis7:USA/California/San_Diego_City/Point_Loma , asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek ;
skos:inScheme <http://www.proquest.com/ontologies/asfis7/placescheme>.
Data sources used
Geonames
Digital Chart of the World (DCW) Countries Admin1 Admin2 World Seas World Rivers Continent World Regions
FAO Geonetwork ASFA Data
GeonamesRDBMS
KMSSQL Engine
KMSMapping
RDBMSModel
Table/ColumnCountry.shp
FeatureStore
KMSFeature Engine
KMSMapping
FeatureModel
FeatureTypeAttribute
Admin1.shpFeatureStore
KMSFeature Engine
KMSMapping
RDF Graph RDF Graph RDF Graph
FeatureModel
FeatureTypeAttribute
World SeasASFA
Sea ZonesFeatureStore
KMSFeature Engine
KMSMapping
RDF Graph
FeatureModel
FeatureTypeAttribute
Data ProductLayer
HydrologyOntology
AdministrativeDivisionOntology
Feature ModelUpper ontology
OntologicalLayer
Semantic Gazetteer API
DataLayer
Knowledge Integration Approach
Geocoding information
Geometry (polygon, linestring or point)
Centroid
Bounding box
Feature types
Alternate names
Neighbor places (similar to RT)
Geocoded Concept
<http://www.proquest.com/ontologies/asfis7/place/Nepal>
a skos:Concept ;
skos:altLabel "State of Nepal", "Neipeal", "Nepalia"...
ft:centroid "POINT (84 28)"^^ks:wkt ;
ft:featureType <http://www.geonames.org/ontology#A> ;
ft:geometry "MULTIPOLYGON (((82.70109558105469 27.711105346679688, 82.65790557861328 ....... 82.59803771972656 27.69027328491211, 82.571755981445312 27.690410614013672, 82.70109558105469 27.711105346679688)))"^^ks:wktMultiPolygon ;
owl:sameAs <http://www.smartrealm.com/gazetteer/feature/countries#NP>
Postprocessing
Centroid computed from geometry
Bounding box computed from polygon geometry.
If no polygon, inherit bounding box from parent
Centroid are not inherited
Inferencing
asfis7:USA/California/San_Diego_City a skos:Concept ; skos:prefLabel "San Diego City"@en . skos:altLabel "San Diego Cty."@en ;
skos:altLabel "San Diego"@en ;
skos:broader asfis7:USA/California;
skos:broaderTransitive asfis7:USA , asfis7:USA/California ; skos:narrower asfis7:USA/California/San_Diego_City/San_Luis_Rey_River , asfis7:USA/California/San_Diego_City/Point_Loma , asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek;
skos:narrowerTransitive asfis7:USA/California/San_Diego_City/San_Luis_Rey_River , asfis7:USA/California/San_Diego_City/Point_Loma , asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek;
skos:inScheme <http://www.proquest.com/ontologies/asfis7/placescheme>.
SKOS Indexing
Field indexed in Lucene/Solr Id Type Preferred labels, alternate labels Geometry Centroid Bounding box Narrower, narrower transitive Broader, broader transitive Related Feature types Equivalent terms
Id, centroid and geometries are spatially indexed in Oracle spatial
Prototype Application
Advantages of Faceted Search
Lets the user decide how to start, and how to explore and group
After refinement, categories that are not relevant to the current results disappear
Seamlessly integrates keyword search with the organizational structure.
Very easy to expand out (loosen constraints)
Very easy to build up complex queries
Advantages of Faceted Search
Can’t end up with empty results sets (except with keyword search)
Helps avoid feelings of being lost Easier to explore the collection
Helps users infer what kinds of things are in the collection. Evokes a feeling of “browsing the shelves”
Is preferred over standard search for collection browsing in usability studies (Interface must be designed properly)
41
Geospatial Hierarchical facet
Benefit of semantic approach
Unique identifier for place
Distinction in search between direct place and indirect place (by transitivity)
Multilingual search
Alternate names search still point to same uri (New York, NYC, Big Apple)
Linkable to other data (reusable for different applications)
Reasoning
Easy integration
43
Accomplishments
Geo-semantic enabled ASFA prototype is a breakthrough Not just pins on a map – fully integrated geo-spatial and semantic
search with GIS display and operations Uses geographic knowledge base and map interface to aid search and
discovery Unique aspects:
Tagging research document not just to points, but to linear features and areal regions on the earth’s surface
Allowing for user-defined areas of interest, including polygons Creating a geo-semantic structure for the locations to enable enhanced
search because of inheritance and inference: e.g. if something is tagged with “Naked Island, Alaska” we know that it is part
of North America and USA but also that it is within Prince William Sound which is within the Gulf of Alaska, which is part of the eastern North Pacific ocean region. Thus a search for research on oil spills in Prince William Sound will also include any documents tagged with Naked Island, Alaska even without any explicit mention of Pr. Wm. Sound in the document