Upload
lynhi
View
221
Download
5
Embed Size (px)
Citation preview
I. F. Cruz Data Mining and Exploration Middleware Workshop 1
Semantic Data Integration
Isabel F. CruzUniversity of Illinois at Chicago
I. F. Cruz Data Mining and Exploration Middleware Workshop 2
OverviewMotivation: Two ApplicationsData Heterogeneity, Interoperability, and Data Integration Declarative Data Integration in GISTwo ApplicationsConclusions
I. F. Cruz Data Mining and Exploration Middleware Workshop 3
Application 1 – Presidential Elections 2000
Build aggregates of the data for the fifty states, at different granularity levels: e.g., county, municipality, wardElection data is in XMLEach state has a different structure for its election resultsSingle interface to display the results in all the states
I. F. Cruz Data Mining and Exploration Middleware Workshop 4
Application 2 – Land UseWLIS (Wisconsin Land Information System): web-based system linking data from distributed, heterogeneous data sourcesCase study: land use codesSample query: “Find all the agricultural lands in Dane and Racine counties.”Different authorities use different land use coding systems leading to syntactic and semantic heterogeneities
FEDERAL
STATE
REGIONAL PLANNING COMMISSION
COUNTY
VILLAGETOWNCITY
I. F. Cruz Data Mining and Exploration Middleware Workshop 5
OverviewMotivation: Two ApplicationsData Heterogeneity, Interoperability, and Data Integration Declarative Data Integration in GISTwo ApplicationsConclusions
I. F. Cruz Data Mining and Exploration Middleware Workshop 6
Heterogeneity
Modeling Power
Types of Heterogeneities[Bishr 99]
Syntactic
Schematic
Semantic
I. F. Cruz Data Mining and Exploration Middleware Workshop 7
Heterogeneity
Modeling Power
Types of Heterogeneities[Bishr 99]
Syntactic
Schematic
Semantic
Different data models Different data formats Unifying models, formats
I. F. Cruz Data Mining and Exploration Middleware Workshop 8
Heterogeneity
Modeling Power
Types of Heterogeneities[Bishr 99]
Syntactic
Schematic
Semantic
Different schemasSchema integration
I. F. Cruz Data Mining and Exploration Middleware Workshop 9
Heterogeneity
Modeling Power
Types of Heterogeneities[Bishr 99]
Syntactic
Schematic
SemanticNaming heterogeneity: same entity, different namesMapping using thesaurusCognitive heterogeneity: entity performs multiple roles in different contextsMapping using rules and constraints
I. F. Cruz Data Mining and Exploration Middleware Workshop 10
Levels of Interoperability[Bishr 99]
System A System B
Application Semantics
Data Model
DBMS
Spatial Data Files
Hardware & OS
Network Protocols
INTEROPERABILITY
Application Semantics
Data Model
DBMS
Spatial Data Files
Hardware & OS
Network Protocols Syst
em-le
vel t
o Se
man
tics
I. F. Cruz Data Mining and Exploration Middleware Workshop 11
Levels of Interoperability[Bishr 99]
System A System B
Application Semantics
Data Model
DBMS
Spatial Data Files
Hardware & OS
Network Protocols
INTEROPERABILITY
Application Semantics
Data Model
DBMS
Spatial Data Files
Hardware & OS
Network Protocols Syst
em-le
vel t
o Se
man
tics
• Single ‘virtual’ global data model • Abstraction of all underlying remote databases• Users need not know the location and format of data• Users need to know semantics of the data
I. F. Cruz Data Mining and Exploration Middleware Workshop 12
Levels of Interoperability[Bishr 99]
System A System B
Application Semantics
Data Model
DBMS
Spatial Data Files
Hardware & OS
Network Protocols
INTEROPERABILITY
Application Semantics
Data Model
DBMS
Spatial Data Files
Hardware & OS
Network Protocols Syst
em-le
vel t
o Se
man
tics
• Single ‘virtual’ global data model • Abstraction of all underlying remote databases• Users need not know the location and format of data• Users need not know semantics of the data
I. F. Cruz Data Mining and Exploration Middleware Workshop 13
Global Schema
Source
[Lenzerini 2001]
Application
Local Schema Local Schema
SourceSource
Query
Wrapper Wrapper
Data Integration
Local SchemaWrapper
Mediator
I. F. Cruz Data Mining and Exploration Middleware Workshop 14
Global Schema
Source
Application
Local Schema Local Schema
SourceSource
Query
Wrapper Wrapper
Data Integration
Local SchemaWrapper
MediatorOntology
[Ludäscher et al. 2001]
I. F. Cruz Data Mining and Exploration Middleware Workshop 15
Ontology
Source
Application
Local Ontology Local Ontology
SourceSource
Query
Wrapper Wrapper
Data Integration
Local OntologyWrapper
Mediator
[Fonseca & Egenhofer 99]
I. F. Cruz Data Mining and Exploration Middleware Workshop 16
Global schema consists of views over local schemas
Global querying is easy – sub-query unfolding
Global schema maintenance is difficult
Global Schema
Local Source
Local Source
queryquery MAPPING
Global-As-View (GAV) Model
I. F. Cruz Data Mining and Exploration Middleware Workshop 17
Each local source corresponds to a query over the global schema
Global querying is difficult –inference over partial answers
Global schema maintenance is easy
Global Schema
Local Source
Local Source
queryquery MAPPING
Local-As-View (LAV) Model
I. F. Cruz Data Mining and Exploration Middleware Workshop 18
Modified LAV Model
Commercial
Retail Sales Retail Services
Commercial
Intensive Non-intensive
Ontology Local Ontology
Limited update to the ontologyAlternative classifications are possibleHowever, currently only possible for simple cases
I. F. Cruz Data Mining and Exploration Middleware Workshop 19
Modified LAV ModelLimited update to the ontologyAlternative classifications are possibleHowever, currently only possible for simple cases
Commercial
Retail Sales Retail Services Intensive Non-
intensive
Modified Ontology
I. F. Cruz Data Mining and Exploration Middleware Workshop 20
Semantic Web“The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what the data means to process it . . .
. . . Imagine what computers can understandwhen there is a vast tangle of interconnected terms and data that can automatically be followed.” (Tim
Berners-Lee, Weaving the Web, 1999)
I. F. Cruz Data Mining and Exploration Middleware Workshop 21
Unicode URI
XML + NS + xmlschema
RDF + rdfschema
Ontology Vocabulary
Logic
Proof
TrustD
igita
l Sig
natu
re
Self –described document
Data
Data
Rules
Semantic Web: Architecture(Berners-Lee http://www.w3.org/2000/Talks/1206-xml2k-tbl/)
I. F. Cruz Data Mining and Exploration Middleware Workshop 22
OverviewMotivation: Two ApplicationsData Heterogeneity, Interoperability, and Data Integration Declarative Data Integration in GISTwo ApplicationsConclusions
I. F. Cruz Data Mining and Exploration Middleware Workshop 23
ApproachOntology-driven– Reduces the problem of knowing the contents and the structure of each
data source to the smaller problem of knowing how entities in such sources are mapped to the ontology
– Can replace global schema in a restricted application domain.Declarative rules– Define mappings between concepts– Easy to define and to maintain– Expressed in XML as agreements
(Modified) Local-As-View– Recognizes autonomy of local sources
Query Processing – Leverages standard semantic web technology
I. F. Cruz Data Mining and Exploration Middleware Workshop 24
Ontology
XMLSource
Application
Local Ontology Local Ontology
XMLSource
XMLSource
Query
Wrapper Wrapper
Data Integration
Local OntologyWrapper
MediatorAgreements
I. F. Cruz Data Mining and Exploration Middleware Workshop 25
Agreement DocumentXML document that act as a wrapper layer for the underlying local data sourceStores information about how entities in the ontology map to the entities in the local data source Uses XML to capture the hierarchical ordering of entities and their mappingsSupports query operations using XPath/XSLT to hide details of how data is structured in local data sourceMinimizes need for programmer intervention and maintenance as it is declaratively specified
I. F. Cruz Data Mining and Exploration Middleware Workshop 26
Mapping RulesOne to oneOne to nullParent to childrenOne to manyMany to one
Encoded using XPath/XSLT Expression Templates in the Agreement Document
I. F. Cruz Data Mining and Exploration Middleware Workshop 27
Query Processing – Step 1
XSLT File
User Query
Agreement Document
arguments
references
XPath Expression Template
XSLT File
XPath Expression
I. F. Cruz Data Mining and Exploration Middleware Workshop 28
Query Processing – Step 2
XSLT File
DOM Tree
Apache XSLT Processor
XPath Expression Selected DOM Nodes
Accessor and Aggregation Operations
I. F. Cruz Data Mining and Exploration Middleware Workshop 29
OverviewMotivation: Two ApplicationsData Heterogeneity, Interoperability, and Data Integration Declarative Data Integration in GISTwo ApplicationsConclusions
I. F. Cruz Data Mining and Exploration Middleware Workshop 30
<state name="Illinois">...<county name="ADAMS">
<candidate name="Gore" votes="12197"/><candidate name="Bush" votes="17331"/><candidate name="Nader" votes="371"/><candidate name="Buchanan" votes="140"/><candidate name="Browne" votes="63"/><candidate name="Hagelin" votes="7"/><candidate name="Phillips" votes="0"/><candidate name="McReynolds" votes="0"/><candidate name="Total" votes="30109"/>
</county>...</state>
Application 1 – Presidential Elections 2000
I. F. Cruz Data Mining and Exploration Middleware Workshop 31
Mapping RulesOntology
State
County County
MunicipalityMunicipality
Wisconsin State Hierarchy
State
County County
Municipality
Wardgroup
Municipality
Wardgroup Wardgroup
Children mapping
One-to-one mapping
One-to-one mapping
I. F. Cruz Data Mining and Exploration Middleware Workshop 32
Agreement Document Fragment<xpath-expr>
/state[@name='$state_name']/county[@name='$county_name']/*</xpath-expr>. . . <argument>
<name>$state_name</name> <value>attribute::name</value>
</argument>
<argument><name>$county_name</name> <value>attribute::name</value>
</argument>. . .<operation name="getCandidateNames" aggregation="sset">
<body>child::candidate/attribute::name
</body></operation>
I. F. Cruz Data Mining and Exploration Middleware Workshop 34
There are 72 counties and hundreds of cities and towns in the state; each may have their own system of classifying Land Use codes
Land Use CodeLand Use Code
Land Use Code
Land Use Code
Land Use Code
Application 2 – Land Use
I. F. Cruz Data Mining and Exploration Middleware Workshop 35
Heterogeneity occurs at all levels
Parcel-based example
Each highlighted parcel has its own land use classification code
I. F. Cruz Data Mining and Exploration Middleware Workshop 36
Dane County
Commercial
Retail ServicesRetail Sales
Racine County
Commercial
Retail Sales and Services
NonintensiveIntensiveLand Under
Development
Classification Semantic IssueQuery: Find all Commercial – Sales areas in Racine and Dane County
I. F. Cruz Data Mining and Exploration Middleware Workshop 37
Classification Scheme:Exhaustive Model
009 Shopping Center010 Open Water111 Single Family113 Two Family115 Multiple Family116 Farm Unit129 Group Quarters
140 Mobile Home142 Mobile Home Park190 Seasonal Residence021 Food and Kindred022 Textile and Mill023 Apparel and Related024 Lumber and Wood
I. F. Cruz Data Mining and Exploration Middleware Workshop 38
Classification Scheme:Hierarchical Model
1 Urban and Developed Land1.01 Residential
1.01.01 Single Family Detached or Duplex1.01.02 Mobile Homes Not in Parks1.01.03 Multi-family Dwellings
1.01.03.01 Three Unit Multi-family1.01.03.02 Four Unit Multi-family1.01.03.03 Five or More Multi-family
I. F. Cruz Data Mining and Exploration Middleware Workshop 39
Heterogeneity of Land Use Coding Systems
Classification for Cropland
Farms8110Lu_4_4City of Madison
General AgricultureAALu1Eau Claire County
Other Agriculture815
Cropland Pasture811TagRacine County (SEWRPC)
Cropland Pasture91LucodeDane County RPC
DescriptionLand Use CodeAttributePlanning Authority
I. F. Cruz Data Mining and Exploration Middleware Workshop 40
Heterogeneity of Land Use Coding Systems
Classification for Cropland
Farms8110Lu_4_4City of Madison
General AgricultureAALu1Eau Claire County
Other Agriculture815
Cropland Pasture811TagRacine County (SEWRPC)
Cropland Pasture91LucodeDane County RPC
DescriptionLand Use CodeAttributePlanning Authority
Synonyms
I. F. Cruz Data Mining and Exploration Middleware Workshop 41
Heterogeneity of Land Use Coding Systems
Classification for Cropland
Farms8110Lu_4_4City of Madison
General AgricultureAALu1Eau Claire County
Other Agriculture815
Cropland Pasture811TagRacine County (SEWRPC)
Cropland Pasture91LucodeDane County RPC
DescriptionLand Use CodeAttributePlanning Authority
Synonyms Two Land Use Codes
I. F. Cruz Data Mining and Exploration Middleware Workshop 42
Land Use Mapping (1)
Land Use Code
Agriculture(OA) Industrial (OI)
35
Ontology Dane County Hierarchy
Manufacturing (OIM) Others(OIO)
Land Use Code
Industrial (21-39)
36 21 22
Agriculture(91-99)
Code Explanations:
35 – Scientific Instruments 21 – Food and kindred
36 – Miscellaneous Industrial 22 – Textile and mill
One-to-many mapping
I. F. Cruz Data Mining and Exploration Middleware Workshop 43
Land Use Mapping (2)
Land Use Code
Agriculture(OA) Industrial (OI)
Ontology Dane County Hierarchy
Manufacturing (OIM)
Land Use Code
Industrial (21-39)
31
Agriculture(91-99)
Plastics (OIML) Rubber (OIMR)
Many-to-one mapping
I. F. Cruz Data Mining and Exploration Middleware Workshop 44
Agreement Document Fragment
<attrvalue id=“OIO” mapping=“one-to-many”><localvalue> 35 </localvalue><localvalue> 36 </localvalue>
</attrvalue
‘OIO’ (Industrial – Others) is equivalent to the collection of ’35’ (Scientific Instruments) and ’36’ (Miscellaneous Industrial)
One-to-many
<attrvalue id=“OAWN” mapping=“one-to-null” />
‘OAWN’ (Agriculture – Woodlands – Non-forest) does not have an equivalent
One-to-null
<attrvalue id=“OAC” mapping=“one-to-one” equiv=“91”/>
‘OAC’ (Agriculture – Croplands and Pasture) is directly mapped to the land use code ’91’(Cropland and Pasture)
One-to-one
Example from Dane CountyMapping Type
I. F. Cruz Data Mining and Exploration Middleware Workshop 45
Agreement Document Fragment
<attrvalue id=“OIML” mapping=“many-to-one”equiv=“31”/><attrvalue id=“OIMR” mapping=“many-to-one”equiv=“31”/>
‘OIML’(Industrial-Manufacturing-Plastics) and ‘OIMR’ (Industrial-Manufacturing-Rubber) both are mapped to ’31’(Manufacturing-Rubber and Plastics)
Many-to-one
<attrvalue id=“OAWF” mapping=“children”><attrvalue id=“OAWFC” equiv=“94”/><attrvalue id=“OAWFN” equiv=“99”/>
</attrvalue>
‘OAWFC’ (Agriculture – Woodlands – Forest – Commercial) and ‘OAWFN’ (Agriculture – Woodlands – Forest –Noncommercial) are equivalent to ‘94’ and ’99’ respectively
‘OAWF’ (Agriculture – Woodlands – Forest) is equivalent to the collection of ‘OAWFC’ and ‘OAWFN’
Parent-Children
Example from Dane CountyMapping Type
I. F. Cruz Data Mining and Exploration Middleware Workshop 46
L114Agriculture -Commercial Forest
94
Parcel ID(s)DescriptionLocal Land Use Code
99 L117, L119Agriculture –Woodlands (non-commercial forest)
Dane County ParcelsQuery “Find all parcels classified as Agriculture – Woodlands – Forests
L114
L117
L119
I. F. Cruz Data Mining and Exploration Middleware Workshop 48
Agreement Maker
Visual interface for creating agreements easilyExisting mappings displayed to the user Displayed list of mappings updated as user identifies more mappings
I. F. Cruz Data Mining and Exploration Middleware Workshop 50
OverviewMotivation: Two ApplicationsData Heterogeneity, Interoperability, and Data Integration Declarative Data Integration in GISTwo ApplicationsConclusions
I. F. Cruz Data Mining and Exploration Middleware Workshop 51
Advantages of approachBoth problems do not require advanced tools like rule engines Using just XML to represent the Ontology and the mappings is enough for these applicationsMost of the processing is within the XSLT or XPath engine (including rule resolution).Ontology representation and query processing evolved between applicationsCan be maintained easily without programming knowledgeLocal-as-View approach did not pose problems
I. F. Cruz Data Mining and Exploration Middleware Workshop 52
SummaryData Integration for real-world GIS problems:– Semantic connections among data– Declarative approach– Available standards and tools– Building block for higher levels of integration – Identification of the right “level of complexity”
I. F. Cruz Data Mining and Exploration Middleware Workshop 53
Current/Future WorkIdentification of other types of mappingsOntology mapping and alignment [Cruz & Rajendran2003]Integration across multiple themesDesign of middleware components for semantic data integrationSchematic data integration using semantic information [Cruz & Xiao 2003]Extension to XQuery to handle semantic data heterogeneity [Wiegand et al 2002]
I. F. Cruz Data Mining and Exploration Middleware Workshop 54
Domainspace Concept in XQuery
DOMAINSPACE Area= “www. co.dane.wi.us/Dane.xml,www.co.racine.wi.us/Racine.xml”
<Result>{FOR $b IN document (Area) // *WHERE $b/Area.LandUseCode = “cropland/pasture”RETURN$b
}</Result>
I. F. Cruz Data Mining and Exploration Middleware Workshop 55
AcknowledgmentsSupport provided by – NSF Digital Government Grant – ARDA-NIMA– NSF ITR Grant for Context-aware Computing
At UIC: Afsheen Rajendran, Huiyong Xiao, and William Sunna.At the U. of Wisconsin-Madison: Nancy Wiegand, Steve Ventura, Dan Patterson and Naijun ZhouAxiomap Visualization Tool: Ilya Zaslavsky, ChaitanBaru
I. F. Cruz Data Mining and Exploration Middleware Workshop 56
ReferencesObject Interoperability for Geospatial Applications: A Case Study, I. F. Cruz and P. Calnan. "The Emerging Semantic Web," IOS Press, 2002 (see also www.semanticweb.org/SWDB). Handling Semantic Heterogeneities Using Declarative Agreements, I. F. Cruz, A. Rajendran, W. Sunna, and N. Wiegand. ACM GIS, 168-174, 2002.Querying Heterogeneous Land Use Data Over the Web, N. Wiegand, N. Zhou, I. F. Cruz, A. Rajendran, GIScience 2002.Semantic Data Integration in Hierarchical Domains, I. F. Cruz and A. Rajendran. Semantic Data Integration in Hierarchical Domains, I. F. Cruz and A. Rajendran. IEEE Intelligent Systems 18(2): 66-73, 2003.Exploring a New Approach to the Alignment of Ontologies, I. F. Cruz and A. Rajendran. Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data, 2003.Using a Layered Approach for Interoperability on the Semantic Web, I.F. Cruz and H. Xiao. WISE, 2003.