View
219
Download
0
Embed Size (px)
Citation preview
Motivation
• Manually integrate information sources is painful, because of– Heterogeneous data source– inconsistent / incomplete data
structure information – Platform dependency
A Solution
• Unity – Automate integration process– Uses ODBC to access multiple data sources– X-Spec to capture semantic meaning of the
data --- Standard dictionary– In C++– On Windows platform
Goal of this Project
• Unity JDBC Driver– Embed integration function of Unity into a JDBC
driver
– X-Spec as the dictionary
– in Java – platform independent
– Access multiple data sources through JDBC
Migrating Unity from ODBC to JDBC
Integration Module
JDBCJDBC JDBC …
DB1 DB2 DBn …
ResultsSemantic Queries
User Queries
SQLSQL SQL
Unity JDBC DriverUnity JDBC Driver
User Queries
Integration Module
ODBCODBC ODBC …
DB1 DB2 DBn …
ResultsSemantic Queries
SQLSQLSQL
UnityUnity
Basic classes of JDBC API
DriverManager
Driver
Connection
Statement ResultSet
ResultSetMetaData
registers
provides
creates retrieves
provides
JDBC Driver Types
• JDBC-ODBC Bridges plus ODBC drivers
• Native-API partly-Java drivers
• JDBC-Net pure Java drivers
• Native-protocol pure Java drivers
Semantic Query
• An example SELECT [Employee].id, [Department;Employee].name
WHERE [Employee].age > 30
– All Fields/Tables that have the same semantic meaning should have same semantic name.
– Semantic query refers a field by its semantic name.– There is no explicit relation specifications(from table, join , union) in the query. – X-Spec Document stores information about all semantic names and corresponding system
names for every field/table. – No nested query.– Semantic Query should be parsed to create sub-query in standard form(SQL’92) for each
data source.
Semantic Query GrammarSELECT
ALLDISTINCT
[Column] ,
FROM Tables
WHERE Search Condition
GROUP BY Columns
ORDER BY [Column] ,DESC/ASC
Semantic Query GrammarSearch Condition
Expression [NOT] LIKE “[%] String”
ColumnIS [NOT] NULL
(Expression)
Expression =<><>
Expression
OR AND
NOT
Parsing X-Spec
X-Spec Table 1
X-Spec Table 2
X-Spec Table 3
X-Spec Table n
X-Spec Field 1 X-Spec Field 2 X-Spec Field k …….
X-spec Key 1 X-spec Key 2 X-spec Key j…
X-spec Joins …
.
.
.…
……
Parsing Semantic Query
Query Translator
Semantic query
S_list C_listF_list GroupBy_list OrderBy_list
PASS ONE PASS TWO
Selected Fields (Sys_Name)
Used for integration
Selected X-Spec Fields
Mapping semantic Name toSystem Name; Build sub-query
Sub Query 1 Sub Query 2 Sub Query n……..
Sub-Query Generation
• S-List(Selection-List) Only those semantic fields that are in the data source can be
substituted by their system names, and added to corresponding sub-query selection list
• C-List(Condition-List) An expression can only be added to sub-query condition list only
when all semantic arguments are in the data source.
Sub-Query Generation
S-List
C-List
OrderBy-List
Sub-S-Clause
Sub-From- Clause
Sub-Where- Clause
Sub-Order By- Clause
Inside JDBC DriverSemantic query
Selected Fields (Sys_Name)
Sub Query 1
Sub Query 2
Sub Query n…
Query Translator
DB1 DB2 DBn …
JDBC 1 JDBC 2 JDBC n
ResultSet 1
ResultSet 2
ResultSet n
ResultSet-MetaData
ResultSet-MetaData
ResultSet-MetaData
Join Union
ResultSet
ResultSetMetaData
Integration
Integration Method
• JOIN Merge Join by Global Keys
MultiValue Field – Data inconsistent
• UNION Simply append one ResultSet to another
Do not need match keys
A Simple Example
• Semantic Query (for two data sources) SELECT F1, F2, F3, F4, F5
WHERE C1 AND C2
SELECT f1a, f3a, f5aFROM tableAWHERE C1
SELECT f1b,f2b, f4bFROM tableBWHERE C2
Sub-query for DB1 Sub-query for DB2
Example Result
• Join ResultAssume Key_A and Key_B are two system names of the global KEY in two data sources
KEY_A f1a f3a f5a123
KEY_B f1b f2b f4b234
KEY F1 F2 F3 F4 F51 NULL NULL
34 NULL NULL
2
DB 1 DB 2
Example Result
• Union
f1a f3a f5a f1b b2b f4b
F1 F2 F3 F4 F5NULL NULLNULL NULLNULL NULL
NULL NULLNULL NULLNULL NULL
DB 1 DB 2
Future Work
• Operations cross data sources • Complete Algorithms for Result Integration• Automated Updates on heterogeneous data Sources• Implement Group By, From in semantic query