Distributed Database Distributed Database Management SystemsManagement Systems
ReadingReading
Textbook: Ch. 4Textbook: Ch. 4
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 22
Design IssuesDesign Issues
Placing of data and programs Placing of data and programs (DBMS and application)(DBMS and application)
Network issuesNetwork issues
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 33
Level of SharingLevel of Sharing
No sharingNo sharing Data sharingData sharing Data and program sharingData and program sharing
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 44
Heterogeneous environment!
Top-Down DesignTop-Down Design
Global Conceptual schema Global Conceptual schema distributiondistribution– FragmentationFragmentation– ReplicationReplication– AllocationAllocation
Figure 3.2Figure 3.2
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 55
Correctness of Correctness of FragmentationFragmentation
1.1. Completeness: FCompleteness: FRR={R={R11, …, R, …, Rnn}}
2.2. Reconstruction: R=Reconstruction: R=RRii, , RRiiRR
3.3. Disjointness: Disjointness: – Horizontal: does not Horizontal: does not d djjRRi i such that dsuch that djjRRk k
where kwhere ki i – Vertical: same as horizontal for non-Vertical: same as horizontal for non-
primary key attributesprimary key attributesFarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 66
1&2: Lossless-join (normalization)
Data DirectoryData Directory
Global vs. local conceptual Global vs. local conceptual schemasschemas– How to search?How to search?– Where to store?Where to store?– Single vs. multiple copies? Single vs. multiple copies?
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 77
Current ResearchCurrent Research
Allocation: new requirements, Allocation: new requirements, technology, etc.technology, etc.
Where to store the fragments?Where to store the fragments? Dynamic environmentDynamic environment
– Usage patternUsage pattern– Application characteristicsApplication characteristics– Network changesNetwork changes– SecuritySecurity
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 88
Bottom-Up ApproachBottom-Up Approach
Multi-database systemsMulti-database systems How to integrate them into 1 How to integrate them into 1
database?database?– InteroperabilityInteroperability
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 99
Database IntegrationDatabase Integration
Physical integrationPhysical integration– Materialized database: data Materialized database: data
warehouseswarehouses– Extract-transform-load (ETL) toolsExtract-transform-load (ETL) tools
Logical integrationLogical integration– Virtual (not materialized) Virtual (not materialized)
integrationintegration– Enterprise Information IntegrationEnterprise Information Integration
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1010
Data WarehousesData Warehouses
On-line Analytical Processing On-line Analytical Processing (OLAP) applications:(OLAP) applications:– Decision support systemsDecision support systems– Trend analysis and forecastingTrend analysis and forecasting
Complex queries, large Complex queries, large databasesdatabases
Materialized view maintanenceMaterialized view maintanence
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1111
Logical IntegrationLogical Integration
No materialized global databaseNo materialized global database Virtual integration: data remains at Virtual integration: data remains at
the local (operational) databasesthe local (operational) databases Global conceptual schema may not Global conceptual schema may not
contain everything from local contain everything from local schemasschemas
AutonomousAutonomous and and heterogeneouheterogeneous s local systemslocal systems
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1212
Bottom-Up DesignBottom-Up Design
Global Conceptual Schema (GCS Global Conceptual Schema (GCS or mediated schema)or mediated schema)– Defined first: local conceptual Defined first: local conceptual
schemas (LCS) are mapped to GCSschemas (LCS) are mapped to GCS– Defined during the integration of Defined during the integration of
the LCSs and develop the the LCSs and develop the corresponding mappings from LCSs corresponding mappings from LCSs to the GCSto the GCS
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1313
GCS Defined FirstGCS Defined First Local-as-view (LAV) systemsLocal-as-view (LAV) systems
– Each LCS is treated as a view over the GCSEach LCS is treated as a view over the GCS– Query results: constrained to the objects in the Query results: constrained to the objects in the
local DBs while the GCS definition may be richerlocal DBs while the GCS definition may be richer– Potential incomplete answersPotential incomplete answers
Global-as-view GCS is defined as a set of views Global-as-view GCS is defined as a set of views over the LCSsover the LCSs– View definition defines how to derive elements View definition defines how to derive elements
of the GCSof the GCS– Query results: constrained to the GCS while the Query results: constrained to the GCS while the
local DBs might be richerlocal DBs might be richer
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1414
Design TasksDesign Tasks
Schema translationSchema translation Schema generationSchema generation Figure 4.3Figure 4.3
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1515
Intermediate Intermediate Canonical Canonical RepresentationRepresentation Expressive to incorporate all Expressive to incorporate all
concepts in the local databasesconcepts in the local databases Simple, intuitive, practical, etc. Simple, intuitive, practical, etc. Example: E/R model, relational Example: E/R model, relational
model, graph/tree models, etc.model, graph/tree models, etc. Tools Tools
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1616
Schema GenerationSchema Generation
Schema matching: syntax and Schema matching: syntax and semanticssemantics
Integration of common schema Integration of common schema elementselements
Schema mappingSchema mapping See example 4.1, 4.2See example 4.1, 4.2
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1717
Schema MatchingSchema Matching
Defined or discovered (e.g., web Defined or discovered (e.g., web data)data)
Rules:Rules:– Correspondence between 2 elementsCorrespondence between 2 elements– Predicate whether the Predicate whether the
correspondence holds or notcorrespondence holds or not– Similarity value between the 2 Similarity value between the 2
elementselements
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1818
Finding Finding CorrespondenceCorrespondence Difficult process due to Difficult process due to schema schema
heterogeneity heterogeneity Can be automated?Can be automated?
– Insufficient schema and instance Insufficient schema and instance informationinformation
– Unavailability of schema Unavailability of schema documentationdocumentation
– Subjectivity of matchingSubjectivity of matchingFarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1919
Matching Algorithm Matching Algorithm IssuesIssues Schema vs. instance matchingSchema vs. instance matching
– Concept matchConcept match– Data instance: semantic inconsistenciesData instance: semantic inconsistencies
Element-level vs. structure-level mappingElement-level vs. structure-level mapping– Element name Element name semantics semantics– Multiple attribute mapping?Multiple attribute mapping?
Matching cardinalityMatching cardinality– One-to-one, one-to-many, many-to-manyOne-to-one, one-to-many, many-to-many
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2020
Semantic Schema Semantic Schema Heterogeneity Heterogeneity Semantic: meaning, interpretation, Semantic: meaning, interpretation,
and intended use of dataand intended use of data– Synonyms, homonyms, hypernymsSynonyms, homonyms, hypernyms– Different ontologiesDifferent ontologies– Imprecise wordingImprecise wording
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2121
Structural Schema Structural Schema Heterogeneity Heterogeneity – Type conflict: attribute vs. entityType conflict: attribute vs. entity– Dependency conflict: mapping Dependency conflict: mapping
cardinality inconsistenciescardinality inconsistencies– Key conflict: different primary keys Key conflict: different primary keys – Behavioral conflict: modeling Behavioral conflict: modeling
assumptions, e.g., referential assumptions, e.g., referential integrity, deletion, etc.integrity, deletion, etc.
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2222
Schema IntegrationSchema Integration
BinaryBinary N-aryN-ary
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2323
Schema MappingSchema Mapping
How the data from local How the data from local databases can be mapped to databases can be mapped to GCSGCS
Mapping creatingMapping creating Mapping maintanenceMapping maintanence
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2424
Mapping CreationMapping Creation
Input: LCS, GCS, M (schema Input: LCS, GCS, M (schema matches)matches)
Output: Q={QOutput: Q={Q11, …, Q, …, Qkk} such that} such that
– DBDBGCSGCS = = Q(DB Q(DBCLSCLS))
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2525
Security ObjectivesSecurity Objectives
ConfidentialityConfidentiality IntegrityIntegrity AvailabilityAvailability
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2626
Question 1Question 1
How distributed databases How distributed databases impact the security objectives?impact the security objectives?– Confidentiality in traditional vs. Confidentiality in traditional vs.
distributed DBsdistributed DBs– Integrity in traditional vs. Integrity in traditional vs.
distributed DBsdistributed DBs– Availability in traditional vs. Availability in traditional vs.
distributed DBsdistributed DBs
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2727
IntegrityIntegrity
Correctness criteriaCorrectness criteria– Top-down designTop-down design– Bottom-up designBottom-up design
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2828
AvailabilityAvailability
What are the issues related to What are the issues related to availability when dealing with availability when dealing with – Top-down designTop-down design– Bottom-up designBottom-up design
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 2929
ConfidentialityConfidentiality
(will be covered in 2(will be covered in 2ndnd part of part of semester but…)semester but…)
Centralized vs. distributed Centralized vs. distributed security policysecurity policy– Top-down designTop-down design– Bottom-up designBottom-up design
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 3030
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 3131
Next ClassNext Class
Semantics-based Database Semantics-based Database IntegrationIntegration