27
IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida [email protected] Distributed Databases (DDB) Introduction to DDB Three DDB Exercises General Electric J.C. Penney Stores – UPS Integrating Autonomous Systems Replication Replication Exercise Implementing Replication

IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida [email protected] Distributed Databases (DDB) Introduction

Embed Size (px)

Citation preview

Page 1: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

1Dr. Lawrence West, Management Dept., University of Central [email protected]

Distributed Databases (DDB)

• Introduction to DDB

• Three DDB Exercises

– General Electric

– J.C. Penney Stores

– UPS

• Integrating Autonomous Systems

• Replication

• Replication Exercise

• Implementing Replication

Page 2: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

2Dr. Lawrence West, Management Dept., University of Central [email protected]

Distributed Databases—Introduction

• What is a Distributed Database (DDB)?

• How many ways can a database be distributed?(And for how many reasons?)

Page 3: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

3Dr. Lawrence West, Management Dept., University of Central [email protected]

Distributed Databases—Introduction (cont.)

• A DDB runs on more than one computer (duhhhh)

– All or part of the data is distributed

– Servers may have unique data or copies of data also on other servers

– Some data may be on clients (not servers) for strictly local use

– Servers may be right next to each other (same room or even rack) or around the world

– DDB may share a strong schema (design) that applies to all instances or may have a logical schema that integrates disparate autonomous DB

Page 4: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

4Dr. Lawrence West, Management Dept., University of Central [email protected]

Distributed Databases—Introduction (cont.)

• Reasons for distributing data

– Workload distribution for scalability

– Getting data closer to users in a distributed organization

– Autonomous operating elements of a diverse organization have different data needs

• Reporting needs to central HQ

• Coordination needs

Page 5: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

5Dr. Lawrence West, Management Dept., University of Central [email protected]

Distributed Databases—Introduction (cont.)

• Three main types/reasons of/for distributing data

1. Diverse distributed organizational divisions have unique data needs with some requirements for centralized coordination and/or reporting

2. Homogeneous distributed organization requires geographically available data but strong needs for centralized management and control

3. Homogeneous organization requires workload distribution to support scalability

• Any combination of these circumstances can exist

Page 6: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

6Dr. Lawrence West, Management Dept., University of Central [email protected]

Exercises

• Three Distributed System Design Exercises

– General Electric

– J.C. Penney

– UPS

• Consider

– Corporate control

– Transaction processing

– Reporting

– Data integrity

– Data warehousing

Page 7: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

7Dr. Lawrence West, Management Dept., University of Central [email protected]

Exercise #1—The General Electric Case

• Consider the GE product line: Light bulbs, large and small consumer appliances, power generation equipment, jet engines, financial services, railroad locomotives, healthcare equipment http://www.ge.com/products_services/index.html

• Assume that each division has the databases and systems needed for its internal operations

1. What are examples of data needs for GE HQ and what technical capabilities must exist to satisfy the needs?

2. Are there any needs for cross-division data communication? If so, identify technical needs.

Page 8: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

8Dr. Lawrence West, Management Dept., University of Central [email protected]

Exercise #2—The J.C. Penney Case

• Consider the operations of the J.C. Penney retail department stores only. More than 1,000 stores throughout US & Puerto Rico.

1. What are examples of data needs for JCP HQ and what technical capabilities must exist to satisfy the needs?

2. Are there unique local needs?

Page 9: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

9Dr. Lawrence West, Management Dept., University of Central [email protected]

Exercise #3—The UPS Case

• Consider UPS customer and package tracking systems characterized by millions of daily transactions, package tracking as a competitive tool, worldwide locations, and disconnected (from the network) transactions at the driver delivery transaction.

1. What transactions (if any) are local?

2. What special data sharing must take place?

3. What is the implication of the disconnected drivers?

Page 10: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

10Dr. Lawrence West, Management Dept., University of Central [email protected]

Integrating Autonomous Systems

• Consider the GE example with additional conditions

– Diverse nodes have different DBMS software

– Similar tables in different nodes have different structures

• Different column structures

• Different data types for same/similar columns

– Different data integrity requirements

• Will most cross-division DB events be:

– Queries/retrievals?

– Inserts/updates?

Page 11: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

11Dr. Lawrence West, Management Dept., University of Central [email protected]

Integrating Autonomous Systems (cont.)

• Data Warehousing—the simple solution

• When corporate needs can be satisfied with a DW solution we can use existing tools

– Ideally suited for disparate data sources

– Transforming disparate data

• Decisions include

– Frequency of updates

– Granularity

– Dimension tables applicable to unique division data

Page 12: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

12Dr. Lawrence West, Management Dept., University of Central [email protected]

Integrating Autonomous Systems (cont.)

• Querying (and Updating)—the Not-So-Simple Solution

• The integration solution includes

– Local DB Management Component (LDBMC)

• Manages the local system

– Global Data Dictionary (GDD)

• An integrated logical data model for the entire DDB with location and structure information

– Distributed DB Management Component (DDBMS)

• Provides the interface to the global system for local users

Page 13: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

13Dr. Lawrence West, Management Dept., University of Central [email protected]

Integrating Autonomous Systems (cont.)

• Distributed DB Management Component (DDBMS)

– Provides logical DB schema interface or data dictionary

• A one-source view of the entire DDB

• Provides location transparency

– Query Processor (more to come)

– Network-wide concurrency control

• More applicable to distributed homogeneous systems with integrated operations

– Query translation for heterogeneous systems (more)

Page 14: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

14Dr. Lawrence West, Management Dept., University of Central [email protected]

Integrating Autonomous Systems (cont.)

• DDBMS Query Processor

– Determines location of data needed to process query

– For one-source queries: Sends query to appropriate local or remote location and returns results

– For multi-source queries:

• Parses query into sub-queries against individual locations

• Receives individual results

• Assembles a result set from the pieces

• May include translation of column elements in heterogeneous systems

Page 15: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

15Dr. Lawrence West, Management Dept., University of Central [email protected]

Integrating Autonomous Systems (cont.)

• A wide variety of commercial products are available for managing distributed autonomous systems

• Think of the integration packages available in our Data Warehouse loading packages

– Difference is that translations act in near real time

– Product an integrated schema with query rules

Page 16: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

16Dr. Lawrence West, Management Dept., University of Central [email protected]

Replication

• Replication consists of

– A single physical source or master database

– Replicated data in distributed locations to support transactions, reporting, and control

– Tools for distributing data

– Tools for integrating changes to data

• Typically has (mostly) homogeneous data structures and DBMS at all locations

– Not all data at every location

Page 17: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

17Dr. Lawrence West, Management Dept., University of Central [email protected]

Replication (cont.)

• Design choices consist of

– What data is distributed to which location

• Filter by row

• Filter by column

– What update rights are available at each location

– How frequently data must be synchronized

– How data is to be synchronized (what replication model is used)

– Data integrity rules at different levels

– Handling data integrity violations

Page 18: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

18Dr. Lawrence West, Management Dept., University of Central [email protected]

Replication—The Publisher/Subscriber Model

• Replication is spoken of using print publication terms

– Publisher—the system node that produces data for consumption by other nodes

– Subscriber—the node that receives data

– Article—a database object (table, view, stored procedure, etc.) that is to be published

– Publication—a collection of articles sent and received as one logical unit

• Nodes may be both publishers and subscribers

Page 19: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

19Dr. Lawrence West, Management Dept., University of Central [email protected]

Snapshot Replication

• Snapshot Replication is what the name implies—a complete copy of data sent to a subscriber

• Often used as an initial copy of data forming the basis for other kinds of replication

• May be read-only data sent to a subscriber

– Product list with prices

– Authorized vendors

• Out-of-synch data may be acceptable for the period between replications

• Data may change infrequently

Page 20: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

20Dr. Lawrence West, Management Dept., University of Central [email protected]

Transactional Replication

• Data is changed at the publisher and propagated to subscribers as changes occur

– E.g., Units in stock at a warehouse

– Customer bank balances

• Data is usually treated as read-only at subscriber but need not be

• Decisions regarding frequency of update messages are required

– Balance workload and network traffic with latency

– Consider the costs of latency

Page 21: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

21Dr. Lawrence West, Management Dept., University of Central [email protected]

Merge Replication

• Typically implemented in a Client-Server environment

• Client (subscriber) makes changes to data that are published back to server (publisher)

• Multiple subscribers may be capable of changing the same data

• Data changes may be made offline

• Update frequency also balances workload and network traffic with latency

• Subscriber changes may be legal at the client but create data integrity violations when updated to the server

Page 22: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

22Dr. Lawrence West, Management Dept., University of Central [email protected]

Replication (cont.)

• All three types of replication may be implemented in same application

– Snapshot for base-level data with transactional replication for updates

– Snapshot for initial data, merge replication by a subscriber, transactional replication to publish changes to other subscribers

• Clients may easily have data not replicated to the Server

– Employee shift schedules

– Detailed transaction data—only summaries published

Page 23: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

23Dr. Lawrence West, Management Dept., University of Central [email protected]

Replication Exercise

• Consider the University Registration DB we used in data warehousing (next slide)

• Assume a state-wide multi-branch university system with a System Central Administration and campuses

• Assume homogeneous DBMS at each location

• Identify:

– Subscription model for each table

– Update rights on each table

– Update frequency for each table

– How much latency will be allowed in each table

Page 24: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

24Dr. Lawrence West, Management Dept., University of Central [email protected]

Replication Exercise

Student

StudentIDLname :Resident Y/N

Enrollment

SectionIDStudentIDGrade

Payment

PaymentIDStudentIDPaymentDatePaymentAmt

Section

SectionIDDeptCodeCourseNumSecNumTermYearDaysTimeInstructorIDCapacityBldgIDRoomNum

Instructor

EmployeeIDLastNameFirstNameDeptIDCurrentSalary

SalaryHistory

EmployeeIDStartDateEndDateSalaryStatus

Course

DeptCodeCourseNumCrHrTitleDescriptionLecHrLabHrDeptID

Department

DeprtIDNameChairIDDeptIDOfficeBldgOfficeRm

Textbook

ISBNTitleYear :WholsalePriceListPrice

SectionBook

ISBNSectionIDQtyOrderedQtySold

Room

BldgIDRoomNumTypeCapacity

Building

BldgIDNameFloorsSqFt

FeeSchedule

AcadYearInStateUGCrHrOutStateUGCrHrInStateGrCrHrOutStateGrCrHrInStateDCrHrOutStateDCrHrHealthFeeActivityFee

BrightFutureStat

StudentIDAcadYearPercentCovered

Page 25: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

25Dr. Lawrence West, Management Dept., University of Central [email protected]

Replication—Data Integrity

• Many data integrity rules in SQL Server have "Not for replication" options available

– Foreign key

– Identity attribute values

– Triggers

– Constraints

• If you do not publish Suppliers data to individual stores you would turn off referential integrity checking from Products to Suppliers

Page 26: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

26Dr. Lawrence West, Management Dept., University of Central [email protected]

Replication—Data Integrity (cont.)

• Publisher (server) may have to detect when a subscriber (client) makes a transaction that is legal to the client but illegal to the server

– Two sales representatives sell the same product that creates a backorder situation

– Two purchasers make purchase on same account that exceeds credit limit

• Violation potential must be identified and incorporated into replication strategy

Page 27: IMS 6217: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases (DDB) Introduction

IMS 6217: Distributed Databases

27Dr. Lawrence West, Management Dept., University of Central [email protected]

Implementing Replication

• SQL Server has a series of Replication Agents that automate replication

– Identify publishers and subscribers

– Build subscriptions, articles, and publications

– Produce snapshots

– Implement transaction replication according to rules

– Implement merge replication notifications

– Manage data integrity rules

• Similar tools available in other systems