Upload
sawyer-haskett
View
228
Download
1
Tags:
Embed Size (px)
Citation preview
IBM Software Group
®
Serge Bourbonnais
Database ReplicationSilicon Valley laboratory
WMO TECO-WIS Convention
Seoul, November 8th, 2006
Database Replication and Change Propagation Technologies for Continuous Availability
IBM Software Group | DB2 Information Management Software
Abstract - IBM Database Replication Technologies• Database replication technologies allow an IT infrastructure to achieve continuous
availability of the enterprise operations, by providing solutions for Disaster Recovery, Workload Isolation, and Information Integration.
• When used for Disaster Recovery, each replicated database can be fully active, and copies do not need to be identical. Some trade-offs include administrative costs, and the overhead of capturing and applying changes.
• For Workload Isolation, the replication process can manage conflicts that may arise from the application workload, database constraints, loading a target while changes are still occurring at the source, or changes arriving out of order in a multi-node configuration. Conflict resolution either relies on timestamp and the origin of each change, or on a designated master. Configurations for data distribution and consolidation to/from hundreds of databases can also be deployed.
• For Information Integration, the replication process deals with heterogeneous data schemas, data stores, or even data models.
IBM Database replication technologies can capture and propagate changes with low-latency at high-throughput over long distances, while preserving database transactional integrity, and tolerating system outages or intermittent connectivity.
IBM Software Group | DB2 Information Management Software
Agenda
Database Replication Technologies for continuous availability In support of the Global Enterprise: From Continuous Availability to
Business Integration
Where Database Replication fits
Where replication does not fit IBM Product Architecture and Capabilities
Capture, Apply, Federation, and Transforms
Topologies, Conflict Detection and Resolution Sample Implementations
IBM Software Group | DB2 Information Management Software
Why Replication in an Information System?
1. Disaster Recovery Goal: High-Availability
Applications: Standby copy for failover, Scheduled and Unscheduled outages
Requirements: Minimize recovery time and eliminate or reduce data loss. Preserve transactional consistency.
2. Workload Isolation Goal: High-Availability, Improve Performance
Applications: Data Distribution/Consolidation, Regional Data Centers, Caches
Requirements: Maintain live copies or subsets for working in disconnected mode, often geographically distributed. Need to detect and resolve conflicts, if any. Data mappings and transformations.
3. Information Integration Goal: High-Availability, Improve Performance, Global Enterprise View
Applications: Analytics, Enterprise Business Integration
Requirements: Moving data to/from heterogeneous data stores. Cleansing and transformations. Assembling objects with data from several sources.
Requirements
More
RequirementsLess
IBM Software Group | DB2 Information Management Software
Relational tablesRelational tables Business ObjectsBusiness Objects
<order><oid>197</oid> <pid>AS207</pid> <desc>Wheel</desc> <qty>1</qty></order>
DatabaseDatabase
Log Shipping•HADR (LUW)Disk Mirroring•GDPS•PPRC
Logical Replication•Data Propagator•Q Replication
Event Publishing•Q Event Publish•II FederationIntegration Software•DataStage
Maintain a full database copy for Disaster Recovery
Maintain a logicaldatabase subset for Disaster Recovery,Workload Isolation
Publish changes (with transformations) forDisaster Recovery,Workload Isolation,Data Integration
Technologies and Products
Propagated Objects
Requirementsand Scenarios
From Disaster Recovery to Information Integration: Replication needs are cumulating - Semantics are increasing.
Database Replication Application Space
IBM Software Group | DB2 Information Management Software
Application Space for Database Replication Technologies
Database Replication is a good fit when: Asynchronous Capture and Delivery
Outages. Network, servers, a site, the RDBMS.Occasionally connected
Non-identical sources and targets
Different platforms. OS, RDBMS, even data modelsDifferent shapes. Sub-setting requiredRow-level transformations., Codepages, Schemas
Update-Anywhere with possible conflicts
Only possible with replication When some data loss is tolerable in case of a major disaster.
Often, solution can be designed to limit loss to a few seconds. Fast delivery over large distances (1000s km)
Several 10 000s rows/second achievable (up to 100 000row/sec) Avoid or minimize full-refresh of data at the target
Other factors: Minimize down-time, administrative cost, application performance impact
Replication Technologies guarantee Transactional Consistency with Resilience
IBM Software Group | DB2 Information Management Software
Limits of Database Replication technologies
Non-zero Data Loss required in case of Disaster (fire, flood) Use Synchronous technologies instead, i.e., HADR, PPRC
Set-level .transformations are required on the data Use ETL software instead
However, replication can be used to feed a staging area for ETL tools. Replication can hide the differences between the target and the source (database schema, data model, codepage, hardware architecture) and provide a continuous, asynchronous feed.
Business Objects need to be assembled Develop applications in the application layer
Other factors Cost-Benefit analysis of the solutions, given the requirements
IBM Software Group | DB2 Information Management Software
DB2
IBM SQL Replication
Source server
Non-DB2Non-DB2
Triggers
Staging Tables
Apply DB2 Information Integrator
•Informix•Oracle•Sybase•SQL Server•Teradata
database recoverylog
Capture SQL
Apply
DB2
Apply
DB2
DB2
Apply•z/OS•iSeries•UDB LUW
Staging is in relational tables Control, and Monitoring information also in relational tables Transport is over a database connection
Target servers
Control tables •Informix
•Oracle•Sybase•SQL Server
IBM Software Group | DB2 Information Management Software
A parenthesis: Database Federation
Nicknames appear as local tables. For example: > db2 list tablesTable/View Schema Type------------------------------- --------------- -----T1 BOURBON TORAT3 BOURBON NCUSTOMERS BOURBON T
T1 is a Table; ORAT3 a Nickname
Remote objects (structured files, tables, spreadsheets) appear to the application as if local tables in a DB2 database
Local and non-local data can be manipulated in the same SQL statement
CREATE NICKNAME ORAT3 FOR ORACLE9.SCOTT.T3
INSERT INTO ORAT3 VALUES(5)
SELECT * FROM ORAT3
IBM Software Group | DB2 Information Management Software
ApplyApply
DB2
DB2
IBM Q Replication
Source server
database recoverylog
Capture
Staging and Transport over MQSeries persistent message queues High-throughput, Low-latency. Apply with parallel agents
Target servers
DB2
Apply•z/OS•UDB LUWApplyApply
Control tables
•z/OS•UDB LUW•VM/VSE
WebSphere Queue Manager (or client)
Admin queue
Restart queue
Send queues
Control Tables
Non-DB2
Apply DB2 Information Integrator
•Informix•Oracle•Sybase•SQL Server•Teradata
ApplyApplyControl Tables
IBM Software Group | DB2 Information Management Software
Performance
Q Replication is between 3 to 10 times faster than SQL replication
Higher throughput and shorter latency
Capture measured throughput: 49000rows/second (V9.1)
Latency less than 2 seconds achievable over 1000s of kilometers
Measured Time to clear up receive queues after an outage 1,000,000 rows accumulated in target receive queue
Continuous arrival rate: 5,000 rows per second
Time to re-sync target database = 91 seconds
(1) Turbo Freeway (2064-216) 2 LPARs 4CP for the source system and 4cp for the target system.
IBM Software Group | DB2 Information Management Software
Q Replication subscriptions - defining target copies Projection over columns and rows of a table:
Only changes for subscribed tables are sent
Some transactions can be ignored (e.g., by owner ID, trans ID, with signal or command)
Some operations can be ignored (e.g., delete)
Filter rows with a predicate (e.g., WHERE :LOCATION ='EAST' AND :SALES > (SELECT SUM(expense) FROM STORES WHERE stores.deptno = :DEPTNO)
Database Schema mappings examples: 1 column to N columns, e.g., [ :C1 || :C2]
N columns 1 columns, e.g., [substr(:C2,2,3)]
Generated columns, e.g., [CURRENT TIMESTAMP]
Capture side: Apply side ORDERS
oid priceIBMORDERS
ibmID price ts
Replication handles codepage conversion, architecture difference
IBM Software Group | DB2 Information Management Software
Q Replication Subscription Types
Primary Secondary/backup
Unidirectional Changes are replicated in one direction
1:N – N:1 topologies – Distribution and Consolidation
Changes can be filtered and transformed Bidirectional – master/slave
Changes replicated in both directions
Conflicts detected on data values:
• Conflict rules: Check key, changed only, or all columns One server designated as winner
• Conflict action: Force, ignore, merge change Tree topologies only
Minimum overhead Peer to peer – no master, use timestamps
Conflicts resolved by using most recent version, no master copy - Handles out of order arrivals (e.g., delete before insert)
Requires extra columns and triggers
Source
Target(s)
IBM Software Group | DB2 Information Management Software
Data Distribution from a (CCD) staging area
Read/write
Source Table
Read-only
CCD Table
Target Table
Q Replication
SQL Apply
…
… Target Table
Target Table
MQSeries
SQL Connections
Q ApplyQ Capture
SQL Apply
SQL Apply
IBM Software Group | DB2 Information Management Software
Consistent Changed Data (CCD) Apply targets
Usages: AUDIT trail of database changes.
• Answer: Who changed what, when, and how? Staging table for data distribution (with SQL Apply)
•For updates, before values can be optionally present in the CCD (e.g., XPARTNO)•Condensed CCD: Contains only the latest changed value of each row•Complete CCD: Initially created with values for all rows from the source table.
5.03
AUTHID OPERATIONCOMMITSEQ LOGMARKER XPARTNO PARTNO XPRICE PRICE
1 USER_A Ucurrent
timestamp A7571 A7571 4.31
2 USER_B Icurrent
timestamp null A7981 null 121.03
PARTS_CCD
3 USER_A Dcurrent
timestamp null A7981 null null
IBM Software Group | DB2 Information Management Software
Usage Building the Data Warehouse Business Integration Auditing requirements
Function Capture changed data in real time Correlate by transactions within a single database Output: XML or CSV
Log-basedcapture
User Application
DB2 z/OS and LUW
VSAM
Capture WebSphere MQWebSphere MQ Integrator Broker
Target DBs
DataStage
WebSphere Business Integration
JMS-aware Application
Event Publishing
Software AGAdabas
IMS
CA IDMS
IBM Software Group
17
Mazda
Support 700 dealers in USA
Trouble matching customer demand with available inventory
More current data needed to track sales achievements with period-end goals
Sales and inventory information is replicated every minute to portal server
Improved access to current data without changes to existing IT infrastructure
Challenge
Solution
Business benefits Increased auto sales Improved dealer satisfaction Currency of information improved by
93%
Technology benefits Re-used existing application and data
base infrastructure Decreased network load compared to
full data refreshes 4 times an hour Ease and speed of deployment
“Within 5 weeks of receiving the [WebSphere] Information Integrator product we were able to implement it in our … environments. It now provides us up to the minute sales activity.”
Joe Neria, Software Consultant. Mazda
IBM Software Group
18
International provider of financial & investment services
Corporate initiative to provide customers better performing real-time queries by utilizing multiple sites.
Replication of critical order processing details for core business functionality
Q Replication for high speed movement of up to 10 Million transactions to secondary site several thousand miles away. Current implementation is Uni-Directional with peer-to-peer plans.
Challenge
Solution
Business benefits Replicating 5-10 Million
transactions with less than 2 seconds latency.
More efficient and cost-effective resource utilization
Secondary platform services reporting and business intelligence queries and acts as backup to primary
Technology benefits Real-time back up of secondary
system provides results in increased capacity for peak workloads.
IBM Software Group
19
CitiStreet
Support single sign-on access through both Web and IVR applications ensuring 24x7 portal access for plan participants and sponsors
Support redundant, active single sign-on applications for failover processing replicating profile changes between them in real time.
Challenge
Solution
Overview• CitiStreet is one of the largest and most
experienced global benefits providers servicing over 9 million plan participants across all markets. CitiStreet was formed in partnership between subsidiaries of State Street Corporation and Citigroup
Business benefits Ensure application availability for plan
participants and sponsors The new solutions from IBM will
improve data integrity with a reduced level of maintenance
Technology benefits Maintain bi-directional synchronization
of profile updates (approx 175,000 updates daily) in real time
“Since nearly 10 million of CitiStreet customers are offered 24-hour access to their retirement accounts, the company can't afford downtime and must be able to replicate data changes when they happen. We fully replicate our database over redundancy data lines, so to us the stability and speed of that asynchronous replication is strategic for us." Barry Strasnick , CIO
CitiStreet
IBM Software Group | DB2 Information Management Software
Summary
IBM develops Data Propagation technologies to provide Continuous Availability and achieve a Global Integrated view of the enterprise in an heterogeneous environment
Q Replication (IBM WebSphere Replication Server) delivers low latency, high throughput, and resilience. It is best-of-breed for heavy OLTP workloads, providing resilience and preserving transactional integrity throughout outages while minimizing the need for full data refreshes.