40
Enterprise Replication WAIUG Forum 2002

Enterprise Replication WAIUG Forum 2002

Embed Size (px)

DESCRIPTION

Enterprise Replication WAIUG Forum 2002. Agenda. What is Enterprise Replication How is Enterprise Replication different from HDR Internal Overview of ER Recent Improvements in Enterprise Replication (9.3/9.4) Troubleshooting Enterprise Replication. IDS Enterprise Replication (ER). - PowerPoint PPT Presentation

Citation preview

Page 1: Enterprise Replication WAIUG Forum 2002

Enterprise ReplicationWAIUG Forum 2002

Page 2: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 2WAIUG Forum 2002

Agenda

What is Enterprise Replication How is Enterprise Replication different from

HDR Internal Overview of ER Recent Improvements in Enterprise

Replication (9.3/9.4) Troubleshooting Enterprise Replication

Page 3: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 3WAIUG Forum 2002

IDS Enterprise Replication (ER)

Log based, Transaction oriented replication Asynchronous, Homogeneous (IDS 7.22+

only) Primary/Target + Update anywhere Consolidation, Dissemination, Workload

partitioning Tightly coupled with the server Web and command line administration

Page 4: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 4WAIUG Forum 2002

ER History

Initial Release: 7.22 in 12/1996• Version I - 7.22 - 7.30 releases • 7.30 Grouper Compression improvements

Version II (7.31 & 9.2x)• Queue and NIF redesign, Hierarchical Routing

Version III (9.3)• UDT support, Smart Blob Support, Dynamic DataSync

parallelism, Replicate sets, Smart blob queuing, In-place alter to add/drop CRCOLS, Serial Col Primary Key Support, …

Version III+ (9.4)• ER/HDR support, Large transaction support, Quick Queue

Recovery, Complex Type support, Performance enhancements

Page 5: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 5WAIUG Forum 2002

How HDR and ER differ?

HDR ER

Provides single primary and singlesecondary

Allows configurable source(s)/target(s)

Primary and secondary must run thesame executables and have similardisk layout

Source/target do not have to be thesame

Secondary restricted to report processing Allows full usage of both source/target

Simple to set up and administer Setup and administration more complex

Primary and secondary are mirror images Source and target can be totally different

Does not support blobspace blobs Supports blobspace blobs

Primary purpose is for high availability Primary purpose is for data distribution

Replication can be synchronous Replication is asynchronous

Page 6: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 6WAIUG Forum 2002

ER – how it works

Source Target

Spool

Global Catalog

syscdr

Logical Log

GrouperGrouperSnoopy

Send

Queue

NIFReceive

Queue

Data Synch

Data Synch

DatabaseRegroups transaction and performs evaluation

Target applythreads

Transmits Txn to targetsTransmits Txn to targets

Database

AckQueue

Page 7: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 7WAIUG Forum 2002

ER and onstats

Source Target

Spool

Global Catalog

syscdr

Logical Log

GrouperGrouperSnoopy

Send

Queue

NIFReceive

Queue

Data Synch

Data Synch

Database

Database

AckQueue

onstat –g cat

onstat –g ddronstat –g grp

onstat –g nif

onstat –g dssonstat –g rcv

onstat –g rqm

Page 8: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 8WAIUG Forum 2002

TX header

Regrouping Transactions

Serial List

Ordered by commit point

Tx header

(onstat –g grp S)

Transactions remain on the global list until they are ACKed from the target(s) or placed in stable queue

ddr_snoopy

CDRGeval

Delete

TX header

Update

Open Transaction Array

10 11

TX header

InsertUpdate

12

Tx header

Global Transaction List

Ordered by begin work

(onstat –g grp L)

Update

Transactions remain on the serial listuntil they are placed into the queue

TX header

Used to control order that replicatedtransactions are shipped to the target

Used to control how replay position is advanced

Commit 11CDRGfan

Page 9: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 9WAIUG Forum 2002

What is Conflict Resolution?

Server A Server B

Server C

Update-1 Update-1

Update-1(in queue)

Update-2Update-2

Update-2

?

Required for update anywhere

Page 10: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 10WAIUG Forum 2002

Conflict Resolution

Method to determine if the current version or a just received version of the row should ‘win’• Ignore

• Row must be applied as is

• Timestamp• Most recent update wins

• Stored Procedure• User written stored procedure is invoked

• Upserts Requires CRCOLS (shadow columns)

• CDRTIME, CDRSERVER

Page 11: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 11WAIUG Forum 2002

How do deletes affect conflict resolution?

Server A Server B

Server C

Update-1 Update-1

Update-1(in queue)

Delete Delete

Delete

?Delete Table

When the row arrives at the targetserver, check in the delete tableto see if the row has been deleted.

Place the deletedrow into the shadowdelete table

Rows are pruned from delete tablesonce they are no longer needed.

The row has already beendeleted so how do I preventthe row from being reapplied?

Page 12: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 12WAIUG Forum 2002

So what’s new?

Page 13: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 13WAIUG Forum 2002

Improvements in 9.3 ER

Improved performance• Increased parallelism• Spooling enhancements

• Eliminates most of the reasons for DDRBLOCK state

Support of user defined types (extended opaque data types)• replication enabled for Spatial Datablade 8.11

Other• Serial Primary Key support in update anywhere• Inplace alter of CRCOLS

Page 14: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 14WAIUG Forum 2002

A Fundamental Problem

With transactional replication, how do I keep the target up with the source when the source has 24 processors with thousands of users doing 500+ multi-row transactions per second and still support referential integrity on the target? Oh yes – this is one of four servers replicating update anywhere around the globe.

Page 15: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 15WAIUG Forum 2002

Goal – Keep the replication cost down

9.30

9.21

Transaction Applyon Source

ER Apply on Target

SuspendServer

ResumeServer

9.3 ER target apply is roughly 3 times fasterthan 9.21 and considerably faster than the original transactions.

Page 16: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 16WAIUG Forum 2002

How we did that

DataSync threads • Apply in parallel, but commit in order

• Knowledgeable of referential integrity rules

• Is able to serialize operations on a single row

• Allows parallelism within a replicate

• Apply always uses buffered logging

• ACK is coordinated with a log flush

Allows parallelism to dynamically change based on characteristics of user work

Requires no configuration

Page 17: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 17WAIUG Forum 2002

What’s coming down the road? (9.4)

Page 18: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 18WAIUG Forum 2002

Failure Points

N

L

L

R

RN

NR

Need to replicateshark sightings table

Page 19: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 19WAIUG Forum 2002

Failure Points

N

L

L

What happens if Dallas fails ?

Can no longerreplicate shark sightingsR

R

N

NR

Page 20: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 20WAIUG Forum 2002

Failure Points

N

L

L

Now what happens if Dallas fails ?

NR

R

HDR pair

R

NR

Page 21: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 21WAIUG Forum 2002

Why can’t I use ER with HDR now?

S

HDR

standard

Only one of the target servers is aware of the updated row. Sincethe HDR secondary is not awareof the row, we have data inconsistency.

ERSend Que

P

Page 22: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 22WAIUG Forum 2002

What we do to coordinate ER with HDR

S

HDR

standard

ERSend Que

ACK

ERSend Que

P

Page 23: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 23WAIUG Forum 2002

ER Event Coordination

Coordinated Events• Replication of transaction

• ACK transmission

• Spooling of send queue

• Replay position advancement

DRINTERVAL cdrHDRMonitor thread acts as coordinator

Page 24: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 24WAIUG Forum 2002

SQL host changes for ER/HDR

srv1 group - - i=1

srv1pri ontlitcp dallas port1 g=srv1

srv1sec ontlitcp memphis port1 g=srv1

srv1shm onipcshm dallas srv1shm1

srv2 group - - i=2

srv2tcp1 ontlitcp newyork cdr2 g=srv2

srv2shm onipcshm newyork srv2shm

Label Type Server Service Options

HDR Pair

Page 25: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 25WAIUG Forum 2002

Quick Queue Recovery

Problem – in the past ER took too long to recover the queue. This meant that it took quite a while to get users back on the system.

Solution – Quick Queue Recovery• Separate table containing summary of each transaction in

stable storage.

• Allow users to connect before ER is fully recovered.• PreDDR thread monitors for log wrap until queue is recovered

• When queue is recovered, PreDDR thread stops and Snoopy begins

Page 26: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 26WAIUG Forum 2002

Large Transactions

Problem – Replicated transaction must be totally in memory to process.

Solution – Support of the replication of transactions that are up to 4TB large• Grouper Paging

• Temporary sblob located in SBSPACETEMP

• Process spooled transactions directly from the spool

Page 27: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 27WAIUG Forum 2002

Other Stuff

Collection Support• Lists, sets, and multisets

Support of multiple smartblob stable queue• Some using logging and some not

Dynamic Log support for DDRBLOCK

Page 28: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 28WAIUG Forum 2002

When troubles come your way

Page 29: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 29WAIUG Forum 2002

SQL Host File Issues

srv1tcp1 ontlitcp dallas port1

srv1tcp2 ontlitcp dallas port2

srv1shm onipcshm dallas srv1shm1

srv2tcp1 ontlitcp houston cdr2

srv2shm onipcshm houston srv2shm

Label Type Server Service Options

srv1_g group - - i=1

srv1tcp1 ontlitcp dallas port1 g=srv1_g

srv1tcp2 ontlitcp dallas port2

srv1shm onipcshm dallas srv1shm1

srv2_g group - - i=2

srv2tcp1 ontlitcp houston cdr2 g=srv2_g

srv2shm onipcshm houston srv2shm

Can Cause Errors!!!

g=srv2_g

Network entry needs to immediatelyfollow the group entry!

ER information in sqlhost file must becommon on all replicating servers

Page 30: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 30WAIUG Forum 2002

CDR GC errors in the message log file

05:35:42 CDR GC peer processing failed: message 1, error 40, CDR server 2

05:35:43 CDR GC peer processing failed: message 3, error 31, CDR server 2

cdr finderr 40

40 unsupported SQL syntax (join, etc..)

cdr finderr 31

31 undefined replicate

cdr findmsg 1

1 define replicate

cdr findmsg 3

3 start replicate

New in 9.4

cdr error

SERVER:SEQNO REVIEW TIME ERROR

site2:6 N 2001-04-26 03:54:46 40

GC operation define replicate 'rep1' failed: unsupported SQL select clause syntax

Page 31: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 31WAIUG Forum 2002

Are any servers suspended or dropped?

onstat -g nif Id Name State Sent Received-------------------------------------------------------------------- 9 site4 RUN 6 15440 6031

cdr list serverSERVER ID STATE STATUS CONNECTION CHANGED------------------------------------------------------------------site1 6 Active Local 0site2 7 Suspend Dropped 0 Jun 11 14:38:40site3 8 Suspend Dropped 0 Jun 11 14:38:37site4 9 Active Connected 0 Jun 11 14:36:50

Page 32: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 32WAIUG Forum 2002

Any Replicates suspended?

cdr list rep

REPLICATE: rep1STATE: ActiveCONFLICT: IgnoreFREQUENCY: immediateQUEUE SIZE: 0PARTICIPANT: test:informix.tab1OPTIONS: row,ris,ats,fullrow

REPLICATE: rep2STATE: SuspendCONFLICT: TimestampFREQUENCY: immediateQUEUE SIZE: 0PARTICIPANT: test:informix.tab2OPTIONS: row,ris,ats,fullrow

Page 33: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 33WAIUG Forum 2002

What is snoopy doing?

onstat -g ddr

DDR -- Running --

# Event Snoopy Snoopy Replay Replay Current CurrentBuffers ID Position ID Position ID Position528 132 393018 130 36f018 132 394000

Log Pages Snooped: From From Tossed Cache Disk (LBC full) 3774 1142 88

If not advancing could meanstable queue is full or remoteserver is down.

Page 34: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 34WAIUG Forum 2002

What is in the queues? (onstat –g rqm)

RQM Statistics for Queue (0xc379018) trg_send Transaction Spool Name: trg_send_stxn Insert Stamp: 8007/0 Flags: SEND_Q, SPOOLED, PROGRESS_TABLE, NEED_ACK Txns in queue: 8003 Log Events in queue: 0 Txns in memory: 4195 Txns in spool only: 3808 Txns spooled: 5142 Unspooled bytes: 266080 Size of Data in queue: 1116086 Bytes Real memory in use: 505056 Bytes Pending Txn Buffers: 0 Pending Txn Data: 0 Bytes Max Real memory data used: 520228 (512000) Bytes Max Real memory hdrs used 995768 (512000) Bytes Total data queued: 1116316 Bytes Total Txns queued: 8007 Total Txns spooled: 5142 Total Txns restored: 4665 Total Txns recovered: 0

historical statistics

real time statistics

Page 35: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 35WAIUG Forum 2002

Is the First Txn changing?

First Txn (0xef7d308) Key: 10/523/0x003530c0/0x00000000 Txn Stamp: 3811/0, Reference Count: 0. Txn Flags: Spooled, Restored Txn Commit Time: (1023823350) 2002/06/11 14:22:30 Txn Size in Queue: 100 First Buf's (0xf116600) Queue Flags: Resident First Buf's Buffer Flags: TRG, Stream NeedAck: Waiting for Acks from <[000c]> No open handles on txn.

Need Acks from these servers.

First Txn (0xef7d308) Key: 10/523/0x003530c0/0x00100000

Server ID Unique LogID LogPos Sequence

If TRG send transactionheader is multiple of 0x100000, then this is a split transaction. It iseither a timed-based or suspended replicate.

Increments by onefor each buffer withinthe transaction.

Offset In Page (3 nibbles)

Page Number in log (5 nibbles)

Page 36: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 36WAIUG Forum 2002

Problem – Who needs to ACK???Txn (0xabbef28) Key: 1/6/0x001c9120/0x00100000

Txn Stamp: 2/0, Reference Count: 0.

Txn Flags: Notify

Txn Commit Time: (991335235) 2001/05/31 11:53:55

Txn Size in Queue: 84

First Buf's (0xabbefc8) Queue Flags: Resident

First Buf's Buffer Flags: TRG, Stream

NeedAck: Waiting for Acks from <[0004]>

No open handles on txn.

$ onstat -g cat

SERVERS

-------------------

Id: 02, Nm: serv2, Or: 0x0004, off: 0, idle: 0, state Suspended

root Id: 00, forward Id: 02, ishub: FALSE, isleaf: FALSE

Server Bits in Waiting ACK bit map

Page 37: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 37WAIUG Forum 2002

What about the send progress tables?

Progress Table: Progress Table is Stable On-disk table name............: spttrg_send Flush interval (time).........: 30 Time of last flush............: 988215679 Flush interval (serial number): 1000 Serial number of last flush...: 4 Current serial number.........: 4

Server Group Bytes Queued Acked Sent------------------------------------------------------------------------------ 9 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0 8 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0 7 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0

rqm key sentrqm key received (acked)

Really highestqueued to be sent!!!

Page 38: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 38WAIUG Forum 2002

Receive Progress Table

Progress Table:

Progress Table is Stable

On-disk table name............: spttrg_receive

Not keeping dirty list.

Server Group Bytes Queued Acked Sent

----------------------------------------------------------------------------------------------------

1 0x10001 156 1/4/1ea0b0/0 - 1/4/1ee1cc/2

Highest ReceivedTransaction (not yet processed)

Highest Transactionthat that has beenACKed

Traverse handle (0xb61b1e8) for thread CDRNr1 at Head_of_Q, Flags: None

Traverse handle (0xb5fc1e8) for thread CDRD_1 at txn (0xb2cd020): 1/4/0x001ee1cc/0x00000000

Flags: In_Transaction

Traverse handle (0xb6091e8) for thread CDRD_1 at Head_of_Q, Flags: None

Currently processingtransaction

Page 39: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 39WAIUG Forum 2002

How quickly are txns replicating?

onstat –g rcv full Statistics by Source

Server 6Repl Txn Ins Del Upd Last Target Apply Last Source Commit393217 4002 4000 2001 0 2002/06/11 14:39:19 2002/06/11 14:38:32393218 2000 2000 0 0 2002/06/11 14:38:37 2002/06/11 14:22:32

These times are on different machines!

Page 40: Enterprise Replication WAIUG Forum 2002

Enterprise Replication - 40WAIUG Forum 2002

In Closing - Questions???