Upload
kieran-witt
View
40
Download
2
Embed Size (px)
DESCRIPTION
Enterprise Replication WAIUG Forum 2002. Agenda. What is Enterprise Replication How is Enterprise Replication different from HDR Internal Overview of ER Recent Improvements in Enterprise Replication (9.3/9.4) Troubleshooting Enterprise Replication. IDS Enterprise Replication (ER). - PowerPoint PPT Presentation
Citation preview
Enterprise ReplicationWAIUG Forum 2002
Enterprise Replication - 2WAIUG Forum 2002
Agenda
What is Enterprise Replication How is Enterprise Replication different from
HDR Internal Overview of ER Recent Improvements in Enterprise
Replication (9.3/9.4) Troubleshooting Enterprise Replication
Enterprise Replication - 3WAIUG Forum 2002
IDS Enterprise Replication (ER)
Log based, Transaction oriented replication Asynchronous, Homogeneous (IDS 7.22+
only) Primary/Target + Update anywhere Consolidation, Dissemination, Workload
partitioning Tightly coupled with the server Web and command line administration
Enterprise Replication - 4WAIUG Forum 2002
ER History
Initial Release: 7.22 in 12/1996• Version I - 7.22 - 7.30 releases • 7.30 Grouper Compression improvements
Version II (7.31 & 9.2x)• Queue and NIF redesign, Hierarchical Routing
Version III (9.3)• UDT support, Smart Blob Support, Dynamic DataSync
parallelism, Replicate sets, Smart blob queuing, In-place alter to add/drop CRCOLS, Serial Col Primary Key Support, …
Version III+ (9.4)• ER/HDR support, Large transaction support, Quick Queue
Recovery, Complex Type support, Performance enhancements
Enterprise Replication - 5WAIUG Forum 2002
How HDR and ER differ?
HDR ER
Provides single primary and singlesecondary
Allows configurable source(s)/target(s)
Primary and secondary must run thesame executables and have similardisk layout
Source/target do not have to be thesame
Secondary restricted to report processing Allows full usage of both source/target
Simple to set up and administer Setup and administration more complex
Primary and secondary are mirror images Source and target can be totally different
Does not support blobspace blobs Supports blobspace blobs
Primary purpose is for high availability Primary purpose is for data distribution
Replication can be synchronous Replication is asynchronous
Enterprise Replication - 6WAIUG Forum 2002
ER – how it works
Source Target
Spool
Global Catalog
syscdr
Logical Log
GrouperGrouperSnoopy
Send
Queue
NIFReceive
Queue
Data Synch
Data Synch
DatabaseRegroups transaction and performs evaluation
Target applythreads
Transmits Txn to targetsTransmits Txn to targets
Database
AckQueue
Enterprise Replication - 7WAIUG Forum 2002
ER and onstats
Source Target
Spool
Global Catalog
syscdr
Logical Log
GrouperGrouperSnoopy
Send
Queue
NIFReceive
Queue
Data Synch
Data Synch
Database
Database
AckQueue
onstat –g cat
onstat –g ddronstat –g grp
onstat –g nif
onstat –g dssonstat –g rcv
onstat –g rqm
Enterprise Replication - 8WAIUG Forum 2002
TX header
Regrouping Transactions
Serial List
Ordered by commit point
Tx header
(onstat –g grp S)
Transactions remain on the global list until they are ACKed from the target(s) or placed in stable queue
ddr_snoopy
CDRGeval
Delete
TX header
Update
Open Transaction Array
10 11
TX header
InsertUpdate
12
Tx header
Global Transaction List
Ordered by begin work
(onstat –g grp L)
Update
Transactions remain on the serial listuntil they are placed into the queue
TX header
Used to control order that replicatedtransactions are shipped to the target
Used to control how replay position is advanced
Commit 11CDRGfan
Enterprise Replication - 9WAIUG Forum 2002
What is Conflict Resolution?
Server A Server B
Server C
Update-1 Update-1
Update-1(in queue)
Update-2Update-2
Update-2
?
Required for update anywhere
Enterprise Replication - 10WAIUG Forum 2002
Conflict Resolution
Method to determine if the current version or a just received version of the row should ‘win’• Ignore
• Row must be applied as is
• Timestamp• Most recent update wins
• Stored Procedure• User written stored procedure is invoked
• Upserts Requires CRCOLS (shadow columns)
• CDRTIME, CDRSERVER
Enterprise Replication - 11WAIUG Forum 2002
How do deletes affect conflict resolution?
Server A Server B
Server C
Update-1 Update-1
Update-1(in queue)
Delete Delete
Delete
?Delete Table
When the row arrives at the targetserver, check in the delete tableto see if the row has been deleted.
Place the deletedrow into the shadowdelete table
Rows are pruned from delete tablesonce they are no longer needed.
The row has already beendeleted so how do I preventthe row from being reapplied?
Enterprise Replication - 12WAIUG Forum 2002
So what’s new?
Enterprise Replication - 13WAIUG Forum 2002
Improvements in 9.3 ER
Improved performance• Increased parallelism• Spooling enhancements
• Eliminates most of the reasons for DDRBLOCK state
Support of user defined types (extended opaque data types)• replication enabled for Spatial Datablade 8.11
Other• Serial Primary Key support in update anywhere• Inplace alter of CRCOLS
Enterprise Replication - 14WAIUG Forum 2002
A Fundamental Problem
With transactional replication, how do I keep the target up with the source when the source has 24 processors with thousands of users doing 500+ multi-row transactions per second and still support referential integrity on the target? Oh yes – this is one of four servers replicating update anywhere around the globe.
Enterprise Replication - 15WAIUG Forum 2002
Goal – Keep the replication cost down
9.30
9.21
Transaction Applyon Source
ER Apply on Target
SuspendServer
ResumeServer
9.3 ER target apply is roughly 3 times fasterthan 9.21 and considerably faster than the original transactions.
Enterprise Replication - 16WAIUG Forum 2002
How we did that
DataSync threads • Apply in parallel, but commit in order
• Knowledgeable of referential integrity rules
• Is able to serialize operations on a single row
• Allows parallelism within a replicate
• Apply always uses buffered logging
• ACK is coordinated with a log flush
Allows parallelism to dynamically change based on characteristics of user work
Requires no configuration
Enterprise Replication - 17WAIUG Forum 2002
What’s coming down the road? (9.4)
Enterprise Replication - 18WAIUG Forum 2002
Failure Points
N
L
L
R
RN
NR
Need to replicateshark sightings table
Enterprise Replication - 19WAIUG Forum 2002
Failure Points
N
L
L
What happens if Dallas fails ?
Can no longerreplicate shark sightingsR
R
N
NR
Enterprise Replication - 20WAIUG Forum 2002
Failure Points
N
L
L
Now what happens if Dallas fails ?
NR
R
HDR pair
R
NR
Enterprise Replication - 21WAIUG Forum 2002
Why can’t I use ER with HDR now?
S
HDR
standard
Only one of the target servers is aware of the updated row. Sincethe HDR secondary is not awareof the row, we have data inconsistency.
ERSend Que
P
Enterprise Replication - 22WAIUG Forum 2002
What we do to coordinate ER with HDR
S
HDR
standard
ERSend Que
ACK
ERSend Que
P
Enterprise Replication - 23WAIUG Forum 2002
ER Event Coordination
Coordinated Events• Replication of transaction
• ACK transmission
• Spooling of send queue
• Replay position advancement
DRINTERVAL cdrHDRMonitor thread acts as coordinator
Enterprise Replication - 24WAIUG Forum 2002
SQL host changes for ER/HDR
srv1 group - - i=1
srv1pri ontlitcp dallas port1 g=srv1
srv1sec ontlitcp memphis port1 g=srv1
srv1shm onipcshm dallas srv1shm1
srv2 group - - i=2
srv2tcp1 ontlitcp newyork cdr2 g=srv2
srv2shm onipcshm newyork srv2shm
Label Type Server Service Options
HDR Pair
Enterprise Replication - 25WAIUG Forum 2002
Quick Queue Recovery
Problem – in the past ER took too long to recover the queue. This meant that it took quite a while to get users back on the system.
Solution – Quick Queue Recovery• Separate table containing summary of each transaction in
stable storage.
• Allow users to connect before ER is fully recovered.• PreDDR thread monitors for log wrap until queue is recovered
• When queue is recovered, PreDDR thread stops and Snoopy begins
Enterprise Replication - 26WAIUG Forum 2002
Large Transactions
Problem – Replicated transaction must be totally in memory to process.
Solution – Support of the replication of transactions that are up to 4TB large• Grouper Paging
• Temporary sblob located in SBSPACETEMP
• Process spooled transactions directly from the spool
Enterprise Replication - 27WAIUG Forum 2002
Other Stuff
Collection Support• Lists, sets, and multisets
Support of multiple smartblob stable queue• Some using logging and some not
Dynamic Log support for DDRBLOCK
Enterprise Replication - 28WAIUG Forum 2002
When troubles come your way
Enterprise Replication - 29WAIUG Forum 2002
SQL Host File Issues
srv1tcp1 ontlitcp dallas port1
srv1tcp2 ontlitcp dallas port2
srv1shm onipcshm dallas srv1shm1
srv2tcp1 ontlitcp houston cdr2
srv2shm onipcshm houston srv2shm
Label Type Server Service Options
srv1_g group - - i=1
srv1tcp1 ontlitcp dallas port1 g=srv1_g
srv1tcp2 ontlitcp dallas port2
srv1shm onipcshm dallas srv1shm1
srv2_g group - - i=2
srv2tcp1 ontlitcp houston cdr2 g=srv2_g
srv2shm onipcshm houston srv2shm
Can Cause Errors!!!
g=srv2_g
Network entry needs to immediatelyfollow the group entry!
ER information in sqlhost file must becommon on all replicating servers
Enterprise Replication - 30WAIUG Forum 2002
CDR GC errors in the message log file
05:35:42 CDR GC peer processing failed: message 1, error 40, CDR server 2
05:35:43 CDR GC peer processing failed: message 3, error 31, CDR server 2
cdr finderr 40
40 unsupported SQL syntax (join, etc..)
cdr finderr 31
31 undefined replicate
cdr findmsg 1
1 define replicate
cdr findmsg 3
3 start replicate
New in 9.4
cdr error
SERVER:SEQNO REVIEW TIME ERROR
site2:6 N 2001-04-26 03:54:46 40
GC operation define replicate 'rep1' failed: unsupported SQL select clause syntax
Enterprise Replication - 31WAIUG Forum 2002
Are any servers suspended or dropped?
onstat -g nif Id Name State Sent Received-------------------------------------------------------------------- 9 site4 RUN 6 15440 6031
cdr list serverSERVER ID STATE STATUS CONNECTION CHANGED------------------------------------------------------------------site1 6 Active Local 0site2 7 Suspend Dropped 0 Jun 11 14:38:40site3 8 Suspend Dropped 0 Jun 11 14:38:37site4 9 Active Connected 0 Jun 11 14:36:50
Enterprise Replication - 32WAIUG Forum 2002
Any Replicates suspended?
cdr list rep
REPLICATE: rep1STATE: ActiveCONFLICT: IgnoreFREQUENCY: immediateQUEUE SIZE: 0PARTICIPANT: test:informix.tab1OPTIONS: row,ris,ats,fullrow
REPLICATE: rep2STATE: SuspendCONFLICT: TimestampFREQUENCY: immediateQUEUE SIZE: 0PARTICIPANT: test:informix.tab2OPTIONS: row,ris,ats,fullrow
Enterprise Replication - 33WAIUG Forum 2002
What is snoopy doing?
onstat -g ddr
DDR -- Running --
# Event Snoopy Snoopy Replay Replay Current CurrentBuffers ID Position ID Position ID Position528 132 393018 130 36f018 132 394000
Log Pages Snooped: From From Tossed Cache Disk (LBC full) 3774 1142 88
If not advancing could meanstable queue is full or remoteserver is down.
Enterprise Replication - 34WAIUG Forum 2002
What is in the queues? (onstat –g rqm)
RQM Statistics for Queue (0xc379018) trg_send Transaction Spool Name: trg_send_stxn Insert Stamp: 8007/0 Flags: SEND_Q, SPOOLED, PROGRESS_TABLE, NEED_ACK Txns in queue: 8003 Log Events in queue: 0 Txns in memory: 4195 Txns in spool only: 3808 Txns spooled: 5142 Unspooled bytes: 266080 Size of Data in queue: 1116086 Bytes Real memory in use: 505056 Bytes Pending Txn Buffers: 0 Pending Txn Data: 0 Bytes Max Real memory data used: 520228 (512000) Bytes Max Real memory hdrs used 995768 (512000) Bytes Total data queued: 1116316 Bytes Total Txns queued: 8007 Total Txns spooled: 5142 Total Txns restored: 4665 Total Txns recovered: 0
historical statistics
real time statistics
Enterprise Replication - 35WAIUG Forum 2002
Is the First Txn changing?
First Txn (0xef7d308) Key: 10/523/0x003530c0/0x00000000 Txn Stamp: 3811/0, Reference Count: 0. Txn Flags: Spooled, Restored Txn Commit Time: (1023823350) 2002/06/11 14:22:30 Txn Size in Queue: 100 First Buf's (0xf116600) Queue Flags: Resident First Buf's Buffer Flags: TRG, Stream NeedAck: Waiting for Acks from <[000c]> No open handles on txn.
Need Acks from these servers.
First Txn (0xef7d308) Key: 10/523/0x003530c0/0x00100000
Server ID Unique LogID LogPos Sequence
If TRG send transactionheader is multiple of 0x100000, then this is a split transaction. It iseither a timed-based or suspended replicate.
Increments by onefor each buffer withinthe transaction.
Offset In Page (3 nibbles)
Page Number in log (5 nibbles)
Enterprise Replication - 36WAIUG Forum 2002
Problem – Who needs to ACK???Txn (0xabbef28) Key: 1/6/0x001c9120/0x00100000
Txn Stamp: 2/0, Reference Count: 0.
Txn Flags: Notify
Txn Commit Time: (991335235) 2001/05/31 11:53:55
Txn Size in Queue: 84
First Buf's (0xabbefc8) Queue Flags: Resident
First Buf's Buffer Flags: TRG, Stream
NeedAck: Waiting for Acks from <[0004]>
No open handles on txn.
$ onstat -g cat
SERVERS
-------------------
Id: 02, Nm: serv2, Or: 0x0004, off: 0, idle: 0, state Suspended
root Id: 00, forward Id: 02, ishub: FALSE, isleaf: FALSE
Server Bits in Waiting ACK bit map
Enterprise Replication - 37WAIUG Forum 2002
What about the send progress tables?
Progress Table: Progress Table is Stable On-disk table name............: spttrg_send Flush interval (time).........: 30 Time of last flush............: 988215679 Flush interval (serial number): 1000 Serial number of last flush...: 4 Current serial number.........: 4
Server Group Bytes Queued Acked Sent------------------------------------------------------------------------------ 9 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0 8 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0 7 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0
rqm key sentrqm key received (acked)
Really highestqueued to be sent!!!
Enterprise Replication - 38WAIUG Forum 2002
Receive Progress Table
Progress Table:
Progress Table is Stable
On-disk table name............: spttrg_receive
Not keeping dirty list.
Server Group Bytes Queued Acked Sent
----------------------------------------------------------------------------------------------------
1 0x10001 156 1/4/1ea0b0/0 - 1/4/1ee1cc/2
Highest ReceivedTransaction (not yet processed)
Highest Transactionthat that has beenACKed
Traverse handle (0xb61b1e8) for thread CDRNr1 at Head_of_Q, Flags: None
Traverse handle (0xb5fc1e8) for thread CDRD_1 at txn (0xb2cd020): 1/4/0x001ee1cc/0x00000000
Flags: In_Transaction
Traverse handle (0xb6091e8) for thread CDRD_1 at Head_of_Q, Flags: None
Currently processingtransaction
Enterprise Replication - 39WAIUG Forum 2002
How quickly are txns replicating?
onstat –g rcv full Statistics by Source
Server 6Repl Txn Ins Del Upd Last Target Apply Last Source Commit393217 4002 4000 2001 0 2002/06/11 14:39:19 2002/06/11 14:38:32393218 2000 2000 0 0 2002/06/11 14:38:37 2002/06/11 14:22:32
These times are on different machines!
Enterprise Replication - 40WAIUG Forum 2002
In Closing - Questions???