Unity Connection 7.0 Cluster/Redundancy TOI

Preview:

DESCRIPTION

Unity Connection 7.0 Cluster/Redundancy TOI. EDCS – 623130 Ramesh Achuthan Radha Radhakrishnan. Agenda. Overview Deployment Cluster Behavior Troubleshooting Upgrading Future Enhancements. Overview – Active/Active. User Interfaces: Voice calls Web Admin/CPCA IMAP - PowerPoint PPT Presentation

Citation preview

© 2006 Cisco Systems, Inc. All rights reserved. 1

Unity Connection 7.0 Cluster/Redundancy TOI

EDCS – 623130

Ramesh AchuthanRadha Radhakrishnan

© 2006 Cisco Systems, Inc. All rights reserved. 2

Overview

Deployment

Cluster Behavior

Troubleshooting

Upgrading

Future Enhancements

Agenda

© 2006 Cisco Systems, Inc. All rights reserved. 3

Overview – Active/Active C C M

W ebC lien ts

D yn. lo ad balanc ing and fai lo ve r with D N S.

D N S

- Auto lo ad balanc ing and fai lo ve r (SC C P /SIP ) .

H ttp/Im a p

D B

C s M g r

S RMSe r v M

S in g le to n s

O th er . . .

D B

C s M g r

S RM Se r v M

S in g le to n s

O th er . . .

H e a rt-be a t

R e plic a tion D B /f ile s

R e m ote W rite for M bxD b

U C -0 P ublis he rN orm a lly - P r im a ry R ole (a c tive )

U C -1. .N S ubs c r ibe rN orm a lly - S e c onda ry R ole (a c tive )

Vo ic e C alls

User Interfaces:• Voice calls• Web Admin/CPCA• IMAP

Load balancing & Failover:• Involve external entities (DNS, CCM etc.)• PIMG for legacy integration

Roles: • Primary & secondary• SRM manages roles

Runs on top of CCM Platform Cluster

© 2006 Cisco Systems, Inc. All rights reserved. 4

Some Terminology

CCM Platform Cluster: Publisher and Subscriber Pub is the first node – fixed at install. Other nodes are Subs.

UC Roles: Primary and Secondary The singleton processes run only in the primary server.

– Notifier, MTA, SysAgent-tasks and more.

UnityMbxDb writes are done only through primary server.

Certain master files: encrypt key, certificates are managed in the primary. These are replicated to secondary.

In normal operation - primary will be the Pub in the cluster.

© 2006 Cisco Systems, Inc. All rights reserved. 5

User access

TUI/VUI - will access servers transparently. IMAP/CPCA clients - will access servers transparently.

Admin:– Transparent server access - administration at either server.– Voice ports etc. will need node selection.

Serviceability:– Trace/Alarm settings will be common for both servers. – Service start/stop information will need node selection.– All singleton processes run on the primary server.

Licensing: – Voice ports are server specific – need server specific license.– User licenses are not server specific – can be put in any server.

RTMT will have to access each server explicitly. Log files will not be replicated.

© 2006 Cisco Systems, Inc. All rights reserved. 6

Deployment

Load balancing & Failover

© 2006 Cisco Systems, Inc. All rights reserved. 7

Installing UC in Cluster

1. Install the first node – Answer yes to the question “Is it the first node in cluster?”

2. Administer the first node and get it running

3. Adding the Second node in the cluster:

1. Using the Admin GUI, add the secondary node under “System Settings Cluster”.

2. Install the second node – Answer no to the question “Is it the first node in the cluster?”

3. Provide the IP/Hostname of the first node.

Once the second node comes up it will be in the cluster with the first one.

© 2006 Cisco Systems, Inc. All rights reserved. 8

CCM setup - SCCP

Dynamic load balancing and failover with Hunt-pilot, Hunt-list, & Line-Group (CCM 4 and above)

Hu n t P ilo t

Hu n t L is t L in e G r o u p

D N

UC - 1

UC - 2

D is tr ib u tio nAlg o r ith m- C ir c u la r ,

m o s t- id le e tc .

*

© 2006 Cisco Systems, Inc. All rights reserved. 9

CCM setup - SIP

Approaches: 1. With DNS-SRV

Route-Pattern Sip-Trunk DNS-SRV FQDN

2. With Route-List (Simpler) Route-Pattern Route-List Route-Group Sip-Trunk Uses distribution algorithm in Route-Group

1. With Sip Gateway DNS-SRV Route-Pattern Sip-Trunk Sip-GW DNS-SRV FQDN

© 2006 Cisco Systems, Inc. All rights reserved. 10

Other Integrations

PIMG PIMG pings a primary UC and can redirect calls to secondary

when primary fails. Load balancing is done at PBX. PIMG failures are handled at PBX.

P BX

P I M G

P I M G

UC - A

UC - B

P r im ar y lin k

S ec o n d ar y lin k

© 2006 Cisco Systems, Inc. All rights reserved. 11

IMAP & CPCA Clients

Load balancing and failover transparent to users.

• DNS – name lookup

• Add A-records in DNS

• Users need to re-login after failover.

© 2006 Cisco Systems, Inc. All rights reserved. 12

Cluster Behavior

© 2006 Cisco Systems, Inc. All rights reserved. 13

Cluster State displayed: None – means only one node in the cluster Normal – means there is more than one node – not failedover Failedover – Publisher is not the primary at that time

Admin changes made to one server should be visible on the other in few seconds.

Messages left on one server should be available from the other server in few seconds.

TUI/VUI, CPCA, IMAP, & Admin shall not notice any login issues when one of the servers is down.

MWI and other notifications shall continue to work when one of the servers is down.

© 2006 Cisco Systems, Inc. All rights reserved. 14

Manual failover

P r im ar y S ec o n d ar y

Ad m in c lien t

N o d e A N o d e B

F ailo v er

m ak e_ p r im ar y ( B )Singletonswill be startedhere.

© 2006 Cisco Systems, Inc. All rights reserved. 15

Cluster management

© 2006 Cisco Systems, Inc. All rights reserved. 16

Manual failback

Ac tin gS ec o n d ar y Ac tin g P r im ar y

Ad m in c lien t

N o d e A N o d e B

F ailb ac k

m ak e_ p r im ar y ( A )Singletonswill be startedhere.

© 2006 Cisco Systems, Inc. All rights reserved. 17

Manual Deactivate

Deactivating a server stops all critical services and base services in it.

The database replication will continue in the deactivated state.

Only secondary servers can be deactivated.

The Administration and the Serviceability GUI are available in the deactivated state.

This state is used for maintenance purposes, wherein all calls, and web user interactions are directed to the other server.

A deactivated server can be activated back to service (as shown).

© 2006 Cisco Systems, Inc. All rights reserved. 18

Manual activate

© 2006 Cisco Systems, Inc. All rights reserved. 19

Auto failover

P r im ar y S ec o n d ar y

S er v M

N o d e A N o d e B

F ailo v er

C r it ic a l S er v ic eF ailu r e

© 2006 Cisco Systems, Inc. All rights reserved. 20

Acting-Primary failure

Ac tin gS ec o n d ar y Ac tin g P r im ar y

N o d e A N o d e B

C r as h

N o d e A tr iesto b e p r im ar y

© 2006 Cisco Systems, Inc. All rights reserved. 21

CPCA servlet failure and redirection

N o d e A

N o d e B

C P C A

C P C A

C P C AW eb

C lien t

r ed ir ec t

© 2006 Cisco Systems, Inc. All rights reserved. 22

Tomcat failure and DNS resolution

N o d e A

N o d e B

T o m c a t

T o m c a t

W ebC lien t

n am e r es o lu tio n

D N S

lo o k u p

© 2006 Cisco Systems, Inc. All rights reserved. 23

Reasons for failover

• Failover can be caused by 30 sec heartbeat failure.

• Failover can be manually initiated also.

• Conditions for auto-failover:

• Critical process cannot be started or fails

•SRM, ServM, DB, DbEventPublisher, CuCsMgr, CuMixer, Notifier etc.

• Too many restarts in some interval

• CuCsMgr - allow single restart, but maybe 3 deaths in 5 or 10 min exceeds threshold

• Non-critical processes will not cause failover. ServM will restart them on same box

© 2006 Cisco Systems, Inc. All rights reserved. 24

Failover

Upon Failover (when primary fails) -

Any existing calls or IP traffic to primary will likely be lost.

SRM in secondary will detect the failure and update status in DB.

SRM in secondary will instruct ServM to start singleton processes.

Switch/PIMG/DNS will determine failover condition and route incoming call traffic to secondary box.

If using DNS, CPCA/IMAP traffic will be sent to secondary

© 2006 Cisco Systems, Inc. All rights reserved. 25

Two Generals’ Problem(split-brain)

Cause: Unreliable communication link between primary and secondary

Byzantine failure of SRM

Secondary thinks primary is dead and assumes “acting-primary” role, while primary continues its operation

Issues DB updates will continue in primary and secondary after failover.

Solution – Split Brain Resolution (SBR) (done automatically)

© 2006 Cisco Systems, Inc. All rights reserved. 26

Troubleshooting – tip 1CLI: show cuc cluster status – shows the current status of the cluster.

Member ID 0 means publisher (i.e., first-node).

Exactly one server must be in the primary role.

If both servers are primary, then they are not talking to each other. Check if the server hostnames are correct and if they can communicate.

© 2006 Cisco Systems, Inc. All rights reserved. 27

Tip 2 – Check certain required services

Check that these services are running on both servers:– Server Role Manager,

– Conversation Manager and Mixer,

– File Sync,

– DB Event Publisher

Check that these services are running in the primary server:– Notifier and

– Message Transfer Agent

© 2006 Cisco Systems, Inc. All rights reserved. 28

Tip 3: Log files

Check Server Role Manager (SRM) logs for cluster issues.

– /var/opt/cisco/connection/log/diag_CuSrm_*.uc

From RTMT select the component “Connection Server Role Manager” to download the SRM log.

Logs are not replicated, so it is required to check them on both servers.

© 2006 Cisco Systems, Inc. All rights reserved. 29

Upgrading a cluster

Upgrade process is very similar to UC 2.x

First upgrade the first node (primary) – do not switchover.

Then upgrade the second node (secondary) – do not switchover

At a convenient time, switchover the first node.

Then switchover second node after the first node switchover is successful.

© 2006 Cisco Systems, Inc. All rights reserved. 30

Future Enhancements

© 2006 Cisco Systems, Inc. All rights reserved. 31

Site Redundancy – Active/Passive

Differences from A/A

• Deployment model:

• No load balancing

CCMW e b/C lie n t

A uto fa ilove r w ithD N S

N o loa d ba la nc ing .

D N SSR V

Auto fai lo ve r (SC C P )N o lo ad balanc ing.

S IP Http /I m ap

D B

C sM gr

S R MS erv M

S in g leto n s

O t h e r . . .

D B

C sM gr

S R M S erv M

S in g leto n s

O t h e r . . .

Hear t- b ea t

Re p lic a t io n D B/file s

UC - A ( ac tiv e) P r im ar y UC - B ( p as s iv e) S ec o n d ar y

W AN

C a ll/re que s tde live re d only

if pr im a ry fa ils .

© 2006 Cisco Systems, Inc. All rights reserved. 32

Multi-server Cluster (N >2)

Current approach implies a single primary to manage singletons and UnityMbxDb updates.

This means 1 primary + N secondary in a N + 1 scenario.

When failover happens, one of the N secondary servers assumes acting-primary role based on some pre-defined criteria.

© 2006 Cisco Systems, Inc. All rights reserved. 33

Q&A

© 2006 Cisco Systems, Inc. All rights reserved. 34

Recommended