Upload
itcamp
View
872
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
High Availability for Exchange 2010
Paul Roman, MVP
Managing Partner, PRAS Consulting
E-mail: [email protected]
Blog: paulroman.pras.ro
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
IT Camp 2011
• Thanks for coming!
• ITCamp is made possible by our sponsors:
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Session agenda
• Discuss different HA design dimensions:
– Infrastructure design
– Database Availability Group design
– Client experiences
• Implementation Examples
• Q&A
• Feedback & prizes
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
INFRASTRUCTURE DESIGN
How should you design your IT infrastructure for Exchange HA
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure Design Active Directory Sites
• Active Directory site assignment controls the association of CAS to Mailbox and Hub to Mailbox
– CAS/HUB service local mailbox servers, “mostly”
– Could be for multiple DAGs
• DAGs can span subnets without special action
– IP address for each MAPI subnet used by DAG
– Configured on DAG object
• Question : When would an AD site span datacenters?
– Answer: When datacenters have LAN quality communication
• Follow Active Directory guidance for AD site definition
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure Design Cross-Datacenter Network Configuration
• For site resilience configurations use DHCP to assign addresses for replication network – Enables delivery of the typically required static routes
– If using static IP addresses, use netsh instead of route for configuring static routes
• In terms of latency requirements, Exchange 2010 was designed with a target round-trip latency of 250ms or less – Remember, the higher the latency, the more impact to replication
• Configure a DNS TTL on “service access connection records” that is consistent with your SLA – E.g. ~5 minutes for a one hour RTO SLA
– Direct association between this time and recovery
– Remember the records might be in different zones!
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure Design Namespace Planning (Site Resilience)
• Each datacenter should be considered active when planning for namespaces
• Each datacenter needs the following namespaces – OWA/OA/EWS/EAS namespace
– POP/IMAP namespace
– RPC Client Access namespace
– SMTP namespace
• In addition, one of the datacenters will maintain the Autodiscover namespace
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure DesignLeverage Split-brain DNS
Best Practice: Use “Split DNS” for Exchange hostnames used by clients
Goal: minimize number of hostnames mail.contoso.com for Exchange connectivity on intranet and Internet
mail.contoso.com has different IP addresses in intranet/Internet DNS
Important – before moving down this path, be sure to map out all the host names (outside of Exchange) that you will want to create in the internal zone
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure Design What does the namespace design look like?
Datacenter 1
CAS HT
MBX
Datacenter 2
HT CAS
AD AD MBX
Internal DNS Mail.contoso.com Pop.contoso.com Imap.contoso.com Autodiscover.contoso.com Smtp.contoso.com Outlook.contoso.com
Internal DNS Mail.region.contoso.com Pop.region.contoso.com Imap.region.contoso.com Smtp.region.contoso.com Outlook.region.contoso.com
ExternalURL = mail.region.contoso.com CAS Array = outlook.region.contoso.com OA endpoint = mail.region.contoso.com
ExternalURL = mail.contoso.com CAS Array = outlook.contoso.com OA endpoint = mail.contoso.com
External DNS Mail.region.contoso.com Pop.region.contoso.com Imap.region.contoso.com Smtp.region.contoso.com
External DNS Mail.contoso.com Pop.contoso.com Imap.contoso.com Autodiscover.contoso.com Smtp.contoso.com
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure Design Certificate Planning
Best practice: minimize the number of certificates
1 certificate for all CAS servers + reverse proxy + Edge/Hub
Use “Subject Alternative Name” (SAN) certificate which can cover multiple hostnames
If leveraging a certificate per datacenter, then ensure that the Certificate Principal Name is the same on all certificates
Outlook Anywhere won’t connect if the Principal Name on the certificate does not match the value configured in msstd: (default matches OA RPC End Point)
Set-OutlookProvider EXPR -CertPrincipalName msstd:mail.contoso.com
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure Design Site Resilience Models
There are two key models you have to take into account when designing site resilient solutions
Datacenter / Namespace Model
User Distribution Model
As mentioned, when planning for site resilience, each datacenter needs to be considered active
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure Design User Distribution Models
The locality of the users will ultimately determine your site resilience architecture
Are users primarily located in one datacenter?
Are users located in multiple datacenters?
Is there a requirement to maintain user population in a particular datacenter?
Active/Passive user distribution model Database copies deployed in the secondary datacenter, but no active mailboxes are hosted there
Active/Active user distribution model User population dispersed across both datacenters with each datacenter being the primary datacenter for its specific user population
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Infrastructure Design Client Access Arrays
1 CAS array per AD site Multiple DAGs within an AD site can use the same CAS array
FQDN of the CAS array needs to resolve to a load-balanced virtual IP address in DNS
Should only resolve in internal DNS structure
CAS Array does not provide any load balancing -> you need a load balancer!
Set the databases in the AD site to utilize CAS array via Set-MailboxDatabase RPCClientAccessServer property
By default, new databases will have the RPCClientAccessServer value set on creation
If database was created prior to creating CAS array, then it is set to random CAS FQDN (or local machine if role co-location)
If database is created after creating CAS array, then it is set to the CAS array FQDN
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DATABASE AVAILABILITY GROUP DESIGN
How should you design your DAGs
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Database Copies
• Each DAG member can host 1 copy of each mailbox database
• Maximum number of copies within a 16 member DAG: – 1 copy – 1600 databases
– 2 copies – 800 databases
– 3 copies – 533 databases
• Two types of database copies – HA database copies
– Lagged database copies
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Lagged Database Copies
• Lagged copies are only for point-in-time protection
– Logical corruption and/or mailbox deletion prevention scenarios
– Provide a maximum of 14 days protection
• When should you deploy a lagged copy?
– Useful only to mitigate a risk
– Not needed if deploying a third-party backup solution (e.g. DPM 2010)
• Lagged copies are not HA database copies
– Lagged copies should never be activated!
• Lagged copies have storage implications
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Controlling Database Copy Activation
• Various scenarios: – Don’t want to activate database copies on servers in
standby because…
– Want to preclude activation of copies on server X because of hardware issue or lagged copies…
– Block activation of database copies on a server during upgrade
• Two ways to activation block copies – Set-MailboxServer <Server> -
DatabaseCopyAutoActivationPolicy <Blocked,IntrasiteOnly,Unrestricted>
– Suspend-MailboxDatabaseCopy <DB\Server> -ActivationOnly
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Sizing
• Question: How many members should be in a DAG? – Answer: It depends (maximum would be 16)
• The larger the DAG, better resiliency – Consider the implications of a three copy/ six server DAG vs. two
DAGs with three servers and three copies of each database
– Larger DAGs continue to provide as much service as they can after more failures
• The larger the DAG, the better efficiency of the hardware – Distribute active load across all members
• For server count, consider a multiple of the number of copies you are deploying
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Sizing
• Question: How many DAGs should I deploy? – Answer: It depends
• Obviously you will need to deploy multiple DAGs if you need more than 16 servers
• You may also need multiple DAGs depending on your site resilience architecture – If deploying an Active/Active user distribution
architecture, then you should consider deploying 2+ DAGs – allows you to control locality and not perform a site activation in the event of a network failure between datacenters
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Pri
mar
y D
atac
ente
r Secon
dary D
atacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG1
Outlook Outlook
DAG1FSW
Active Active
DAG Design Active/Active User Distribution Sizing
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Active/Active User Distribution Sizing
Pri
mar
y D
atac
ente
r Secon
dary D
atacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG1
Outlook Outlook
DAG1FSW
MBX-F MBX-HMBX-GMBX-E
DAG2
DAG2FSW
Active
ActivePassive
Passive
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Two Failure Models
• Design for all database copies activated – Design for the worst case - server architecture
handles 100 percent of all hosted database copies becoming active
• Design for targeted failure scenarios – Design server architecture to handle the active
mailbox load during the worst failure case you plan to handle • 1 member failure requires 2 or more HA copies and 2 or
more servers
• 2 member failure requires 3 or more HA copies and 4 or more servers
– Requires Set-MailboxServer <Server> -MaximumActiveDatabases <Number>
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design It’s all in the layout
• Consider this scenario – 8 servers, 40 databases with 2 copies
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8
DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36
DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37
DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38
DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39
DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40
DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’
DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’
DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’
DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’
DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design It’s all in the layout
• If I have a single server failure – Life is good
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8
DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36
DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37
DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38
DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39
DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40
DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’
DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’
DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’
DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’
DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design It’s all in the layout
• If I have a double server failure – Life could be good…
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8
DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36
DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37
DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38
DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39
DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40
DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’
DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’
DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’
DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’
DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design It’s all in the layout
• If I have a double server failure – Life could be bad…
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8
DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36
DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37
DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38
DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39
DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40
DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’
DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’
DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’
DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’
DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design It’s all in the layout
• Now let’s consider this scenario
– 4 servers, 12 databases with 3 copies
– With a single server failure:
– With a double server failure:
Server 1 Server 2 Server 3 Server 4 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12 DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’ DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’
Server 1 Server 2 Server 3 Server 4 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12 DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’ DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’
Server 1 Server 2 Server 3 Server 4 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12 DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’ DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design It’s all in the layout – Over Subscription
• If you plan to over subscribe the servers then: – Don’t plan to be perfect!
– Set soft threshold for number of active databases per server • In some circumstances databases will fail to mount because
of limit
– Put processes in place for redistributing databases per server • After hardware maintenance
• After software maintenance
• Periodically – because of random failures
– SP1 includes a script to provide automated load balancing
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design It’s all in the layout – Over Subscription
• If you plan to over subscribe the servers then:
– Educate your operations team on implication of over subscription
– Periodically validate you are not too over subscribed
• Run in your worst case scenario for a period of time
– Have a plan on how you handle being too over subscribed
• Reminders:
– Design storage subsystems to handle all database copy I/O and capacity
– Design CPU and memory to handle the max active database copies and the passive copies
– Design memory to handle the max active database copies
– Design network subsystem to handle the throughput required to sustain the active load, the number of target copies, and CI updates
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design It’s all in the layout
• Consider physical hardware situations where practical (JBOD in particular)
– If servers in DAG are in multiple racks then spread copies across racks
– If servers are in different rooms in datacenter then factor that into distribution
– If servers reside on the same network switch/router, then a network failure can take out multiple servers
– In summary, minimize possible single points of failures
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Storage Architecture
• Deployment on RAID or JBOD will be based on several factors – Cost
– Hardware
– Number of copies
– Types of copies
– Single or multi-datacenter
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Storage Architecture
2 HA Copies (Total)
3+ HA Copies (Total)
2+ HA Copies / Datacenter
1 Lagged Copy
2+ Lagged Copies / Datacenter
Primary Datacenter Servers
RAID RAID or JBOD
RAID or JBOD
RAID RAID or JBOD
Secondary Datacenter Servers
RAID RAID RAID or JBOD
RAID RAID or JBOD
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Replication Concerns
• Replication is always from source to target – Remember if you have multiple copies in a remote datacenter, you
will have multiple log streams being shipped across the wire
• Exchange 2010 offers compression for log shipping – Controllable setting for the DAG
– Default is inter-subnet
– MSIT sees 30% compression, but can vary for each customer based on message profile
• Also have to factor in content indexing – While an index exists for every copy, the index for a passive copy is
updated by getting changes from active copy’s index
– This communication is not compressed
• How do I size for replication and content indexing impact? – Use the Exchange 2010 Mailbox Server Role Requirements
Calculator
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Replication Networks
• Single network DAG members fully supported – Recommendation: have minimum of two networks on each
member server
• Initial DAG network configuration is based on the enumeration of cluster networks – Cluster enumerates networks based on subnet
– One cluster network is created for each subnet / port
– Recommendation: Collapse into single MAPI and Replication DAG networks
• MAPI network may be replication disabled – Network will be utilized for replication if no other valid
replication path exists
• There is no preference order to replication networks – chosen at random by Replication service
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
DAG Design Small Scale Architectures
• Small scale / branch office architectures that require high availability – 2-4 servers typically
– Requires Windows Server Enterprise Edition
• There are many different options:
Hardware Licensing
2 physical servers (all-in-one)* Requires Hardware Load Balancer
Less licenses
2 physical server architecture utilizing Hyper-V (role separation via VMs)*
Less hardware More Exchange licenses
4 physical servers (role separation – 2 MBX, 2 HT/CAS)
More hardware More Exchange and Windows licenses
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
CLIENT EXPERIENCES
How should you design your DAGs
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Client Experiences Typical Outlook Behavior
• All Outlook versions behave consistently in a single datacenter HA scenario – Profile points to Client Access Server array
– Profile is unchanged by failovers or loss of CAS
• All Outlook versions should behave consistently in a datacenter failover scenario – Primary datacenter Client Access Server DNS name is
bound to IP address of standby datacenter’s Client Access Server
– Autodiscover continues to hand out primary datacenter CAS name as Outlook RPC endpoint
– Profile remains unchanged
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Key
Pri
mar
y D
atac
ente
r
Secon
dary D
atacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG
Outlook 2010Outlook 2007
Active
Passive
Outlook 2003
Client Experiences Cross-Site DB Failover Redirect (Outlook
Versions)
Preferred Database Site = PDC (RPCClintAccessServer = CAS-PRI)
Cross Site Connections = Not Allowed
Autodiscover detects profile change and updates client
Outlook 2003 updates due to ecWrongServer
Autodiscover detects profile change and
updates client
Outlook 2003 can’t update if source CAS is
unavailable
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Client Experiences Other Clients
• Other client behavior varies per technology and scenario:
In-Site *Over Scenario Out-of-Site *Over
Scenario
Datacenter Switchover
OWA Reconnect Manual Redirect Reconnect
Active Sync Reconnect Redirect or proxy Reconnect
POP/IMAP Reconnect Proxy Reconnect
EWS Reconnect Autodiscover Reconnect
Autodiscover N/A Seamless Reconnect
SMTP / Powershell N/A N/A Reconnect
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
IMPLEMENTATION EXAMPLES
Real life implementations
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Implementation Examples Fully Redundant Infrastructure
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Implementation Examples Disaster recovery with lagged copy
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Conclusion
• There are many different design dimensions that have to be considered when designing for high availability and site resilience with Exchange 2010
• The choices you will make will determine the number of copies and hardware you deploy – Design choices should be based on customer
requirements
– Exchange 2010 allows you to take advantage of new options which can lower costs
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Q&A
@itcampro / #itcampro Premium conference on Microsoft’s Dev and ITPro technologies
Don’t forget!
Get your free Azure pass!
• 30+15 days, no CC req’d
– http://bit.ly/ITCAMP11
– Promo code: ITCAMP11
We want your feedback!
• Win a WP7 smartphone
– Fill in your feedback forms
– Raffle: end of the day