Upload
vmworld
View
762
Download
0
Tags:
Embed Size (px)
DESCRIPTION
VMworld 2013 Mauricio Barra, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare Thomas McQuillan, UnitedHealth Group
Citation preview
vCenter Site Recovery Manager – Solution Overview
and Lessons from a Fortune 500 Health Care
Company Implementation
Mauricio Barra, VMware
Thomas McQuillan, UnitedHealth Group
BCO5733
#BCO5733
2 2
Agenda
Context of BC/DR
vCenter Site Recovery Manager 5.5
Licensing, Pricing and Packaging
UnitedHealth Group: An SRM implementation at scale
from a Fortune 500 Company
3 3
Context of Business Continuity and Disaster Recovery
4 4
Uptime and Protection of Data are Critical for Business
Revenue Continuously available services
ensure revenue streams
Productivity Enables the workforce to work
at full capacity
Compliance Guarantees responsiveness
to auditing entities (SOX, ISO)
Reputation Protects relationships with
customers and partners
5 5
Improving BC/DR Is at the Top of IT Initiatives
Source: Forrester “Server Virtualization Predictions For 2013”, March 2013
Source: Forrester “BC/DR Remain Priorities For 2012 But Take A Backseat To Cost-Saving And Efficiency Initiatives”, October 2011
Among top 5 technology
priorities in 2012
• 40% report High Priority
• 20% report Critical
Priority
#1 driver for
virtualization:
• 57% report it’s “very important” to adopt x86 virtualization
6 6
Legacy Disaster Recovery Solutions Are Not Adequate
Expensive Complex
Recovery Plans
?
?
?
? ?
? ?
?
Unreliable
Failovers
Apps
Hosts
Storage
Network
Software
Hosts
Storage
Facilities
>$10K per app
Failure to meet business requirements
• Long RTOs – days to weeks
• Too much time and resources consumed
7 7
Planned Downtime Unplanned Downtime
VMware Enables IT Business Continuity at All Levels
• vMotion
• Storage vMotion
• Fault Tolerance
• High Availability
• App HA
• Site Recovery Manager
• DR to the Cloud with SRM
• vSphere Replication
• vSphere Data Protection Advanced
• vSphere APIs for Data Protection (VADP)
Site Application Availability
Local Application Availability
Data Protection
8 8
vCenter Site Recovery Manager 5.5 Automated DR Management and Orchestration
9 9
Key Components of a Disaster Recovery Solution with SRM
vCenter Server Site
Recovery Manager
Protected Site Recovery Site
Storage
vCenter Server Site
Recovery Manager
vSphere vSphere
Storage
Disaster Recovery: Ensuring recovery or continuation of operations at an alternate
site in the case of an outage at the primary site
10 10
Replication Options
vSphere Replication
Array-Based Replication (3rd party)
Key Components of a Disaster Recovery Solution with SRM
vCenter Server Site
Recovery Manager
Protected Site
vSphere
Storage
Site Recovery Manager
vSphere
Disaster Recovery: Ensuring recovery or continuation of operations at an alternate
site in the case of an outage at the primary site
11 11
Copy Individual Virtual Machines with vSphere Replication
Only true Hypervisor-based replication
for vSphere
Asynchronous RPOs (15 min to 24 hrs)
Managed directly from vCenter Server
Included with vSphere Ess+ and above
Reduce replication software costs
Reduce storage costs using
heterogeneous arrays
Simpler VM-level replication
SRM integration enables automated DR
vSphere
vSphere Replication
Site A (Primary)
vSphere
Site B (Recovery)
Overview
Benefits
12 12
What’s New with vSphere Replication
Multiple vSphere Replication
appliances per vCenter Server
Choose to revert to a previous
‘known good point’ after failover
Enables new topologies with up to
10 vSphere Replication appliances
Multiple point-in-time
recovery
Benefit
Reduce storage costs replicating
to and from Virtual SAN
Support for Virtual SAN
(public beta)
Support for Storage vMotion
and Storage DRS Leverage this vSphere functionality
on VMs being replicated
New Feature
New
New optimizations gain up to 5x
speed improvement in replication Dramatic speed improvement
13 13
Site Recovery Manager Delivers Simple and Reliable DR
DR orchestration solution that
automates testing and execution
of centralized recovery plans
Leverages vSphere Replication
or broad range of array-based
replication solutions
Up to 50% lower TCO for DR
Setup recovery plans in minutes,
not weeks
Initiate orchestration with one click
Test as frequently as needed
vCenter Site Recovery Manager
Benefits
VMware vSphere
VMware
vCenter Server
Site Recovery
Manager
VMware
vCenter Server
Site Recovery
Manager
VMware vSphere
Site A (Primary) Site B (Recovery)
Servers Servers Array-based
replication
vSphere
Replication
14 14
What’s New with Site Recovery Manager
Choose to revert to a previous
‘known good point’ after failover
Multiple point-in-time recovery
with vSphere Replication
Benefit
Reduce storage costs using Virtual
SAN with vSphere Replication
Support for Virtual SAN
(public beta)
Support for Storage vMotion
and Storage DRS Leverage this vSphere functionality
on VMs being replicated
New Feature
New
15 15
SRM Transforms Management of Recovery and Migration Plans
Weeks or months to set up recovery plans
Unstructured and error-prone
Quickly falls out of sync with apps and infrastructure changes
Simple set up in minutes
Defined workflows eliminate errors
Simple to keep in sync with changes
…to Simple Recovery Plans From Complex Runbooks…
16 16
Frequent Testing Reduces Recovery Risk
During the testing gap, organizations can’t be sure that they can recover the current IT environment
A failover scenario may take days or weeks to complete, leaving the business at extreme risk
Lack of confidence
in DR process
Time
DR Test DR Test
TESTING GAP
Recovery
Risk
Traditional Disaster Recovery
17 17
Frequent Testing Reduces Recovery Risk
SRM provides assurance that DR objectives will be met.
Time
DR Test DR Test
TESTING GAP
Recovery
Risk
Traditional Disaster Recovery
Recovery
Risk
DR Test DR Test Time
Site Recovery Manager
Frequent
DR Testing
18 18
SRM Automates Every Workflow of DR Orchestration
Replication
Main site Recovery
site
Non-disruptive Testing Automated Failover
Automated Failback Planned Migrations
• Automated testing in
an isolated network
• Test as frequently
as needed for
predictable RTOs
• Automatically re-protect
VMs from Site B to Site A
• Reverse original
recovery plan
• 1-click initiation
• Automated execution
of user-defined
recovery plan
• Graceful shutdown
of production VMs
• ‘Data sync’ ensures zero
data loss
SRM
19 19
SRM’s Automation Reduces The Cost of Disaster Recovery
DR Costs per VM per Year
Source: The Total Economic Impact of VMware vCenter Site Recovery Manager, Forrester, May 2013
$1,757
$800 $800
$288 $288
$477
$477
$-
$500
$1,000
$1,500
$2,000
$2,500
Manual DR SRM only SRM + vSphereReplication
DR management and testing SRM Software Replication
50% lower DR costs (not factoring cost of
downtime)
• Over $1,100 savings per
year for each protected VM
• Avg. cost of downtime is
$145,000 per hour
• Planned migrations add 5%
cost savings
$2,234
$1,564
-30%
$1,087
-21%
Downtime
20 20
Public Cloud
Shared Recovery Site
DR2C Delivers SRM Benefits without Secondary Datacenter
Main site
Cost-efficient DR services:
Subscription-based
Shared resources lower cost
Providers offer variety of pricing, packaging,
service levels and deployment options
DR to the Cloud with SRM
Partner Ecosystem
vSphere
vCenter
Server SRM
vSphere
Replication
21 21
Licensing, Pricing and Packaging
22 22
SRM Available a-la-Carte or with vCloud Suite Enterprise
Packaging Licensing What is included with each license?
A-la-carte Per VM
•SRM only
•Two editions – Standard or Enterprise
•Entitlement to protect a certain number
of licensed virtual machines
vCloud Suite
Enterprise Per CPU
•SRM, vSphere Ent+ and all the
components of vCloud Suite Enterprise
•Entitlement to protect an unlimited number
of virtual machines on licensed processors
23 23
SRM a-la-Carte Available in Two Editions
Standard Enterprise
Licensing and Pricing
Per protected virtual machine (license only) $195 $495
Scalability Limits
• Maximum protected VMs 75 VMs(1)
Unlimited(2)
Features
• Centralized recovery plans ● ●
• Non-disruptive testing ● ●
• Automated DR failover ● ●
• Automated failback ● ●
• Planned migration ● ●
• Array-based replication support ● ●
• vSphere Replication support ● ●
• Multiple point-in-time recovery with VR ● ●
• Storage vMotion / Storage DRS support ● ●
• Virtual SAN (public beta) support ● ●
New in SRM 5.5 1. Maximum of 75 VMs per site and per SRM instance
2. Subject to the product’s technical scalability limits
24 24
Clo
ud
Ma
na
ge
men
t C
lou
d In
fra
str
uctu
re
SRM Included in vCloud Suite Enterprise
Price (per CPU, license only)
vSphere Enterprise Plus
• Virtualized infrastructure with policy-based automation
Disaster Recovery Automation
• Automated disaster recovery planning, testing, and execution
Cloud Automation
• Application and data services – Application provisioning, changes and data
• Governance – Approvals, reclamation, cost profile and transparency
• Extensibility – Infrastructure integrations, workflows and customizations
• Infrastructure provisioning and management
SRM Enterprise
$4,995 $7,495 $11,495
Networking and Security
• Scalable networking and virtualization-aware security
vCloud Net & Sec vCloud Net & Sec vCloud Net & Sec
vSphere
Enterprise Plus
vSphere
Enterprise Plus
vSphere
Enterprise Plus
Operations Management
• Application Monitoring – OS, middleware, databases
• OS-level change, configuration and regulatory compliance management
• Extensibility – Adapters for 3rd party OS and application monitoring tools
• Extensibility – Adapters for 3rd party Infrastructure monitoring tools
• vSphere hardening, change and configuration management
• Application Awareness – Discovery dependency mapping
• Chargeback – Cost metering and reporting
• Operations Dashboard – Health Monitoring and Performance Analytics
• Capacity Management – Planning and Optimization
vCOPS Advanced vCOPS Enterprise
vCAC Ent
Updated Q3 2013
vCOPS Standard
vCAC Adv vCAC Std
Virtualized Datacenters
• Virtualized datacenters and public cloud extensibility
vCD, vCC vCD, vCC vCD, vCC
Enterprise Advanced Standard
25 25
UnitedHealth Group An SRM implementation at scale
from a Fortune 500 Company
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 26
Thomas McQuillan
Director IT Architecture
UnitedHealth Group IT
VMware SRM at
UnitedHealth Group
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 27
Place image over
grey area
UnitedHealth Group: At a Glance
About Us:
• Serving more than 85 million individuals worldwide with health benefits and services.
• Operations in all 50 states in the United States and 20 other countries worldwide.
• 2012 revenues of $110.6 billion.
• Fortune 500 ranking: No. 17
• Named World’s Most Admired Company in the Insurance and Managed Care sector, 2010, 2011, 2012, by Fortune.
• Member, Dow Jones Industrial Average.
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 28
• Data Centers
• ~126
• Systems under Management
• ~ 35,000 Servers
• VM’s Under Management
• ~ 29,000 Server (~1,200 Hosts)
• ~ 5,000 VDI
• Storage under Management
• 48.2 Petabytes (Including Mainframe)
• 18.78 Petabytes (Distributed Systems)
• Distributed Systems under DR Management
• ~1,200 (Distributed Systems)
• ~1.079 PB (1,079 TB)
• VM’s under DR Management
• 700 (Linux & Windows)
• 200 TB
UnitedHealth Group IT: At a Glance
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 29
UnitedHealth Group IT: At a Glance
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 30
• Array Based Model
• Boot from SAN
• RTO < 8 Hrs
• RPO = 0 Min
• Backup Protection Model
• RTO = 48-72 Hrs
• RPO < 48 Hrs
• Backup Protection Model
• RTO < = 8 weeks
• RPO < = 48 hours
Rapid
Recovery
Warm
Recovery
HOT
Recovery
UnitedHealth Group: DR Models
vCenterHB
Site A Site B
vCenterHB
2. Daily maintenance scripting captures
VM configuration.
1. Standard Data Protection.
3. VM’s are backed up and 2nd copy
processes are utilized to move copies of
backups to 2nd site.
Recovery Methodology:
HOT = Active Infrastructure being
repurposed for DR Recovery.
Not: Active/Standby
RTO = 48 – 72 Hrs
Backup Protected (HOT)
Backup
Proxy
RPO = <48 Hrs
Backup
Proxy
vCenterHB
vCenterHB
4. A DR Event is experienced.
5. DR Failover is invoked.
Backup Protected (HOT)
Site A Site B Backup
Proxy
2. Daily maintenance scripting captures
VM configuration.
1. Standard Data Protection.
3. VM’s are backed up and 2nd copy
processes are utilized to move copies of
backups to 2nd site.
Recovery Methodology:
HOT = Active Infrastructure being
repurposed for DR Recovery.
Not: Active/Standby
RTO = 48 – 72 Hrs
Backup
Proxy
RPO = <48 Hrs
Bad News
vCenterHB
Site A Site B
vCenterHB
6. Non-Essential VM’s are Shut Down and
Deleted.
Backup Protected (HOT)
Backup
Proxy
4. A DR Event is experienced.
5. DR Failover is invoked.
2. Daily maintenance scripting captures
VM configuration.
1. Standard Data Protection.
3. VM’s are backed up and 2nd copy
processes are utilized to move copies of
backups to 2nd site.
Recovery Methodology:
HOT = Active Infrastructure being
repurposed for DR Recovery.
Not: Active/Standby
RTO = 48 – 72 Hrs
RPO = <48 Hrs
Backup
Proxy
vCenterHB
Site A Site B
vCenterHB
8. Backup Proxies reconfig.
Backup Protected (HOT)
7. Core Infrastructure Reconfig.
Backup
Proxy
6. Non-Essential VM’s are Shut Down and
Deleted.
4. A DR Event is experienced.
5. DR Failover is invoked.
2. Daily maintenance scripting captures
VM configuration.
1. Standard Data Protection.
3. VM’s are backed up and 2nd copy
processes are utilized to move copies of
backups to 2nd site.
Recovery Methodology:
HOT = Active Infrastructure being
repurposed for DR Recovery.
Not: Active/Standby
RTO = 48 – 72 Hrs
RPO = <48 Hrs
Backup
Proxy
vCenterHB
Site A Site B
vCenterHB
9. VM’s restored via Backup Proxies.
Backup Protected (HOT)
Backup
Proxy
Backup
Proxy
8. Backup Proxies reconfig.
7. Core Infrastructure Reconfig.
6. Non-Essential VM’s are Shut Down and
Deleted.
4. A DR Event is experienced.
5. DR Failover is invoked.
2. Daily maintenance scripting captures
VM configuration.
1. Standard Data Protection.
3. VM’s are backed up and 2nd copy
processes are utilized to move copies of
backups to 2nd site.
Recovery Methodology:
HOT = Active Infrastructure being
repurposed for DR Recovery.
Not: Active/Standby
RTO = 48 – 72 Hrs
RPO = <48 Hrs
vCenterHB
Site A Site B
vCenterHB
10. VM NIC configuration script run to
restore VM NICs to restored VM’s.
11. VM’s started.
12. Application Verification.
Backup Protected (HOT)
Backup
Proxy
9. VM’s restored via Backup Proxies.
8. Backup Proxies reconfig.
7. Core Infrastructure Reconfig.
6. Non-Essential VM’s are Shut Down and
Deleted.
4. A DR Event is experienced.
5. DR Failover is invoked.
2. Daily maintenance scripting captures
VM configuration.
1. Standard Data Protection.
3. VM’s are backed up and 2nd copy
processes are utilized to move copies of
backups to 2nd site.
Recovery Methodology:
HOT = Active Infrastructure being
repurposed for DR Recovery.
Not: Active/Standby
RTO = 48 – 72 Hrs
RPO = <48 Hrs
Backup
Proxy
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 37
SUMMARY
Key Findings:
• Rapid Recovery provides desired RTO
• All systems in model.
• Testing requires full environment failover & failback.
• Doesn’t meet Self Service Model.
• HOT Recovery is Time Consuming & Destructive
• All three Recovery Models are labor intensive
• All three Recovery Models require Manual Failover and Failback
A New recovery model is required for virtualization.
UnitedHealth Group IT: DR Summary
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 38
Place image over
grey area
UnitedHealth Group IT: Disaster Recovery Models
Virtualization Model Requirements
• Hypervisor Integrated
• Automated Approach
• Non-Disruptive Testing
• Non-Destructive
• Support RTO/RPO Bands
• Support for Array & vSphere Rep.
• Provide path for Self Service Model
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 39
UnitedHealth Group IT: DR Models Revised
• Array Based Model
• Boot from SAN
• RTO < 8 Hrs
• RPO = 0 Min
• Backup Protection Model
• RTO = 48-72 Hrs
• RPO < 48 Hrs
• Array & vSphere Rep Enabled
• RTO < 8 Hrs 72 Hrs
• vSphere RPO 15 Min minimum
• Array RPO > 0 Min
• Backup Protection Model
• RTO < = 8 weeks
• RPO < = 48 hours
Rapid Recovery Virtualization Recovery
HOT Recovery Warm Recovery
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 40
UnitedHealth Group IT: SRM Replication Options
RPO DECISION FLOW
Start
Use Array-based Replication
Is the workload max change
rate below the accepted threshold X within the
RPO window?
Use vSphere Replication
No
Do any VMs require time consistent
recovery with other VMs?
Yes
Yes
NoIs the application RPO requirement greater
than 15 Minutes?Yes
No
vCenterHB
Site A Site B
vCenterHB
1. SRM Protected VM’s based on RTO Band.
2. Replication of VM’s based on RPO Decision Flow.
Recovery Methodology:
Site Recovery Manager
vSphere and/or Storage Rep.
Storage Rep RPO <30 Min
VMware
SRM
VMware
SRM
VMware SRM Protected
2nd Copy
Storage Primary
Storage
vSphere Rep RPO = <24 Hr
RTO Band 1 < 8 hrs
RTO Band 2 = 8 - 24 hrs
RTO Band 3 = 24 - 48 hrs
RTO Band 4 = 48 - 72 hrs
vCenterHB
Site A Site B
vCenterHB
VMware
SRM
VMware SRM Protected
3. A DR Event is experienced.
4. DR Failover is invoked.
VMware
SRM
2nd Copy
Storage
Bad News
1. SRM Protected VM’s based on RTO Band.
2. Replication of VM’s based on RPO Decision Flow.
Recovery Methodology:
Site Recovery Manager
vSphere and/or Storage Rep.
Storage Rep RPO <30 Min
vSphere Rep RPO = <24 Hr
RTO Band 1 < 8 hrs
RTO Band 2 = 8 - 24 hrs
RTO Band 3 = 24 - 48 hrs
RTO Band 4 = 48 - 72 hrs
vCenterHB
Site A Site B
vCenterHB
VMware SRM Protected
5. Non-Essential VM’s are Shut Down (Not Deleted).
2nd Copy
Storage
VMware
SRM
3. A DR Event is experienced.
4. DR Failover is invoked.
1. SRM Protected VM’s based on RTO Band.
2. Replication of VM’s based on RPO Decision Flow.
Recovery Methodology:
Site Recovery Manager
vSphere and/or Storage Rep.
Storage Rep RPO <30 Min
vSphere Rep RPO = <24 Hr
RTO Band 1 < 8 hrs
RTO Band 2 = 8 - 24 hrs
RTO Band 3 = 24 - 48 hrs
RTO Band 4 = 48 - 72 hrs
VMware
SRM
vCenterHB
Site A Site B
vCenterHB
VMware SRM Protected
6. Core Infrastructure Reconfigured.
2nd Copy
Storage
VMware
SRM
5. Non-Essential VM’s are Shut Down (Not Deleted).
3. A DR Event is experienced.
4. DR Failover is invoked.
1. SRM Protected VM’s based on RTO Band.
2. Replication of VM’s based on RPO Decision Flow.
Recovery Methodology:
Site Recovery Manager
vSphere and/or Storage Rep.
Storage Rep RPO <30 Min
vSphere Rep RPO = <24 Hr
RTO Band 1 < 8 hrs
RTO Band 2 = 8 - 24 hrs
RTO Band 3 = 24 - 48 hrs
RTO Band 4 = 48 - 72 hrs
VMware
SRM
vCenterHB
Site A Site B
vCenterHB
VMware SRM Protected
7. SRM recovery plans executed by RTO Band.
VMware
SRM
2nd Copy
Storage
6. Core Infrastructure Reconfigured.
5. Non-Essential VM’s are Shut Down (Not Deleted).
3. A DR Event is experienced.
4. DR Failover is invoked.
1. SRM Protected VM’s based on RTO Band.
2. Replication of VM’s based on RPO Decision Flow.
Recovery Methodology:
Site Recovery Manager
vSphere and/or Storage Rep.
Storage Rep RPO <30 Min
vSphere Rep RPO = <24 Hr
RTO Band 1 < 8 hrs
RTO Band 2 = 8 - 24 hrs
RTO Band 3 = 24 - 48 hrs
RTO Band 4 = 48 - 72 hrs
VMware
SRM
vCenterHB
Site A Site B
vCenterHB
VMware
SRM
VMware SRM Protected
8. Application Verification.
VMware
SRM
2nd Copy
Storage
7. SRM recovery plans executed by RTO Band.
6. Core Infrastructure Reconfigured.
5. Non-Essential VM’s are Shut Down (Not Deleted).
3. A DR Event is experienced.
4. DR Failover is invoked.
1. SRM Protected VM’s based on RTO Band.
2. Replication of VM’s based on RPO Decision Flow.
Recovery Methodology:
Site Recovery Manager
vSphere and/or Storage Rep.
Storage Rep RPO <30 Min
vSphere Rep RPO = <24 Hr
RTO Band 1 < 8 hrs
RTO Band 2 = 8 - 24 hrs
RTO Band 3 = 24 - 48 hrs
RTO Band 4 = 48 - 72 hrs
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 47
• Hypervisor Integrated
• Automated Approach
• Non-Disruptive Testing
• Non-Destructive
• Supports RTO/RPO Bands
• Support for Array & vSphere Rep.
UnitedHealth Group IT: DR Summary
BENEFITS OF SRM
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 48
Today:
Distributed Systems under DR Management
• 1,200+ (Distributed Systems)
• 1.079 PB (1,079 TB)
VM’s under DR Management
• 700 (Linux & Windows)
• 200 TB
Future:
VM’s under Management
• 90% of Distributed Environment
• Currently ~70%
100% Virtualized workloads requiring DR
• Virtualization Recovery Model
SRM Full Integrated for Self Service Environment
• Private & Public Cloud
UnitedHealth Group: Virtualization DR
THE ROAD AHEAD
© 2012 UnitedHealth Group. Any use, copying or distribution without written permission from UnitedHealth Group is prohibited. 49
50
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1305
Business Continuity and Disaster Recovery In Action
Group Discussions:
BCO1004-GD
vCenter Heartbeat with Harry Smith
BCO5733
THANK YOU
vCenter Site Recovery Manager – Solution Overview
and Lessons from a Fortune 500 Health Care
Company Implementation
Mauricio Barra, VMware
Thomas McQuillan, UnitedHealth Group
BCO5733
#BCO5733