Upload
moon5397
View
215
Download
0
Embed Size (px)
Citation preview
8/3/2019 eBook a Guide to Modern IT Disaster Recovery
1/7
IntroductIon: W Bk Sws,
w y wi y ?
chapter 1: e u
chapter 2: S wi Intelligent
VirtualFi
chapt
t 10
concl
r
appenIN
SIDE:
www.vmware.com
How to prepare or and mitigate the
Back Swan Events in your data center
A Guide to ModernIT Disaster Recove
8/3/2019 eBook a Guide to Modern IT Disaster Recovery
2/7
INtroDuctIoN
Your data center IS Your caStle.Thats where
all your critical IT components hardware, data, and
sotware reside. You protect it with the latest bul-
letproo security solutions and mae it reliable through
redundant multiprocessing, highly scalable platorms,
and super-ast optical networs.
And yet, it is not ully protected rom orces beyond
your control such as natural disasters; man-made events
lie road closures; security procedures or partner service
interruption at a specic site.
Downtime and loss o data, even i temporary, can have
long-lasting eects or business and can contribute to
the demise o the otherwise well-lubed business:
q Loss o revenue rom your customers inability
to do business with you
q Diminished maret credibility and customer
trust, resulting in churn
q Penalties or violated SLAs with partners,
suppliers, distributors, and ranchisees
q Costs o recovering and repairing the lost data
q Legal costs o meeting internal and external
compliance requirements
How do yo
vestment e
investment
q 43% o
reope
q 93% o
days
q 40%
disas
acces
CIOs and I
which norm
adapt prac
deal with p
actions as w
Gartners Top
These sta
within your
1 McGladrey and2 National Arch3 Gartner, Dece
Expect the UnexpectedOur hope is you never need to activate an IT disaster reco
Our job is to provide automated protection if you do nee
chaptEr 1
You now VMware as the virtualization
company that has been the maret leader or
the past 11 years. In act, according to Gartner,
more than 80% o all virtualized applications
in the world run on VMware today. This eboo
highlights the VMware perspective on disas-
ter recovery in the data center. But lets orget
about IT or a moment.
The ty Bk Sw es is a
metaphor that encapsulates the concept o
surprise events that have a major impact.Itreers to unexpected events o large magnitudeand consequence and their dominant role
in history. Such events, considered extreme
outliers, play vastly larger roles than regular
occurrences.
The Black Swan, a boo by Nassim Nicholas
Taleb, explains that while Blac Swan Events
are unpredictable, a person or organization
can plan or negative events, and by doing
so strengthen their ability to respond, as well
as exploit positive events. Taleb contends that
people in general and specically enterprises
are very vulnerable to hazardous Blac
Swan Events and are exposed to high losses
i unprepared.
There is an obvious parallel between the
Theory o Blac Swan Events and the need or
disaster preparedness or your critical IT assets.
Deploying automated disaster recovery (DR)
is the way to protect IT and business rom
unpredictable events even Blac Swan
Events. The ollowing chapters explain the
basics o DR and the required inrastructure.
They also oer DR hidden realities and best
practices with real-world advice.
What are Back Swans, and what dothey have to do with your data center?
1
DR is the IT industrys way to prepare and f
8/3/2019 eBook a Guide to Modern IT Disaster Recovery
3/7
3
untIl relIaBle vIrtualIzatIon ManaGeMent
SolutIonSbecame available several years ago,
DR solutions ell well short o satisying business
requirements due to the ollowing:
q High Cost
q Complexity
q Lac o Reliability
With traditional manual DR solutions, thehigh cost
came rom the need to deploy a second ailover site
with dedicated inrastructure, sotware licenses, and
human personnel. The complexitywas high because,
to ensure the recovery o entire business services, the
recovery plans had to manipulate many individual com-
ponents and moving parts: applications, hosts, networ,
and storage. Thelack of reliabilityo these procedures
was diminished by low automation and ina bility to test
any recovery procedure.
Many organizations had limited condence in meeting
their Recovery Point Objective (RPO) and Recovery
Time Objective (RTO) in the event o a disaster. IT de-
partments were hesitant to expand disaster protection,
uncertain whether the quality o the insurance was really
worth its cost.
Virtualization is undamental and critical to the success
o DR planning. Virtualization abstracts the complexity
o hardware and sotware and allows standardization o
processes, thus maing planning and automation o the
recovery procedures much more reliable and repeatable.
In act, in a recent IDG survey, 70% o customers
achieved improved BC/DR with virtualization.1
Anintelligent virtual inrastructure based on VMware is
the right oundation or the modern DR solution. Highly
adaptive and scalable, it is optimized or business-
critical worloads with built-in intelligence.
The VMware DR soution provides:
q t sims wy i
iis sy si
qt sims wy s y
migi s
qFy m, ms ib si
y migi
Start with an InTEllIGEnT andVIRTUAl FoundationReiabeg Repeatabeg Recoverabe
chaptEr 2
Virtua Recovery Process 4 Hours
Restore VM Power on VM
Physica Recovery Process 40 Hours
CongureHardware
Install OS InstallBacup
StartSingle-StepAutomaticRecovery
Congure OS
cs-i dr: With the rapid adoption o virtualiza-
tion and the evolution o replication technology, DR is
becoming more cost-ecient. Virtualization enables in-
rastructure consolidation at the ailover site. Less costly
replication options are more broadly available, using
lower-end storage appliances or stand-alone sotware
solutions. With these advances, DR can protect large-
scale mission-critical IT assets, as well as smaller sites
and Tier 2 applications.
am dr: In virtual environments, end users are
shielded rom the complexity o managing each step
in the recovery process. Now, a DR solution can auto-
matically execute and coordinate all the steps required
to ensure the desired level o protection. Traditional run-
boos are no longer good enough to manage recovery
plans and are replaced with sotware-driven recovery
plans. Setti
ment is as s
business se
rib si
tion, organ
that they c
provides th
a non-disruare now rep
ing the ris
predictable
The chart
virtualize
along wit
chaptEr 2 continued
How would you describe your organizations usage o the ocapabilities with itsproduction environment-based virtual m
0 20
39%
35%
Automated restart o virtual machines in the evento a physical server hardware ailure
Bacup and recovery solutions integratedwith the virtualization platorm
Site recovery solutions or virtual machines
Live migration o virtual machines based on CPU, memory,and networ utilization policies
Live migration o virtual machines
Live migration o storage associatedwith virtual machines
Automated deployment o virtualized servers basedon CPU, memory, and networ utilization policies
Automated enorcement o liecycle policies andreclamation o resources rom expired virtual machines
Automated deployment o virtual machines based onpower consumption policies
We currently use this eature/capability
W e h av e n o p la ns to us e t hi s ea tu re /c ap ab il it y D on t n ow
We plan to
1 IDG Research, Benets o Virtualizing Business Critical Applications,March 2011 Source: ESG white paper: Enterprise Strategy Group, 2011: Virtualizatio
http://www.vmware.com/solutions/cloud-solutions.htmlhttp://www.vmware.com/products/site-recovery-manager/overview.htmlhttp://www.vmware.com/products/site-recovery-manager/overview.htmlhttp://www.vmware.com/solutions/cloud-solutions.html8/3/2019 eBook a Guide to Modern IT Disaster Recovery
4/7
MYTH 3: a
y w
REAlITY
out testing
be tested w
validity. SR
recovery p
How its do
Adventist H
zation in th
roughly ou
Services (A
employs m
To ensure A
Zero initia
service and
systems li
record app
Adding SR
IS to stream
DR plannin
ing and tes
chaptEr 3 continuedWith SRM, setting up an automated recovery plan is
eortless and can be done in a matter o minutes, in-
stead o the wees required to set up manual runboos.
How its done at Swbk
Swedban is one o the largest nancial institutions
in Scandinavia and the Baltic, with 362 branches in
Sweden and 222 branches in Estonia, Latvia, and Lithu-ania. The ban serves 9.5 million private customers and
534,000 corporate customers, with 18,000 employees.
Preventing service disruption is critical to Swedban.
Swedban had to meet recovery targets or its legacy
applications through traditional means o bacup and
recovery, which were complex and time-consuming.
Swedban deployed SRM to simpliy and automate the
recovery process, management, and testing o recovery
plans. Since the SRM implementation, Swedban tests
its DR capabilities at least twice a year. It shuts down
one data center completely, moving worloads to the
surviving data center. It runs everything in the bacup
data center or 24 hours, and then ails over again to the
original data center.
Mart Nael, head o Core Inrastructure, Group IT at
Swedban, states, Our recovery time is below 30
minutes or mission-critical worloads and less than our
hours or the whole data center.
Bsiss ss:
q Positive ROI within one year rom hardware-
cost avoidance
q IT operational costs reduced 14% year-to-year
q 1,000 virtual machines managed by two ull-
time-equivalent staers
q 30x aster server provisioning
5
MYTH 1: diss ry is y ; is
si s smig.
REAlITY: VMware vCenter Site Recovery Manager
(SRM) gives you the fexibility to dene ailover scenar-
ios that meet your choice o coverage, speed, a nd cost
o recovery. For example, while a dedicated recovery
site is a robust solution (and yes, more expensive), in
many cases it is sucient to have a n active bidirec-
tional approach where two or more data centers are
complementary with enough capacity to pic up critical
applications. Thereore, no resources are wasted and
business continuity is maintained.
Overall, SRM customers consistently report signicantsavings o money, resources, and time.
How its done at cg limi
Challenger Limited issues annuities and provides
investment products and services. The organization
runs two co-located data centers, supporting around
500 sta in Australia.
In order to meet business requirements or ast recovery
and minimal data loss, Challenger Limited implemented
a dual-cluster VMware inrastructure that was lined
to networed storage devices in its two co-located
data centers, at about one-third the cost o a physi-
cal disaster recovery environment. SRM has enabled
the organization to dispense with most o the 50 tapes
previously used to bac up data, saving one person-day
per wee. In addition, Challenger Limited automated
hundreds o steps in its disaster recovery processes.
Bsiss ss:
q Improved RPO rom 24 hours to 90 minutes
and recovery time rom 24 hours to less than
our hours
q Reduced the number o people needed to restore
systems to one person
q Cut capital investment or disaster recovery to a
third o the cost o a physical environment
q Eliminated the need to acquire 15 standby
physical servers at a cost o $200,000
MYTH 2: aiig y mgig dr
si is m sk qiig si skis
si ss.
REAlITY: Not with VMware. Physical DR can be
complex because o duplicate and siloed inrastructures
and conguration synchronization issues across sites.
Virtualization encapsulates servers, OS, and applica-
tions, including all conguration data, so the complexity
is greatly reduced. Virtualization and automation ensure
that recovery plans are simple, complete, and can be
reliably executed by sta with no special sills required.
Disaster Recovery Myths and ReaitiesDisaster Recovery is ike an insurance poicy that you can
test without having an accident.
chaptEr 3 VMwMnnd e
s es
http://www.vmware.com/files/pdf/customers/11Q1_Swedbank_Case_Study.pdfhttp://www.vmware.com/files/pdf/customers/apac_au_10Q4_ss_vmw_challenger_english.pdfhttp://www.vmware.com/files/pdf/customers/apac_au_10Q4_ss_vmw_challenger_english.pdfhttp://www.vmware.com/files/pdf/customers/11Q1_Swedbank_Case_Study.pdf8/3/2019 eBook a Guide to Modern IT Disaster Recovery
5/7
1.vii.Virtual environments are much more agileand easier to migrate. Virtualization hides the complex-
ity by shielding the individual components and moving
parts, thus simpliying the planning and increasing the
visibility into the DR process. It also allows you to use
hypervisor-based replication that is ar more fexible and
cost-ecient than storage-based replication.
2.am.Dont let human error stand in your way.Use automated recovery plans, not a stac o notes in
a binder. With the proper automation, a recovery plan
can be done in a matter o minutes instead o wees.
Automation shields users rom having to manage many
o the recovery steps, and automatically coordinates
activities such as preconguring networs and virtual
machines, conguring the recovery inrastructure, and
restarting applications.
3.viy s.Test your DR plans oten. Use non-disruptive testing o your recovery and ailbac plans.
Study a detailed report o the test outcomes, including
the RTO achieved. With this inormation, you can gain
the condence that your disaster protection plan meets
the business objectives. It will also provide the neces-
sary training to the sta and show any possible issues
early so they can be addressed.
4.S recovery ca
For examp
Exchange,
and started
To set your
tions and s
5.a Act early to
actual disas
condence
has been te
possible ts
6.B caused by
gone wron
data maint
migrating y
ris and gre
degradatio
Disaster Recovery Best Practices As shared by VMwares 5,000+ SRM customers
chaptEr 4
7
In addition to our 10 developmental centers, were also
responsible or maing sure providers throughout the
state get the support they need to receive unding rom
the ederal government, says Brian Brothers, networ
administrator manager. I our services were to go down
and we couldnt ensure reimbursement or Medicaid
unds, it would have a severe impact on the providers
and the developmentally disabled they serve. Some
providers might shut their doors.
At DODD, SRM is responsible or a reliable, veriable DR
enablement that can be tested and audited. The agency
has tested its disaster recovery solution twice. The
second test involved 50 production servers, which were
successully ailed over to the remote site in about 90
minutes. I we do have a true disaster someday, our
DR site becomes our production site. We expect to
be up and running in less than two hours, notes kipp
Berte, IT manager or inrastructure and operations,
Ohio Department o Developmental Disabilities.
DODDs disaster recovery site doesnt just sit idle.
Instead, the bacup site actively supports the
application development team on a daily basis.
Bsiss ss:
q A reliable disaster recovery site that can
be up and running in less than two hours
q Fully tested, active disaster recovery
solution implemented or an agile, private
cloud inrastructure
q Online systems providing aster, more
reliable service
chaptEr 3 continueda button. The act that we can run tests as oten as we
want gives us a high degree o condence in the recov-
erability o our systems, says kenneth Newball, senior
disaster recovery administrator at AHS-IS.
Bsiss ss:
q Reduced RTO by 75%, rom 48 hours to less
than one hour
q Eliminated the cost o fying a team o sevenpeople to test remote DR
q Cut hardware purchases by 84.5%, maintenance
by 93.1%, and power consumption by 90%
MYTH 4: dr s is sk s, ik i
s ms iky s.
REAlITY: Even i the big disaster never happens,
the recovery plan can be used as a migration plan
with similar steps, helping you during planned down-
times such as site migrations. In addition, DR planning
helps to ulll compliance where disaster recovery plans
are required. The outcome o recovery testing proves
disaster preparedness and the ability to meet RTOs.
How its done at oi dm
dm disbiiis
The Ohio Department o Developmental Disabilities
(DODD) runs a statewide system o support services or
some 80,000 people with developmental disabilities. A
disaster causing a systemwide ailure would have very
real human impact.
http://www.vmware.com/files/pdf/customers/11Q1_DODD_Case_Study.pdfhttp://www.vmware.com/files/pdf/customers/11Q1_DODD_Case_Study.pdfhttp://www.vmware.com/files/pdf/customers/11Q1_DODD_Case_Study.pdfhttp://www.vmware.com/files/pdf/customers/11Q1_DODD_Case_Study.pdf8/3/2019 eBook a Guide to Modern IT Disaster Recovery
6/7
While your data center is critical to your ability to conductbusiness, events beyond your control (or even planned
ones) can mae IT services unavailable or highly limited.This situation, however rare, could be very damaging to theintegrity o your business, your maret credibility, and your
customers satisaction and loyalty.
You can mitigate this ris by implementing a DR solution to
protect your critical IT assets. A well-designed DR solutionbuilt on an intelligent virtual inrastructure can provide the
required RTO and RPO while eeping the costs in chec. YourDR plans can be tested in a nondisruptive way and benetyour IT department in areas beyond the typical DR needs.
Your IT inrastructure plays the most critical role or the
easibility and the ultimate success o your DR plans.Virtualized inrastructure proved to be the most reliable
and cost-eective platorm or DR by allowing you toabstract the moving parts and components o your datacenter, simpliying the replication architecture and requir-ing ewer resources overall.
So how do you start the journey to protect your IT assets?
Use this quic-start list as your guide:
1.Iiy y ms ii iis .What applications directly generate revenue, maintainsaety, or are otherwise critical to business continuity?
What data is absolutely critical or your customers, yourinternal accounting and nances, or compliance?
2. I y y, si iiig y ky
iis. This will not only cut much o the operationaland maintenance costs by removing unnecessary complex-
ity and operational cost, but it will also mae your environ-ment better suited or eective DR planning.
3.ag you lose? Fo
online with are realistic
4.df iiis
on the data
cally trigge
5. Iiy
is y
will be a comrecovery, an
6. S
ing specic
choices tha
the level o solution or t
Mae sure yactual disas
And nally,
a Blac Swation to reco
VMware is h
F m i
ry M
isivMw
For details aon deliverincontinuity, a
come you to
A Quick-Start Guide to Disaster RIt can be done. It must be done. VMware can help you do
coNcluSIoN
9
9.p ibk.Create and test a ailbacrecovery plan, set up replication in reverse, and now
when to trigger it. Agree on what to consider the
end o the disaster so your business can go bac
to normal.
10.d js w my dr.Utilize cheaper,commodity ailover site assets, or use the repurposed
hardware let over ater your primary data center has
gone virtual. Consider bidirectional or shared ailover
sites, use more sotware in the cloud (SaaS), and also
loo at non-IT DR means (UPS or bacup generators,
uel reserves, better re protection, etc.).
chaptEr 4 continued7.assig ssibiiis.Assign everybody involvedin the DR plan a specic tas. Dont expect the relevant
personnel to always be at the disaster site or to be in
control immediately. Implement necessary duplication
and redundancy or people, just lie you would do with
computers.
8.K y y s sssib.It is a good practice to prepopulate yourailover site with the data that doesnt change oten, or
by much. This will allow you to ocus only on the a st-
changing critical data at the time o ailover, and ulti-
mately meet your RTO with less eort.
.
http://www.vmware.com/products/site-recovery-manager/overview.htmlhttp://www.vmware.com/products/site-recovery-manager/overview.html8/3/2019 eBook a Guide to Modern IT Disaster Recovery
7/7
11
dISaSter recoverY IS a KeY part o a companys
business continuity initiative to ensure the availability o
integral IT-dependent business processes and prevent
any long-term negative eects o both planned and u
nplanned disruptions. The goal o DR is to restore critical
IT services as quicly as possible and minimize business
disruption.
Nothing impacts your ability to recover more than the
agility o your IT and applications inrastructure. Just
lie re saety must be built into the b uilding beore the
re occurs, and a car s saety eatures are engineered to
reduce crash impact, the design o your IT inrastructure
can mae or brea the success o your DR program.
It and applIcatIonS InFraStructure
Your data centers inrastructure plays an instrumentalrole in the eectiveness o your DR solution. The inra-
structure can mae DR very complex, hard to imple-
ment, and sometimes even impossible; or it ca n help
to mae your IT reliable, veriable, and eective. The
next section explains how.
Two key processes for simple and reliable
disaster recovery:
FaIlover
Failover is the capability to switch over to a redundant
or standby server, system, or networ upon the ailure
or termination o an existing asset. Failover should ha p-
pen without any ind o human intervention or warning.
FaIlBacK
Failbac is the process o restoring a system or another
asset that is in a ailover state bac to its original state.
Eective ailbac returns the system to the state o
operation beore the disruption.
diss ry 101 t Bsis
pimy Si ry Si
appENDIx
rto
While RTO is purely a technical metric, the decision to
trigger the ailover is a business one, and RTO can oten
tae much longer than the ac tual DR itsel. Whether
initiated by humans or by an automatic trigger, the
lead time to start DR should be also accounted or and
included in RTO. Replication is a ey element o any DR
process in most cases, usually provided by the specic
DR solution used.
replIcatIon
In the context o preparing or a ailover, replication
provides intentionally architected redundancy o your IT
resources: hardware, data, sotware, networs, or all o
them together. There are several actors in determining
the depth and amount o replication needed: type o
services to be protected, criticality o dierent compo-
nents, technology, and cost.
dISaSter recoverY ScenarIoS
Various DR scenarios and techniques are available to
meet your specic requirements and cost objectives. The
right architecture can mae your DR procedures more
ecient, cost-eective, and predictable. Here are a ew
commonly used congurations rom which to choose:
qai-pssi: This is a more traditional DR sce-
nario, where a production site running applications is
recovered at a second site that is idle until ailover is
required. In this scenario you are paying or a DR site
that is idle most o the time.
Key metrics for planning and measuring success of
the procedures.
rpo
Recovery Point Objective (RPO) is the point in time to
which you must recover data as dened by your organi-
zation, generally called an acceptable loss in a disaster
situation. It allows an organization to dene a window
o time beore a disaster when data may be lost and is
tightly dependent on the type o data replication used.
The higher granularity o data replication, the shorter
the RPO.
qai-a
worloa
Congu
the virtu
so that y
the wor
qBii
tion so t
at both s
spare ca
virtual e
ql F
ail over
when a s
orces y
qS
deploym
single re
multiple
All prote
this sing
recovery
that nee
This top
recovery
appENDIx continued
VMware vSphere
VMwarevCenter Server
Site RecoveryManager
VMwarevCenter Server
Site RecoveryManager
VMware vSphere
Servers Servers