Classes of Recovery

Embed Size (px)

Citation preview

  • 8/9/2019 Classes of Recovery

    1/10

    Developing a truly effective business continu-

    ity planone that will result in successful

    recovery should the need actually arise

    requires more effort than just implementing

    a backup solution. This white paper, part of a

    series on the IT infrastructure of business

    continuity, will outline a methodology for set-

    ting recovery priorities based on business

    requirements and matching recovery technol

    ogy to recovery objectives that will result in

    cost effective business continuity systems

    and timely recoveries.

    Classes of Recovery

    W H I T E P A P E R

  • 8/9/2019 Classes of Recovery

    2/10

    Introduction

    Backup and recovery solutions are often thought of as expensive ways for IT personnelto have peace of mind in the event of an interruption to important business functions.This is true in many cases because the solutions are based on budgetary constraintsinstead of business needs. Enterprises would benefit instead by developing a methodolo-gy to determine the characteristics of a recovery, and by placing each business functioninto an appropriate class of recovery. By classifying each business function and assigning a

    corresponding priority, performing an actual recovery will be more successful and willresult in smaller loss to the company in the event of an outage.

    Recovery methodologies

    We need to ask ourselves, How successful have most disaster recovery efforts been?The answer is not particularly easy to obtain, since there have been very few major disas-ters, and most IT managers have never actually experienced one.When a business doesattempt a recovery, they often discover that it cant be done in the required timeframe.This occurs because the backup capabilities cant support the recovery objectives neededby the business.

    Many companies prepare for disaster recoveries by performing disaster recovery tests,

    which for the most part are minimally successful. One of the major issues that CNT seesrepeatedly during these tests is a lack of prioritization of recovery activities. Instead,companies attempt to recover all of the infrastructure and data as quickly as possible.

    What is needed is a recovery strategy methodology that meets the needs of the businessand allows the business functions to be recovered in a pre-determined order. In this way,a company will be better equipped to develop and maintain the right solution at theright cost. Using such a methodology instead of the all-or-nothing approach will benefitthe company by providing the proper level of protection for each of the business unitsinstead of being excessive or insufficient.

    Prioritizing the recovery

    All too often, backup/recovery solutions are devised in a vacuum without much regardfor the actual needs of the business. Since the primary purpose of recovery is to reinstatecritical business function(s) in the event of an outage, it is imperative that the recoveryscenario is designed with business needs in mind.This must be done well before the dis-aster occurs.Trying to recover is difficult enough-trying to prioritize during the recoveryis an impossible task.

    The issue that slows down most disaster recoveries is attempting to restore everything atonce.The sensible thing is to first restore only what is absolutely necessary, less criticalfunctions second, and non-vital business functions last. Recovering 50 percent of theinfrastructure is certainly easier and quicker than recovering the entire enterprise. Byreducing the number of tapes, servers, and disk requirements, a seemingly impossibletask can turn into an achievable one.

    CNT believes that each enterprise should try to categorize their business functions intoseveral classes of recovery.We recommend using the following general approach:

    Class 0: no reason to recover during a disaster recovery

    Class 1: non-vital business functions

    Class 2: business functions that are vital to the company, but are not the most important

    Class 3: critical, must have business functions

    Class 4: continuously available, absolutely cannot go offline for any reason

    2 W H I T E P A P E R

    Table of Contents

    Introduction 2

    Recovery methodologies 2

    Prioritizing the recovery 2

    Business requirements 3

    Recovery objectives 4

    Techniques 5

    Technology 6

    Cost 7

    Assessing the classes

    of recovery 8

    Conclusion 8

  • 8/9/2019 Classes of Recovery

    3/10

    Breaking down each component into recovery classes makes it easier to determine whatthe companys needs are, what the recovery capabilities currently are, and what needs tobe done to achieve the desired class of recovery should the current capabilities prove tobe inadequate.These requirements can vary from company to company as well as fromindustry to industry.

    Class 0 and Class 4 require the least amount of detail pertaining to continuance of business.

    Class 4 simply must continue uninterrupted at all cost.This class is normally reservedfor the most critical of process such as financial market transactions, air traffic control-ling, critical health care systems and infrastructure such as power and communication.Class 0 environments are not recovered at all in the event of an outage.While these sys-tems may support the overall IT infrastructure of a company, they contain no criticaldata and would be replaced rather than recovered if lost in a disaster.We feel it is impor-tant to acknowledge these different recovery classes as many systems are designated withthese recovery objectives.The remainder of this paper will focus on Class 1, Class 2 andClass 3.This is where business continuity is achieved for most business functions.

    Business requirements

    Figure 1 shows the high-level overview of the process needed to assess the recovery

    requirements for each business function.

    The initial step in this process involves assessing each of the business requirements andclassifying them by their relative importance to the company (see Figure 2, next page).This is traditionally done through a business impact analysis (BIA), which rates the sever-ity of the impact on the company should the business function become unavailable.Theimpact to the company is perceived in terms of operating costs, infrastructure costs, reg-ulatory fines or sanctions, financial losses, or damage to the companys reputation (lossof market share, decreased customer satisfaction, etc.).While the impact on reputationis intangible and certainly cannot be quantified, it is a real threat and can eventuallyresult in some financial reverses.

    Business functions that are most important and critical will fall into Class 3. Functions

    that are vital but can be recovered after critical functions are restored will be designatedas Class 2. Non-vital functions that can be performed via alternate methods for anextended period of time will be designated Class 1.

    Classes of Recovery

    TECHNOLOGY

    DEFINE

    REASSESS DICTATE

    DRIVES DETERMINE

    BUSINESS

    REQUIREMENTS

    COST TECHNIQUES

    RECOVERY

    OBJECTS

    Figure 1: assessing recovery require-

    ments for each business function

  • 8/9/2019 Classes of Recovery

    4/10

    Recovery objectives

    After each business requirement is classified as to the relative impact to the business inthe event of an outage, the recovery objectives for each class must be quantified (seeFigure 3).The class of recovery mandated by the business requirements defines therecovery objectives for each business function. For example, business functions that haveClass 3 business requirements can only have Class 3 recovery objectives.

    These objectives also need to be quantified in terms of recovery time objectives (RTO)and recovery point objectives (RPO).The RTO is the time it takes to restore the busi-ness function to a functioning level.The RPO is the specific point-in-time the data needsto be restored to in order to affect a successful recovery. Although the RTO and RPO ineach class of recovery will vary from company to company, some general guidelines aresuggested in figure 4.

    To keep things in perspective and make sure the actions are derived within a realistictimeframe, remember that recovery is comprised of several factors:

    Time to restore hardware infrastructure:

    Server(s)

    Disk

    Network components

    Storage area network (SAN) components

    Tape drives/libraries

    Time to restore operating system(s)

    Time to establish connectivity

    Time to restore application software and data

    4 W H I T E P A P E R

    Class 3 (critical)

    Disaster has immediate

    effect on company

    Cannot do business

    without process

    Class 2 (vital)

    Disaster will adversely

    affect company after

    some period of time

    Can do business with-

    out process but not for

    extended time period

    Class 1 (non-vital)

    Disaster has no short or

    long term effect on

    company

    Can do business with-

    out process for an

    extended period of time

    Class 3 (critical)

    Critical process

    Requires shortest recov-

    ery and time-to-data

    timeframes

    Class 2 (vital)

    Critical process

    Must be restored as

    soon as possible after

    the critical processes

    Next shortest recovery

    and time-to-data time-

    frames

    Class 1 (non-vital)

    Non-vital process

    Can use alternate

    processes to perform

    function

    Longest recovery and

    time-to-data timeframes

    Figure 2: classifying business

    requirements

    One could also include a Class 4 for busi-

    ness functions which require solutions

    that are fully fault-tolerant at the hard-

    ware level with transaction replication.

    However, these scenarios are rarely used

    and are really only practical in mainframe

    environments.

    Figure 3: quantifying recoveryobjectives

  • 8/9/2019 Classes of Recovery

    5/10

    Each of these factors must be considered when planning for recovery.There will beoverlap and timing issues that vary from company to company, but breaking down therecovery in a logical manner will result in realistic recovery objectives.

    Techniques

    The techniques used for each class of recovery are dictated by the recovery objective

    time frames. Figure 5 details these techniques according to their recovery class.Thesetechniques must take into account the nature of the outage and the extent of the recov-ery effort. For instance, if a company does not wish to recover from a total site disaster(e.g., fire, flood, earthquake) then the techniques involving any sort of replication ofprocessors or data at a remote site are not pertinent.

    Class 3 recovery techniques are the most robust, and provide the shortest recovery timeand the fastest time to data.These include hot backup systems, remote mirrored diskarrays, electronic tape vaulting, SANs, local and remote redundant networks, and use ofsnapshot disk volumes.

    Class 2 recovery techniques are less robust than Class 3 techniques, but still rely onhigh-end technology.

    Business functions that only require Class 1 recovery methods employ the low-end tech-niques such as no redundancy, no failover capability, manual backup processes, and staticdatabase backups. Backups are usually performed on a server-by-server basis and theresponsibility for them lies with a single person. For the most part, if recovery isrequired for a Class 1 function, it will be handled at that time with little or no planning.

    Just as important as recovery techniques are backup techniques. For instance, it isimperative to use enterprise class backup software for the business functions requiringClass 3 recovery. Disk mirrors are also very useful, since they protect against disk mediafailure. Should a virus, sabotage, or software bug corrupt the data in the file or database,however, the ability to recover will rest upon a volume clone or tape backup.

    It is important to remember that no matter what sort of state-of-the-art technology isemployed, the ability to recover successfully is only as good as the last backup. It is forthis reason that the backup process must be deemed a critical business function.All too

    Classes of Recovery

    Class 3 (critical) Class 2 (vital) Class 1 (non-vital)

    RTO 5 30 minutes 30 minutes 8 hours > 8 hours

    RPO < 1 minute before event 1 minute 24 hours 24 hours (last backup)

    Class 3 (critical) Class 2 (vital) Class 1 (non-vital)

    RTO < 8 hours 8 hours 72 hours 72 hours 1 week

    RPO < 15 minutes Last backup Last full backup

    before event (< 24 hours) (< 1 week)

    Figure 4: typical recovery time

    and recovery point objectives

    This table points out that RTO and RPO

    should be stated differently depending

    on whether recovery can take place at

    the primary site as opposed to the

    secondary site.

    secondary site

    primary site

  • 8/9/2019 Classes of Recovery

    6/10

    often the backup process is insufficient to handle the volume of data required for recovery;if this process does not complete in the allotted time it is cancelled without hesitation.

    Given the astronomical growth rate of data in most companies, the ability to manage thisdata becomes a necessity and warrants strong storage management policies and proce-dures, as well as the personnel to enforce them. Good storage management helps reducethe amount of data being backed up by ensuring that only the data required to recover a

    business function is backed up.This is a much more effective paradigm than the all-or-nothing approach.

    Note that network recovery needs to be included as part of the overall recovery sce-nario. Both local and remote redundant networks require a level of planning and testingwell above that required by the lower classes of recovery.This will ensure that neitherperformance nor connectivity is compromised.

    Technology

    The technology for each class of recovery is determined by the technique required tosupport the desired recovery objective. Figure 6 details the technology according torecovery class.

    A Note on Business Continuity

    While often thought of as being one and the same, there is a critical distinction betweendisaster recovery and business continuity. A disaster recovery plan should be just onecomponent of a broader business continuity strategy to keep business operations continu-ing as usual no matter what kind of disruption occurs-planned or unplanned. For furtherreading, please see CNTs paper titled The IT Infrastructure of Business Continuity.

    6 W H I T E P A P E R

    Figure 5: classifying techniques used

    for each class of recovery

    Class 3 (critical)

    Automatic failovercapabilities

    Snapshot volumes

    of databases

    No single points

    of failure

    Class 2 (vital)

    Manual failovercapabilities

    Enterprise backup and

    recovery solutions

    Split mirror database

    backups

    Online database

    backups

    Class 1 (non-vital)

    Ad hoc backup andrecovery solutions

    Static database backups

    No failover capabilities

    primary site

    Class 3 (critical)

    Real-time or near real-

    time data replication

    Hot failover servers

    (OS, App online and

    dedicated to production

    support)

    No single point of

    failure

    Class 2 (vital)

    Remote tape vaulting

    Warm servers available

    for Manual reconfig (OS

    and app loaded, used

    for non-production

    work)

    Class 1 (non-vital)

    Cold serversrebuild at

    someone elses site

    Offsite tape storage

    secondary site

  • 8/9/2019 Classes of Recovery

    7/10

    Cost

    The cost associated with each class of recovery is driven by the technology needed toemploy the appropriate recovery technique for that class (see Figure 7). In general, themore critical the business function, the more expensive the recovery solution.

    When the cost of recovery is deemed to be too high, the business requirements shouldbe reassessed to see whether or not any business functions could be reclassified to a

    lower recovery class. All too often, this process fails because instead of reassessing thebusiness requirements, the technology is compromised without regard for the businessfunctions.This jeopardizes the integrity of the recovery class for the business functions inthat class.Those in charge of the business functions must decide what level of risk isacceptable to the company, and the assumption of risk should never be made solely onthe basis of budget constraints.

    Finding the optimum cost of protection is rarely an easy task. How much is enough? Mostcompanies merely look at the cost associated with the technology needed to protect thebusiness functions. A simple benefit analysis would help justify the cost of protection.

    Classes of Recovery

    Class 4

    Fault tolerant

    hardware

    Class 3

    Server

    clustering

    WAN server

    clustering

    Enterprise stor-

    age solutions

    Redundant

    networks

    Networked

    storage

    Class 2

    Departmental

    Storage

    Solutions

    Duplicated hard-

    ware configs

    Automated tape

    library

    Dedicated back-

    up networks

    Class 1

    JBOD

    Locally attached

    tape drives

    Manual tape

    movers

    Class 4

    Fault tolerant

    hardware

    Transaction

    replication

    software

    Class 3

    Duplicate hot

    sites

    WAN server

    clustering

    Storage con-

    troller-based

    disk replication

    Storage virtual-

    ization replica-

    tion

    Software based

    replication

    Application

    based replication

    Class 2

    Co-location sites

    Departmental

    storage solu-

    tions

    Duplicated hard-

    ware configs

    Remote

    accessed auto-

    mated tape

    library

    Dedicated back-

    up networks

    Class 1

    Rented recovery

    sites

    Tapes stored at

    rented off-site

    location

    remote

    local

    Figure 6: determining technology usedfor each class of recovery

  • 8/9/2019 Classes of Recovery

    8/10

    To do this, first determine the expected loss if there is an interruption to a businessfunction and no protection in place (Ln).Then calculate the expected loss due to aninterruption of a business function with protection in place (Lp).The purported benefitof protection for a business function is represented as follows: Benefit = (Ln) - (Lp).

    This is only one method of trying to justify the cost associated with protection.Quantifying the expected loss in the event of a business function interruption should

    always be done and used as one factor in determining what the cost of protectionshould be for that business function.

    Assessing the classes of recovery

    Once the business requirements have been run top-down through the process and theappropriate technologies have been selected for each class of recovery, it is necessary toassess the current technology of each business function and assign each one a specificclass of recovery. Doing this provides a view of the present class of recovery capabilityversus the class of recovery needed by the business. If the current class of recovery isgreater than that required by the business function, the level of protection is excessive.On the other hand, if it is lower than the required class of recovery, the level of protec-tion is deficient and needs to be improved. Comparing the differences in technology

    between the classes of recovery for each business function makes it easy to determinewhat needs to change in order to achieve the required class of recovery. Only at thispoint can a realistic upgrade plan be developed.

    Conclusion

    Following the classes of recovery methodology outlined in this paper is beneficial to acompany for several reasons. First, it ensures that recovery methods selected for eachbusiness function are driven by the needs of the business. Second, it allows a company toeasily assess their current recovery capabilities and develop a viable strategy to correctdeficiencies in their existing recovery scheme. Basing the classes of recovery on theamount of risk that can be assumed by each business function ensures that the expenserequired to preserve the business correlates directly to its importance to the company.

    Further reading

    Another paper in this series, Evaluating Your Exposure, details the process of cost justify-ing business continuity. It begins by discussing factors to consider in establishing the costof disrupted service for a business function. It also describes a study known as a businessimpact analysis.This paper is co-authored by CNTs strategic partner in business continu-ity, Strohl Systems, experts in business impact analysis and business continuity planning.

    8 W H I T E P A P E R

    Class 3 (critical)

    Most expensive to

    implement

    Least loss incurred from

    disaster

    Class 2 (vital)

    Moderate expense to

    implement

    Moderate loss incurred

    from disaster

    Class 1 (non-vital)

    Least expensive to

    implement

    Highest loss incurred

    from disaster

    Figure 7: costs associated with eachclass of recovery

  • 8/9/2019 Classes of Recovery

    9/10

    After establishing the desired recovery class of your application environments, andafter justifying the costs by understanding your exposure, you are ready to talk tech-nology. Additional white papers in this series focus on current technologies necessaryto achieve specific recovery objectives. One, Primary Site Recovery Techniques,focuses on business continuation technology within the primary data center. A secondpaper, Secondary Site Recovery Techniques, focuses on continuation technology at analternate location.The technology solutions will be compared against others available

    within a given recovery class.These comparisons will give consideration to costs toimplement and complexity to manage.

    Classes of Recovery

    CNT has nearly two decades worth of

    experience assessing, designing, and

    deploying IT solutions to support busi-

    ness continuity objectives. Our profes-

    sional consulting organization can help

    you effectively evaluate and plan your

    optimal solution. From business continu-

    ity architecture assessments, design, andintegration, to remote network manage-

    ment and support, we help you stream-

    line the decision making process, acceler-

    ate technology deployment, and meet

    your IT recovery objectives.

  • 8/9/2019 Classes of Recovery

    10/10

    CNT is one of the worlds largest providers of comprehensivestorage networking solutions. For over 20 years, our experts haveanalyzed, designed, and built enterprise storage networks.

    Visit www.cnt.com to learn about our solutions, products, partner-ships, career opportunities, and more.

    2003 by Computer Network Technology Corporation (Nasdaq:CMNT). All rights reserved. Any reproduction of these materialswithout the prior written consent of CNT is strictly prohibited. CNT,the CNT logo, Channelink, and UltraNet are registered trademarks ofComputer Network Technology Corporation. All other trademarksidentified herein are the property of their respective owners. CNT isan equal opportunity employer. CNT corporate headquarters QMS isregistered to ISO 9001: 2000. Certificate #006765.

    U S A : 1 - 8 0 0 - 6 3 8 - 8 3 2 4 C a n a d a : 9 0 5 - 5 9 5 - 1 5 0 0U K : 4 4 - 1 7 5 3 - 7 9 2 4 0 0 F r a n c e : 3 3 - 1 - 4 1 3 0 - 1 2 1 2Austral ia: 61-2-9540-5486 Germany: 49-89-42 74 11-0Switzer land: 41-1-73 35-733 Belg ium: 32-2-737 76 42I t a l y : 3 9 - 0 6 - 5 1 4 9 3 1 B r a z i l : 5 5 - 1 1 - 5 5 0 9 - 1 5 0 4Japan: 813-5403-4858 Other locations: 1-763-268-600

    PL563 | 0803