NotesOn Risk Management Datacenter Assessment - Part II

Embed Size (px)

Citation preview

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    1/13

     

    © DP Harshman All Rights Reserved  Page 1 of 13 www.fromtheranks.com 

    NotesOn: Risk Management – Datacenter Assessment – Part II

    Introduction (V1.2):

    As promised in my Datacenter Assessment – Part I post, here is the first part of the Datacenter Assessment list

    containing details on how to properly assess the risk levels of a, or several, datacenters. Plan on spending

    some time working through each section. The second part will be along soon. If you have any questions be

    sure to ask via a comment, or use my “Contact Us” page (for some reason the most popular method).

    Introduction (V1.2): 1

    Refs: 1

    Background: 1

    The Risk Scale: 2

    The Datacenter Assessment: 4

    Power: 5

    Cooling: 6

    Fire Detection/Suppression: 8

    Security/Monitoring: 9

    Location/Distance: 11

    Additional Notes: 13

    Summary: 13

    Refs:

    NotesOn: Risk Management – Datacenter Assessment – Part III 

    NotesOn: Risk Management – Datacenter Assessment – Part I 

    NotesOn: Risk Management – Disaster Recovery & Business Continuity Essentials 

    NotesOn: Risk Management – Disaster Recovery & Business Continuity Definitions 

    Background:

    In the DR & BC Essentials post (linked above), under the “Disaster Proofing Steps – At A High Level” section,

    “Step One” was to “do, or have done, a DR Assessment to ferret out, dig out, all of the DR risks in IT.”

    That is a tall order, a lot to ask, so I broke it down into three areas for the initial  assessment (as opposed to

    deep dive analyses). The general areas are:

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://www.fromtheranks.com/2402/risk-management/risk-management%E2%80%93datacenter-assessment%E2%80%93part-iii-noteson/http://www.fromtheranks.com/2402/risk-management/risk-management%E2%80%93datacenter-assessment%E2%80%93part-iii-noteson/http://www.fromtheranks.com/2086/risk-management/risk-management-datacenter-assessment-part-i-noteson/http://www.fromtheranks.com/2086/risk-management/risk-management-datacenter-assessment-part-i-noteson/http://www.fromtheranks.com/1925/risk-management/risk-management-disaster-recovery-business-continuity-essentials-noteson/http://www.fromtheranks.com/1925/risk-management/risk-management-disaster-recovery-business-continuity-essentials-noteson/http://www.fromtheranks.com/1987/risk-management/risk-management-disaster-recovery-business-continuity-definitions-noteson/http://www.fromtheranks.com/1987/risk-management/risk-management-disaster-recovery-business-continuity-definitions-noteson/http://www.fromtheranks.com/1925/risk-management/risk-management-disaster-recovery-business-continuity-essentials-noteson/#disaster-proofing-steps-%E2%80%93-at-a-high-levelhttp://www.fromtheranks.com/1925/risk-management/risk-management-disaster-recovery-business-continuity-essentials-noteson/#disaster-proofing-steps-%E2%80%93-at-a-high-levelhttp://www.fromtheranks.com/1925/risk-management/risk-management-disaster-recovery-business-continuity-essentials-noteson/#disaster-proofing-steps-%E2%80%93-at-a-high-levelhttp://www.fromtheranks.com/1987/risk-management/risk-management-disaster-recovery-business-continuity-definitions-noteson/http://www.fromtheranks.com/1925/risk-management/risk-management-disaster-recovery-business-continuity-essentials-noteson/http://www.fromtheranks.com/2086/risk-management/risk-management-datacenter-assessment-part-i-noteson/http://www.fromtheranks.com/2402/risk-management/risk-management%E2%80%93datacenter-assessment%E2%80%93part-iii-noteson/http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    2/13

     

    © DP Harshman All Rights Reserved  Page 2 of 13 www.fromtheranks.com 

    1.  The datacenter(s)

    a.  the assessment first looks at the “brick and mortar” facilities, and essential services, provided

    by the vendor (unless your company has built its own). This is the foundation upon which

    everything else in IT rests.

    b.  The critical services provided by the datacenter – this part of the assessment looks at, at a highlevel, the services provided by the vendor and/or your own company’s datacenter team, often

    referred to as infrastructure. Later on, a focused analysis will be done but the first pass helps to

    point out trouble areas and single points of failure (SPOFs).

    2.  Next, the initial assessment looks at the most (or believed to be most) critical systems supported by

    the datacenter and the infrastructure, this zeroes in on the “heart burn” “high pain” systems that keep

    the “business awake at nights” (to be of value, of course, the DR systems must then be built out).

    3.  Finally, during the assessment process, there are, almost invariably, “additional findings” that need to

    be raised and flagged. There is no set list for these, though one that often arises – “No DR Czar” hassince been added to my DC Assessment master list – you’ll be seeing sections of this shortly.

    It is important to segregate these assessment areas out into discrete inquiries as otherwise the entire subject

    of DR becomes muddied and executive questions such as “What exactly are we trying to solve?” start to arise,

    some refer to that approach as “boiling the ocean”. And it is not the intent of the initial assessment to “boil

    any ocean” but stay very tightly focused on the essentials, the fundamental components, that allow IT to

    provide its services to its users to help them do “it” better whatever “it” is.

    One last note then we’ll dive into the details of how to do Step 1a above; the primary focus of this post. I refer

    to this part of my job as the “initial assessment” because detailed analyses will need to be done later. This

    first “go around” has the sole purpose of letting IT and the Business know where they stand, DR-wise. It paints

    a broad, but accurate, picture that helps every individual involved to determine the “next steps”.

    Without this process you could very well end up remediating risks of lesser importance, leaving the gaping

    holes untouched.

    The Risk Scale:

    Before beginning it is best if you understand the Risk Scale that I use when doing this assessment. The

    process is different from the one that uses the “Risk Categories” as described in NotesOn: Risk Management –Disaster Recovery & Business Continuity Definitions under the “Business Impact Analysis (BIA)” section.

    I use Risk Categories during my Disaster Impact Analyses (DIA’s) as it is focused on the estimation of the

    impact “on the business” of the loss of the system being analyzed and one needs to be very specific as to the

    type of impact. Though I’m not going to address the Risk Category based process here (I will cover this and

    DIA’s in a future post), the resulting Risk Scale used there is identical to the Risk Scale used during DR

    Assessments.

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://www.fromtheranks.com/1987/risk-management/risk-management-disaster-recovery-business-continuity-definitions-noteson/#business-impact-analysis-biahttp://www.fromtheranks.com/1987/risk-management/risk-management-disaster-recovery-business-continuity-definitions-noteson/#business-impact-analysis-biahttp://www.fromtheranks.com/1987/risk-management/risk-management-disaster-recovery-business-continuity-definitions-noteson/#business-impact-analysis-biahttp://www.fromtheranks.com/1987/risk-management/risk-management-disaster-recovery-business-continuity-definitions-noteson/#business-impact-analysis-biahttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    3/13

     

    © DP Harshman All Rights Reserved  Page 3 of 13 www.fromtheranks.com 

    What is most important to understand is that in a DR Assessment one does not focus on the “Probability” of a

    DR Event occurring but on the “Likelihood” of a DR event exploiting the identified risk and the “Impact” thereof 

    Because. By the time one is doing DIA’s on each system the “DR Scenario” has almost always been established

    and thus the probability of a DR event has been accounted for (ex: “the datacenter is lost in an earthquake”,

    “downtown is unavailable for six to eight weeks”, etc.). Thus you can focus wholly on the likelihood of the

    incident/scenario causing a loss and the severity of the impact of the loss.

       I  m  p  a  c   t

    High

    None(Green)

    Medium(Orange)

    High(Red)

    High(Red)

    High(Red)

    Material Risk

    Medium

    None(Green)

    Low(Yellow)

    Medium(Orange)

    High(Red)

    Medium(orange)

    Significant Risk

    Low

    None(Green)

    Low(Yellow)

    Low(Yellow)

    Medium(Orange)

    Low(Yellow)

    Moderate Risk

    None

    None

    (Green)

    None

    (Green)

    None

    (Green)

    None

    (Green)

    None

    (Green)  Acceptable Risk

    None Low Medium High

    Likelihood

    The definitions for the Risk Results are as follows:

    Risk Level Definition

    Material RiskColor code = Red

    Findings with the highest risk level, indicating a material departure from Disaster Recovery

    (DR), Business Continuity (BC) and/or IT Audit best practices and standards, i.e. a decided

    “fragility” in the ability to respond to a disaster event. A material risk reflects a potential

    immediate or future harmful impact to the company. Remediation of material risks should

    have the highest priority.

    Significant

    RiskColor code = Orange 

    Findings that represent a substantial departure from DR, BC and/or IT Audit best practices

    and standards. A Significant Risk while serious is not as critical as a Material Risk as it poses a

    lesser probability of, and degree of, immediate or future harm to the company. Remediation

    is required but at a lower priority than Material Risks.

    Moderate

    RiskColor code = Yellow

    Findings which are meaningful but potentially less harmful and with a lower probability of

    detrimental impact on the company. Remediation should occur at the earliest opportunity as

    a preventive measure but after Material and Significant Risks are mitigated.

    AcceptableColor code = Green

    Observations that generally adhere to DR, BC and/or IT Audit best practices and standards.

    The potential negative impact on the company is considered immaterial or consistent with

    acceptable business risks. Little or no remediation required.

    Note: as risks can be cumulative, i.e. because of surrounding risks a ‘Yellow” may be upgraded to an “Orange’, the

    reduction of a risk may have the reverse-effect of reducing the Risk Level of another or other assessed items. Thus,

    addressing Material Risks first brings about the greatest collateral improvement.

    To put it another way, Risk Levels are assigned based on an analysis of the probability of a disaster event

    exposing a risk and the subsequent potential impact on the business (and its customers) if it is exposed.

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    4/13

     

    © DP Harshman All Rights Reserved  Page 4 of 13 www.fromtheranks.com 

    Keeping the Risk Scale in mind at all times, let’s start tearing apart (and putting back together) the Datacenter

    Assessment.

    The Datacenter Assessment:

    This portion of the overall DR assessment is broken down into ten sections, or categories. They are:

    1.  Power

    2.  Cooling

    3.  Fire Detection/Suppression

    4.  Security/Monitoring

    5.  Location/Distance

    6.  Internet/Communications Feeds

    7.  Expansion Room

    8.  Technical Support

    9.  Disaster Recovery/Bus. Continuity Plans

    10. Misc. (but no less important)

    We’ll take these one by one, basing each discussion around the section of the Datacenter Survey Template

    that I’ve built and refined over the years (and will continue to refine, so check back from time to time for

    updated versions).

    Note: due to the length of the document and the number of graphics (my post editor seems to have an issue

    with overly long docs with lots of images) I have split the Datacenter Assessment List into two parts.

    This document, Part II in the series, will thus cover list categories 1 through 5. Part III in this series will address

    categories 6 through 10.

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    5/13

     

    © DP Harshman All Rights Reserved  Page 5 of 13 www.fromtheranks.com 

    Power:

    Before anything else can work in a datacenter there must be power, clean, dependable, reliable power. This is

    your number one survey area; not to slight the others, but without this (and cooling) there is almost no point

    in going further.

    •  Are there at least two (2) power supply entrances?

    Rationale: if one goes down you still have a “normal” power supply.

    •  If there are two or more power entrances, are they from two, or more, substations?

    Rationale: if the transformers, etc., at one substation go down, the other power supply entrance should st

    be “hot”. Think of resiliency as being a defense in layers.

    •  If there is only one substation, how far is it from the datacenter?

    Rationale: if the substation is “right on top of” the datacenter, one event could very well take out both,extending the recovery period. Also, it helps to know the name when making inquiries to the power

    company.

    •  Are there diesel generators and fuel supplies in place to power the entire datacenter at full capacity for a

    minimum of 3 days?

    Rationale: if external power supplies fail, the datacenter can continue to operate and at least maintain the

    equipment, as long as the fuel supplies last and then as long as the supplier(s) is able to make deliveries.

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    6/13

     

    © DP Harshman All Rights Reserved  Page 6 of 13 www.fromtheranks.com 

    •  Are the generators, and Automatic (power) Transfer Switches, tested? Regularly?

    Rationale: more than once, gear assumed to be operational proved to be anything but. The only way to

    ensure DR related systems will work is to test them. I know of at least one set of switches that, when

    finally tested, did a complete (expensive) melt-down.

    •  Are there sufficient Uninterruptable Power Supplies (UPS’s) and Line Conditioners to cleanly filter and

    supply the incoming power, no matter what its source?Rationale: though not exclusive to them, particularly in industrial areas (a common location for

    datacenters) power surges and brown-outs are not unusual. This line noise must be filtered before it gets

    to IT’s gear or “mysterious” errors and outright hardware failures will occur.

    •  What is the UPS runtime for the entire facility at current loads? At max loads?

    Rationale: assuming all power sources have been extinguished all of the IT groups in the DC need a

    minimum amount of time to “spin down” their hardware. Failure to do so can result in damage to or

    complete loss of a system(s).

      How often are the UPS’s and their batteries inspected and/or replaced? Are the inspection recordsavailable for review?

    Rationale: batteries have a “life”. If the DC personnel push the batteries towards the end of their life span

    before replacement they will most likely prematurely fail when needed the most. Minimum is battery

    replacement every three years, with testing at least semi-annually.

    Cooling:

    As with Power, if your prospective (or current) datacenter has insufficient cooling capability there is virtually

    no good reason to continue the survey further. Especially these days, with high density racks of equipment

    being “the norm” rather than the exception, cooling is critical. Correction: the right type of cooling is critical.

    We could, and I probably should, spend an entire post on this subject alone but I’ll hit the “high points” here.

    •  Does the DC as a whole, and your “cage” specifically, have redundant cooling systems (N+1 is a minimum)?

    Rationale: to state the obvious, rooms full of computer gear generate heat, a good deal of it. Without

    proper cooling, even with near zero MTBF (Mean Time Between Failure) rated equipment, your computer

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    7/13

     

    © DP Harshman All Rights Reserved  Page 7 of 13 www.fromtheranks.com 

    gear will quickly shutdown or fail outright. The higher the heat density in the rack the greater the risk. The

    less cooling capability the greater the risk. High heat X low cooling = hardware failure.

    •  Have you verified the type of cooling used, by personal inspection and that it is N+1?

    Rationale: looking supersedes believing. One datacenter promised “the world” and then inserted the

    customer’s systems into an already resource exhausted facility. Not only wasn’t ½ of the N+1 capable of

    cooling their equipment, the N+1 wasn’t either.

    •  If you are planning on “high density” racks (i.e. approaching or exceeding 30 kW [kilowatts] of power

    consumed per rack = 300 100W light bulbs) does the DC provide hot aisle / cold aisle cooling

    configurations or equivalent?

    Rationale: to save on DC space and equipment costs IT continues to pack more “computing horse power”

    into less space. This architecture requires tightly controlled “hot spot” free redundant cooling systems.

    •  What type of cooling architecture (perimeter, row, rack, hot/cold aisle, or a combination) is used

    throughout the datacenter or at least in your cage?

    Rationale: if it’s a new cage you can build it out as you wish, if you are considering taking over an “oldone” your choices may be limited. Ensure the choice available or the choice you select is appropriate for

    current and future needs. Repairing or repurchasing hardware (and replacing the data lost) is almost

    always more expensive (on several planes) than the cooling bill.

    •  In “high density” areas are cabling and A/C ducting in separate “chases” so as not to interfere with the

    optimal flow of cold and hot air supplies and returns?

    Rationale: raised floors and isolated rows can work as long as the air channels to and from the racks are

    not clogged with cabling, piping, other ducting, etc.

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    8/13

     

    © DP Harshman All Rights Reserved  Page 8 of 13 www.fromtheranks.com 

    Fire Detection/Suppression:

    It doesn’t happen often, but it does happen. For any of a number of reasons (though typically its hardware or

    an electrical panel shorting out) some or all of a datacenter will catch fire. With a proper, effective, detection

    and suppression system the damage can be minimized. Improper, ineffective systems can do as much, or

    more, damage than the fire.

    •  What type of fire detection and suppression system(s) is in use?

    Rationale: there are a few right ways to handle this and any number of not so right ways.

    •  Is the fire detection system automatic?

    Rationale: if the datacenter relies on manual fire detection and physical response for suppression the

    “human factor” intrudes and can result in extensive loss of equipment or the entire datacenter. Not all

    datacenters have auto detect/suppression systems.

    •  Does the triggering of the fire detection system automatically initiate a power shutoff, before fire

    suppression begins?

    Rationale: water, and some extinguishing agents, when injected onto, or into, electrically live systems will

    most likely destroy those systems.

    •  Is the fire detection/suppression system primarily based on standard water sprinkler systems?

    Rationale: as a final option, water based systems are “okay” if electrical power is cut off first but preferred

    is a staged system where first response is a localized clean-agent extinguisher.

    •  How often is the entire fire detection/suppression system inspected? Are the inspection records available

    for review?

    Rationale: obvious

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    9/13

     

    © DP Harshman All Rights Reserved  Page 9 of 13 www.fromtheranks.com 

    Security/Monitoring:

    Security is an important factor in selecting a datacenter. Hacking is not the only way that information is

    illicitly, illegally, extracted from your systems. There is nothing better than having direct access to, and being

    able to create a direct conduit from, your servers with an “unseen” network cable to an unknown device.

    Then there is the issue of the angry ex-employee, or the case of the curious visitor doing damage.

    Security also covers the monitoring of your hardware. The newer boxes have temperature sensors built inthat signal out of spec conditions. A/C systems can have sensors built in that indicate over/under conditions.

    And so on. If someone is monitoring these sensors (as well as occasional “mark one eyeball” inspections the

    risk of loss is significantly reduced. Some datacenters also provide intrusion detection and prevention

    capability as a first line in negating hack attempts.

    •  Is there a Network Operations Center [NOC] for that datacenter? Is it local, or remote? If remote are

    there operations personnel physically on hand 24x7 at that datacenter?

    Rationale: datacenters contain high voltage, high heat, highly expensive equipment and, most often,

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    10/13

     

    © DP Harshman All Rights Reserved  Page 10 of 13 www.fromtheranks.com 

    highly sensitive data. A team monitoring the datacenter is a must. An on-site incident response team is an

    absolute requirement.

    •  Is the datacenter monitored by the NOC on a 24x7x365 basis?

    Rationale: see above.

    •  Is there at least one remote backup NOC?Rationale: if the local NOC “goes down” a remote location may still be able to monitor the datacenter

    itself.

    •  Is the datacenter anonymous, i.e. do you have to know where it is and have access to it before you know it

    is a datacenter?

    Rationale: the less obvious it is the less likely the datacenter is to be hacked, attacked or breeched in

    some fashion.

    •  Are there multi-zone security cameras encompassing the entire exterior of the facility, including all access

    points?Rationale: cameras on “just the main entrance” are insufficient.

    •  Are there security cameras within the datacenter focused on halls, stairs, etc.?

    Rationale: defense in layers. Someone may be authorized for their “cage” or to provide a specific service

    to one area but decide to “look around”.

    •  Are there redundant access systems (cards, biometrics, visual identification, etc) at each and every

    entrance?

    Rationale: just one method is insufficient; defense in layers is the best strategy.

    •  Are both access acceptance and rejection logs maintained and reviewable?

    Rationale: without access to the acceptance and rejection logs no “audit” of the security system is

    possible.

    •  Are intrusion detection and prevention systems (IDS/IDP) stood up by the Datacenter? Or are you

    required to provide your own IDS/IDP monitoring? Or both?

    Rationale: as the datacenter provides initial access to your network devices they can and sometimes do

    provide at least some layer of hacker and virus scanning security. In any case it needs to be clear who does

    what and when and how.

    •  Are cleaning, maintenance and delivery personnel properly vetted and monitored / escorted at all times?Rationale: obvious.

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    11/13

     

    © DP Harshman All Rights Reserved  Page 11 of 13 www.fromtheranks.com 

    Location/Distance:

    •  Is the datacenter in a high Risk Zone, such as:

      a fire zone (forests, woods, grassy plains, etc)?

     a known geologically unstable region (earthquakes, susceptibility to rock slides, volcanic activity,

    etc.)?

      a known severe weather zone (tornado alleys, hurricane areas, avalanche zones, tsunami region, a

    zone that routinely receives isolating ice/snow storms, etc.)?

      an area susceptible to the side-effects of severe weather (flood plain, old river bottom, below a

    dam, etc.)?

    Rationale: should be obvious. No zone is without risks but intentionally putting your systems in a

    datacenter that has a fair to high probability of being in harm’s way may not be the best idea.

    •  If this is a secondary datacenter, is it located in the same Risk Zone? Or in another high Risk Zone?

    Rationale: if, for example, you have two datacenters in an earthquake fault riddled zone (Los Angeles and

    San Francisco are both on the San Andreas Fault) one event could take out both, despite the distancebetween them. It is better to have the primary and secondary datacenters in different risk zones, it is best

    if neither are in any of any significance, but, if that is not possible, your final decision must include an

    understanding of the risk levels of each zone. A “no risk” zone is hard to obtain but at least pick a

    secondary risk zone that has no direct cause-effect connection to the primary datacenter’s zone.

    •  Is the datacenter located near or in the immediate vicinity of your primary office centers?

    Rationale: Often, in large office facilities there is a “mini-“ datacenter in that building to handle telecomm,

    networking, local file storage, etc. If the office(s) is too close to your primary business center, a singular DR

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    12/13

     

    © DP Harshman All Rights Reserved  Page 12 of 13 www.fromtheranks.com 

    event could “take out” or “take down” both leaving you few if any Business Continuity or DR options.

    Similar logic applies to the separation between a primary and a secondary datacenter.

    Note: My rule of thumb is, as an absolute minimum, 200 miles separation with both in different types of

    risk zones; with the risk zone evaluation being the senior datum. Absolute, bare bones, never to be

    impinged on is 20 miles separation but, when it comes to many DR incidents that is no separation at all.

    •  Is the datacenter located near or in the immediate vicinity of your tape storage location?

    Rationale: if the datacenter is “next door” to the tape v site the same DR event could take out both

    facilities or make it difficult if not impossible for supporting personnel to get to not only the datacenter but

    to the tape vault location necessary to bring a datacenter back up; assuming one of the two is accessible.

    •  Is the datacenter’s immediate environment very high risk by virtue of being located:

      Within 1-2 miles of an airport, in any direction?

      Near an explosive materials plant (fireworks, ammunition, etc)?

      Near any facility using/purveying easily combustible gases, fuels?

     Near railroad tracks or crossings, or interstate highways where transportation of hazardous

    materials is common?

      Near a nuclear or other power plant?

      Near an active military base?

      Near a politically “sensitive” facility (city, county, state, federal government buildings, etc.) that

    could reasonably be deemed to be a likely terrorist target?

      Near or in a high crime area (increasing risk of theft, vandalism and/or risk to support personnel)?

    Rationale: sometimes “things” go “boom”, or fall out of the sky, or are targets for protesters, etc. It is best

    not to have a datacenter within reach of such accidental or manmade disasters. One datacenter I know of

    was located near the end of the major runway of a major metropolitan airport. Not a good idea from a

    risks management perspective. The mitigation was to move the datacenter, which we did.

    •  Is the datacenter located in a heavy industrial area subject to equipment generated brown-outs and

    surges?

    Rationale: constant brown-outs and surges are hard on transformers, line conditioners, UPS systems and

    IT’s equipment.

    •  Is the datacenter located on the “other side” of solitary, sole access, bridges or roadways, thereby

    potentially isolating it from all ingress and egress efforts?

    Rationale: there are known datacenters with “one road in and out” only. If, for instance, the road or

    bridge were destroyed in a flood, off-road vehicles or helicopters would likely be your only physical access

    option. Not to mention the sole options for all of the datacenter personnel.

    End Of Part I of the List

    http://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.comhttp://c/Documents%20and%20Settings/Administrator/Application%20Data/Microsoft/Word/www.fromtheranks.com

  • 8/16/2019 NotesOn Risk Management Datacenter Assessment - Part II

    13/13

     

     Additional Notes:

    1.  If datacenter personnel are vague in their answers (and this has happened more than once) I “ding”

    them accordingly, i.e. their risk rating goes up. Vague is a “tell tale” for no or probably not. Treat it

    accordingly.

    2.  It is very rare that someone is an expert in everything. So, taking Fire Detection and Suppressionsystems as an example, if you aren’t a “guru” in that area bring along someone who is, or do your

    homework based on answers supplied by the datacenter, i.e. take each of their answers and research it

    to understand the technology, verify whether or not it is an appropriate solution and then rank it

    accordingly.

     Summary:

    We will continue, and conclude, the Datacenter Assessment List in Part III. Part II, however, should give you a

    good start on, and a good understanding of, the process of selecting the best available datacenter that meets

    your usage, and risk reduction, criteria.

    Hope this helps,

    DP Harshman

    PDF Link