Cissp Week 24

Business Continuity andDisaster Recovery Planning

Domain 8 CISSP Official CBK 3rd EditionPages 1092-1155

Tim JensenJem JensenStaridLabs

Key Terms

Business Continuity Planning (BCP)

Disaster Recovery Planning (DRP)

Project Initiation and Management

First steps to building a BCPObtain senior management support to go forward with the project

Define a project scope, the objectives to be achieved, and the planning assumptions

Estimate the project resources needed to be successful, both human resources and financial resources.

Define a timeline and major deliverables of the project

Assign a project manager for the initial creation of a BCP and DRP

Senior Leadership Support

Senior Leadership's major goals:Execute the mission

Protect the organization

Risks you can point out to get buy in:Financial

Reputational

Regulatory (lawsuits)

Senior Management could be held liable for not using due care to protect the corporation.

BCP and DRP plans can take a year or more to complete, management support is critical so the process doesn't get postponed half way through.

Financial Risks

Can be quantified

Determines amount to spend on the recovery program

P * M = CProbability of harm (p)How likely is a damaging event to occur

Magnitude of harm (m)What is the financial damage for a single event?

Cost of prevention (c)The cost of putting in place a countermeasure. The cost of the countermeasure should not be more than the cost of the event.

Additional Benefits of Planning

Locating single points of failure (SPOF)

Process Improvements

Dealing with technical incidents

Project Scope and Plan

It's very important to gain firm agreement on the scope and goals of the DRP and BCP.Technology only or include business processes?

Main office only or all offices?

Workforce impairmentPandemic, labor strike, transportation issues

Project manager must agree with leadership on scope, timeline, and deliverables.

Legal and Regulatory Requirements

Many industries have applicable regulations.

Recent regulations:The 9/11 Commission Recommendations Act Of 2007 (Public Law 110-53)Recommends that private sector organizations validate their recovery readiness by comparing their programs to an unnamed standard (NFPA 1600 has been proposed)

US Government endorsed but is vuluntary

British Standard BS25999

The Ten Professional Practice Areas
NFPA 1600

Project Initiation and Management

Risk Evaluation and Control

Business Impact Analysis

Developing Business Continuity Strategies

Emergency Response and Operations

Developing and Implementing Business Continuity Plans

Awareness and Training Programs

Maintaining and Exercising Business Continuity Plans

Public Relations and Crisis Communications

Coordination with Public Authorities

BS25999

Extension of PAS56

Intention is to create the ability to demonstrate compliance with the standard

Stage 1: Audit including a desktop reviewMust be completed before Stage 2

Stage 2: conformance and certification audit where the planner must demonstrate implementation

If implementation fails then corrective action must be agreed upon.

If both stages complete then the organization can apply for BS25999 certification.

US Financial Regulations

Federal Financial Institutions Examination Council (FFIEC) specifies that BCP is about maintaining, resuming, and recovering the organization. Not just the technology.The planning process must be conducted enterprise wide.

BCP and test results should be independently audited and reviewed by board of directors

Company should be aware of the BCP activities of its 3rd party providers, key suppliers, and organization partners.

If processes are outsourced then the service providers BCP must be reviewed to ensure critical services can be restored within acceptable timeframes.

Additional Regulations:National Association of Insurance Commissioners (NAIC)

National Futures Association Compliance Rule 2-38

Electronic Funds Transfer Act

Basel Committee

Other Regulations and Standards

Australian Prudential Standard CPS 232 July 2012Requires institution BCM must include:BCM Policy

Business Impact Analysis (BIA) including risk assessment

Recovery objectives and strategies

Business Continuity Plan (BCP) including crisis management and recovery

Review and testing of the BCP

Training and awareness

Monetary Authority of Singapore June 2003

Standard for Business Continuity/Disaster Recovery Service Providers (SS507)SingaporeSets stringent standards for DR service providers

HIPAARequires data backup plan, disaster recovery plan, and emergency mode operations plan

Sarbanes Oxley Section 404

Applicable if required to file annual report required by Section 13(a) or 15(d) of the Securities Exchange Act of 1934 (15 USC 78m or 78o(d)

Must contain:Responsibility of management for establishing and maintaining adequate internal control structure and procedures for financial reporting

Contain an assessment, as of the end of the most recent fiscal year of the issuer, of the effectiveness of the internal control structure and procedures of the issuer.

Internal Control Evaluation and ReportingBCP and contingency planning is not considered in scope

Legal Standards

Blake vs Woodford Bank & Trust Co (1977)Foreseeable workload failure to prepare

Sun Cattle Company, Inc vs Miners Bank (1974)Computer System Failure Foreseeable Computer Failure

US vs Carroll Towing Company (1947)Defined breach of duty of care where B < PLB = (cost) Burden of taking precautions

P = Probability of Loss

L = Gravity of Loss

P * L must be greater than B to create a duty of due care for the defendant

Legal Standard Continued

Negligent Standard to Plan or Prepare (pandemic) 2003Canadian nurses filed suit saying the federal government was negligent in not preparing for the second wave after the disease was first identified.

Resource Requirements

Require plan for both staff and finances

Staff resourcesNeed staff from business operations and technology groups (IT).

Identify recovery priority

Identify required timeframesOnce timeframes are identified, plan staffing to meet timeframes (If 24 hour recovery will be required, etc)

The staff planning recovery must be the same team who executes the recovery in the event of an incident.

Financial Resources

Finances may be required to:Hire outside contractors/consultants

Travel may be required to offsite locations

Hardware, software, etc may need to be purchased.

Emergency Notification Lists

The BCP/DRP planner should build a contact list of critical staff and leadership.

The list should include at a minimum:Title, name, home phone, work phone, mobile phone

Tim Recommends also home address

Tim also recommends: Distribute the list and make sure everyone on the list has a physical copy offsite. Storing the list in a computer system housed onsite with no offline copies is stupid.

Vital Records

All vital records needed to rebuild the organization must be stored offsite in a secure location that can be accessed following a disaster.

This includes electronic data backups as well as paper record backups

Common Vital Records

Anything with a signature

Customer Correspondence

Customer Conversations

Accounting Records

Justification Proposals/Documents

Transcripts/minutes of meetings with legal significance

Paper with Value (Stock certificates, bonds, comercial paper)

Legal Documents (Letters of incorporation, deeds, etc)

Common Vital Records

Databases and contact lists for employees, customers, vendors, partners, etc

Business unit contingency plans

Procedure/application manuals

Backup files from production servers/applications

Reference documents used regularly

Calendar files or printouts

Source Code

Risk and Business Analysis

The planning team will make recommendations about which risks the organization should mitigate and which systems and processes the plan will recover and when.

Strategy Development

The planner will review different strategies for business recovery based on required SLA for critical systems.

Cost/Benefit analysis will be done to identify strategy viability.

Alternate Site Selection and Implementation

The planner selects and builds out alternate sites used to recovery the organization/technology.

Shouldn't be susceptible to the same threats as the primary site.Example: If Fargo is the main datacenter location, the backup site shouldn't be in Grand Forks. If one floods the other is likely to flood at the same time.

Good resources:www.prep4agthreats.org

www.switchlv.com/wp-content/uploads/disaster_avoidance_2013/disaster-map.html

Video Segway

Documenting the Plan

All of the information is compiled into a plan document.

Procedures are designed for each site and for each technology and/or application to be recovered.

Testing, Maintenance, and Updating

The plan must be validated by testing recovery.

A maintenance schedule must be established to the plan doesn't become obsolete.

Business Impact Analysis

The purpose of a BIA is to decide what needs to be recovered and how quickly.

Priority:Critical

Essential

Supporting

Non-Essential

Must determine maximum tolerable downtime (MTD). Also known as Recovery Time Objective (RTO)

Risk Assessments

Three elements of risk:Threats

Assets

Mitigating Factors

Threats are measured as a probability. (May happen 1 in 10 years)

Most common threat is power availability.

Second most common is a water event.Flooding, plumbing leak, broken pipe, leaky roof, water main break

Other Common Threats:Severe Weather, cable cuts, fires, labor disputes, transportation mishaps, hardware failures.

Internal Threats

Equipment fails prematurely:Improper installation

Improper environment

Equipment fails due to wear and tear:Most equipment has a mean time between failures rating.

Running equipment beyond MTBF is risking failure.

Assets

If the organization doesn't own anything then it won't be concerned about risks because it has little or nothing to lose. (Gotta love IT Security consulting!!!)

Assets include:Information

Financial

Physical

Human

Mitigating Factors

Controls ore safeguards that will be put in place to reduce the impact of a threat.

Example is that UPS devices can save production systems from hard crashes which could lead to data loss and long recovery times.

When a risk is identified the planner must accept the risk, transfer the risk, avoid the risk, or mitigate the risk.

Mitigation Strategies

AcceptThe risk is so unlikely to occur or the impact is so small, it'd cost more to mitigate.

TransferInsurance

AvoidanceHave compensating controls so risk is completely removed. Example is having 2 call centers in very different climates. In the event of inclement weather in one, the other is still operational.

MitigationControls implemented to avoid the risk or to lessen the impact.

Define Recovery Objectives

Identify all the resources necessary to perform each recovery function

Recovery Strategy

Strategies are driven by the recovery timeframes

Surviving Site Strategy

A surviving site strategy is implemented so that while service levels may drop, a function never ceases to be performed because it operates in at least two geographically dispersed buildings that are fully equipped and staffed.

Self Service Strategy

An organization can transfer work to another of its own locations, which has available facilities and/or staff to manage the time sensitive workload until the interruption is over.

Internal Arrangement Strategy

Training rooms, cafeterias, conference rooms may be equipped to support organizational functions while staff from the impacted site travels to another site and resumes organization.

Mutual Aid Agreement Strategies

Other similar organizations may be able to accommodate those affected.

Dedicated Alternate Site Strategy

Built by the company to accommodate organization function or technology recovery.

Work from Home Strategy

Workers can remote in

External Supplier Strategy

Pay an external company for disaster recovery.

These companies provide data centers, alternate site spaces, mobile units, and temporary staff.

Backup Storage Strategy

Data should be backed up once or more times a day and a copy sent offsite.

The offsite storage should be far enough away from your primary site to be safe and close enough to your recovery site to allow timely recovery operations to start.

Systems should be prioritized to make sure resources are available for the most critical systems and data.

A full backup is normally taken and then incremental backups occur every few hours or every day.

Recovery Site Strategies

Dual Data Center

Applications are load balanced or hot swapped between two data centers so downtime is minimized.

Each data center should be able to operate at full load.

Internal Hot Site

Site is standby ready with all technology and equipment necessary already in place.

Often used as dev/test until recovery is needed, at which time dev/test is removed and production is implemented.

Should be exactly the same hardware, software, etc.

External Hot Site

Equipment is installed and waiting, but the environment must be rebuilt for recovery.

Often contracted through a recovery service provider.

Equipment and software should be kept as close to identical as possible to speed recovery.

Warm Site

A leased or rented facility which is partially configured with some equipment, but not the actual computers.

Generally has cooling, cabling, and networking in place.

Servers are delivered to the site at the time of the disaster.

Cold Site

Empty data center space with no technology.

All technology must be acquired at the time of a disaster.

Mobile Sites

Mobile house or sea cargo trailer with a data center in it which can be dropped, hooked up, and is ready to go.

Processing Agreements

Organizations can contract with other organizations for data processing.

Reciprocal Agreements

Similar organizations can share the risk of an outage by hosting the data and processing of the other organization in the event of a disaster.

Has a lot of contractual, legal, and compliance issues depending on what data you process.

Outsourcing

Business processes can be outsourced entirely.

Multiple Processing Sites

Multiple sites inside the organization can be used for processing.

Useful if the company is spread throughout the country or world.

Runs into bandwidth and latency issues.

Disaster Recovery Process

When things are going bad, people get stressed and make bad decisions

Document the plan!Clear instructions on who will do what and when

Consistent regardless of the event

Define communication strategy

Distribute to everyone who has a role in recovery

Test/verify the plan


ResponseAssessment team: evaluates the event and escalates to the appropriate people if needed

Escalation team: contacted by assessment teamConsists of event owner, responders, stakeholders

Emergency notification listsResponse teams must be reachable 24x7

Must be reachable by everyone in the organization

Should be used for every event, from plumbing leaks to Godzilla attacks


Emergency Management TeamProvide management (short-term tactical command)

Assess damage, keep executives in the loop, initiate and organize response

Executive TeamSenior executives

Respond to issues that need direction

Handle PR

Provide leadership (long-term strategic direction)


Emergency Response TeamExecute the recovery process

Retrieve recovery info (potentially offsite)

Communicate with command center

Work with alternate site personnel

Identify/Install replacement equipment or software

Command CenterShould have copy of the plan so they can ensure it is being followed correctly

Should keep track of what's being done and costs


Communications

Important to keep everyone informedEmergency notification listTeam members/managers who disseminate notifications

Contingency line (ex: printed phone# on badges)Single number to call to get the latest info

PRImportant that everyone tells the same story

Keep things short and honest

Multiple communication channelsCould actually reduce confusion (techs on their own conference bridge since some jargon sounds scary)

Assessment

Process to rate severity of events

Tiered categories like:Non-incident: limited or no disruption

Incident: cause downtime for a facility or serviceTrigger disaster recovery plan, report to senior mgmt

Severe incident: significant destruction or disruptionTrigger DR, contact senior management and crisis mgmt

Restoration

Planned event after recoveryInterim plans (example)Part of DR plan was to set up alternative site

Work from alt site until original site is restored

Slowly transition back to original site

After everything is back at the original site, dismantle the alternate site

Training

Awareness programMake sure everyone knows the plan before they need to use it

Train all employees on how to raise issues to the evaluation team

Train stakeholders on their role in case of an eventConduct exercises to practice

Reassure customers that a plan is in place so the organization will always be there

New hire training

Exercises

Exercise instead of testTest makes people think it's pass fail

Call exercise activate the call treeVerify numbers are correct

What percentage were unavailable

Walkthrough exerciseTalk though a scenario with everyone

Make sure everyone has actually read the plan

Find weaknesses

Exercises

SimulationNever create a disaster by testing for one

Validate alternative site readiness

Considered successful if everything worked out to get the resources needed to recover

Also successful if it didn't since you learn what to fix

Compact exerciseStart with call exercise and run right into a simulation

Fake injuries, pretend reporters, fire drills

Maintaining the Plan

Should be reviewed regularly and updatedReview every 3 months

Formal audit yearly

Version controlEnsures everyone is using the latest version

Keeps a history of what changed and why

Store the latest plan offsite so it's available in a real disaster

Disaster Recovery Program

Probably will start as a projectProjects have an end; DR must be on-going

Transition into a an ongoing processRepeat the steps regularly

Use the program to spin off smaller projects like yearly audits and quarterly reviews

Emergency Management Organization (EMO)Department or group responsible

Emergency Operations Center (EOC)Provides a location and resources for recovery

Other Risk Areas

Business continuity is closely related to other areas of riskA good DR plan doesn't matter if records management policy is so poor that offsite backups don't exist or aren't maintained

Good firewall policy doesn't help if alternate site has so little physical security that people could enter it and access the data directly

Need to address all risk areas for complete coverage

Education

Cissp Week 24