If you can't read please download the document
Upload
jemtallon
View
968
Download
0
Embed Size (px)
Citation preview
Business Continuity andDisaster Recovery Planning
Domain 8 CISSP Official CBK 3rd EditionPages 1092-1155
Tim JensenJem JensenStaridLabs
Key Terms
Business Continuity Planning (BCP)
Disaster Recovery Planning (DRP)
Project Initiation and Management
First steps to building a BCPObtain senior management support to go forward with the project
Define a project scope, the objectives to be achieved, and the planning assumptions
Estimate the project resources needed to be successful, both human resources and financial resources.
Define a timeline and major deliverables of the project
Assign a project manager for the initial creation of a BCP and DRP
Senior Leadership Support
Senior Leadership's major goals:Execute the mission
Protect the organization
Risks you can point out to get buy in:Financial
Reputational
Regulatory (lawsuits)
Senior Management could be held liable for not using due care to protect the corporation.
BCP and DRP plans can take a year or more to complete, management support is critical so the process doesn't get postponed half way through.
Financial Risks
Can be quantified
Determines amount to spend on the recovery program
P * M = CProbability of harm (p)How likely is a damaging event to occur
Magnitude of harm (m)What is the financial damage for a single event?
Cost of prevention (c)The cost of putting in place a countermeasure. The cost of the countermeasure should not be more than the cost of the event.
Additional Benefits of Planning
Locating single points of failure (SPOF)
Process Improvements
Dealing with technical incidents
Project Scope and Plan
It's very important to gain firm agreement on the scope and goals of the DRP and BCP.Technology only or include business processes?
Main office only or all offices?
Workforce impairmentPandemic, labor strike, transportation issues
Project manager must agree with leadership on scope, timeline, and deliverables.
Legal and Regulatory Requirements
Many industries have applicable regulations.
Recent regulations:The 9/11 Commission Recommendations Act Of 2007 (Public Law 110-53)Recommends that private sector organizations validate their recovery readiness by comparing their programs to an unnamed standard (NFPA 1600 has been proposed)
US Government endorsed but is vuluntary
British Standard BS25999
The Ten Professional Practice Areas
NFPA 1600
Project Initiation and Management
Risk Evaluation and Control
Business Impact Analysis
Developing Business Continuity Strategies
Emergency Response and Operations
Developing and Implementing Business Continuity Plans
Awareness and Training Programs
Maintaining and Exercising Business Continuity Plans
Public Relations and Crisis Communications
Coordination with Public Authorities
BS25999
Extension of PAS56
Intention is to create the ability to demonstrate compliance with the standard
Stage 1: Audit including a desktop reviewMust be completed before Stage 2
Stage 2: conformance and certification audit where the planner must demonstrate implementation
If implementation fails then corrective action must be agreed upon.
If both stages complete then the organization can apply for BS25999 certification.
US Financial Regulations
Federal Financial Institutions Examination Council (FFIEC) specifies that BCP is about maintaining, resuming, and recovering the organization. Not just the technology.The planning process must be conducted enterprise wide.
BCP and test results should be independently audited and reviewed by board of directors
Company should be aware of the BCP activities of its 3rd party providers, key suppliers, and organization partners.
If processes are outsourced then the service providers BCP must be reviewed to ensure critical services can be restored within acceptable timeframes.
Additional Regulations:National Association of Insurance Commissioners (NAIC)
National Futures Association Compliance Rule 2-38
Electronic Funds Transfer Act
Basel Committee
Other Regulations and Standards
Australian Prudential Standard CPS 232 July 2012Requires institution BCM must include:BCM Policy
Business Impact Analysis (BIA) including risk assessment
Recovery objectives and strategies
Business Continuity Plan (BCP) including crisis management and recovery
Review and testing of the BCP
Training and awareness
Monetary Authority of Singapore June 2003
Standard for Business Continuity/Disaster Recovery Service Providers (SS507)SingaporeSets stringent standards for DR service providers
HIPAARequires data backup plan, disaster recovery plan, and emergency mode operations plan
Sarbanes Oxley Section 404
Applicable if required to file annual report required by Section 13(a) or 15(d) of the Securities Exchange Act of 1934 (15 USC 78m or 78o(d)
Must contain:Responsibility of management for establishing and maintaining adequate internal control structure and procedures for financial reporting
Contain an assessment, as of the end of the most recent fiscal year of the issuer, of the effectiveness of the internal control structure and procedures of the issuer.
Internal Control Evaluation and ReportingBCP and contingency planning is not considered in scope
Legal Standards
Blake vs Woodford Bank & Trust Co (1977)Foreseeable workload failure to prepare
Sun Cattle Company, Inc vs Miners Bank (1974)Computer System Failure Foreseeable Computer Failure
US vs Carroll Towing Company (1947)Defined breach of duty of care where B < PLB = (cost) Burden of taking precautions
P = Probability of Loss
L = Gravity of Loss
P * L must be greater than B to create a duty of due care for the defendant
Legal Standard Continued
Negligent Standard to Plan or Prepare (pandemic) 2003Canadian nurses filed suit saying the federal government was negligent in not preparing for the second wave after the disease was first identified.
Resource Requirements
Require plan for both staff and finances
Staff resourcesNeed staff from business operations and technology groups (IT).
Identify recovery priority
Identify required timeframesOnce timeframes are identified, plan staffing to meet timeframes (If 24 hour recovery will be required, etc)
The staff planning recovery must be the same team who executes the recovery in the event of an incident.
Financial Resources
Finances may be required to:Hire outside contractors/consultants
Travel may be required to offsite locations
Hardware, software, etc may need to be purchased.
Emergency Notification Lists
The BCP/DRP planner should build a contact list of critical staff and leadership.
The list should include at a minimum:Title, name, home phone, work phone, mobile phone
Tim Recommends also home address
Tim also recommends: Distribute the list and make sure everyone on the list has a physical copy offsite. Storing the list in a computer system housed onsite with no offline copies is stupid.
Vital Records
All vital records needed to rebuild the organization must be stored offsite in a secure location that can be accessed following a disaster.
This includes electronic data backups as well as paper record backups
Common Vital Records
Anything with a signature
Customer Correspondence
Customer Conversations
Accounting Records
Justification Proposals/Documents
Transcripts/minutes of meetings with legal significance
Paper with Value (Stock certificates, bonds, comercial paper)
Legal Documents (Letters of incorporation, deeds, etc)
Common Vital Records
Databases and contact lists for employees, customers, vendors, partners, etc
Business unit contingency plans
Procedure/application manuals
Backup files from production servers/applications
Reference documents used regularly
Calendar files or printouts
Source Code
Risk and Business Analysis
The planning team will make recommendations about which risks the organization should mitigate and which systems and processes the plan will recover and when.
Strategy Development
The planner will review different strategies for business recovery based on required SLA for critical systems.
Cost/Benefit analysis will be done to identify strategy viability.
Alternate Site Selection and Implementation
The planner selects and builds out alternate sites used to recovery the organization/technology.
Shouldn't be susceptible to the same threats as the primary site.Example: If Fargo is the main datacenter location, the backup site shouldn't be in Grand Forks. If one floods the other is likely to flood at the same time.
Good resources:www.prep4agthreats.org
www.switchlv.com/wp-content/uploads/disaster_avoidance_2013/disaster-map.html
Video Segway
Documenting the Plan
All of the information is compiled into a plan document.
Procedures are designed for each site and for each technology and/or application to be recovered.
Testing, Maintenance, and Updating
The plan must be validated by testing recovery.
A maintenance schedule must be established to the plan doesn't become obsolete.
Business Impact Analysis
The purpose of a BIA is to decide what needs to be recovered and how quickly.
Priority:Critical
Essential
Supporting
Non-Essential
Must determine maximum tolerable downtime (MTD). Also known as Recovery Time Objective (RTO)
Risk Assessments
Three elements of risk:Threats
Assets
Mitigating Factors
Threats are measured as a probability. (May happen 1 in 10 years)
Most common threat is power availability.
Second most common is a water event.Flooding, plumbing leak, broken pipe, leaky roof, water main break
Other Common Threats:Severe Weather, cable cuts, fires, labor disputes, transportation mishaps, hardware failures.
Internal Threats
Equipment fails prematurely:Improper installation
Improper environment
Equipment fails due to wear and tear:Most equipment has a mean time between failures rating.
Running equipment beyond MTBF is risking failure.
Assets
If the organization doesn't own anything then it won't be concerned about risks because it has little or nothing to lose. (Gotta love IT Security consulting!!!)
Assets include:Information
Financial
Physical
Human
Mitigating Factors
Controls ore safeguards that will be put in place to reduce the impact of a threat.
Example is that UPS devices can save production systems from hard crashes which could lead to data loss and long recovery times.
When a risk is identified the planner must accept the risk, transfer the risk, avoid the risk, or mitigate the risk.
Mitigation Strategies
AcceptThe risk is so unlikely to occur or the impact is so small, it'd cost more to mitigate.
TransferInsurance
AvoidanceHave compensating controls so risk is completely removed. Example is having 2 call centers in very different climates. In the event of inclement weather in one, the other is still operational.
MitigationControls implemented to avoid the risk or to lessen the impact.
Define Recovery Objectives
Identify all the resources necessary to perform each recovery function
Recovery Strategy
Strategies are driven by the recovery timeframes
Surviving Site Strategy
A surviving site strategy is implemented so that while service levels may drop, a function never ceases to be performed because it operates in at least two geographically dispersed buildings that are fully equipped and staffed.
Self Service Strategy
An organization can transfer work to another of its own locations, which has available facilities and/or staff to manage the time sensitive workload until the interruption is over.
Internal Arrangement Strategy
Training rooms, cafeterias, conference rooms may be equipped to support organizational functions while staff from the impacted site travels to another site and resumes organization.
Mutual Aid Agreement Strategies
Other similar organizations may be able to accommodate those affected.
Dedicated Alternate Site Strategy
Built by the company to accommodate organization function or technology recovery.
Work from Home Strategy
Workers can remote in
External Supplier Strategy
Pay an external company for disaster recovery.
These companies provide data centers, alternate site spaces, mobile units, and temporary staff.
Backup Storage Strategy
Data should be backed up once or more times a day and a copy sent offsite.
The offsite storage should be far enough away from your primary site to be safe and close enough to your recovery site to allow timely recovery operations to start.
Systems should be prioritized to make sure resources are available for the most critical systems and data.
A full backup is normally taken and then incremental backups occur every few hours or every day.
Recovery Site Strategies
Dual Data Center
Applications are load balanced or hot swapped between two data centers so downtime is minimized.
Each data center should be able to operate at full load.
Internal Hot Site
Site is standby ready with all technology and equipment necessary already in place.
Often used as dev/test until recovery is needed, at which time dev/test is removed and production is implemented.
Should be exactly the same hardware, software, etc.
External Hot Site
Equipment is installed and waiting, but the environment must be rebuilt for recovery.
Often contracted through a recovery service provider.
Equipment and software should be kept as close to identical as possible to speed recovery.
Warm Site
A leased or rented facility which is partially configured with some equipment, but not the actual computers.
Generally has cooling, cabling, and networking in place.
Servers are delivered to the site at the time of the disaster.
Cold Site
Empty data center space with no technology.
All technology must be acquired at the time of a disaster.
Mobile Sites
Mobile house or sea cargo trailer with a data center in it which can be dropped, hooked up, and is ready to go.
Processing Agreements
Organizations can contract with other organizations for data processing.
Reciprocal Agreements
Similar organizations can share the risk of an outage by hosting the data and processing of the other organization in the event of a disaster.
Has a lot of contractual, legal, and compliance issues depending on what data you process.
Outsourcing
Business processes can be outsourced entirely.
Multiple Processing Sites
Multiple sites inside the organization can be used for processing.
Useful if the company is spread throughout the country or world.
Runs into bandwidth and latency issues.
Disaster Recovery Process
When things are going bad, people get stressed and make bad decisions
Document the plan!Clear instructions on who will do what and when
Consistent regardless of the event
Define communication strategy
Distribute to everyone who has a role in recovery
Test/verify the plan
Disaster Recovery Process
ResponseAssessment team: evaluates the event and escalates to the appropriate people if needed
Escalation team: contacted by assessment teamConsists of event owner, responders, stakeholders
Emergency notification listsResponse teams must be reachable 24x7
Must be reachable by everyone in the organization
Should be used for every event, from plumbing leaks to Godzilla attacks
Disaster Recovery Process
Emergency Management TeamProvide management (short-term tactical command)
Assess damage, keep executives in the loop, initiate and organize response
Executive TeamSenior executives
Respond to issues that need direction
Handle PR
Provide leadership (long-term strategic direction)
Disaster Recovery Process
Emergency Response TeamExecute the recovery process
Retrieve recovery info (potentially offsite)
Communicate with command center
Work with alternate site personnel
Identify/Install replacement equipment or software
Command CenterShould have copy of the plan so they can ensure it is being followed correctly
Should keep track of what's being done and costs
Disaster Recovery Process
Communications
Important to keep everyone informedEmergency notification listTeam members/managers who disseminate notifications
Contingency line (ex: printed phone# on badges)Single number to call to get the latest info
PRImportant that everyone tells the same story
Keep things short and honest
Multiple communication channelsCould actually reduce confusion (techs on their own conference bridge since some jargon sounds scary)
Assessment
Process to rate severity of events
Tiered categories like:Non-incident: limited or no disruption
Incident: cause downtime for a facility or serviceTrigger disaster recovery plan, report to senior mgmt
Severe incident: significant destruction or disruptionTrigger DR, contact senior management and crisis mgmt
Restoration
Planned event after recoveryInterim plans (example)Part of DR plan was to set up alternative site
Work from alt site until original site is restored
Slowly transition back to original site
After everything is back at the original site, dismantle the alternate site
Training
Awareness programMake sure everyone knows the plan before they need to use it
Train all employees on how to raise issues to the evaluation team
Train stakeholders on their role in case of an eventConduct exercises to practice
Reassure customers that a plan is in place so the organization will always be there
New hire training
Exercises
Exercise instead of testTest makes people think it's pass fail
Call exercise activate the call treeVerify numbers are correct
What percentage were unavailable
Walkthrough exerciseTalk though a scenario with everyone
Make sure everyone has actually read the plan
Find weaknesses
Exercises
SimulationNever create a disaster by testing for one
Validate alternative site readiness
Considered successful if everything worked out to get the resources needed to recover
Also successful if it didn't since you learn what to fix
Compact exerciseStart with call exercise and run right into a simulation
Fake injuries, pretend reporters, fire drills
Maintaining the Plan
Should be reviewed regularly and updatedReview every 3 months
Formal audit yearly
Version controlEnsures everyone is using the latest version
Keeps a history of what changed and why
Store the latest plan offsite so it's available in a real disaster
Disaster Recovery Program
Probably will start as a projectProjects have an end; DR must be on-going
Transition into a an ongoing processRepeat the steps regularly
Use the program to spin off smaller projects like yearly audits and quarterly reviews
Emergency Management Organization (EMO)Department or group responsible
Emergency Operations Center (EOC)Provides a location and resources for recovery
Other Risk Areas
Business continuity is closely related to other areas of riskA good DR plan doesn't matter if records management policy is so poor that offsite backups don't exist or aren't maintained
Good firewall policy doesn't help if alternate site has so little physical security that people could enter it and access the data directly
Need to address all risk areas for complete coverage